Long cold starts (FLAME)

enoonan · October 1, 2024, 11:58pm

My FLAME cold starts take a really long time - like around 20 seconds. I’m using them to run headless browsers (not super unlike https://worldpagespeed.fly.dev/), and I basically want my FLAME nodes to only do that.

I don’t understand exactly what all factors into cold start boot times. My suspicion is that my main app just has a lot of dependencies that take time to compile, even if they aren’t going to be used by the FLAME node.

Is that true? If so, would it then make sense to just spin up a separate Fly app specifically to run headless browsers? I see that with FLAME.FlyBackend we also have the option of pointing at whatever Docker image we want.

#flame

chrismccord · October 2, 2024, 1:17am

What is the size of your app image? There’s nothing to compile/build since it will be launching the prebuilt docker image from the parent. Are you certain the time is the cold start and not something app specific like the time to load your headless chromedriver process? worldpagespeed cold starts are in the 5-10s range with the time being actual time to start chromedriver and start driving the browser. You can also look into starting with a warm pool min: whatever and min_idle_shutdown_after to idle down below min if no work is needed to avoid deploys causing users to hit a cold pool.

enoonan · October 2, 2024, 6:34pm

Hey Chris!

I actually chatted with you about this project a bit at ElixirConf I have borrowed quite a bit from the world page speed repo in setting this up and I massively appreciate your responding here.

So the size of the app image is 1.08GB. I got that by doing fly auth docker and then pulling the image locally. Not sure there’s another way to see it. Is that a lot?

The actual machine boot time is pretty snappy. I get this message in the logs when I do a cold start:

Machine created and started in 3.514s

But then after that it seems like it takes another ~10 seconds for it to start handling the request. Is that to be expected? WPS always seemed faster to me.

The request processing itself takes about 4 seconds, which is ok and has as much to do as the site being tested as anything else. So the total cold start round trip is hovering in the 18 second range.

Seems like the most likely solution down the road will be to just keep at least one BrowserRunner FLAME node warm at all times.

chrismccord · October 2, 2024, 7:22pm

That imagine size is reasonable and about what worldpagespeed is. I can’t say where the time is spent, but I would check your app supervision tree to ensure you aren’t waiting on extra services that you don’t need. You can also enable more logging to see if reported FLAME times match what you are experiencing:

{FLAME.Pool, 
 ...
 log: :info}

Then your fly log’s will show times like:

syd [info]19:20:24.116 [info] runner connect: completed in 8493ms

enoonan · October 2, 2024, 8:32pm

Thank you Chris.

This definitely gives me enough visibility to be able to fine-tune this in the future.

For the supervision tree, I’m actually using your children function and have it set like this:

    children(
      always: TwfexWeb.Telemetry,
      parent: {DNSCluster, query: Application.get_env(:twfex, :dns_cluster_query) || :ignore},
      parent: {Phoenix.PubSub, name: Twfex.PubSub},
      # Start the Finch HTTP client for sending emails
      parent: {Finch, name: Twfex.Finch},
      # Start a worker by calling: Twfex.Worker.start_link(arg)
      # {Twfex.Worker, arg},
      flame: Twfex.Browser.HeadlessDriver,
      parent: Twfex.Repo,
      parent: {AshAuthentication.Supervisor, otp_app: :example},
      parent:
        {FLAME.Pool,
         name: Twfex.BrowserRunner,
         min: 0,
         max: 10,
         max_concurrency: 10,
         idle_shutdown_after: :timer.hours(2)},
      # Start to serve requests, typically the last entry
      parent: TwfexWeb.Endpoint
    )

So in theory the only thing running in the FLAME node is telemetry and the BrowserRunner.

system · October 9, 2024, 8:33pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What's the cold start time for when app's vm count is 1? Questions / Help example-app , docs	2	2218	May 2, 2022
Fly response times	8	135	August 30, 2024
How does fly.io calculate VM exec time? Questions / Help	2	715	May 3, 2022
Slow Starting Time Phoenix	0	218	January 10, 2024
Does Fly prioritize cold starts or distance? Questions / Help	0	270	October 19, 2022

Long cold starts (FLAME)

Related topics