I am experiencing an issue with my deployed applications where the APIs experience significant cold start delays. Specifically:
After a deployment, the first request to the API takes a long time to respond and occasionally fails.
The health check for the app also experiences delays and sometimes fails on the first run.
Subsequent requests (2nd, 3rd, etc.) work fine.
If the API is idle for a few minutes (around 5 minutes), the first request again experiences a delay or fails, showing a repeated cold start behavior.
This behavior is impacting the reliability of our application, especially for endpoints that need to be responsive immediately after deployment or during idle periods.
After a deployment, the first request to the API takes a long time to respond and occasionally fails.
This is likely either because the machine takes a bit to spin up and be ready for requests (solution: tune the grace period in your health check) or because on first deploy the machine is stopped; in this case the image is updated but stopped machines aren’t started (let sleeping machines lie).
Also, things definitely work better if you define an explicit http health check as described in that link rather than relying on the implicit health check - it’s a very trivial tcp “is the process up?” test and it doesn’t take into account app startup time, so it’s subject to some exponential back-off from the Fly proxy resulting in the slow first request behavior you see.
Also, your app is extremely slow to reach a point where it starts serving requests; it’s likely because you’re starting a boatload of services on app startup (I spotted redis, nats, and launchdarkly at quick glance). I would recommend splitting these off into separate Fly apps and only running your main web process on the Fly app, for faster boot times.
If the API is idle for a few minutes (around 5 minutes), the first request again experiences a delay or fails, showing a repeated cold start behavior.
This is because your machines are configured to auto-stop after a few minutes when they are not serving any requests. This saves money because machines aren’t running when nobody is using them, with the downside that indeed, when a request comes in, the machine has to wake up (cold-boot) before it can serve the request.
Properly tuning the grace period helps here because a well-tuned grace period means the wait until the request is answered is shorter, but also: auto-stop/auto-start can be turned off if you absolutely need fast first responses all the time. More information here.
What’s the spec of your machine? I wonder if your lengthy boot-up process is overwhelming the CPU, and thus you’re experiencing throttling, which is slowing down your app’s readiness.
Also, what stack are you using? We have seen some feedback here that Java APIs go through some kind of bytecode caching on first boot, which races the CPU on start-up, making the throttling problem worse.