Throttling on deploy?

jjjoooeee · July 16, 2025, 11:38pm

Hi, I have a java app that’s been running on a shared-1x-cpu@256MB for a while now. About 2 weeks ago I deployed a new version and noticed that responsiveness down significantly, but eventually repaired itself. Did another deploy today and noticed the same thing, but decided to look at grafana metrics.

What I see is that the “balance” blue line goes rapidly from 8.33 minutes to ~0 within 1 minute of the deploy, and then load average, steal and throttle start to go up. Once the balance recovers, the app is usable again but unclear what to do about deploys resetting the burst? I did see the announcements/documentation about the cpu throttling changes, but they don’t seem to address this?
Thanks!

halfer · July 17, 2025, 9:03pm

Would you share the graphs you’re looking at? It may help to look at top or similar inside your app. What kind of load (say req/sec) is your app under? Or does your app produce a higher load during deploy?

jjjoooeee · July 17, 2025, 9:19pm

Sign in to Your Account · Fly → “Instance load and CPU” section → " CPU Quota Balance and Throttling" graph.

req/sec is practically negligible, there’s been less than 200 requests since the deploy yesterday. there may be higher load on startup in terms of JVM jit-ing, dependency injection/reflection stuff, but not sure how it would burn through 8 minutes of balance in less than a minute with actual cpu utilization less than 50% according to same grafana chart.

jjjoooeee · July 17, 2025, 10:07pm

and zoomed in a little more:

halfer · July 17, 2025, 10:22pm

Things I’d do here:

I’d first redeploy temporarily with a better CPU to understand what’s happening here, especially if the graph you show can always be replicated on deploy. I did also think about that CPU throttling post myself too.

Secondly, I wonder if there are options you can supply in Java to slow down the initial CPU usage. I think you can also use the JVM to record what is happening in that ten-minute window (either on your current machine type, or the temporarily improved one). I don’t use Java, but as a random guess, could the garbage collector be doing unnecessary stuff?

It might be worth experimenting with a different JVM, in case there is a more efficient one available.

Finally, it may be worth looking at your app, or its dependencies, to see if it is doing anything at the start of its lifecycle that is doing work you don’t know about.

mayailurus · July 17, 2025, 10:39pm

This is a known shortcoming of the balance system, from what Fly.io has said in the past:

(A deploy counts as a “new version” in this context, as I understand it.)

halfer · July 17, 2025, 10:49pm

Sure, but isn’t the real problem that something is consuming CPU at deploy, and if that can be tackled, the non-ideal CPU throttling would be academic?

mayailurus · July 17, 2025, 11:22pm

Hm… I think both are worth looking at, to be honest. (“Do less at startup” is always good advice, after all, .)

jjjoooeee · July 18, 2025, 6:58pm

Thanks for the pointer @mayailurus – I saw the threads but not the part where it talked about “machine versions” (or I just didn’t equate “new deploy” with “machine version”). Either way the behavior is at least explained if not ideal. I mean I guess I can completely rewrite my app in rust instead of jvm…

@btoews is there any plan for the “hopefully change in the future” part of the comment? can CPU Performance · Fly Docs be updated to talk about deployments/versions? it currently just talks about applications and machines…

halfer · July 19, 2025, 12:34pm

If your initial CPU throttling comes from warming up the compilation cache, maybe in the short term you can detune it so it does less work here? If you’re on 3 or 4, maybe you can drop it down one or two levels?

system · July 26, 2025, 12:34pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.