We’ve slowly been rolling out changes to how CPU quotas are enforced for machines with shared vCPUs. As of today, these changes are fully enabled. For the vast majority of users, this results in more predictable performance from machines, but some users CPU usage is throttled as a result.
The rollout of these changes has been tracked in the Predictable Processor Performance thread, but it’s super long and hard to follow. So, I’m posting this new thread to make sure folks are aware of today’s changes.
The TL;DR is that shared vCPUs are allowed to use 1/16th of a CPU core whereas performance vCPUs are allowed to use an entire CPU core. We allow a balance of unused vCPU time to be accumulated and spent in bursts to lessen the impact of throttling for bursty applications. More details can be found in the docs as well as in the other thread.
Don’t you think there’s a big gap between 1/16 and 16/16? Like something in-between would be nice to have. My workload is very bursty and gets throttled every time with shared CPUs, but with performance I barely hit 50% load (the workload is a mix between network, CPU and disk).
Would you consider offering some shared CPUs with more allocation? maybe something like half of a CPU or 33%.
It’s quite likely that we’ll add other vCPU options in the future. In the meantime, you could get 50% CPU by using a shared-cpu-8x but only running your app with a single thread.
After the rollout our apps stopped working. I had to move from 1x with 512MB RAM (which worked fine before) to 8x (which doesn’t allow for less than 2GB RAM) just to get them running again…
Wow, pretty surprised by this. I’m catching up on the quota changes and from the original thread:
When we say this change won’t be noticeable to the overwhelming majority of our users, we mean it: the guts of this change have been rolled out for weeks, and we’ve been tracking quota usage. A tiny fraction of organizations on Fly.io, which have been benefiting from our lax scheduling, will lose some performance from this change. We’re reaching out to them directly.
Cool - so looks like quota enforcement rolled out this morning. We began throttling and had an outage as a result. Zero communication or indication that we’d be affected until it happened.
What is the expectation of your customers, that we are all reading through every thread in community and checking our workloads against future changes?
We now have an outage too. I thought we’d mitigated this after last time by delaying an short expensive task for 10 minutes after start-up (as a new deploy sees the balance reset), but the rate of accrual now seems slower. I’m try to mitigate (again) by delaying it by an hour.
Edit: an hour delay only built up a balance under 3.5 minutes despite utilisation being under 6% for pretty much all the time.
So basically, you’re saying that any application that is capable of running more than a single thread cannot safely use shared-cpu-Nx without being constantly throttled?
We were running our production on two shared-cpu-8x machines when the outage hit for no apparent reason (did we miss the official communication or is this community forum the only source of official communication?).
Anyway, the baseline on the CPU load charts was label with “6.25%” (=1/16 of 100%), so it’s not completely clear why would someone pay for shared-cpu-8x when it works just as well as shared-cpu-1x.
I’m so glad I came across this thread because I’ve been pulling out my hair trying to figure out why I started to see apps break yesterday in my afternoon.
At a high level I think these changes make sense, but I think there’s a critical flaw: Certain apps tend to burst quite a bit at boot due to parsing code (node.js), warming up a VM (node.js/v8,java, etc), and with this change I’m seeing a nodejs app taking 10-20 minutes before it’s useable where previously it took maybe 20-30 seconds, just because the startup sequence is throttled so aggressively out of the gate.
Is it possible to either bless apps with a CPU burst balance right away, or at least honor the balance at a machine level so that a new deployment can come up quickly without starting from a burst balance of 0?
It feels like an anti-pattern to over-provision an app just so it can boot up properly where otherwise the running cpu utilization would fit in a 1x cpu allotment.
This part is expected, I think. The balance accumulated is the difference between the time you were entitled to and the time you actually used, so if the entitlement is much lower then the accumulation rate goes way down, too.
As a simplified example, suppose that you had a shared-1x Machine with a flat 0% CPU usage (just for the sake of argument). Then each 80ms cycle, you would be entitled to 5ms of time but use 0ms, thus accumulating… only 5ms. Over an hour,
that would amount to (5/80) × 60min = 3.75min.
(And in the past, that would have been more like (24/80) × 60min = 18min, which is already way above the ~8min ceiling.)
Aside: Personally, I think it’s best to think of these in terms of the actual 80ms cycles—and treat the percentages as just mnemonics and intuition-builders.
@btoews Did Fly ever send out email PSA for the CPU change? I don’t see many emails from Fly at all and this Discourse is suppose to be your main form of communication but folks like @mayailurus and I seem to be more active than Fly itself.
+1 to this. This caused multiple outages of my app due to deployment hitting the bursting quota due to django’s startup process will likely cause untold outages for apps that don’t have uptime reporting. We added uptime reporting two days ago and started seeing outages that previously never occurred that we weren’t aware of. Whats interesting is that the two outages we’ve had due to this change probably doesn’t even register as an outage since there was no “crash” – just a server restarting 10x slower than it did previously.
I don’t have a problem with the burst limitations in theory once an app is up and running. I would not have become a paid member of fly if it required me to upgrade to a professional tier to host our staging server without outages.
EDIT: Our outage was caused by our rolling deployment strategy which was previously rolling. You can avoid a deployment outage by switching to bluegreen which will wait for health checks to pass.
EDIT 2: Blue green deployment isn’t possible in my case due to our use of volume so it’s more likely i’ll just migrate off fly at this rate honestly. I’ve spent thousands on this platform so this is pretty disappointing to be honest.
Good idea. A lot of apps are like that, especially those based on Java. This reflects my experience with TeamCity Server that has a horrendous startup sequence, but afterwards it works decently well. shared-cpu-4x made it happy though.
I wonder if this would fix for us too. I have to use 8x just to run the app now even though it used to run fine at 1x, but watching the graphs it has been under 2% CPU usage most of the time today. But startup throttles so hard that it just never boots up right even after an hour it’s still timing out db requests etc because of the nonstop throttling. Once it stabilizes then CPU goes down to under the 1x level
I’m so pleased I came across this thread. I thought I was going crazy. Our app had been working fine (and still does once it has booted), but all of a sudden has been breaking during deploys.
In short, what used to be a 30-second boot-up now takes 8 minutes with our Django app. Once it’s booted, everything is good and snappy. And that makes sense when looking at the CPU Quota Balance and Throttling graph:
The period of throttling you see is from the machine booting to the Django app loading.
Is this an intentional consequence of the changes? It seems extremely unhelpful that I’ll need to scale up a machine to manage minimal load just so that it doesn’t take 8 minutes to boot.
I also think the way these changes were communicated was fairly poor.
When we say this change won’t be noticeable to the overwhelming majority of our users, we mean it: the guts of this change have been rolled out for weeks, and we’ve been tracking quota usage. A tiny fraction of organizations on Fly.io, which have been benefiting from our lax scheduling, will lose some performance from this change. We’re reaching out to them directly.
Do you still believe this to be accurate? I didn’t think we were particularly out of the ordinary in our usage.
+1, it’d be fair for apps that have saved cpu resources.
@btoews Also, I observe another rough edge regarding slow startup (Node.js app in my case). Throttling seems to take place even after startup and app is unusably slow (for like 10 minutes). If I restart machine right after startup, then even the whole process of the second startup become as fast as before this change and my app becomes usable right away. Are there some kind of rollover of throttling beyond the 80ms time window?
The rationale for this change has been explained previously, at length, and it’s clearly in keeping with industry standards and not really controversial. However, as we’ve seen again and again, you (Fly) seem incapable of communicating proactively about it, with the end result that your customers are angry at you. What is preventing you from sending us email letting us know when QoS-impacting changes are being made to services? I genuinely do not understand why @kurt isn’t running around with his hair on fire right now. Read the thread! Everyone in here is angry about this (whether they leave or not isn’t for me to say), and there are clearly still some implementation details to be ironed out around startup bursting. But instead of communicating with us about it, you’re just letting people simmer in the replies. Please, start proactively communicating with your customers.
I get the idea behind that change, but I have a couple of comments:
First off, it’d be nice to get some “CPU credits” in advance. Indeed our instances are bursting mostly at boot time like many others have pointed out in this thread. With this change we now have to deploy fewer but bigger instances just in order to deploy them; that’s a shame.
Second, I don’t quite understand what we’re seeing in Graphana:
Here we had an instance with 1 vCPU and the baseline at 6.25%. Fair enough.
But after upgrading to 4 vCPU, I’d expect the baseline to be at 50%? It goes to 50% briefly but falls down to 6.25%. Why is that? And then the usage seems to be throttled based on the 6.25%. Does that look correct to you?
We currently provide a small amount of initial balance, enough to run a machine at 100% for a brief 5-second burst after launch. We may consider adjusting this amount based on feedback, but to make it higher we may also need to add additional restrictions (such as limiting the number of restarts) to prevent potential abuse.
The Grafana queries are divided over the number of vCPUs, so the balance percentage will stay constant but the overall utilization percentage will be lower when you scale to more vCPUs. Hope that clarification helps make sense of what you’re seeing in the graphs.