Yep. The CPU quotas and bursting no longer apply to performance
vCPUs.
This is great to hear, thanks! Definitely simplifies things quite a bit!
Thank you!
Proceeding with the staged rollout, CPU quotas for shared
vCPUs are 25% enabled for the next hour. This should take effect within the next 5 minutes.
Update:
Shared machines running above 76.6% CPU should be seeing charts like this
We had to cut this short due to high system load on some servers that might have been related to enforcing CPU quotas. Hopefully the 37 minutes was enough for folks to observe the impact at a 25% enforcement. We’ll keep folks updated on whether the system load issue will impact the rollout schedule moving forward.
Update: we figured out the load spike and will continue with the rollout as planned. The next step is 25% enabling quotas on Tuesday around the same time.
Thanks. We wrote in earlier but got a negative response, I’ll write in again.
I talked to support and it sounds like we’ve got this worked out with your case.
I’ve had several back and forth with support and still haven’t gotten a definitive answer that this change isn’t going to be applied to us in two days time or when we will be granted grace until. (I’ve requested only until middle of November). The last response seemed to again imply that we only need make small changes and it shouldn’t be a problem.
I’m finding it quite disappointing that the idea that change management requires planning and resources and that unscheduled changes are disruptive does not seem to be understood
This is a critical piece of our infrastructure, impacting thousands of customers as we come up to a critical season for them. Any wobbles will impact their confidence in our platform, so, as we hold our commitment to our customers seriously, and I also have to balance that with my financial responsibilities to the company, I’m not just going to throw some resources at it and see what happens, we need to plan, test, and implement this change methodically.
This shouldn’t be a controversial position, and continuing to push that this is a small change and we should roll with it signals a lack of respect for your customers time.
Hey Chris, we did confirm back on 18 October that we were going to work with you on this. That was reiterated on the 22nd. Apologies that we weren’t able to give firmer timelines or implementation details at the time.
That said, I hope that you found my latest email much less ambiguous!
Sorry if that’s the vibe you got—I was just checking to see if you were aware of the update that perf. Machines will now be excluded; it wasn’t a request to move workloads to them.
Not sure if this is the best place to post this but I’m giving it a try.
I have seen graphs like this for some of our apps where the balance suddenly drops. Is this a bug or some expected behavior I’m just not aware of? Have you made some adjustments to the balances?
Looking at this app, the drops line up with machine updates. CPU quota balances currently don’t carry across machine versions. That’s something we’ll hopefully change in the future though.
Just in case anyone was curious about this. The app was accidentally setting an environment variable GOMEMLIMIT=1
, which causes Go’s runtime to try to limit its memory usage to 1 byte. Hallpass is our SSH server and needs the app’s environment variables in order to include them in shells spawned by users. The GOMEMLIMIT
variable was causing hallpass to run garbage collection constantly to try to meet it’s memory target. This was taking 15% CPU. We addressed this by adding a prefix to app environment variables given to hallpass and then stripping those prefixes for SSH sessions.
when was this fix deployed to the machines? Do we need to redeploy our apps?
We made the change last week. If you are setting Go-related environment variables and are seeing lots of CPU usage from hallpass, you can try redeploying your app.
Continuing with the rollout schedule, we’re enabling quotas at 25% today. Starting now, quotas are enabled in the CDG region. In 30 minutes, we’ll enable them in IAD. 30 minutes later, we’ll enable them in other regions. I’ll update this post as today’s rollout continues.
Update 10:38 MST: Enabled in IAD.
Update 13:03 MST: Enabled everywhere
Now that its 25% enforced, I’m assuming the yellow baseline and blue burst quota will decrease with each rollout of the increased enforcement? And eventually be back at the previous values once its 100% enforced?
Thanks!
That’s correct.
Continuing with the rollout schedule, I’m enabling quotas at 50% for the next hour. I’ll update this post when I’ve switched back to 25%.
Update 12:55 EST: Rolled back to 25%
Quotas are now enabled at 50%
Do you know if these CPU changes affect memory usage at all? I’m seeing more efficient memory usage but I’m not sure if they’re related to anything. Nothing drastic changed in my app land.
These changes shouldn’t have any direct impact on machines’ memory usage.