fly.io site is currently inaccessible...

thewilkybarkid1 · November 26, 2024, 3:27pm

I’m now thinking that the problems we’ve experienced aren’t related the downtime, but that CPU quotas have been turned on:

We have an HTTP cache stored in a volume, and run a cleanup process on startup to prune old entries (cacache’s verify). This was known to be a bit intensive, but only lasted for a minute or two.

Hidden in that thread is:

So I think that on the deploy the CPU quota balance is reset to 0, the intensive process started and was immediately throttled (which caused the HTTP server running on the instance to crawl to a halt). Once the throttled task eventually completed, the throttling was lifted allow the HTTP server to run as expected.

I’ve not been able to recreate this with a machine restart as the balance is kept (it consumes some of it, but doesn’t get near 0). I’ll have to confirm by turning off the intensive task (or at least delay it starting), and see how that deploy goes…

Edit: testing with a 10 minute delay

Edit 2: the deploy happened, the CPU quota balance dropped but the HTTP server handles requests fine. The balance is now replenishing:

Edit 3: the intensive task ran successfully with enough balance in place, though there was some ‘stealing’:

Topic		Replies	Views
Something went wrong? Questions / Help	42	1431	September 22, 2022
Service unavailable? Unable to deploy django app or login	18	549	September 16, 2023
Fly API down?	1	332	March 28, 2022
Fly.io apps down in production	3	324	October 17, 2022
Fly.io machine is down again - another incident? builders	15	339	November 5, 2024

fly.io site is currently inaccessible...

Related topics