fly.io site is currently inaccessible...

ddddevelll · November 26, 2024, 1:44am

Our web service is down
When will it be back to normal?..

doncote · November 26, 2024, 1:45am

Ours is down as well

croberts · November 26, 2024, 1:48am

Our IAD servers are out as well, along with the machine API.

doncote · November 26, 2024, 1:49am

fly.io itself, our machines in several different regions, the cli.

this isn’t good.

elliotdickison · November 26, 2024, 1:50am

We had managed to make it through the current incident unscathed until a few minutes ago but it appears to be worsening. Many (but not all!) of our apps in iad are reporting no machines.

rodolfo · November 26, 2024, 1:50am

Major infrastructure issue. Our API has been down for over 2 hours.

flippyhead · November 26, 2024, 1:52am

Yeah they say “degraded API performance” but I can’t get any of my API calls to work:

| Error: server returned a non-200 status code: 504

And cannot even load fly.io anymore.

doncote · November 26, 2024, 1:52am

Has there been any other communication other than the status page updates?

kyleatcausadix · November 26, 2024, 1:56am

Not that I’m aware, and I don’t think we should expect it for a hot minute—lots of signal to indicate this has ballooned (see what I did there) into a widespread, likely even global, outage.

At least the status page and Discourse sites are up

#hugops

doncote · November 26, 2024, 1:58am

our apps appear to be up again, cli is working, fly.io is accessible.

ACPixel · November 26, 2024, 2:24am

Things seem to be getting better, though I still have no CLI access at all which is making it really hard to restore our services

#hugops for sure, this musta gotten way bigger than they excpected

larry · November 26, 2024, 2:46am

Still can’t redeploy via CI.

kyleatcausadix · November 26, 2024, 2:53am

Yep, we’re in the same boat. Deploys are still 504-ing (not Depot), and all attempts to roll existing instances are also 504-ing. Not out of the woods yet.

bobbyhiddn · November 26, 2024, 2:59am

Down for me as well as of 7:59 MST. 504’ing after waiting for the depot. Sucks as I had just gotten a solution ready to test apparently right as it went down. Always how it goes lol

larry · November 26, 2024, 3:06am

@bobbyhiddn
Has similar incidents happened before? In my view, an outage lasting several hours is a very serious incident. If such things happen frequently, we may need to seriously consider migrating away from Fly.io. I really like the convenience that Fly.io brings, but stability is always the highest priority as we operate in the financial payment industry.

bobbyhiddn · November 26, 2024, 3:14am

So far, no. I’ve been using it for a few months now and the convenience has been a huge value add as it let’s me black box most of the deployment stream while testing. This has been my first major incident with the platform. So far, none of my products are making money, just some development ideas, so it’s not a huge deal for me, but if they were, I would be concerned.

flippyhead · November 26, 2024, 3:15am

I am surprised they haven’t bothered to comment here, though.

ACPixel · November 26, 2024, 3:17am

We’ve been here for just about a year and a half. For sure not the first major outage. This is however one of the longest-lasting ones that I’ve personally seen. I think I’ve experienced about 4-5 other large outages with Fly, most lasting less than an hour with only one that I can remember lasting more than 2. This is by far the worst one I’ve experienced and is causing a lot of issues on our end. Definitely feel for the team, scaling server infra from scratch like this is a massive undertaking.

ACPixel · November 26, 2024, 3:17am

Take it as a sign that they’re all-in on trying to fix it. I’m sure they will reply once they aren’t all hands on deck

ryush00 · November 26, 2024, 3:31am

We are 30 minutes away from our school project presentation, and we are very flustered.