fly.io site is currently inaccessible...

al0xd · November 26, 2024, 3:42am

This is absolutely frustrating! It’s been over 2 hours, and our application is completely down. The system keeps returning 504 Gateway Timeout errors. We can’t scale machines, and deployments are entirely non-functional. The flyctl command is also completely unresponsive, as if it’s “dead.”

This situation is severely impacting our business. We urgently need assistance from the Fly team to resolve this issue immediately.

Extremely disappointed with the current state of service!

hodoli · November 26, 2024, 3:43am

+1
still facing 10+ upset VIP customers

Direwolf · November 26, 2024, 3:45am

I love some of the cool features of Fly.io but this has been one to many production outages. I’ll use it for dev / low impact apps going forward but I gotta move my production load off of here after this.

thunderbolt.sanchez · November 26, 2024, 3:45am

Deploy/Image API calls not working. 2 apps, one cannot load machines in the UI, the other, I cannot stop the machine. 1 app has been in ‘Deploying’ state for a couple/few hours. During this time, my ‘Current month so far’ $$ number has ticked up albeit not a big number.

Need to refresh images and I can see the ‘logs’ API working via CLI (Grafana isn’t working for me yet). The apps are getting traffic but I can’t do anything about it (stop/refresh/etc.).

hodoli · November 26, 2024, 3:46am

same. so upsetting and frustrating.
We’ll move on to another Cloud after this…

fredwu · November 26, 2024, 4:34am

What’s infuriating to me is the status page calling this “degraded API performance”, how is a major outage like this, when many of our sites were completely offline, simply classified as “degraded API performance”?

thunderbolt.sanchez · November 26, 2024, 4:41am

What I don’t like: last 2 updates on the incident have quite optimistic verbiage. Last one was maybe an hour ago and I’m still in the same unusable state as my morning (APAC region).

codedyne · November 26, 2024, 4:43am

Same issue still with Error: server returned a non-200 status code: 504.

Honestly surprising that even when paying for the service there is no way to contact support without paying at a minimum an extra $30 a month…for only help during business hours

ACPixel · November 26, 2024, 4:54am

At this point, it really is starting to get a bit ridiculous

kyleatcausadix · November 26, 2024, 5:10am

Right there with you.

Incidents are tough, no doubt, and folks deserve space and our grace to tame fires. That said, this wall of opacity and status page marketing spin ain’t it. Just sitting on our hands over here smashing return on flyctl and spamming the refresh button waiting for any signal that things are going to change.

ACPixel · November 26, 2024, 5:12am

I agree, I appreciate that they are likely all working on trying to get this solved, but not having any meaningful updates, and the fact that the few updates we have gotten seem to not actually be factual, makes this a really tough situation.

khuezy · November 26, 2024, 5:17am

Systems issues I can tolerate, it’s the lack of communication and transparency that’s bothersome. Just hire 1 community manager to be active on this discourse and discord man…

mauvia_m · November 26, 2024, 5:26am

Can’t build! I’m getting this, “Error: failed to fetch an image or build from source: failed to list volumes: context deadline exceeded”. It 504s other times.

What region is everybody in? My machine is in the Illinois location I think.

ACPixel · November 26, 2024, 5:26am

Yeah, just about anything related to deploy/cli/api is currently down.

thunderbolt.sanchez · November 26, 2024, 5:27am

Singapore & Sydney

hodoli · November 26, 2024, 5:55am

Still experiencing the issue. Our prod app has been unusable.
Anyone has update?

Region: NRT (Tokyo)

ACPixel · November 26, 2024, 5:56am

Unfortunately no real updates beyond their status page, which says they are scaling up their systems to handle the increased load (presumably of everyone trying to get their apps back online)

simoncocking · November 26, 2024, 6:06am

It’s not just deploys that are failing - we lean heavily on FLAME and basically every call fails with a 503 Service Unavailable response because instances cannot be started. It’s been like this now for ~9 hours, basically our entire business day.

Luckily I was able to patch around the problem early on, but since then I’ve been unable to fly ssh into our instances either. Not Good.

ACPixel · November 26, 2024, 6:07am

Yeah, seems like a LOT of stuff broke I also need to SSH into a machine atm and have been unable to for the last ~5 hours

berona · November 26, 2024, 6:08am

the same here. Machines have been unresponsive for over 8 hours now and I can’t deploy or scale to another region