I’m facing a serious service interruption for my app muradi. Over 7 hours ago, this status message appeared:
“We are performing emergency maintenance on a host some of your app’s instances are running on. Apps may be unavailable until the maintenance is completed.”
Unfortunately, the issue is still ongoing, and the app remains completely inaccessible.
Here’s what I’ve tried so far:
Ran fly deploy – no change.
Attempted to scale/redeploy – still no resolution.
The app has an attached volume, which is currently tied to what seems to be a stuck machine on the affected host.
My Questions:
What is the current status of the maintenance?
Can I migrate my app and attached volume away from the affected host?
Is there any way Fly.io can help with restoring, detaching, or restarting the volume/machine?
This extended downtime is impacting my users, and I’m looking for a way to get the app back online as soon as possible.
Any help from the community or Fly.io staff would be greatly appreciated!
When i run: fly incidents hosts list
Host Issues count: 1
ID | MESSAGE | STARTED AT | LAST UPDATED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
51w9vn5mjwo3qedj | We are performing emergency maintenance on a host some of your apps instances are running on. Apps may be unavailable until the maintenance is completed. | 2025-05-20 09:02:38 +0000 UTC | 2025-05-20 09:02:38 +0000 UTC
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Error: failed to update machine configuration for e286e732a9ed28 [app]: machine 'e286e732a9ed28' requires manual intervention, it can't be automatically replaced because its volume 'vol_4qp3wn8e2nn6w7wv' is on an unreachable host
These repairs can take a very long time—and may end with the volume being lost permanently. The multiple places to write data is exactly what gives you durability on this platform, although no one ever doubts the inconvenience of doing so!
My overall impression is that you really want to sign up for Fly Support, if you don’t have that already. (That’s what I would do, if I had irreplaceable data here on Fly.io.)
I do wonder if Fly’s host reliability level is not where I think it should be. That said, @mayailurus is quite right; running a cluster of hosts is the only way you will get a reliable storage service.
Your comment suggests to me that you’re expecting volume writes in a cluster to have to be done manually, which is perplexing; I would have thought you would just need some software to automate that.
Of course, as well as cluster syncing, you should have backups as well. Do you have them?