Hardware failure in LHR two of my DBs down for 24 days as a result

Without notification of any kind. As it always has been my experience with failures on your systems.

2024-04-30 15:48:56 UTC Some of your apps in LHR are on a host which has suffered a hardware failure and will be down for an extended period. You are not being charged for the resources on this host. You may await the host’s return to service, or you may spin up new Machines in the region and restore from backups.

2024-04-25 13:09:01 UTC We are performing emergency maintenance on a host some of your apps instances are running on. Apps may be unavailable until the maintenance is completed.

I really can’t get over my head you don’t notify of that to your customers, and I have to find that out casually because my boss tries to use the app and it’s down. Luckily, is not a much used app, that’s why we didn’t find it out earlier.

But this is so unprofessional. I after the experiences I’ve been having with your failures. I’m sorry, but it’s not a service I’m comfortable using, so well be moving all our stuff out.

In fact, we haven’t moved anything more because of that. Its just unreliable. We can’t trust a service with this behavior.

And after this “rant” can anyone help me to recover my database apps?

Tired to trouble shut them with the link provided Troubleshoot apps when a host is unavailable · Fly Docs

But ofc I can’t list any volume snapshots cause the “machine” is down.

2 Likes

From General to Questions / Help

Added machines, postgres

The lack of notification here on fly.io’s part, nothing aside from a notification when logging into the panel is abysmal. Having setup a development environment a few months ago to move away from DigitalOcean, I’m seriously considering moving elsewhere before production is deployed here.

Yes, things should be deployed around multiple machines and AZ’s but for non-production workloads this isn’t really a requirement. To see jobs not running, and come to check what the issue is to see it’s just completely down, a new machine hasn’t been auto spun-up? Unless I’m missing some big announcement, really poor

Email notifications for host issues was announced a couple weeks ago, so you should receive an email for any future issues affecting Machines.

Hey, @wjordan, is there a chance we can get the machines restored after this long?

Also, I deleted the faulty apps, to try to recreate them, tring to get the service up and running. Could that be a problem in the process of “restoring” the machines and volumes if they are restored?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.