Emergency maintenance for over 24 hours?

ScottLovegrove · October 16, 2024, 8:36am

Hi,

I got an email yesterday to say there was some emergency maintenance going on with one of my apps (turns out the database), I figured “ok, fine, nothing to worry about”, but now, after 24 hours, it’s still saying this:

To make matters worse, some of the suggestions it gives (like cloning) won’t work as there is an active incident going on with my app, which is less than helpful.

What other options do I have here? I can’t just delete the database and recreate it.

Elder · October 16, 2024, 8:46am

Hi @ScottLovegrove

Is your database one instance only?

From my experience it can take quite long, it might mean hardware failure and a need of physical access, parts being fixed/replaced. That can takes days in some cases.

ScottLovegrove · October 16, 2024, 8:59am

Yes it is, it’s only a hobby project, but I have space for one more machine, so when it does eventually come back up, I’ll see if i can increase the instance count.

Ugh, not ideal, but I guess it’s a valid situation, would be good to know that that’s the issue, at least Fly could provide an update telling me it’s a hardware issue.

Elder · October 16, 2024, 9:23am

They do volume snapshots once per day, check if you can restore from the most recent snapshot. Fly recommends 3 instances for a prod setup, because in case of SSD failure the data since the last snapshot is lost .

For a single server setup you could trigger backups a few times per day. Some kind of a cron job that runs a backup script

ScottLovegrove · October 16, 2024, 9:39am

Ok, so I can see a bunch of snapshots. Is the suggestion here to create a new database, then try and restore one of the snapshots to the new database?

ScottLovegrove · October 16, 2024, 9:41am

Ah, I can see in the dashboard a “How to use” button on the snapshots with the commands on how to create a new database from that snapshot. I’ll give that a try, thanks for the pointer.

ScottLovegrove · October 16, 2024, 11:01am

So I used the command to create a new postgres database using the snapshot, but the command to create ended up quitting with “context deadline exceeded”, and when I go to the new database in the dashboard, it’s only completed 1/3 checks, with the following error:

500 Internal Server Error failed to connect to repmgr node: failed to connect to `host=fdaa:2:7bf2:a7b:328:90fb:e9b7:2 user=repmgr database=repmgr`: server error (FATAL: database "repmgr" does not exist (SQLSTATE 3D000))

It also won’t let me do a fly postgres attach command, that fails with an error of no active leader found, presumably because of the above error.

I’ve tried stopping the new machine and starting it back up again, but I’m getting the same 500 error, so now I appear stuck as to what to do next.

ScottLovegrove · October 16, 2024, 1:32pm

Managed to get the snapshot loaded, had to use 15.2 of the postgres image, rather than 15.3

mayailurus · October 16, 2024, 4:26pm

Added postgres, volumes

system · October 23, 2024, 4:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Emergency Maintenance Affecting My App - Can I Migrate or Do I Need to Wait? Questions / Help lhr , postgres , machines , databases , volumes	3	113	May 6, 2025
Postgres not being backed up when expected Questions / Help postgres	6	368	November 27, 2022
Postgres db snapshots on hobby account postgres , volumes	4	209	April 22, 2024
How to fix postgres on pending state? postgres	7	756	October 25, 2022
I deleted the machine for my database. What can I do now? postgres , troubleshooting , volumes	8	60	April 2, 2025

Emergency maintenance for over 24 hours?

Related topics