Production postgres database down

tremblay · August 14, 2024, 11:49pm

I’m having a big issue right now with my production database.
I can’t scale down or scale up, the database is currently unreachable. I figured I would try to redeploy the same image, and I received this message:

Error: failed to update machine configuration for ID_HERE [app]: machine ‘ID_HERE’ requires manual intervention, it can’t be automatically replaced because its volume ‘vol_ID_HERE’ is on an unreachable host

Not quite sure what I can do. No configurations were changed in the recent or not-so-recent past. It just died.

tremblay · August 14, 2024, 11:54pm

2024-08-14T23:52:42Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:42.898Z WARN cmd/sentinel.go:276 no keeper info available {“db”: “ID”, “keeper”: “OTHER ID”}

2024-08-14T23:52:42Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:42.900Z ERROR cmd/sentinel.go:1018 no eligible masters

2024-08-14T23:52:48Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:48.098Z WARN cmd/sentinel.go:276 no keeper info available {“db”: “ID”, “keeper”: “OTHER ID”}

2024-08-14T23:52:48Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:48.102Z ERROR cmd/sentinel.go:1018 no eligible masters

2024-08-14T23:52:53Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:53.364Z WARN cmd/sentinel.go:276 no keeper info available {“db”: “ID”, “keeper”: “OTHER ID”}

2024-08-14T23:52:53Z app[app id] yyz [info]sentinel | 2024-08-14T23:52:53.366Z ERROR cmd/sentinel.go:1018 no eligible masters

mayailurus · August 15, 2024, 12:26am

Hi… It sounds like the physical host that that Machine is on has failed. There is more context on this situation in a recent post:

Have you been able to check your Fly.io dashboard recently? Typically, a notification of this kind of disruption appears there.

mayailurus · August 15, 2024, 12:26am

Added postgres, volumes

kurt · August 15, 2024, 12:45am

I looked at your DB, and it appears it was created as a development postgres with only one machine. These will go down when the hardware the machine is on fails, and come back up when we recover the hardware.

Right now you have two choices:

You can wait for the host to come back up
You can restore the latest backup to a new database

Either way, if you need max uptime you should select one of the production grade postgres clusters. Production grade PG clusters are high availability, if any one of the members fails it will elect a healthy one as primary.

Your existing DB can upgrade once it becomes available again.

tremblay · August 15, 2024, 12:58am

Yeah that was clearly my mistake when creating it. Is there a way I can upgrade it with a fly command? Or do I need to chain a bunch of stuff together?

I absolutely want to maximize uptime on this (not looking forward to tomorrow morning…)

Do we have any idea what kind of timeline we are looking at? I don’t think I can afford to go back to the latest backup as we have operations that would need to be redone in a specific order (let’s just call it a nightmare scenario on my end)

Thanks,

Eric

tremblay · August 15, 2024, 1:27am

Yeah thanks – the host indeed did go down (the notification popped up maybe 2 minutes after I created this post and reached out via email)

Not much is workable, I mean if I absolutely HAVE to I can recover from a snapshot, but at this point i’d rather take my licks and have a little bit of downtime but not lose a bunch of data in the process. Hopefully the host is operational soon

travtarr · August 15, 2024, 5:39am

As far as I’m aware, you’ll need to use individual commands.
Once the original host is back up as Kurt mentioned, you’ll need to do the following:

# get volume id
flyctl vol ls -a <pg_app_name>

# use above volume id here
flyctl vol fork <volume-id> -a <pg_app_name>

# repeat above fork command assuming you are setting up the 3 machines needed for repmgr

flyctl scale count 3 -a <pg_app_name>

If you need to increase the size of the machines, it’s easier to do it before you scale the number of machines since the scaling operation will copy the config.

# get machine's id
flyctl m ls -a <pg_app_name>

# these are just example numbers, you'll want to figure these out
flyctl m update <machine-id> --vm-cpu 2 --vm-mem 2048

If you update the memory, you’ll want to adjust the shared-buffers config to account for it. This defaults to 25% of the available memory when it was originally launched (just note this config is in 8kB units). Assuming you set it to 2048 as the above command, you would want to set it as 65536 (8kB * 65536 = 2048MB * 0.25)

flyctl pg update -a <pg_app_name> --shared-buffers 65536

system · August 22, 2024, 5:40am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Postgres down after machine memory update Questions / Help postgres	26	802	May 8, 2023
Database is down. Fly dashboard is down Questions / Help postgres	11	282	May 31, 2024
How to recover a fly Postgres that is on host in maintenance? postgres , machines , volumes	6	43	March 18, 2025
How to convert your not-free Postgres to free Postgres	11	6569	December 1, 2023
Can I "downgrade" my PostgreSQL app? Questions / Help postgres	6	112	September 9, 2024

Production postgres database down

Related topics