Postgres troubles

Hi, i have problem with my fly postgres instance.

500 Internal Server Error failed to connect to local node: failed to connect to host user=repmgr database=repmgr`: server error (FATAL: the database system is in recovery mode (SQLSTATE 57P03)

Hey there, could you provide more information?

fly status --app <pg-app-name>

postgres status

Hi Quis,

Looking at your app from the backend, it looks like you originally had a cluster of three nodes, right? And then you destroyed two?

It looks like however you did it, the cluster leader did not realize that it now forms a cluster of one. Try adding a new Machine with fly m clone to restore your cluster to health.

We did delete two instances of the database cluster, but everything was fine for about a week.

fly m clone doesn’t help. With attach volume too.

It’s not immediately clear to me what’s going on but I am not a Fly PG expert. My colleague Shaun above is, but it’s the weekend so I’m not sure when he might get back to you. If you want to get back in service ASAP, I would recommend the following.

Create a new cluster from the existing one.

fly pg create --initial-cluster-size 1 --fork-from quispostgres -n <NEW_PG_APP_NAME>

This will create a new cluster with the data from your current cluster. If by chance that new cluster doesn’t come up, then there’s some sort of data corruption issue, so you should restore from a snapshot made prior to this occurring:

fly pg create fly pg create --initial-cluster-size 1 --snapshot-id <SNAPSHOT_ID> -n <NEW_PG_APP_NAME>

Then, for each Fly App that uses Fly PG on the backend, do the following (note that “DATABASE_URL” is literal; everything else in caps and tagged should be replaced):

# Remove the old database config from app
fly secrets unset -a <YOUR_FRONTEND_APP> --stage DATABASE_URL
# Add the config for the new database
fly pg attach -a <YOUR_FRONTEND_APP> --database-user <NEW_DB_USER_NAME> --database-name <OLD_DB_NAME> <NEW_PG_APP_NAME>
fly secrets deploy -a <YOUR_FRONTEND_APP>

Yes, I have already done this, but creating a new database cluster is only possible with a snapshot from 2 days ago. The same error occurs with a more recent snapshot.

At the moment we have lost data for 2 days :frowning:
Whenever possible, we need to know at least the reason for this decline. To avoid it in the future.

This is happening to me too.

Hi, check pls my case

@quisprof @frsatneedle Mind sending me the name of your PG apps?

Problems started in the “quispostgres” application

I took a look at your setup and it appears your instance is in recovery mode. This state is reserved for replicas that fall out of sync with the Primary. I also noticed you have quite a few lingering Volumes tied to this App. Did you accidently remove your Primary instance and or attach the wrong Volume to your machine?

Initially it was a cluster of 3 instances. But in order to optimize resources, I left only one machine and everything was fine for about a week, and then suddenly it wasn’t.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.