Unable to recover Postgres backup

I’m following the instructions here: Backup, Restores, & Snapshots · Fly Docs

Here are the logs from the terminal:

fly postgres create --snapshot-id vs_gR22ajNNzv4OoS1jVmA
? Choose an app name (leave blank to generate one): db-recovery
? Select Organization: ORG
Some regions require a paid plan (bom, fra, maa).
See https://fly.io/plans to set up a plan.

? Select region: Warsaw, Poland (waw)
? Select configuration: Specify custom configuration
? Initial cluster size - Specify at least 3 for HA 2
? Select VM size: shared-cpu-1x - CPU Kind: Shared, vCPUs: 1 Memory: 256MB
? Volume size 1
Creating postgres cluster in organization ORG
Creating app...
Setting secrets on app db-recovery...
Restoring 1 of 2 machines with image flyio/postgres-flex:15.3@sha256:e5882c1841195860fb002e4eebfb84b47afbf193e0c7bd739dfd1056ef7c6b62
Waiting for machine to start...
Machine 5683622f52e98e is created
Restoring 2 of 2 machines with image flyio/postgres-flex:15.3@sha256:e5882c1841195860fb002e4eebfb84b47afbf193e0c7bd739dfd1056ef7c6b62
Waiting for machine to start...
Machine 6e82de57a24308 is created
==> Monitoring health checks
  Waiting for 5683622f52e98e to become healthy (started, 1/3)
  Waiting for 6e82de57a24308 to become healthy (started, 3/3)

Postgres cluster db-recovery created
  Username:    postgres
  Password:    CIwRAlYnVbf5Yiv
  Hostname:    db-recovery.internal
  Flycast:     fdaa:2:6ec2:0:1::9
  Proxy port:  5432
  Postgres port:  5433
  Connection string: postgres://postgres:XXX@db-recovery.flycast:5432

Save your credentials in a secure place -- you won't be able to see them again!

Connect to postgres
Any app within the ORG organization can connect to this Postgres using the above connection string

Now that you've set up Postgres, here's what you need to understand: https://fly.io/docs/postgres/getting-started/what-you-should-know/

I think the first sign of some problem is this part of the logs:

  Waiting for 5683622f52e98e to become healthy (started, 1/3)
  Waiting for 6e82de57a24308 to become healthy (started, 3/3)

So after it’s done creating a new app, I’m running fly status -a db-recovery:

ID            	STATE  	ROLE   	REGION	CHECKS                        	IMAGE                             	CREATED             	UPDATED
6e82de57a24308	started	primary	waw   	3 total, 3 passing            	flyio/postgres-flex:15.3 (v0.0.43)	2023-09-04T14:19:09Z	2023-09-04T14:19:24Z	
5683622f52e98e	started	error  	waw   	3 total, 1 passing, 2 critical	flyio/postgres-flex:15.3 (v0.0.43)	2023-09-04T14:18:26Z	2023-09-04T14:19:01Z

And of course when I attach Postgres app to my core app it doesn’t work. I’ve tried 3 times with different snapshots and machines but the same story repeats. What’s going on?

I’ve changed some names but the snapshot/machine IDs are true IDs. I won’t remove this app for a day or so just so that someone from Fly.io could take a look at it but also I don’t want to keep it for long time to not increase cost unnecessarily. I’m just doing recovery tests before the app release and the backup recovery has to work.

And here are logs from the db-recovery app:

2023-09-04T14:37:45.646 app[5683622f52e98e] waw [info] repmgrd | Running...

2023-09-04T14:37:45.674 app[5683622f52e98e] waw [info] repmgrd | [2023-09-04 14:37:45] [NOTICE] repmgrd (repmgrd 5.3.3) starting up

2023-09-04T14:37:45.674 app[5683622f52e98e] waw [info] repmgrd | [2023-09-04 14:37:45] [INFO] connecting to database "host=fdaa:2:6ec2:a7b:84:5f2e:775f:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5"

2023-09-04T14:37:45.677 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:45.676 UTC [10812] FATAL: database "repmgr" does not exist

2023-09-04T14:37:45.677 app[5683622f52e98e] waw [info] repmgrd | [2023-09-04 14:37:45] [ERROR] connection to database failed

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | [2023-09-04 14:37:45] [DETAIL]

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | connection to server at "fdaa:2:6ec2:a7b:84:5f2e:775f:2", port 5433 failed: FATAL: database "repmgr" does not exist

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd |

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | [2023-09-04 14:37:45] [DETAIL] attempted to connect using:

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | user=repmgr connect_timeout=5 dbname=repmgr host=fdaa:2:6ec2:a7b:84:5f2e:775f:2 port=5433 fallback_application_name=repmgr options=-csearch_path=

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | exit status 6

2023-09-04T14:37:45.678 app[5683622f52e98e] waw [info] repmgrd | restarting in 5s [attempt 224]

2023-09-04T14:37:46.047 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.045 UTC [10814] FATAL: database "repmgr" does not exist

2023-09-04T14:37:46.171 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.170 UTC [10816] FATAL: database "repmgr" does not exist

2023-09-04T14:37:46.486 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.486 UTC [10818] WARNING: database "postgres" has a collation version mismatch

2023-09-04T14:37:46.486 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.486 UTC [10818] DETAIL: The database was created using collation version 2.31, but the operating system provides version 2.36.

2023-09-04T14:37:46.486 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.486 UTC [10818] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

2023-09-04T14:37:46.516 app[5683622f52e98e] waw [info] Registering standby

2023-09-04T14:37:46.550 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:46.549 UTC [10821] FATAL: database "repmgr" does not exist

2023-09-04T14:37:46.552 app[5683622f52e98e] waw [info] failed post-init: failed to register new standby: failed to register standby: exit status 1. Retrying...

2023-09-04T14:37:47.421 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:47.420 UTC [10823] FATAL: database "repmgr" does not exist

2023-09-04T14:37:47.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:47.484 UTC [10825] WARNING: database "postgres" has a collation version mismatch

2023-09-04T14:37:47.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:47.484 UTC [10825] DETAIL: The database was created using collation version 2.31, but the operating system provides version 2.36.

2023-09-04T14:37:47.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:47.484 UTC [10825] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

2023-09-04T14:37:47.516 app[5683622f52e98e] waw [info] Registering standby

2023-09-04T14:37:47.548 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:47.546 UTC [10828] FATAL: database "repmgr" does not exist

2023-09-04T14:37:47.550 app[5683622f52e98e] waw [info] failed post-init: failed to register new standby: failed to register standby: exit status 1. Retrying...

2023-09-04T14:37:48.052 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.051 UTC [10830] FATAL: database "repmgr" does not exist

2023-09-04T14:37:48.177 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.176 UTC [10832] FATAL: database "repmgr" does not exist

2023-09-04T14:37:48.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.484 UTC [10834] WARNING: database "postgres" has a collation version mismatch

2023-09-04T14:37:48.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.484 UTC [10834] DETAIL: The database was created using collation version 2.31, but the operating system provides version 2.36.

2023-09-04T14:37:48.484 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.484 UTC [10834] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

2023-09-04T14:37:48.519 app[5683622f52e98e] waw [info] Registering standby

2023-09-04T14:37:48.552 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:48.551 UTC [10837] FATAL: database "repmgr" does not exist

2023-09-04T14:37:48.554 app[5683622f52e98e] waw [info] failed post-init: failed to register new standby: failed to register standby: exit status 1. Retrying...

2023-09-04T14:37:49.428 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:49.427 UTC [10839] FATAL: database "repmgr" does not exist

2023-09-04T14:37:49.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:49.484 UTC [10841] WARNING: database "postgres" has a collation version mismatch

2023-09-04T14:37:49.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:49.484 UTC [10841] DETAIL: The database was created using collation version 2.31, but the operating system provides version 2.36.

2023-09-04T14:37:49.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:49.484 UTC [10841] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

2023-09-04T14:37:49.519 app[5683622f52e98e] waw [info] Registering standby

2023-09-04T14:37:49.553 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:49.552 UTC [10844] FATAL: database "repmgr" does not exist

2023-09-04T14:37:49.555 app[5683622f52e98e] waw [info] failed post-init: failed to register new standby: failed to register standby: exit status 1. Retrying...

2023-09-04T14:37:50.058 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.057 UTC [10846] FATAL: database "repmgr" does not exist

2023-09-04T14:37:50.183 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.182 UTC [10848] FATAL: database "repmgr" does not exist

2023-09-04T14:37:50.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.485 UTC [10850] WARNING: database "postgres" has a collation version mismatch

2023-09-04T14:37:50.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.485 UTC [10850] DETAIL: The database was created using collation version 2.31, but the operating system provides version 2.36.

2023-09-04T14:37:50.485 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.485 UTC [10850] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE postgres REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

2023-09-04T14:37:50.515 app[5683622f52e98e] waw [info] Registering standby

2023-09-04T14:37:50.548 app[5683622f52e98e] waw [info] postgres | 2023-09-04 14:37:50.546 UTC [10853] FATAL: database "repmgr" does not exist

2023-09-04T14:37:50.550 app[5683622f52e98e] waw [info] failed post-init: failed to register new standby: failed to register standby: exit status 1. Retrying...

Also when doing pg_dumb I get the following message in the dump file:

WARNING:  database "db" has a collation version mismatch
DETAIL:  The database was created using collation version 2.31, but the operating system provides version 2.36.
HINT:  Rebuild all objects in this database that use the default collation and run ALTER DATABASE raczekteam REFRESH COLLATION VERSION, or build PostgreSQL with the right library version.

I’ve also tried to update collation version following those instructions: postgresql - Collation version mismatch - Database Administrators Stack Exchange

And this error went away when doing sql dump. So I guess it might also fix the original issue. But I have to wait until tomorrow for the snapshot to be created. Is it possible to create snapshot on demand?

You need to start the new postgres cluster with the same image as the old one fly postgres create --snapshot-id vs_gR22ajNNzv4OoS1jVmA --image-ref flyio/postgres-flex:15.2

I assume the 15.2 image, since the error looks something I just had also, trying to do the same.

1 Like

Hmm interesting, it actually made it work. Thank you! I will still have to test it with 15.3 after a new snapshot is created today.

Ok it’s really weird. My current Postgres app is using flyio/postgres-flex:15.3 (v0.0.43) image. When I use the same version to create app from the snapshot then it doesn’t work but if I use v15.2 then it works. What’s going on?

fly postgres create --snapshot-id vs_gR22ajNNzv4OoS1jVmA --image-ref flyio/postgres-flex:15.3

fly status -a app-db-recovery
ID            	STATE  	ROLE   	REGION	CHECKS                        	IMAGE                             	CREATED             	UPDATED
6e82e24f2e7387	started	error  	waw   	3 total, 1 passing, 2 critical	flyio/postgres-flex:15.3 (v0.0.43)	2023-09-06T08:43:10Z	2023-09-06T08:43:35Z	
3287966b599d85	started	primary	waw   	3 total, 3 passing            	flyio/postgres-flex:15.3 (v0.0.43)	2023-09-06T08:43:40Z	2023-09-06T08:43:54Z	

fly status -a app-db
ID            	STATE  	ROLE   	REGION	CHECKS            	IMAGE                             	CREATED             	UPDATED
4d8979df452587	started	primary	waw   	3 total, 3 passing	flyio/postgres-flex:15.3 (v0.0.43)	2023-06-30T13:14:38Z	2023-09-04T15:14:07Z	
17811359ae2289	started	replica	waw   	3 total, 3 passing	flyio/postgres-flex:15.3 (v0.0.43)	2023-06-30T13:15:06Z	2023-09-04T15:13:30Z

I can see that this new image is just 5 days old so originally it was definitely created with version 15.2 but why does it show that it was created with version 15.2 then.

EDIT: Btw, From what I remember I’ve also update version to 15.3 with the command flyctl image update. I’ve just forgot that I did so. But still after updating it should be possible to use the newest image version.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.