panic: FLY_CONSUL_URL or CONSUL_URL are required with postgres-ha deploy

juliancarrivick · February 6, 2024, 7:27am

I’ve been able to partially fix this with the following steps:

fly scale count 0. Note, because of the aforementioned lack of volumes being mounted this lost everything, but I had the data backed up via wal-g
fly consul attach (see here)
fly scale count 1
Volume was mounted here (but empty), then I restored from the wal-g backup.

At this point I was stuck in a boot loop where it was trying to update the database with the current OPERATOR_PASSWORD but failing because the database was in a readonly state. It also was identifying as a replica in the Fly UI. I fixed this by forcibly promoting (thanks to this SO answer):

su stolon
/usr/lib/postgresql/14/bin/pg_ctl promote -D /data/postgres

At this point I had a working leader, but no redundancy, so I attempted to fly scale count 2, but the replica would again bootloop checking stolon. The DB doesn’t seem to be coming up or something because

export $(cat /data/.env | xargs)
stolonctl status

would give me something like the following:

=== Keepers ===

UID             HEALTHY PG LISTENADDRESS                        PG HEALTHY      PG WANTEDGENERATION     PG CURRENTGENERATION
232f9d636332    true    fdaa:0:47b5:a7b:232:f6de:cf8c:2:5433    true            1                       0
233582125d22    false   (no db assigned)        false   0       0

While I am getting constant errors in the monitoring console:

 2024-02-06T06:56:19.675 app[6e824532a22108] syd [info] exporter | INFO[0046] Established new database connection to "fdaa:0:47b5:a7b:233:5821:25d2:2:5433". source="postgres_exporter.go:970"

2024-02-06T06:56:20.141 app[6e824532a22108] syd [info] checking stolon status

2024-02-06T06:56:20.676 app[6e824532a22108] syd [info] exporter | ERRO[0047] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:47b5:a7b:233:5821:25d2:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:47b5:a7b:233:5821:25d2:2]:5433: connect: connection refused source="postgres_exporter.go:1658"

2024-02-06T06:56:21.141 app[6e824532a22108] syd [info] checking stolon status

2024-02-06T06:56:22.142 app[6e824532a22108] syd [info] checking stolon status

However I can connect directly to the DB if I ssh into the failing machine and run psql, plus the password on the leader works, so there is some synchronisation going on. I will probably give up on this sometime soon, leaving a single leader until I am ready to move to postgres-flex. However creating that is giving me a 504 at the moment - but that’s another issue.

Topic		Replies	Views
Postgres-ha launch is failing due to consul_url not being set Build debugging	2	551	April 11, 2023
Postgres app not working anymore after hardware damage Questions / Help postgres , volumes	2	66	August 25, 2024
Postgres application fails to start Questions / Help postgres	9	1057	September 16, 2023
Postgres database unavailable in two apps Questions / Help postgres	5	383	June 2, 2022
Postgres cannot restart Questions / Help postgres , volumes	13	136	October 3, 2024

panic: FLY_CONSUL_URL or CONSUL_URL are required with postgres-ha deploy

Related topics