Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2… complete
psql: error: could not translate host name “app-db.internal” to address: Name or service not known
Thanks for your response. I was in the middle of adding a replica to see if I could resolve my issue that way, somehow.
$ flyctl status -a gameday-math-staging-db (main)
App
Name = gameday-math-staging-db
Owner = gamedaymath
Version = 20
Status = running
Hostname = gameday-math-staging-db.fly.dev
Platform = nomad
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
e3638027 app 20 mia run running (replica) 3 total, 2 passing, 1 critical 0 12m2s ago
e7fba105 app 20 mia run running (leader) 3 total, 3 passing 4 2h55m ago
$ flyctl logs --app gameday-math-staging-db (main)
2022-11-22T20:27:44Z app[e3638027] mia [info] "wal_compression": "on",
2022-11-22T20:27:44Z app[e3638027] mia [info] "wal_keep_size": "0",
2022-11-22T20:27:44Z app[e3638027] mia [info] "work_mem": "64MB"
2022-11-22T20:27:44Z app[e3638027] mia [info] },
2022-11-22T20:27:44Z app[e3638027] mia [info] "maxStandbysPerSender": 50,
2022-11-22T20:27:44Z app[e3638027] mia [info] "deadKeeperRemovalInterval": "1h"
2022-11-22T20:27:44Z app[e3638027] mia [info]}
2022-11-22T20:27:44Z app[e3638027] mia [info]generated new config
2022-11-22T20:27:44Z app[e3638027] mia [info]keeper | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]sentinel | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]exporter | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy | [WARNING] 325/202744 (554) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy | [NOTICE] 325/202744 (554) : New worker #1 (570) forked
2022-11-22T20:27:45Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:47Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:47Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:48Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:49Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:50Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:51Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy | [WARNING] 325/202751 (570) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy | [NOTICE] 325/202751 (570) : haproxy version is 2.2.9-2+deb11u3
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy | [NOTICE] 325/202751 (570) : path to executable is /usr/sbin/haproxy
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy | [ALERT] 325/202751 (570) : backend 'bk_db' has no server available!
2022-11-22T20:27:52Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:53Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:54Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:55Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:56Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper | 2022-11-22 20:27:57.368 UTC [27800] WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | pg_basebackup: checkpoint completed
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | pg_basebackup: write-ahead log start point: 16/AB000028 on timeline 1
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | pg_basebackup: starting background WAL receiver
2022-11-22T20:27:57Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | 0/588428 kB (0%), 0/1 tablespace (/data/postgres/backup_label )
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper | 2022-11-22 20:27:57.502 UTC [27807] ERROR: replication slot "stolon_7063fb99" does not exist
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper | 2022-11-22 20:27:57.502 UTC [27807] STATEMENT: START_REPLICATION SLOT "stolon_7063fb99" 16/AB000000 TIMELINE 1
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper | pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "stolon_7063fb99" does not exist
2022-11-22T20:27:58Z app[e3638027] mia [info]keeper | 12985/588428 kB (2%), 0/1 tablespace (/data/postgres/base/16386/99537 )
2022-11-22T20:27:58Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:59Z app[e3638027] mia [info]keeper | 68392/588428 kB (11%), 0/1 tablespace (/data/postgres/base/16386/90809 )
2022-11-22T20:27:59Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:00Z app[e3638027] mia [info]keeper | 127906/588428 kB (21%), 0/1 tablespace (/data/postgres/base/16386/90783 )
2022-11-22T20:28:00Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:01Z app[e3638027] mia [info]keeper | 201062/588428 kB (34%), 0/1 tablespace (/data/postgres/base/16386/90790 )
2022-11-22T20:28:01Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:02Z app[e3638027] mia [info]keeper | 268608/588428 kB (45%), 0/1 tablespace (/data/postgres/base/16386/90779 )
2022-11-22T20:28:02Z app[e7fba105] mia [info]sentinel | 2022-11-22T20:28:02.049Z WARN cmd/sentinel.go:287 received db state for unexpected db uid {"receivedDB": "", "db": "7063fb99", "keeper": "88370a2"}
2022-11-22T20:28:02Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:02Z app[e3638027] mia [info]keeper is healthy, db is healthy, role: standby
2022-11-22T20:28:03Z app[e3638027] mia [info]keeper | 351742/588428 kB (59%), 0/1 tablespace (/data/postgres/base/16386/90803 )
2022-11-22T20:28:04Z app[e3638027] mia [info]keeper | 411057/588428 kB (69%), 0/1 tablespace (/data/postgres/base/16386/90796 )
2022-11-22T20:28:05Z app[e3638027] mia [info]keeper | 488089/588428 kB (82%), 0/1 tablespace (/data/postgres/base/16386/16920 )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | 582285/588428 kB (98%), 0/1 tablespace (/data/postgres/base/13756/3601_fsm )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:06Z app[e7fba105] mia [info]keeper | 2022-11-22 20:28:06.197 UTC [27800] WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | 588437/588437 kB (100%), 0/1 tablespace (/data/postgres/global/pg_control )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | 588437/588437 kB (100%), 1/1 tablespace
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | pg_basebackup: write-ahead log end point: 16/AB000100
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | pg_basebackup: waiting for background process to finish streaming ...
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | pg_basebackup: error: child process exited with exit code 1
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | pg_basebackup: removing data directory "/data/postgres"
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper | 2022-11-22T20:28:06.329Z ERROR cmd/keeper.go:1364 failed to resync from followed instance {"error": "sync error: exit status 1"}
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | 2022-11-22T20:28:11.359Z ERROR cmd/keeper.go:1109 db failed to initialize or resync
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | pg_basebackup: checkpoint completed
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | pg_basebackup: write-ahead log start point: 16/AD000028 on timeline 1
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | pg_basebackup: starting background WAL receiver
2022-11-22T20:28:11Z app[e7fba105] mia [info]keeper | 2022-11-22 20:28:11.461 UTC [27874] WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper | 0/588428 kB (0%), 0/1 tablespace (/data/postgres/backup_label )
2022-11-22T20:28:12Z app[e3638027] mia [info]keeper | 51816/588428 kB (8%), 0/1 tablespace (/data/postgres/base/16386/90809 )
2022-11-22T20:28:13Z app[e3638027] mia [info]keeper | 167014/588428 kB (28%), 0/1 tablespace (/data/postgres/base/16386/90790 )
2022-11-22T20:28:14Z app[e3638027] mia [info]keeper | 250336/588428 kB (42%), 0/1 tablespace (/data/postgres/base/16386/90779 )
2022-11-22T20:28:15Z app[e3638027] mia [info]keeper | 314525/588428 kB (53%), 0/1 tablespace (/data/postgres/base/16386/103836 )
2022-11-22T20:28:16Z app[e3638027] mia [info]keeper | 410001/588428 kB (69%), 0/1 tablespace (/data/postgres/base/16386/90796 )
2022-11-22T20:28:17Z app[e3638027] mia [info]keeper | 479170/588428 kB (81%), 0/1 tablespace (/data/postgres/base/16386/99372 )
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 566853/588428 kB (96%), 0/1 tablespace (/data/postgres/base/16386/90762 )
2022-11-22T20:28:18Z app[e7fba105] mia [info]keeper | 2022-11-22 20:28:18.180 UTC [27874] WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | WARNING: skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 588437/588437 kB (100%), 0/1 tablespace (/data/postgres/global/pg_control )
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 588437/588437 kB (100%), 1/1 tablespace
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | pg_basebackup: write-ahead log end point: 16/AD000100
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | pg_basebackup: waiting for background process to finish streaming ...
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | pg_basebackup: syncing data to disk ...
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | pg_basebackup: renaming backup_manifest.tmp to backup_manifest
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | pg_basebackup: base backup completed
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.481 UTC [672] LOG: starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.481 UTC [672] LOG: listening on IPv6 address "fdaa:0:ddd8:a7b:88:3:70a:2", port 5433
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.482 UTC [672] LOG: listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.485 UTC [673] LOG: database system was interrupted; last known up at 2022-11-22 20:28:11 UTC
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.533 UTC [673] LOG: entering standby mode
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.536 UTC [673] LOG: redo starts at 16/AD000028
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.536 UTC [673] LOG: consistent recovery state reached at 16/AD000100
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.537 UTC [672] LOG: database system is ready to accept read-only connections
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper | 2022-11-22 20:28:18.547 UTC [677] LOG: started streaming WAL from primary at 16/AE000000 on timeline 1
The issue seems to be with the private networking.
After running flyctl ssh console -a gameday-math-staging-db
I cannot resolve the internal address provided by DATABASE_URL (DATABASE_URL=postgres://****:****@top2.nearest.of.gameday-math-staging-db.internal:5432/gameday_math_staging)
It looks like the internal address entries for the postgres VMs are missing from DNS. I cannot figure what to run from flyctl to add those DNS entries. It feels like a problem when the VMs are starting that they are not registering with DNS, and I don’t think it’s a problem I can solve.
I am able to use flyctl ssh console and connect to each Postgres VM and then use psql to query the database. So, postgres itself seems to be running fine on the VMs.
But, anytime .internal addresses are involved things fail:
$ flyctl postgres connect -a gameday-math-staging-db
Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2... complete
psql: error: could not translate host name "gameday-math-staging-db.internal" to address: Name or service not known
@jswanner please try running fly restart gameday-math-staging-db. You are probably correct about a VM not getting registered in DNS, a restart will rerun that process (there’s no current way to do that without a restart).
Thanks for your response. Unfortunately, it didn’t work:
$ flyctl postgres restart -a gameday-math-staging-db
Attempting to restart replica(s)
Restarting e3638027-0b73-e230-2a74-f66d5734c934
Performing a failover
Postgres cluster has been successfully restarted!
$ flyctl postgres connect -a gameday-math-staging-db
Error can't get role for fdaa:0:ddd8:a7b:88:3:70a:2: 500: context deadline exceeded
Granted, I had to use flyctl postgres restart and not fly restart:
$ fly restart gameday-math-staging-db
Error this command has been removed. please use `fly apps restart` instead
$ fly apps restart gameday-math-staging-db
Error postgres apps should use `fly pg restart` instead
Update:
Trying to connect to postgres is now back to the earlier error:
$ flyctl postgres connect -a gameday-math-staging-db
Connecting to fdaa:0:ddd8:a7b:88:3:70a:2... complete
psql: error: could not translate host name "gameday-math-staging-db.internal" to address: Name or service not known
Oh this is unrelated to DNS issues, I think. It seems like your PG nodes are restarting. I believe your disk may be filling up. Try resizing it to 5+ GB and see if that gets you unstuck?
I’ve increased the volume size, but doesn’t seem to have helped.
I’m pretty sure that was me that triggered that.
As it stands right now, after restarting the postgres app, the logs fill up with the following message:
2022-11-22T22:30:43.171 app[e7fba105] mia [info] sentinel | 2022-11-22T22:30:43.170Z WARN cmd/sentinel.go:276 no keeper info available {"db": "b1c0ee90", "keeper": "88370a2"}
The following is an excerpt of the logs prior to that:
2022-11-22T22:32:04Z app[e7fba105] mia [info]exporter | INFO[0004] Established new database connection to "fdaa:0:ddd8:a7b:2cc3:0:d185:2:5433". source="postgres_exporter.go:970"
2022-11-22T22:32:05Z app[e7fba105] mia [info]exporter | ERRO[0005] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:ddd8:a7b:2cc3:0:d185:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:ddd8:a7b:2cc3:0:d185:2]:5433: connect: connection refused source="postgres_exporter.go:1658"
2022-11-22T22:32:06Z app[e7fba105] mia [info]error connecting to local postgres context deadline exceeded
2022-11-22T22:32:06Z app[e7fba105] mia [info]checking stolon status
2022-11-22T22:32:06Z app[e7fba105] mia [info]keeper | 2022-11-22T22:32:06.402Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-22T22:32:06Z app[e7fba105] mia [info]keeper is healthy, db is healthy, role: master
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy | [WARNING] 325/223206 (569) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy | [NOTICE] 325/223206 (569) : haproxy version is 2.2.9-2+deb11u3
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy | [NOTICE] 325/223206 (569) : path to executable is /usr/sbin/haproxy
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy | [ALERT] 325/223206 (569) : backend 'bk_db' has no server available!
Can you try fly image update and see if it helps? I think we have improvements since the build you’re running that may make it more obvious what’s happening (or fix it entirely).
Ah I think we found the issue. It seems like you moved the db between organizations? DNS didn’t get the message, so it doesn’t think you’re allowed to ask for IPs. Give it a bit.
Oh, weird. Sorry, I guess I should have brought that up, but I moved the apps last week or so? Everything had been up running since I moved the apps, so I wasn’t thinking of that as part of the problem.