Unable to connect to postgres

I’m not able to connect to postgres via flyctl:

flyctl postgres connect -a app-db

Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2… complete
psql: error: could not translate host name “app-db.internal” to address: Name or service not known

My application also isn’t able to connect to it:

tcp connect (top2.nearest.of.app-db.internal:5432): non-existing domain - :nxdomain

I’m guessing I should see an answer when using flyctl dig:

flyctl dig AAAA app-db.internal

But, I do not:

;; opcode: QUERY, status: NXDOMAIN, id: 50520
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;app-db.internal. IN AAAA

How do I fix, what I believe to be, this internal network problem?

Thanks for your time and help.

Hi @jswanner,

I assume that app-db is not the actual name of your app. But, since you didn’t mention anything about that, it should be verified.

Also, post the fly status -a app-db, using the actual name of your app. Does it even exist?

The command to connect to my own Postgres DB console does work, so it’s not a systemic problem.

Hi @Mark,

Thanks for your response. I was in the middle of adding a replica to see if I could resolve my issue that way, somehow.

$ flyctl status -a gameday-math-staging-db                                (main)
App
  Name     = gameday-math-staging-db
  Owner    = gamedaymath
  Version  = 20
  Status   = running
  Hostname = gameday-math-staging-db.fly.dev
  Platform = nomad

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS                  HEALTH CHECKS                   RESTARTS        CREATED
e3638027        app     20      mia     run     running (replica)       3 total, 2 passing, 1 critical  0               12m2s ago
e7fba105        app     20      mia     run     running (leader)        3 total, 3 passing              4               2h55m ago

Thanks for showing the status.

You can also get the logs for the DB app if that’s helpful. fly logs -a gameday-math-staging-db.

For the console to Postgres, it should connect to the primary/leader since any changes need to happen there.

I’m not aware of any issues in mia either: https://status.flyio.net/

$ flyctl logs --app gameday-math-staging-db                                                                                                                                    (main)
2022-11-22T20:27:44Z app[e3638027] mia [info]        "wal_compression": "on",
2022-11-22T20:27:44Z app[e3638027] mia [info]        "wal_keep_size": "0",
2022-11-22T20:27:44Z app[e3638027] mia [info]        "work_mem": "64MB"
2022-11-22T20:27:44Z app[e3638027] mia [info]    },
2022-11-22T20:27:44Z app[e3638027] mia [info]    "maxStandbysPerSender": 50,
2022-11-22T20:27:44Z app[e3638027] mia [info]    "deadKeeperRemovalInterval": "1h"
2022-11-22T20:27:44Z app[e3638027] mia [info]}
2022-11-22T20:27:44Z app[e3638027] mia [info]generated new config
2022-11-22T20:27:44Z app[e3638027] mia [info]keeper   | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]sentinel | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy    | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]exporter | Running...
2022-11-22T20:27:44Z app[e3638027] mia [info]exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy    | [WARNING] 325/202744 (554) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2022-11-22T20:27:44Z app[e3638027] mia [info]proxy    | [NOTICE] 325/202744 (554) : New worker #1 (570) forked
2022-11-22T20:27:45Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:47Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:47Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:48Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:49Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:50Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:51Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy    | [WARNING] 325/202751 (570) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy    | [NOTICE] 325/202751 (570) : haproxy version is 2.2.9-2+deb11u3
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy    | [NOTICE] 325/202751 (570) : path to executable is /usr/sbin/haproxy
2022-11-22T20:27:51Z app[e3638027] mia [info]proxy    | [ALERT] 325/202751 (570) : backend 'bk_db' has no server available!
2022-11-22T20:27:52Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:53Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:54Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:55Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:56Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:27:57.368 UTC [27800] WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | pg_basebackup: checkpoint completed
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | pg_basebackup: write-ahead log start point: 16/AB000028 on timeline 1
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | pg_basebackup: starting background WAL receiver
2022-11-22T20:27:57Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   |      0/588428 kB (0%), 0/1 tablespace (/data/postgres/backup_label        )
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:27:57.502 UTC [27807] ERROR:  replication slot "stolon_7063fb99" does not exist
2022-11-22T20:27:57Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:27:57.502 UTC [27807] STATEMENT:  START_REPLICATION SLOT "stolon_7063fb99" 16/AB000000 TIMELINE 1
2022-11-22T20:27:57Z app[e3638027] mia [info]keeper   | pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR:  replication slot "stolon_7063fb99" does not exist
2022-11-22T20:27:58Z app[e3638027] mia [info]keeper   |  12985/588428 kB (2%), 0/1 tablespace (/data/postgres/base/16386/99537    )
2022-11-22T20:27:58Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:27:59Z app[e3638027] mia [info]keeper   |  68392/588428 kB (11%), 0/1 tablespace (/data/postgres/base/16386/90809    )
2022-11-22T20:27:59Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:00Z app[e3638027] mia [info]keeper   | 127906/588428 kB (21%), 0/1 tablespace (/data/postgres/base/16386/90783    )
2022-11-22T20:28:00Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:01Z app[e3638027] mia [info]keeper   | 201062/588428 kB (34%), 0/1 tablespace (/data/postgres/base/16386/90790    )
2022-11-22T20:28:01Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:02Z app[e3638027] mia [info]keeper   | 268608/588428 kB (45%), 0/1 tablespace (/data/postgres/base/16386/90779    )
2022-11-22T20:28:02Z app[e7fba105] mia [info]sentinel | 2022-11-22T20:28:02.049Z        WARN    cmd/sentinel.go:287     received db state for unexpected db uid {"receivedDB": "", "db": "7063fb99", "keeper": "88370a2"}
2022-11-22T20:28:02Z app[e3638027] mia [info]checking stolon status
2022-11-22T20:28:02Z app[e3638027] mia [info]keeper is healthy, db is healthy, role: standby
2022-11-22T20:28:03Z app[e3638027] mia [info]keeper   | 351742/588428 kB (59%), 0/1 tablespace (/data/postgres/base/16386/90803    )
2022-11-22T20:28:04Z app[e3638027] mia [info]keeper   | 411057/588428 kB (69%), 0/1 tablespace (/data/postgres/base/16386/90796    )
2022-11-22T20:28:05Z app[e3638027] mia [info]keeper   | 488089/588428 kB (82%), 0/1 tablespace (/data/postgres/base/16386/16920    )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | 582285/588428 kB (98%), 0/1 tablespace (/data/postgres/base/13756/3601_fsm )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:06Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:28:06.197 UTC [27800] WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | 588437/588437 kB (100%), 0/1 tablespace (/data/postgres/global/pg_control   )
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | 588437/588437 kB (100%), 1/1 tablespace
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | pg_basebackup: write-ahead log end point: 16/AB000100
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | pg_basebackup: waiting for background process to finish streaming ...
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | pg_basebackup: error: child process exited with exit code 1
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | pg_basebackup: removing data directory "/data/postgres"
2022-11-22T20:28:06Z app[e3638027] mia [info]keeper   | 2022-11-22T20:28:06.329Z        ERROR   cmd/keeper.go:1364      failed to resync from followed instance {"error": "sync error: exit status 1"}
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | 2022-11-22T20:28:11.359Z        ERROR   cmd/keeper.go:1109      db failed to initialize or resync
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | pg_basebackup: checkpoint completed
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | pg_basebackup: write-ahead log start point: 16/AD000028 on timeline 1
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   | pg_basebackup: starting background WAL receiver
2022-11-22T20:28:11Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:28:11.461 UTC [27874] WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:11Z app[e3638027] mia [info]keeper   |      0/588428 kB (0%), 0/1 tablespace (/data/postgres/backup_label        )
2022-11-22T20:28:12Z app[e3638027] mia [info]keeper   |  51816/588428 kB (8%), 0/1 tablespace (/data/postgres/base/16386/90809    )
2022-11-22T20:28:13Z app[e3638027] mia [info]keeper   | 167014/588428 kB (28%), 0/1 tablespace (/data/postgres/base/16386/90790    )
2022-11-22T20:28:14Z app[e3638027] mia [info]keeper   | 250336/588428 kB (42%), 0/1 tablespace (/data/postgres/base/16386/90779    )
2022-11-22T20:28:15Z app[e3638027] mia [info]keeper   | 314525/588428 kB (53%), 0/1 tablespace (/data/postgres/base/16386/103836   )
2022-11-22T20:28:16Z app[e3638027] mia [info]keeper   | 410001/588428 kB (69%), 0/1 tablespace (/data/postgres/base/16386/90796    )
2022-11-22T20:28:17Z app[e3638027] mia [info]keeper   | 479170/588428 kB (81%), 0/1 tablespace (/data/postgres/base/16386/99372    )
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 566853/588428 kB (96%), 0/1 tablespace (/data/postgres/base/16386/90762    )
2022-11-22T20:28:18Z app[e7fba105] mia [info]keeper   | 2022-11-22 20:28:18.180 UTC [27874] WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | WARNING:  skipping special file "./postgresql.auto.conf"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 588437/588437 kB (100%), 0/1 tablespace (/data/postgres/global/pg_control   )
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 588437/588437 kB (100%), 1/1 tablespace
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | pg_basebackup: write-ahead log end point: 16/AD000100
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | pg_basebackup: waiting for background process to finish streaming ...
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | pg_basebackup: syncing data to disk ...
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | pg_basebackup: renaming backup_manifest.tmp to backup_manifest
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | pg_basebackup: base backup completed
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.481 UTC [672] LOG:  starting PostgreSQL 14.4 (Debian 14.4-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.481 UTC [672] LOG:  listening on IPv6 address "fdaa:0:ddd8:a7b:88:3:70a:2", port 5433
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.482 UTC [672] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.485 UTC [673] LOG:  database system was interrupted; last known up at 2022-11-22 20:28:11 UTC
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.533 UTC [673] LOG:  entering standby mode
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.536 UTC [673] LOG:  redo starts at 16/AD000028
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.536 UTC [673] LOG:  consistent recovery state reached at 16/AD000100
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.537 UTC [672] LOG:  database system is ready to accept read-only connections
2022-11-22T20:28:18Z app[e3638027] mia [info]keeper   | 2022-11-22 20:28:18.547 UTC [677] LOG:  started streaming WAL from primary at 16/AE000000 on timeline 1

The issue seems to be with the private networking.

After running flyctl ssh console -a gameday-math-staging-db

I cannot resolve the internal address provided by DATABASE_URL (DATABASE_URL=postgres://****:****@top2.nearest.of.gameday-math-staging-db.internal:5432/gameday_math_staging)

root@e3638027:/# dig +short AAAA gameday-math-staging-db.internal
root@e3638027:/# dig +short AAAA top2.nearest.of.gameday-math-staging-db.internal

From that machine, I can resolve the app address, so some parts of the DNS setup is working, but not with regards the postgres instances:

root@e3638027:/# dig +short AAAA gameday-math-staging.internal
fdaa:0:ddd8:a7b:2cc3:1:ede2:2

It looks like the internal address entries for the postgres VMs are missing from DNS. I cannot figure what to run from flyctl to add those DNS entries. It feels like a problem when the VMs are starting that they are not registering with DNS, and I don’t think it’s a problem I can solve.

I am able to use flyctl ssh console and connect to each Postgres VM and then use psql to query the database. So, postgres itself seems to be running fine on the VMs.

But, anytime .internal addresses are involved things fail:

$ flyctl postgres connect -a gameday-math-staging-db
Connecting to fdaa:0:ddd8:a7b:2cc3:0:d185:2... complete
psql: error: could not translate host name "gameday-math-staging-db.internal" to address: Name or service not known

@jswanner please try running fly restart gameday-math-staging-db. You are probably correct about a VM not getting registered in DNS, a restart will rerun that process (there’s no current way to do that without a restart).

Hi @kurt,

Thanks for your response. Unfortunately, it didn’t work:

$ flyctl postgres restart -a gameday-math-staging-db
Attempting to restart replica(s)
 Restarting e3638027-0b73-e230-2a74-f66d5734c934
Performing a failover
Postgres cluster has been successfully restarted!
$ flyctl postgres connect -a gameday-math-staging-db 
Error can't get role for fdaa:0:ddd8:a7b:88:3:70a:2: 500: context deadline exceeded

Granted, I had to use flyctl postgres restart and not fly restart:

$ fly restart gameday-math-staging-db
Error this command has been removed. please use `fly apps restart` instead
$ fly apps restart gameday-math-staging-db                                                                                                                                  
Error postgres apps should use `fly pg restart` instead

Update:

Trying to connect to postgres is now back to the earlier error:

$ flyctl postgres connect -a gameday-math-staging-db
Connecting to fdaa:0:ddd8:a7b:88:3:70a:2... complete
psql: error: could not translate host name "gameday-math-staging-db.internal" to address: Name or service not known

Oh this is unrelated to DNS issues, I think. It seems like your PG nodes are restarting. I believe your disk may be filling up. Try resizing it to 5+ GB and see if that gets you unstuck?

I’ve increased the volume size, but doesn’t seem to have helped.

I’m pretty sure that was me that triggered that.

As it stands right now, after restarting the postgres app, the logs fill up with the following message:

2022-11-22T22:30:43.171 app[e7fba105] mia [info] sentinel | 2022-11-22T22:30:43.170Z WARN cmd/sentinel.go:276 no keeper info available {"db": "b1c0ee90", "keeper": "88370a2"}

The following is an excerpt of the logs prior to that:

2022-11-22T22:32:04Z app[e7fba105] mia [info]exporter | INFO[0004] Established new database connection to "fdaa:0:ddd8:a7b:2cc3:0:d185:2:5433".  source="postgres_exporter.go:970"
2022-11-22T22:32:05Z app[e7fba105] mia [info]exporter | ERRO[0005] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:ddd8:a7b:2cc3:0:d185:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:ddd8:a7b:2cc3:0:d185:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2022-11-22T22:32:06Z app[e7fba105] mia [info]error connecting to local postgres context deadline exceeded
2022-11-22T22:32:06Z app[e7fba105] mia [info]checking stolon status
2022-11-22T22:32:06Z app[e7fba105] mia [info]keeper   | 2022-11-22T22:32:06.402Z        ERROR   cmd/keeper.go:719       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2022-11-22T22:32:06Z app[e7fba105] mia [info]keeper is healthy, db is healthy, role: master
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy    | [WARNING] 325/223206 (569) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy    | [NOTICE] 325/223206 (569) : haproxy version is 2.2.9-2+deb11u3
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy    | [NOTICE] 325/223206 (569) : path to executable is /usr/sbin/haproxy
2022-11-22T22:32:06Z app[e7fba105] mia [info]proxy    | [ALERT] 325/223206 (569) : backend 'bk_db' has no server available!

Can you try fly image update and see if it helps? I think we have improvements since the build you’re running that may make it more obvious what’s happening (or fix it entirely).

I updated the image earlier today actually (after the problems started that is, it was part of what I’ve tried to do to fix the problem):

$ flyctl image update -a gameday-math-staging-db 
Error image is already running the latest image

Thanks again for your help.

Ah I think we found the issue. It seems like you moved the db between organizations? DNS didn’t get the message, so it doesn’t think you’re allowed to ask for IPs. Give it a bit.

Oh, weird. Sorry, I guess I should have brought that up, but I moved the apps last week or so? Everything had been up running since I moved the apps, so I wasn’t thinking of that as part of the problem.

It’s working! Thanks so much for your help!

1 Like

Ah yes, there was a bug for a long time where moving an app to a different org did not propagate to our internal state. It was fixed late last week.

Hopefully that shouldn’t happen again.