Elixir server can't connect to postgres db

Hi. Last night my liveview app lost connection to the postgres db, and is in a perpetual “pending” state, despite restarting the postgres instance. How can I fix this?

Can you take a look at your app logs to identify any specific issues?

fly logs -a your-app-name

Phoenix is very helpful letting users know what’s happening :slight_smile:

Feel free to paste it here if there’s no sensitive data

Thanks for the quick response :slight_smile:
This a relevant excerpt from the elixir app log:

2022-04-09T20:29:57Z app[ed47e1cc] ams [info]20:29:57.665 [error] Postgrex.Protocol (#PID<0.1910.2>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (intersplat-db.internal:5432): host is unreachable - :ehostunreach
2022-04-09T20:35:20Z app[ed47e1cc] ams [info]20:35:20.043 [error] Postgrex.Protocol (#PID<0.1907.2>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (intersplat-db.internal:5432): timeout

My database (intersplat-db) does not return anything with the log command.

Hello! It looks like your DB crashed. It’s running an older version of our Postgres setup that’s using etcd for a backing store instead of consul. I just fixed that. Can you see if it’s working now?

When you get a chance it’s worth running fly image show -a intersplat-db and then fly image update -a intersplat-db.

I ran the commands, and it deployed successfully. However, my elixir instance is having trouble connecting to the db FATAL: password authentication failed for user "dark_bush_7960_g4k3rxg1e0x0qznj". Did something change configuration wise for elixir apps?

Looking! That’s definitely a problem.

Sorry to jump on norseboats post but having exactly the same issue here. DB crashed at some point today. Just tried to update the image as per this suggestion and it didn’t go well:

2022-04-09T21:36:30Z   [info]proxy    | [WARNING] 098/213630 (565) : bk_db/pg1 changed its IP from (none) to fdaa:0:3118:a7b:a9a:0:386c:2 by flydns/dns1.
2022-04-09T21:36:30Z   [info]proxy    | [WARNING] 098/213630 (565) : Server bk_db/pg1 ('lhr.chrx-db1.internal') is UP/READY (resolves again).
2022-04-09T21:36:30Z   [info]proxy    | [WARNING] 098/213630 (565) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2022-04-09T21:36:31Z   [info]exporter | ERRO[0002] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:3118:a7b:a9a:0:386c:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:3118:a7b:a9a:0:386c:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2022-04-09T21:36:34Z   [info]exporter | INFO[0006] Established new database connection to "fdaa:0:3118:a7b:a9a:0:386c:2:5433".  source="postgres_exporter.go:970"
2022-04-09T21:36:35Z   [info]exporter | ERRO[0007] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:3118:a7b:a9a:0:386c:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:3118:a7b:a9a:0:386c:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2022-04-09T21:36:37Z   [info]proxy    | [WARNING] 098/213637 (565) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2022-04-09T21:36:37Z   [info]proxy    | [NOTICE] 098/213637 (565) : haproxy version is 2.2.9-2+deb11u3
2022-04-09T21:36:37Z   [info]proxy    | [NOTICE] 098/213637 (565) : path to executable is /usr/sbin/haproxy
2022-04-09T21:36:37Z   [info]proxy    | [ALERT] 098/213637 (565) : backend 'bk_db' has no server available!
2022-04-09T21:36:42Z   [info]keeper   | {"level":"warn","ts":"2022-04-09T21:36:42.347Z","logger":"etcd-client","caller":"v3@v3.5.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002bddc0/#initially=[etcd-na.fly-shared.net:443]","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
2022-04-09T21:36:42Z   [info]sentinel | {"level":"warn","ts":"2022-04-09T21:36:42.349Z","logger":"etcd-client","caller":"v3@v3.5.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0002bfdc0/#initially=[etcd-na.fly-shared.net:443]","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
2022-04-09T21:36:42Z   [info]sentinel | 2022-04-09T21:36:42.349Z	FATAL	cmd/sentinel.go:2021	cannot create sentinel: cannot create store: cannot create kv store: etcdserver: request timed out
2022-04-09T21:36:42Z   [info]keeper   | 2022-04-09T21:36:42.351Z	FATAL	cmd/keeper.go:2118	cannot create keeper: cannot create store: cannot create kv store: etcdserver: request timed out
2022-04-09T21:36:42Z   [info]panic: error checking stolon status: {"level":"warn","ts":"2022-04-09T21:36:42.348Z","logger":"etcd-client","caller":"v3@v3.5.0/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0004281c0/#initially=[etcd-na.fly-shared.net:443]","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: request timed out"}
2022-04-09T21:36:42Z   [info]cannot create kv store: etcdserver: request timed out
2022-04-09T21:36:42Z   [info]: exit status 1
2022-04-09T21:36:42Z   [info]goroutine 9 [running]:
2022-04-09T21:36:42Z   [info]main.main.func2(0xc0000d0000, 0xc000075710)
2022-04-09T21:36:42Z   [info]	/go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:81 +0x72c
2022-04-09T21:36:42Z   [info]created by main.main
2022-04-09T21:36:42Z   [info]	/go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:72 +0x43b
2022-04-09T21:36:42Z   [info]Main child exited normally with code: 2
2022-04-09T21:36:42Z   [info]Reaped child process with pid: 541 and signal: SIGKILL, core dumped? false
2022-04-09T21:36:42Z   [info]Reaped child process with pid: 544 and signal: SIGKILL, core dumped? false
2022-04-09T21:36:42Z   [info]Reaped child process with pid: 538, exit code: 1
2022-04-09T21:36:42Z   [info]Reaped child process with pid: 536, exit code: 1
2022-04-09T21:36:42Z   [info]Reaped child process with pid: 565, exit code: 1
2022-04-09T21:36:42Z   [info]Starting clean up.
2022-04-09T21:36:42Z   [info]Umounting /dev/vdc from /data

Database is chrx-db1

@norseboat there is something up with replication on this DB. We have it running properly with a single node, but adding replicas is causing problems. We’re going to keep working on it, you should be good to go right now though.

@kurt Thank you so much! It seems to be working as intended now :smile: Excellent service as always :rocket:

@aaronrussell I recovered your DB and moved it off the etcd coordinator we were using. It had the same issue and should be much more reliable now.

:pray: thank you