Failing Postgres DB connection from app here and there

kurt · October 3, 2021, 5:43pm

We’re bringing PG clusters back up now. It’s going to take a 30+ minutes for all of them to settle (assuming the Consul issue we think we fixed is actually fixed).

mathiasn · October 4, 2021, 12:40pm

So coming back to the original issue. I still have connection issues from time to time. Happened several times today. As soon as it’s happening again I’ll check logs and let you know about details.

kurt · October 4, 2021, 1:54pm

@mathiasn will you run fly status -a <db-name> and see if it’s been restarting? If you have random connection errors, it might mean that postgres is dying + restarting. This could happen because of OOMs, or potentially because the VM is failing health checks.

You can also run fly status -a <db-name> --all to see if there are any failed Postgres VMs.

mathiasn · October 4, 2021, 4:29pm

Yeah, I checked that but that wasn’t the problem. I see that in my logs:

2021-10-04T16:28:32.607314094Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:32.607Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-04T16:28:34.372375382Z app[c3991336] fra [info] sentinel | 2021-10-04T16:28:34.371Z	INFO	cmd/sentinel.go:94	sentinel leadership lost
2021-10-04T16:28:34.613629365Z app[c3991336] fra [info] sentinel | 2021-10-04T16:28:34.611Z	ERROR	cmd/sentinel.go:102	election loop error	{"error": "Unexpected response code: 500 (rpc error getting client: failed to get conn: rpc error: lead thread didn't get connection)"}
2021-10-04T16:28:44.614824417Z app[c3991336] fra [info] sentinel | 2021-10-04T16:28:44.613Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-10-04T16:28:45.248003267Z app[c3991336] fra [info] sentinel | 2021-10-04T16:28:45.247Z	ERROR	cmd/sentinel.go:1843	error retrieving cluster data	{"error": "Unexpected response code: 500"}
2021-10-04T16:28:45.287357763Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:45.283Z	ERROR	cmd/keeper.go:1042	error retrieving cluster data	{"error": "Unexpected response code: 500"}
2021-10-04T16:28:45.414813784Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:45.414Z	INFO	cmd/proxy.go:346	check function error	{"error": "cannot get cluster data: Unexpected response code: 500"}
2021-10-04T16:28:45.629651205Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:45.629Z	INFO	cmd/proxy.go:304	check timeout timer fired
2021-10-04T16:28:45.631443594Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:45.629Z	INFO	cmd/proxy.go:158	Stopping listening
2021-10-04T16:28:45.638706631Z app[c3991336] fra [info] keeper   | 2021-10-04 16:28:45.637 UTC [15334] LOG:  unexpected EOF on client connection with an open transaction
2021-10-04T16:28:50.602536023Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:50.602Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-04T16:28:50.603381998Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:50.603Z	INFO	cmd/keeper.go:1543	already master
2021-10-04T16:28:50.622795592Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:50.622Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-04T16:28:50.623014301Z app[c3991336] fra [info] keeper   | 2021-10-04T16:28:50.622Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-04T16:28:50.877236098Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:50.876Z	INFO	cmd/proxy.go:124	Starting proxying
2021-10-04T16:28:50.879662303Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:50.879Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-04T16:28:51.274763461Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:51.274Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-04T16:28:51.910751156Z app[c3991336] fra [info] sentinel | 2021-10-04T16:28:51.910Z	INFO	cmd/sentinel.go:89	sentinel leadership acquired
2021-10-04T16:28:52.615550171Z app[c3991336] fra [info] keeper   | 2021-10-04 16:28:52.614 UTC [20923] LOG:  PID 15338 in cancel request did not match any process
2021-10-04T16:28:56.398172558Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:56.397Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-04T16:28:56.728869510Z app[c3991336] fra [info] proxy    | 2021-10-04T16:28:56.728Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-04T16:29:00.955099000Z app[c3991336] fra [info] keeper   | 2021-10-04T16:29:00.954Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-04T16:29:00.955828456Z app[c3991336] fra [info] keeper   | 2021-10-04T16:29:00.955Z	INFO	cmd/keeper.go:1543	already master
2021-10-04T16:29:00.972456457Z app[c3991336] fra [info] keeper   | 2021-10-04T16:29:00.972Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-04T16:29:01.886192529Z app[c3991336] fra [info] proxy    | 2021-10-04T16:29:01.885Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-04T16:29:02.009029488Z app[c3991336] fra [info] sentinel | 2021-10-04T16:29:02.008Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-10-04T16:29:02.013667831Z app[c3991336] fra [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc00013f420, 0x0, 0x0, 0x0, 0x0)
2021-10-04T16:29:02.013688101Z app[c3991336] fra [info] sentinel | 	/go/pkg/mod/github.com/superfly/leadership@v0.2.0/candidate.go:120 +0xc5
2021-10-04T16:29:02.013695461Z app[c3991336] fra [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2021-10-04T16:29:02.019037149Z app[c3991336] fra [info] sentinel | exit status 2
2021-10-04T16:29:02.019051229Z app[c3991336] fra [info] supervisor stopping
2021-10-04T16:29:02.032714308Z app[c3991336] fra [info] keeper   | 2021-10-04 16:29:02.031 UTC [20933] FATAL:  terminating connection due to administrator command
2021-10-04T16:29:02.035267053Z app[c3991336] fra [info] keeper   | 2021-10-04 16:29:02.034 UTC [20927] FATAL:  terminating connection due to administrator command
2021-10-04T16:29:02.036928923Z app[c3991336] fra [info] keeper   | 2021-10-04 16:29:02.036 UTC [20925] FATAL:  terminating connection due to administrator command
2021-10-04T16:29:02.048222135Z app[c3991336] fra [info] keeper   | waiting for server to shut down....2021-10-04 16:29:02.048 UTC [601] LOG:  database system is shut down
2021-10-04T16:29:02.149057075Z app[c3991336] fra [info] keeper   | server stopped
2021-10-04T16:29:02.158022202Z app[c3991336] fra [info] keeper   | signal: interrupt
2021-10-04T16:29:02.570294779Z app[c3991336] fra [info] Main child exited normally with code: 0
2021-10-04T16:29:02.571891309Z app[c3991336] fra [info] Starting clean up.
2021-10-04T16:29:02.587783295Z app[c3991336] fra [info] Umounting /dev/vdc from /data

UPDATE: The postgres is not coming up again :-/

2021-10-04T16:30:25.422012886Z runner[c3991336] fra [info] Starting instance
2021-10-04T16:30:25.461124774Z runner[c3991336] fra [info] Configuring virtual machine
2021-10-04T16:30:25.464100656Z runner[c3991336] fra [info] Pulling container image
2021-10-04T16:30:26.205737502Z runner[c3991336] fra [info] Unpacking image
2021-10-04T16:30:26.216660758Z runner[c3991336] fra [info] Preparing kernel init
2021-10-04T16:30:26.363695085Z runner[c3991336] fra [info] Setting up volume 'pg_data'
2021-10-04T16:30:26.577683944Z runner[c3991336] fra [info] Configuring firecracker
2021-10-04T16:30:56.139020719Z runner[c3991336] fra [info] Starting instance
2021-10-04T16:35:01.266231914Z runner[cd1f75b0] fra [info] Starting instance

The log stops here basically.

Update: 2021-10-04 18:44 CEST: It’s back

sheharyar · October 4, 2021, 9:16pm

Having the same issues. The postgres instance becomes unavailable from time to time causing the app to die. I set up an external uptime monitor a while back and this is how the results look:

Here’s the result of that command:

Instances
ID       TASK VERSION REGION DESIRED STATUS           HEALTH CHECKS                  RESTARTS CREATED
7ecbb18c app  7 ⇡     sin    run     running (leader) 3 total, 3 passing             0        4h33m ago
500ef53b app  7 ⇡     sin    stop    failed                                          0        4h37m ago
5b78f730 app  7 ⇡     sin    stop    failed                                          0        4h39m ago
3d40f194 app  7 ⇡     sin    stop    failed                                          0        4h40m ago
ee951a50 app  7 ⇡     sin    stop    failed                                          0        4h41m ago
98adb042 app  7 ⇡     sin    stop    failed           3 total, 3 critical            0        2021-10-03T19:50:29Z
8ca6c001 app  6       sin    stop    failed                                          0        2021-10-03T18:50:22Z
49b10ddd app  6       sin    stop    failed                                          0        2021-10-03T18:37:10Z
31ab093c app  5       sin    stop    failed                                          0        2021-10-03T17:50:28Z
a306c3e3 app  5       sin    stop    failed                                          0        2021-10-03T17:37:19Z
c430a194 app  5       sin    stop    failed                                          0        2021-10-03T17:20:02Z
02596840 app  5       sin    stop    failed                                          0        2021-10-03T17:03:23Z
307767e8 app  5       sin    stop    failed                                          0        2021-10-03T16:54:49Z
2d413ce5 app  5       sin    stop    failed                                          0        2021-10-03T16:50:48Z
e62b4fce app  5       sin    stop    failed                                          0        2021-10-03T16:47:14Z
199311b9 app  5       sin    stop    failed                                          0        2021-10-03T16:46:44Z
5821abeb app  5       sin    stop    failed           3 total, 1 passing, 2 critical 2        2021-10-03T16:41:24Z
1521ab66 app  5       sin    stop    failed                                          0        2021-10-03T16:39:59Z
b59b36cf app  4       sin    stop    failed                                          0        2021-10-03T16:36:40Z
f7969e46 app  4       sin    stop    failed                                          0        2021-10-03T16:04:48Z
851f5d97 app  4       sin    stop    failed                                          0        2021-10-03T15:48:19Z
90570902 app  4       sin    stop    failed                                          0        2021-10-03T15:40:24Z
742a4728 app  4       sin    stop    failed                                          0        2021-10-03T15:35:56Z
654b7404 app  4       sin    stop    failed                                          0        2021-10-03T15:33:35Z
5642d15f app  4       sin    stop    failed                                          0        2021-10-03T15:32:23Z
6e2eab61 app  4       sin    stop    failed                                          0        2021-10-03T15:31:39Z
77648b94 app  4       sin    stop    failed           3 total, 3 critical            1        2021-09-26T18:06:18Z

Note that I did not push a newer version/image of the DB.

mathiasn · October 5, 2021, 4:35pm

It just happened again… @kurt

fly status -all shows

Instances
ID       TASK VERSION REGION DESIRED STATUS           HEALTH CHECKS                  RESTARTS CREATED
e7ad6e45 app  3       fra    run     running (leader) 3 total, 3 passing             0        23h58m ago
cd1f75b0 app  3       fra    stop    failed                                          0        2021-10-04T16:35:11Z
c3991336 app  3       fra    stop    failed           3 total, 1 passing, 2 critical 1        2021-10-03T17:26:22Z
801adfdd app  3       fra    stop    failed           3 total, 2 passing, 1 critical 3        2021-10-03T17:10:36Z
4deb1dca app  3       fra    stop    failed           3 total, 1 passing, 2 critical 3        2021-10-03T17:00:24Z
e4da363d app  3       fra    stop    failed           3 total, 1 passing, 2 critical 4        2021-10-03T16:46:26Z
452e7fe8 app  3       fra    stop    failed           3 total, 1 passing, 2 critical 7        2021-09-29T15:22:43Z

So that’s not the problem.

kurt · October 5, 2021, 4:44pm

We’re rolling out new Consul services in various regions, we think this will help. Give us some more time to dig into the exact issues here but I’m hoping we have a better fix later this afternoon.

mathiasn · October 7, 2021, 4:04pm

And it’s happening again

2021-10-07T16:02:20.907933950Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:20.907Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-07T16:02:20.908999264Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:20.908Z	INFO	cmd/keeper.go:1543	already master
2021-10-07T16:02:20.929239637Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:20.929Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-07T16:02:20.929371297Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:20.929Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-07T16:02:21.169834105Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:21.169Z	INFO	cmd/proxy.go:304	check timeout timer fired
2021-10-07T16:02:21.169860804Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:21.169Z	INFO	cmd/proxy.go:158	Stopping listening
2021-10-07T16:02:32.101632298Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:32.101Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-07T16:02:32.102407174Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:32.102Z	INFO	cmd/keeper.go:1543	already master
2021-10-07T16:02:32.118596240Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:32.118Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-07T16:02:32.118780259Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:32.118Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-07T16:02:35.703068136Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:35.702Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-07T16:02:43.297418985Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:43.297Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-07T16:02:43.298335980Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:43.298Z	INFO	cmd/keeper.go:1543	already master
2021-10-07T16:02:43.321491617Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:43.321Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-07T16:02:43.321680596Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:43.321Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-07T16:02:46.951184079Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:46.950Z	INFO	cmd/proxy.go:124	Starting proxying
2021-10-07T16:02:46.953021309Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:46.952Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-07T16:02:50.703502541Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:50.702Z	INFO	cmd/proxy.go:304	check timeout timer fired
2021-10-07T16:02:50.703539851Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:02:50.703Z	INFO	cmd/proxy.go:158	Stopping listening
2021-10-07T16:02:54.494465587Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:54.494Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-07T16:02:54.495482691Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:54.495Z	INFO	cmd/keeper.go:1543	already master
2021-10-07T16:02:54.517188336Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:54.516Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-07T16:02:54.517608534Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:02:54.517Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-07T16:03:05.343635336Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:03:05.343Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}

check timeout timer fired, Stopping listening…

And later

2021-10-07T16:04:43.670033576Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:43.669Z	INFO	cmd/keeper.go:1094	our db boot UID is different than the cluster data one, waiting for it to be updated	{"bootUUID": "edd638f0-b719-4ac9-a1b8-a6183d405b20", "clusterBootUUID": "040d9b75-7aff-4f4d-b726-6e37092c6756"}
2021-10-07T16:04:45.293758097Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:45.293Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:04:47.794762575Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:47.794Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:04:49.925324436Z app[a8319d13] fra [info] keeper is healthy, db is healthy, role: master
2021-10-07T16:04:50.295569268Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:50.295Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:04:52.031582378Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:04:52.031Z	INFO	cmd/proxy.go:289	not proxying to master address since we aren't in the enabled proxies list	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-07T16:04:52.796538858Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:52.796Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:04:54.797452657Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:54.797Z	INFO	cmd/keeper.go:1094	our db boot UID is different than the cluster data one, waiting for it to be updated	{"bootUUID": "edd638f0-b719-4ac9-a1b8-a6183d405b20", "clusterBootUUID": "040d9b75-7aff-4f4d-b726-6e37092c6756"}
2021-10-07T16:04:54.925143875Z app[a8319d13] fra [info] error connecting to local postgres context deadline exceeded
2021-10-07T16:04:54.925178254Z app[a8319d13] fra [info] checking stolon status
2021-10-07T16:04:55.297561989Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:55.297Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:04:57.798035077Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:04:57.797Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:05:00.298994303Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:05:00.298Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:05:02.800328429Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:05:02.799Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:05:03.159353080Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:05:03.159Z	INFO	cmd/proxy.go:124	Starting proxying
2021-10-07T16:05:03.161504023Z app[a8319d13] fra [info] proxy    | 2021-10-07T16:05:03.161Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-07T16:05:05.301526030Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:05:05.301Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-07T16:05:05.932521206Z app[a8319d13] fra [info] keeper   | 2021-10-07T16:05:05.932Z	INFO	cmd/keeper.go:1094

kurt · October 7, 2021, 4:07pm

Can we stop/start your db? We can switch it to a local consul by stopping all VMs and starting them back up. It’ll cause about 30s of downtime.

mathiasn · October 7, 2021, 4:08pm

Go for it. It doesn’t matter whether you as a person or fly as a service brings my service down

kurt · October 7, 2021, 4:19pm

Ok, updated. I just noticed this DB is only running one instance. We don’t completely support that, our service is designed for 2+ VMs at all time for redundancy. The change we just shipped will likely improve the reliability of your DB as is, but there will definitely be issues running DBs with one member.

If you really want to run Postgres with only one VM, you’re better off running the raw Postgres image directly. Believe it or not, the clustering infrastructure in our image makes a single VM less reliable.

mathiasn · October 7, 2021, 4:20pm

Ok, thank you, I’ll adjust that.

yannis · October 7, 2021, 4:31pm

My db is down again too from time to time since ~30 min. The app is staging-db.

fly status

App
  Name     = staging-db
  Owner    = ulysse
  Version  = 7
  Status   = running
  Hostname = staging-db.fly.dev

Instances
ID       TASK VERSION REGION DESIRED STATUS                 HEALTH CHECKS                  RESTARTS CREATED
1df13ea8 app  7       cdg    run     running (failed to co) 3 total, 2 critical            3        9m55s ago
b6cb9747 app  7       cdg    run     running (failed to co) 3 total, 1 passing, 2 critical 12       2021-10-03T17:34:29Z

mathiasn · October 7, 2021, 4:34pm

I’ve did a fly scale count 2 for the DB app. That’s correct isn’t it? I was just wondering because fly status still shows one instance although fly scale show shows Count 2.

And fly vm status <vm-id> does not accept the given instance id from fly status

kurt · October 7, 2021, 5:10pm

You’ll need to add a volume for it to scale to: fly volumes create pg_data --size 10 --region fra

yannis · October 7, 2021, 5:24pm

@kurt My db is still down. Any news?

mathiasn · October 7, 2021, 5:38pm

There is already such a volume:

fly volumes show vol_x915grnn8ern70qy

        ID: vol_x915grnn8ern70qy
      Name: pg_data
    Region: fra
   Size GB: 10
 Encrypted: false
Created at: 07 Jun 21 14:31 UTC

I guess I shouldn’t create another volume on top of that…

Update 2: I created another volume, and have two instances now, thank you!

kurt · October 7, 2021, 5:38pm

@yannis sorry, missed your note. We’re looking!

iiro.krankka · October 7, 2021, 5:39pm

My db was again offline for a good 5-10 minutes.

aswinmohanme · October 7, 2021, 5:45pm

My DB and App is offline, the site is posted on Show HN too.

Edit: It’s back up after a deploy.

Topic		Replies	Views
Lost Postgres DB connection Questions / Help	4	528	December 14, 2021
Unable to connect to postgres database Questions / Help postgres	2	610	April 22, 2022
Troubleshooting failing managed pg?	1	349	April 1, 2022
Postgres app no longer connects, logs non-stop proxy + keeper messages Build debugging postgres	5	387	February 21, 2022
Production database randomly going down and not recovering	6	961	July 18, 2022

Failing Postgres DB connection from app here and there

Related topics