Postgres instance suddenly down?

Our prod postgres instance suddenly went into a pending state out of the blue and won’t recover.

fly status -a [redacted]
App
  Name     = [redacted]  
  Owner    = [redacted]          
  Version  = 0                        
  Status   = pending                  
  Hostname = [redacted]

Deployment Status
  ID          = d27f018e-3eb6-be78-6c73-ff495429133b         
  Version     = v0                                           
  Status      = successful                                   
  Description = Deployment completed successfully            
  Instances   = 2 desired, 2 placed, 2 healthy, 0 unhealthy  

Instances
ID VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED 

And logs output

2021-04-10T17:52:13.958Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:13.953Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:52:22.644Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:22.637Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:52:31.320Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:31.313Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:52:39.281Z f8fc2ce6 ams [info] Shutting down virtual machine
2021-04-10T17:52:39.999Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:39.992Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:52:45.163Z f8fc2ce6 ams [info] Sending signal SIGTERM to main child process w/ PID 509
2021-04-10T17:52:45.177Z f8fc2ce6 ams [info] postgres_exporter | Interrupting...
2021-04-10T17:52:45.178Z f8fc2ce6 ams [info] keeper            | Interrupting...
2021-04-10T17:52:45.179Z f8fc2ce6 ams [info] sentinel          | Interrupting...
2021-04-10T17:52:45.180Z f8fc2ce6 ams [info] proxy             | Interrupting...
2021-04-10T17:52:45.182Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:45.168Z	INFO	cmd/sentinel.go:1816	stopping stolon sentinel
2021-04-10T17:52:45.210Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:52:45.193 UTC [614] LOG:  received SIGHUP, reloading configuration files
2021-04-10T17:52:45.222Z f8fc2ce6 ams [info] postgres_exporter | Exited
2021-04-10T17:52:45.225Z f8fc2ce6 ams [info] proxy             | Exited
2021-04-10T17:52:45.226Z f8fc2ce6 ams [info] sentinel          | Exited
2021-04-10T17:52:46.180Z f8fc2ce6 ams [info] Reaped child process with pid: 560 and signal: SIGHUP, core dumped? false
2021-04-10T17:52:46.181Z f8fc2ce6 ams [info] Reaped child process with pid: 614, exit code: 0
2021-04-10T17:52:46.225Z f8fc2ce6 ams [info] keeper            | Exited
2021-04-10T17:52:48.188Z f8fc2ce6 ams [info] Main child exited normally with code: 0
2021-04-10T17:52:48.189Z f8fc2ce6 ams [info] Starting clean up.
2021-04-10T17:52:48.189Z f8fc2ce6 ams [info] Reaped child process with pid: 557, exit code: 0
2021-04-10T17:52:48.202Z f8fc2ce6 ams [info] Umounting /dev/vdc from /data
2021-04-10T17:52:50.295Z f8fc2ce6 ams [info] Starting instance
2021-04-10T17:52:50.344Z f8fc2ce6 ams [info] Configuring virtual machine
2021-04-10T17:52:50.353Z f8fc2ce6 ams [info] Pulling container image
2021-04-10T17:52:51.511Z f8fc2ce6 ams [info] Unpacking image
2021-04-10T17:52:51.534Z f8fc2ce6 ams [info] Preparing kernel init
2021-04-10T17:52:51.932Z f8fc2ce6 ams [info] Setting up volume 'pg_data'
2021-04-10T17:52:52.399Z f8fc2ce6 ams [info] Configuring firecracker
2021-04-10T17:52:54.539Z f8fc2ce6 ams [info] Starting virtual machine
2021-04-10T17:52:54.733Z f8fc2ce6 ams [info] Starting init (commit: 0512da4)...
2021-04-10T17:52:54.764Z f8fc2ce6 ams [info] Mounting /dev/vdc at /data
2021-04-10T17:52:54.769Z f8fc2ce6 ams [info] Running: `docker-entrypoint.sh /fly/start.sh` as root
2021-04-10T17:52:54.797Z f8fc2ce6 ams [info] 2021/04/10 17:52:54 listening on [fdaa:0:1850:a7b:aa3:0:1509:2]:22 (DNS: [fdaa::3]:53)
2021-04-10T17:52:54.989Z f8fc2ce6 ams [info] system            | Tmux socket name: overmind-fly-mbrlW8IIpZWXzJ87RFl0eb
2021-04-10T17:52:54.991Z f8fc2ce6 ams [info] system            | Tmux session ID: fly
2021-04-10T17:52:54.992Z f8fc2ce6 ams [info] system            | Listening at ./.overmind.sock
2021-04-10T17:52:55.092Z f8fc2ce6 ams [info] keeper            | Started with pid 554...
2021-04-10T17:52:55.097Z f8fc2ce6 ams [info] sentinel          | Started with pid 557...
2021-04-10T17:52:55.098Z f8fc2ce6 ams [info] proxy             | Started with pid 559...
2021-04-10T17:52:55.099Z f8fc2ce6 ams [info] postgres_exporter | Started with pid 563...
2021-04-10T17:52:55.226Z f8fc2ce6 ams [info] postgres_exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2021-04-10T17:52:55.478Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:55.471Z	INFO	cmd/sentinel.go:2000	sentinel uid	{"uid": "5852a8e4"}
2021-04-10T17:52:57.403Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:57.398Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-04-10T17:52:57.441Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:52:57.437Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:52:57.672Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:52:57.668Z	INFO	cmd/sentinel.go:89	sentinel leadership acquired
2021-04-10T17:52:59.953Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:52:59.945Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:02.462Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:53:02.454Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:04.964Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:53:04.956Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:06.876Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:06.868Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:07.464Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:53:07.456Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:09.964Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:53:09.957Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:12.466Z f8fc2ce6 ams [info] keeper            | 2021-04-10T17:53:12.458Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T17:53:12.868Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:53:12.864 UTC [611] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-10T17:53:12.871Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:53:12.868 UTC [611] LOG:  listening on IPv6 address "fdaa:0:1850:a7b:aa3:0:1509:2", port 5433
2021-04-10T17:53:12.874Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:53:12.872 UTC [611] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-04-10T17:53:12.907Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:53:12.904 UTC [613] LOG:  database system was shut down at 2021-04-10 17:52:45 UTC
2021-04-10T17:53:12.917Z f8fc2ce6 ams [info] keeper            | 2021-04-10 17:53:12.914 UTC [611] LOG:  database system is ready to accept connections
2021-04-10T17:53:15.550Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:15.544Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:17.171Z f8fc2ce6 ams [info] postgres_exporter | INFO[0021] Established new database connection to "fdaa:0:1850:a7b:aa3:0:1509:2:5433".  source="postgres_exporter.go:970"
2021-04-10T17:53:17.190Z f8fc2ce6 ams [info] postgres_exporter | INFO[0021] Semantic Version Changed on "fdaa:0:1850:a7b:aa3:0:1509:2:5433": 0.0.0 -> 12.5.0  source="postgres_exporter.go:1539"
2021-04-10T17:53:17.242Z f8fc2ce6 ams [info] postgres_exporter | INFO[0022] Established new database connection to "fdaa:0:1850:a7b:aa3:0:1509:2:5433".  source="postgres_exporter.go:970"
2021-04-10T17:53:17.254Z f8fc2ce6 ams [info] postgres_exporter | INFO[0022] Semantic Version Changed on "fdaa:0:1850:a7b:aa3:0:1509:2:5433": 0.0.0 -> 12.5.0  source="postgres_exporter.go:1539"
2021-04-10T17:53:21.221Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:21.215Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:26.891Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:26.884Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:35.572Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:35.567Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:41.241Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:41.235Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:46.928Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:46.922Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:53:55.615Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:53:55.610Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:04.283Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:04.276Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:09.960Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:09.954Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:18.778Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:18.774Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:27.454Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:27.449Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:36.241Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:36.235Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:40.440Z f8fc2ce6 ams [info] Shutting down virtual machine
2021-04-10T17:54:44.914Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:44.906Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T17:54:48.760Z f8fc2ce6 ams [info] Sending signal SIGTERM to main child process w/ PID 509
2021-04-10T17:54:48.778Z f8fc2ce6 ams [info] sentinel          | Interrupting...
2021-04-10T17:54:48.779Z f8fc2ce6 ams [info] proxy             | Interrupting...
2021-04-10T17:54:48.780Z f8fc2ce6 ams [info] postgres_exporter | Interrupting...
2021-04-10T17:54:48.781Z f8fc2ce6 ams [info] keeper            | Interrupting...
2021-04-10T17:54:48.783Z f8fc2ce6 ams [info] sentinel          | 2021-04-10T17:54:48.761Z	INFO	cmd/sentinel.go:1816	stopping stolon sentinel
2021-04-10T17:54:48.801Z f8fc2ce6 ams [info] postgres_exporter | Exited
2021-04-10T17:54:48.892Z f8fc2ce6 ams [info] sentinel          | Exited
2021-04-10T17:54:48.894Z f8fc2ce6 ams [info] proxy             | Exited
2021-04-10T17:54:49.781Z f8fc2ce6 ams [info] Reaped child process with pid: 556 and signal: SIGHUP, core dumped? false
2021-04-10T17:54:49.782Z f8fc2ce6 ams [info] Reaped child process with pid: 611, exit code: 0
2021-04-10T17:54:49.785Z f8fc2ce6 ams [info] Reaped child process with pid: 1010 and signal: SIGPIPE, core dumped? false
2021-04-10T17:54:49.790Z f8fc2ce6 ams [info] keeper            | Exited
2021-04-10T17:54:51.790Z f8fc2ce6 ams [info] Main child exited normally with code: 0
2021-04-10T17:54:51.792Z f8fc2ce6 ams [info] Reaped child process with pid: 553, exit code: 0
2021-04-10T17:54:51.793Z f8fc2ce6 ams [info] Starting clean up.
2021-04-10T17:54:51.809Z f8fc2ce6 ams [info] Umounting /dev/vdc from /data
2021-04-10T18:28:38.797Z b6d6bbf9 ams [info] Starting instance
2021-04-10T18:28:38.853Z b6d6bbf9 ams [info] Configuring virtual machine
2021-04-10T18:28:38.860Z b6d6bbf9 ams [info] Pulling container image
2021-04-10T18:28:40.032Z b6d6bbf9 ams [info] Unpacking image
2021-04-10T18:28:40.056Z b6d6bbf9 ams [info] Preparing kernel init
2021-04-10T18:28:40.274Z b6d6bbf9 ams [info] Setting up volume 'pg_data'
2021-04-10T18:28:40.701Z b6d6bbf9 ams [info] Configuring firecracker

Please advise…

I just tried to scale the memory size, perhaps it was running out of memory? Will report back…

EDIT - still getting the same issue:

2021-04-10T18:54:24.405Z a6eed214 ams [info] Starting instance
2021-04-10T18:54:24.461Z a6eed214 ams [info] Configuring virtual machine
2021-04-10T18:54:24.468Z a6eed214 ams [info] Pulling container image
2021-04-10T18:54:25.697Z a6eed214 ams [info] Unpacking image
2021-04-10T18:54:25.747Z a6eed214 ams [info] Preparing kernel init
2021-04-10T18:54:26.001Z a6eed214 ams [info] Setting up volume 'pg_data'
2021-04-10T18:54:26.716Z a6eed214 ams [info] Configuring firecracker
2021-04-10T18:54:43.744Z a6eed214 ams [info] Starting virtual machine
2021-04-10T18:54:43.966Z a6eed214 ams [info] Starting init (commit: 0512da4)...
2021-04-10T18:54:43.999Z a6eed214 ams [info] Mounting /dev/vdc at /data
2021-04-10T18:54:44.008Z a6eed214 ams [info] Running: `docker-entrypoint.sh /fly/start.sh` as root
2021-04-10T18:54:44.040Z a6eed214 ams [info] 2021/04/10 18:54:44 listening on [fdaa:0:1850:a7b:aa3:0:1509:2]:22 (DNS: [fdaa::3]:53)
2021-04-10T18:54:44.234Z a6eed214 ams [info] system            | Tmux socket name: overmind-fly-t_f5eGIMsNqn7UL-aydj5m
2021-04-10T18:54:44.235Z a6eed214 ams [info] system            | Tmux session ID: fly
2021-04-10T18:54:44.236Z a6eed214 ams [info] system            | Listening at ./.overmind.sock
2021-04-10T18:54:44.337Z a6eed214 ams [info] postgres_exporter | Started with pid 563...
2021-04-10T18:54:44.339Z a6eed214 ams [info] keeper            | Started with pid 555...
2021-04-10T18:54:44.343Z a6eed214 ams [info] sentinel          | Started with pid 558...
2021-04-10T18:54:44.344Z a6eed214 ams [info] proxy             | Started with pid 560...
2021-04-10T18:54:44.483Z a6eed214 ams [info] postgres_exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2021-04-10T18:54:44.777Z a6eed214 ams [info] sentinel          | 2021-04-10T18:54:44.767Z	INFO	cmd/sentinel.go:2000	sentinel uid	{"uid": "3a8612ce"}
2021-04-10T18:54:46.925Z a6eed214 ams [info] sentinel          | 2021-04-10T18:54:46.920Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-04-10T18:54:46.952Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:46.948Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:54:47.196Z a6eed214 ams [info] sentinel          | 2021-04-10T18:54:47.192Z	INFO	cmd/sentinel.go:89	sentinel leadership acquired
2021-04-10T18:54:47.615Z a6eed214 ams [info] sentinel          | 2021-04-10T18:54:47.609Z	ERROR	cmd/sentinel.go:1886	cannot get keepers info	{"error": "unexpected end of JSON input"}
2021-04-10T18:54:49.460Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:49.453Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:54:51.961Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:51.954Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:54:54.462Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:54.455Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:54:56.962Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:56.955Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:54:59.463Z a6eed214 ams [info] keeper            | 2021-04-10T18:54:59.456Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:55:01.907Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:01.900Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:01.962Z a6eed214 ams [info] keeper            | 2021-04-10T18:55:01.957Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:55:04.462Z a6eed214 ams [info] keeper            | 2021-04-10T18:55:04.458Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:55:06.968Z a6eed214 ams [info] keeper            | 2021-04-10T18:55:06.962Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-10T18:55:07.549Z a6eed214 ams [info] keeper            | 2021-04-10 18:55:07.544 UTC [617] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-10T18:55:07.551Z a6eed214 ams [info] keeper            | 2021-04-10 18:55:07.549 UTC [617] LOG:  listening on IPv6 address "fdaa:0:1850:a7b:aa3:0:1509:2", port 5433
2021-04-10T18:55:07.555Z a6eed214 ams [info] keeper            | 2021-04-10 18:55:07.553 UTC [617] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-04-10T18:55:07.587Z a6eed214 ams [info] keeper            | 2021-04-10 18:55:07.584 UTC [618] LOG:  database system was shut down at 2021-04-10 17:54:48 UTC
2021-04-10T18:55:07.596Z a6eed214 ams [info] keeper            | 2021-04-10 18:55:07.594 UTC [617] LOG:  database system is ready to accept connections
2021-04-10T18:55:10.576Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:10.571Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:19.248Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:19.241Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:27.921Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:27.917Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:36.597Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:36.593Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:42.271Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:42.265Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:47.950Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:47.944Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:55:56.639Z a6eed214 ams [info] sentinel          | 2021-04-10T18:55:56.633Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:05.411Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:05.404Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:11.087Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:11.080Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:19.769Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:19.763Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:25.675Z a6eed214 ams [info] postgres_exporter | INFO[0101] Established new database connection to "fdaa:0:1850:a7b:aa3:0:1509:2:5433".  source="postgres_exporter.go:970"
2021-04-10T18:56:25.690Z a6eed214 ams [info] postgres_exporter | INFO[0101] Semantic Version Changed on "fdaa:0:1850:a7b:aa3:0:1509:2:5433": 0.0.0 -> 12.5.0  source="postgres_exporter.go:1539"
2021-04-10T18:56:25.731Z a6eed214 ams [info] postgres_exporter | INFO[0101] Established new database connection to "fdaa:0:1850:a7b:aa3:0:1509:2:5433".  source="postgres_exporter.go:970"
2021-04-10T18:56:25.743Z a6eed214 ams [info] postgres_exporter | INFO[0101] Semantic Version Changed on "fdaa:0:1850:a7b:aa3:0:1509:2:5433": 0.0.0 -> 12.5.0  source="postgres_exporter.go:1539"
2021-04-10T18:56:28.439Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:28.435Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:37.104Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:37.098Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:40.051Z a6eed214 ams [info] Shutting down virtual machine
2021-04-10T18:56:42.781Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:42.776Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "873c68a3", "keeper": "fdaa01850a7baa00150a2"}
2021-04-10T18:56:45.581Z a6eed214 ams [info] Sending signal SIGTERM to main child process w/ PID 510
2021-04-10T18:56:45.598Z a6eed214 ams [info] sentinel          | Interrupting...
2021-04-10T18:56:45.599Z a6eed214 ams [info] proxy             | Interrupting...
2021-04-10T18:56:45.600Z a6eed214 ams [info] postgres_exporter | Interrupting...
2021-04-10T18:56:45.601Z a6eed214 ams [info] keeper            | Interrupting...
2021-04-10T18:56:45.603Z a6eed214 ams [info] sentinel          | 2021-04-10T18:56:45.582Z	INFO	cmd/sentinel.go:1816	stopping stolon sentinel
2021-04-10T18:56:45.634Z a6eed214 ams [info] postgres_exporter | Exited
2021-04-10T18:56:45.638Z a6eed214 ams [info] sentinel          | Exited
2021-04-10T18:56:45.639Z a6eed214 ams [info] proxy             | Exited
2021-04-10T18:56:46.601Z a6eed214 ams [info] Reaped child process with pid: 557 and signal: SIGHUP, core dumped? false
2021-04-10T18:56:46.603Z a6eed214 ams [info] Reaped child process with pid: 1030 and signal: SIGPIPE, core dumped? false
2021-04-10T18:56:46.604Z a6eed214 ams [info] Reaped child process with pid: 617, exit code: 0
2021-04-10T18:56:46.639Z a6eed214 ams [info] keeper            | Exited
2021-04-10T18:56:48.609Z a6eed214 ams [info] Main child exited normally with code: 0
2021-04-10T18:56:48.610Z a6eed214 ams [info] Reaped child process with pid: 554, exit code: 0
2021-04-10T18:56:48.611Z a6eed214 ams [info] Starting clean up.
2021-04-10T18:56:48.627Z a6eed214 ams [info] Umounting /dev/vdc from /data

We’re looking at this! It’s something specific to ams we haven’t identified yet.

1 Like

Our ams datacenter is having difficulty reaching the Vault servers we use to keep, among other things, application secrets in sync. Right now, these postgres VMs can’t get the secrets they need to boot successfully. Our immediate concern is to get that corrected and your DB come back.

We’re not sure what stopped the VMs in the first place. It was likely a similar issue, the DBs use consul to keep leader state and if they lose connectivity to Consul for an extended periods they’ll fail health checks and get restarted.

1 Like

FYI we’re bringing up postgres replicas in the cdg region while we figure out what’s up so you can get your app going again.

1 Like

Cool. That’s within the same app right, so should have the same internal hostname and secrets (or is the replica a separate app because of the extraordinary situation?)

It’ll be the same postgres app, yes, so same connection strings and stuff.

1 Like

Yours is now running in cdg, can you double check and make sure it’s all good?

Still down on our end, cannot connect.

I just noticed that! Internal dns has a bunch of now-bad IPs, we’re cleaning that up as well.

1 Like

Give it a try now?

Yeah, still nothing :confused:

Also, it seems our API is now unreachable from external network also

Extra note: while on Wireguard the DB server is responding, but rejecting the password that we have set.

Meanwhile the API is timing out when trying to reach server (API is also hosted on ams).
Postgrex.Protocol (#PID<0.407.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect ([redacted].internal:5432): host is unreachable - :ehostunreach

EDIT here: here’s what the logs said

2021-04-10T21:04:15.088Z e02c58e6 cdg [info] keeper            | 2021-04-10 21:04:15.084 UTC [1342] FATAL:  password authentication failed for user "postgres"
2021-04-10T21:04:15.091Z e02c58e6 cdg [info] keeper            | 2021-04-10 21:04:15.084 UTC [1342] DETAIL:  Role "postgres" does not exist.
2021-04-10T21:04:15.094Z e02c58e6 cdg [info] keeper            | 	Connection matched pg_hba.conf line 8: "host all all ::0/0 md5"

Also, the logs for the cdg instance still have lots of errors? And one more thing - if I restart our API on ams right now, will that also not be able to access secrets (and then consequentially also fail to boot)?

EDIT: Also, the DB has stopped responding when trying to access directly via Wireguard also

Are you using the <name>.internal DNS over wireguard? It’s still returning bad IPs sometimes, if you run flyctl ips private you can find the running instances IP address and connect to that directly.

Your AMS hosted API might also have issues if it restarts. It’s worth moving that to cdg for now as well.

Yes, that’s what we’re using. Going to IP directly instead of .internal hostname now resolves it without problems, but we still get incorrect password error when trying to connect.

Same on our API.

Oh I missed the incorrect password error. Let me get in there and see what’s up. This is a username/password you use for an app you attached with our CLI?

We just use the postgres user that was created upon first deployment and it’s corresponding password