Database is down suddenly, can't do a restart

When I try restarting the machine, I see this:

Error: failed to restart machine 1781570f115989: could not stop machine 1781570f115989: failed to restart VM 1781570f115989: failed_precondition: machine still active, refusing to start

These are the logs. I’m completely lost as to what could have happened. I didn’t touch anything. Anybody recognise this?

$ fly logs -a customer-dashboard-production-db

Waiting for logs...

2023-06-26T20:32:51.402 app[1781570f115989] ams [info] keeper | 2023-06-26T20:32:51.400Z FATAL cmd/keeper.go:2118 cannot create keeper: cannot create store: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:32:51.402 app[1781570f115989] ams [info] keeper | exit status 1

2023-06-26T20:32:51.402 app[1781570f115989] ams [info] keeper | restarting in 5s [attempt 1]

2023-06-26T20:32:51.402 app[1781570f115989] ams [info] sentinel | 2023-06-26T20:32:51.400Z FATAL cmd/sentinel.go:2030 cannot create sentinel: cannot create store: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:32:51.405 app[1781570f115989] ams [info] sentinel | exit status 1

2023-06-26T20:32:51.405 app[1781570f115989] ams [info] sentinel | restarting in 3s [attempt 1]

2023-06-26T20:32:51.773 app[1781570f115989] ams [info] checking stolon status

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] panic: error checking stolon status: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] : exit status 1

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] goroutine 9 [running]:

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] main.main.func2(0xc0000d2000, 0xc000086a00)

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:81 +0x72c

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] created by main.main

2023-06-26T20:32:51.887 app[1781570f115989] ams [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:72 +0x43b

2023-06-26T20:32:52.523 app[1781570f115989] ams [info] Starting clean up.

2023-06-26T20:32:52.523 app[1781570f115989] ams [info] Umounting /dev/vdb from /data

2023-06-26T20:32:53.531 app[1781570f115989] ams [info] [ 3.509158] reboot: Restarting system

2023-06-26T20:33:08.887 app[1781570f115989] ams [info] Starting init (commit: 08b4c2b)...

2023-06-26T20:33:08.952 app[1781570f115989] ams [info] Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755

2023-06-26T20:33:08.964 app[1781570f115989] ams [info] Preparing to run: `docker-entrypoint.sh start` as root

2023-06-26T20:33:09.039 app[1781570f115989] ams [info] 2023/06/26 20:33:08 listening on [fdaa:0:b44e:a7b:23c5:f057:59e3:2]:22 (DNS: [fdaa::3]:53)

2023-06-26T20:33:09.235 app[1781570f115989] ams [info] cluster spec filename /fly/cluster-spec.json

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] cluster spec already exists

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "initMode": "existing",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "existingConfig": {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "keeperUID": "23c5a392142f2"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] },

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "pgParameters": {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_mode": "on",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_timeout": "60",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "effective_cache_size": "192MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "effective_io_concurrency": "200",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "maintenance_work_mem": "64MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_connections": "300",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_parallel_workers": "8",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_parallel_workers_per_gather": "2",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_worker_processes": "8",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "random_page_cost": "1.1",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "shared_buffers": "64MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "wal_compression": "on",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "work_mem": "4MB"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] },

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "maxStandbysPerSender": 50,

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "deadKeeperRemovalInterval": "1h"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] }

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] system memory: 256mb vcpu count: 1

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "initMode": "existing",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "existingConfig": {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "keeperUID": "23c523a8f2"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] },

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "pgParameters": {

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_command": "if [ $ENABLE_WALG ]; then /usr/local/bin/wal-g wal-push \"%p\"; fi",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_mode": "on",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "archive_timeout": "60",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "effective_cache_size": "192MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "effective_io_concurrency": "200",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "maintenance_work_mem": "64MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_connections": "300",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_parallel_workers": "8",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_parallel_workers_per_gather": "2",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "max_worker_processes": "8",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "random_page_cost": "1.1",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "shared_buffers": "64MB",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "wal_compression": "on",

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "work_mem": "4MB"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] },

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "maxStandbysPerSender": 50,

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] "deadKeeperRemovalInterval": "1h"

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] }

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] generated new config

2023-06-26T20:33:09.239 app[1781570f115989] ams [info] keeper | Running...

2023-06-26T20:33:09.247 app[1781570f115989] ams [info] sentinel | Running...

2023-06-26T20:33:09.247 app[1781570f115989] ams [info] proxy | Running...

2023-06-26T20:33:09.255 app[1781570f115989] ams [info] exporter | Running...

2023-06-26T20:33:09.690 app[1781570f115989] ams [info] exporter | INFO[0000] Starting Server: :9187 source="postgres_exporter.go:1837"

2023-06-26T20:33:09.717 app[1781570f115989] ams [info] proxy | [WARNING] 176/203309 (538) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.

2023-06-26T20:33:09.745 app[1781570f115989] ams [info] proxy | [NOTICE] 176/203309 (538) : New worker #1 (562) forked

2023-06-26T20:33:09.745 app[1781570f115989] ams [info] proxy | [WARNING] 176/203309 (562) : bk_db/pg1 changed its IP from (none) to fdaa:0:b44e:a7b:23c5:f057:59e3:2 by flydns/dns1.

2023-06-26T20:33:09.745 app[1781570f115989] ams [info] proxy | [WARNING] 176/203309 (562) : Server bk_db/pg1 ('ams.customer-dashboard-production-db.internal') is UP/READY (resolves again).

2023-06-26T20:33:09.745 app[1781570f115989] ams [info] proxy | [WARNING] 176/203309 (562) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.

2023-06-26T20:33:09.891 app[1781570f115989] ams [info] keeper | 2023-06-26T20:33:09.889Z FATAL cmd/keeper.go:2118 cannot create keeper: cannot create store: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:33:09.891 app[1781570f115989] ams [info] keeper | exit status 1

2023-06-26T20:33:09.891 app[1781570f115989] ams [info] keeper | restarting in 5s [attempt 1]

2023-06-26T20:33:09.892 app[1781570f115989] ams [info] sentinel | 2023-06-26T20:33:09.889Z FATAL cmd/sentinel.go:2030 cannot create sentinel: cannot create store: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:33:09.892 app[1781570f115989] ams [info] sentinel | exit status 1

2023-06-26T20:33:09.892 app[1781570f115989] ams [info] sentinel | restarting in 3s [attempt 1]

2023-06-26T20:33:10.237 app[1781570f115989] ams [info] checking stolon status

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] panic: error checking stolon status: cannot create kv store: Put "https://consul-fra-2.fly-shared.net/v1/catalog/register?wait=5000ms": dial tcp [2a09:8280:1::6:5ab1]:443: connect: connection refused

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] : exit status 1

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] goroutine 9 [running]:

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] main.main.func2(0xc0000d2000, 0xc000086a00)

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:81 +0x72c

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] created by main.main

2023-06-26T20:33:10.372 app[1781570f115989] ams [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:72 +0x43b

2023-06-26T20:33:10.988 app[1781570f115989] ams [info] Starting clean up.

2023-06-26T20:33:10.990 app[1781570f115989] ams [info] Umounting /dev/vdb from /data

2023-06-26T20:33:11.997 app[1781570f115989] ams [info] [ 3.246590] reboot: Restarting system

I found this somewhere in the dashboard:

Hi,

Possibly an issue with ams? Only other people have also reported issues e.g

Nothing on the status page though :thinking:

We have one host in ams having issues, it will show up in your dashboard as shown above if you’re on that host.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.