postgres fallen over

Hi

My postgres db seems to have fallen over. I don’t think I’ve done anything out of the ordinary and i usually don’t expect postgres to fall over.

fly ssh console -C "connect" --app recilution-db
Connecting to top1.nearest.of.recilution-db.internal... complete
psql: error: connection to server at "recilution-db.internal" (fdaa:0:50b4:a7b:276d:0:9ff2:2), port 5432 failed: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

I can however ssh into the box tried to restart

postgres@6d75ae7f:/usr/local/bin$ pg-restart
{"success":false,"message":"exit status 1","data":""}
postgres@6d75ae7f:/usr/local/bin$ exit
logout
# pg-restart
{"success":false,"message":"exit status 1","data":""}

Theres whole load of postgres processes running and owned by stolon. I found the stolonctl command however have no idea how to use it.

I’m at a stage i could just flatten the box and start again but I’d really want to know what happened and how to prevent this in the future.

Any pointers or direction would be much appreciated. I’ll probably start reading the stolon docs next but kind of wanted this stuff to be transparent

It looks like the disk filled up. We’re looking at it, the errors it’s giving about a “pre existing shared memory block” are new.

It’s running again. Will you check and see if it has the right data? I’m concerned that it might have corrupted what’s on disk.

For future reference, you shouldn’t really use pg-restart from within the VM. If you need to restart a postgres you can run fly vm stop <id> and it’ll do what you need. Stolon needs to manage the postgres process to keep things happy and restarting it without restarting stolon can cause problems.

Did you happen to check the logs before you ran pg-restart?

Thanks Kurt

Looks like my app is back up. Data looks to be fine with at a quick glance.

/var/logs/postgresql was empty when i checked it, i think that was before i hit pg-restart

How can i monitor/query volumes to see if they’re filling up?
df -h on the box suggests theres still over 6 gigs of space left?

Thanks
Phil

The /data/ volume is only 1gb, the OS runs on a temporary 8GB volume that gets reset on every boot. But /data/ doesn’t look full, it’s possible the WAL filled it up and then got cleared.

Logs don’t get written to the filesystem, you can view them with fly logs -a <db>.

Thanks Kurt that makes sense. Thanks again for your help!