I’ve recently had issues with sudden Database errors and it ultimately was caused by a DB machine ending up in a zombie state. My questions:
- Is there a way to get notified about such drastic changes in the availability? I’ve seen that you integrate with grafana, but it seems to me that this is a case that should have a more native solution?
- If grafana is the only way, how to set it up so it reports on any machine not being running?
Fyi, if you read this because you face similar issues, these were the errors I saw suddenly appearing:
- PG::ReadOnlySqlTransaction: ERROR: cannot execute INSERT in a read-only transaction
- PQsocket() can’t get socket descriptor
- PQconsumeInput() server closed the connection unexpectedly
This is how I solved it:
- Check the status with
fly status -a APP_NAME-db
- I saw there was a minor update available, so I updated it:
fly image update -a APP_NAME-db
- More importantly however, I saw that one of the 3 machines was in a state called ‘zombie’. I restarted that server:
fly machine restart MACHINE_ID(this step solved the issue)