Just looking at the logs of my postgres app, it does seem very unhappy.
The health checks keep flapping between passing and failing.
(I’ve removed a lot of duplicate cmd/sentinel.go:276 no keeper info available
lines from the excerpt below for brevity, but there were lots of them)
I’m curious about the “Your instance has hit resource limits” errors. Some of these indicate waiting 10s on io, and some are waiting on memory. This is on a database cluster with no active users, and the database size is tiny (<50MB).
The machine is a shared-1x-cpu@256MB, and the metrics show the Firecracker memory usage well below this (~166MB).
As mentioned previously, this personal side-project has worked flawlessly for a number of years, and these issues only started in the last few weeks.
There have been no app deployments or significant data changes (e.g. bulk imports or similar) that could explain the sudden issues.
Not really sure whether this is a Stolon issue or something else?
2025-04-02T22:38:17Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:16.436Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:38:18Z app[21781507a60489] syd [info]keeper | 2025-04-02 22:38:18.835 UTC [9850] LOG: could not receive data from client: Connection reset by peer
2025-04-02T22:38:20Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:20.119Z ERROR cmd/sentinel.go:1018 no eligible masters
2025-04-02T22:38:39Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T22:38:42Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:41.797Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:40:16Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
2025-04-02T22:40:18Z app[21781507a60489] syd [info]keeper | 2025-04-02T22:40:18.118Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "context deadline exceeded"}
2025-04-02T22:40:24Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:24.845Z ERROR cmd/sentinel.go:1895 cannot get keepers info {"error": "unexpected end of JSON input"}
2025-04-02T22:40:41Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:41.729Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:40:41Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:41.798Z ERROR cmd/sentinel.go:1018 no eligible masters
2025-04-02T22:40:54Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T22:52:20Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-02T23:03:43Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
2025-04-02T23:03:57Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T23:28:27Z app[21781507a60489] syd [info]sentinel | 2025-04-02T23:28:27.702Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T23:51:30Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] cpu: system spent 1.68s of the last 10 seconds waiting on cpu (276.39µs)
2025-04-02T23:51:41Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-02T23:54:46Z app[21781507a60489] syd [info]sentinel | 2025-04-02T23:54:46.118Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T01:05:51Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.36s of the last 10 seconds waiting on io (30.3µs)
2025-04-03T01:06:01Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:06:22Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] memory: system spent 1.07s of the last 10 seconds waiting on memory (71.21µs)
[✗] io: system spent 1.38s of the last 10 seconds waiting on io (21.72µs)
2025-04-03T01:06:42Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:08:38Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.38s of the last 10 seconds waiting on io (34.43µs)
2025-04-03T01:08:52Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:10:22Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] memory: system spent 1.88s of the last 10 seconds waiting on memory (40.2µs)
[✗] io: system spent 2.25s of the last 10 seconds waiting on io (21.9µs)
2025-04-03T01:10:52Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:32:02Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:41:23Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.06s of the last 10 seconds waiting on io (37.39µs)
2025-04-03T01:41:32Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T02:38:24Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.2s of the last 10 seconds waiting on io (41.62µs)
2025-04-03T02:38:42Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:38:41.971Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:38:55Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:38:54.789Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:39:24Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
HTTP GET http://172.19.137.58:5500/flycheck/pg: 500 Internal Server Error Output: [✗] transactions: Timed out (321.27ms)
2025-04-03T02:39:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:29.316Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:39:40Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:40.596Z ERROR cmd/sentinel.go:1895 cannot get keepers info {"error": "unexpected end of JSON input"}
2025-04-03T02:39:44Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-03T02:39:56Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:56.040Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:04Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
2025-04-03T02:40:04Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:03.719Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:07Z health[21781507a60489] syd [error]Health check for your postgres role has failed. Your cluster's membership is inconsistent.
2025-04-03T02:40:08Z app[21781507a60489] syd [info]keeper | 2025-04-03T02:40:06.925Z ERROR cmd/keeper.go:742 error getting pg state {"error": "query returned 0 rows"}
2025-04-03T02:40:17Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:17.318Z ERROR cmd/sentinel.go:1018 no eligible masters
2025-04-03T02:40:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:29.236Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:29.317Z ERROR cmd/sentinel.go:1018 no eligible masters
2025-04-03T02:40:48Z health[21781507a60489] syd [info]Health check for your postgres role is now passing.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy | [WARNING] 092/024054 (683) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5119ms. 0 active and 1 backup servers left. Running on backup. 5 sessions active, 0 requeued, 0 remaining in queue.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy | [WARNING] 092/024054 (683) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5119ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy | [ALERT] 092/024054 (683) : backend 'bk_db' has no server available!
2025-04-03T02:40:55Z app[21781507a60489] syd [info]keeper | 2025-04-03 02:40:55.566 UTC [27806] LOG: could not receive data from client: Connection reset by peer
2025-04-03T02:40:55Z app[21781507a60489] syd [info]keeper | 2025-04-03 02:40:55.966 UTC [27805] LOG: could not receive data from client: Connection reset by peer
2025-04-03T02:40:58Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:58.041Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:58Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:58.437Z ERROR cmd/sentinel.go:1018 no eligible masters
2025-04-03T02:41:04Z app[21781507a60489] syd [info]proxy | [WARNING] 092/024104 (683) : Backup Server bk_db/pg is UP, reason: Layer7 check passed, code: 200, check duration: 2883ms. 0 active and 1 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
2025-04-03T02:41:04Z app[21781507a60489] syd [info]proxy | [WARNING] 092/024104 (683) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 2958ms. 1 active and 1 backup servers online. 0 sessions requeued, 0 total in queue.
2025-04-03T02:41:26Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:41:26.443Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:41:39Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-03T02:41:52Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:41:52.142Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:44:39Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:44:38.915Z WARN cmd/sentinel.go:276 no keeper info available {"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:44:56Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.