My postgres standalone instance (mess-with-dns-pg) crashed today and started printing out these errors repeatedly. Restarting the instance manually fixed the issue, but it didn’t restart automatically.
Is there a way I can set up a healthcheck so that the postgres instance automatically restarts itself if there’s a problem?
2022-08-21T17:33:30Z app[3461a67a] iad [info]keeper | 2022-08-21 17:33:30.956 GMT [17951] FATAL: pre-existing shared memory block (key 131073, ID 7) is still in use
2022-08-21T17:33:30Z app[3461a67a] iad [info]keeper | 2022-08-21 17:33:30.956 GMT [17951] HINT: Terminate any old server processes associated with data directory "/data/postgres".
2022-08-21T17:33:31Z app[3461a67a] iad [info]keeper | 2022-08-21T17:33:31.130Z ERROR cmd/keeper.go:1526 failed to start postgres {"error": "postgres exited unexpectedly"}
2022-08-21T17:33:31Z app[3461a67a] iad [info]sentinel | 2022-08-21T17:33:31.469Z WARN cmd/sentinel.go:276 no keeper info available {"db": "0f41122e", "keeper": "ab805b922"}
2022-08-21T17:33:31Z app[3461a67a] iad [info]sentinel | 2022-08-21T17:33:31.475Z ERROR cmd/sentinel.go:1009 no eligible masters
So this is a thing we’re not sure how to handle. That Postgres crashed because it OOMed, which can corrupt data. We do pretty aggressive cleanup when a new VM boots against that disk to make it work, but I’m not fully comfortable doing that automatically on crash (it is better for someone to go “this is, in fact, ok”).
I think you can configure the built in healthcheck to restart for you if you run:
fly config save -a mess-with-dns-pg
# edit fly.toml under [checks.pg]
# restart_limit = 3 (instead of restart_limit = 0)
fly deploy -i flyio/postgres:14 -a mess-with-dns-pg
This should make our health checker do the (possibly destructive) restart for you after 3 sequential postgres healthcheck failures.
Also, you should know these Postgreses aren’t “managed” exactly, they’re automated Fly apps. Normally people think “managed” means “a human responds when something goes wrong”, but that’s not something we do. We’ve worked to make this more obvious, but I’m guessing we set the wrong expectation when you created yours.
thanks so much! I can definitely stop asking questions about my databases problems here if it’s not helpful – I can’t always tell if a problem I’m having is unique to me or not.
Oh we love the questions. I was just concerned that you were expecting something else from “managed”. Asking questions in the forum is exactly what we want. And we’re happy to look at DBs when people ask forum questions.
You can do all kinds of fun stuff with that. fly deploy -i flyio/postgres:14 just deploys our public build of that exact app. If you fork it and fiddle around, you can deploy your changes over the DB app we created for you.