how to automatically restart a managed postgres when it crashes?

julia · August 21, 2022, 5:40pm

My postgres standalone instance (mess-with-dns-pg) crashed today and started printing out these errors repeatedly. Restarting the instance manually fixed the issue, but it didn’t restart automatically.

Is there a way I can set up a healthcheck so that the postgres instance automatically restarts itself if there’s a problem?

2022-08-21T17:33:30Z app[3461a67a] iad [info]keeper   | 2022-08-21 17:33:30.956 GMT [17951] FATAL:  pre-existing shared memory block (key 131073, ID 7) is still in use
2022-08-21T17:33:30Z app[3461a67a] iad [info]keeper   | 2022-08-21 17:33:30.956 GMT [17951] HINT:  Terminate any old server processes associated with data directory "/data/postgres".
2022-08-21T17:33:31Z app[3461a67a] iad [info]keeper   | 2022-08-21T17:33:31.130Z	ERROR	cmd/keeper.go:1526	failed to start postgres	{"error": "postgres exited unexpectedly"}
2022-08-21T17:33:31Z app[3461a67a] iad [info]sentinel | 2022-08-21T17:33:31.469Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "0f41122e", "keeper": "ab805b922"}
2022-08-21T17:33:31Z app[3461a67a] iad [info]sentinel | 2022-08-21T17:33:31.475Z	ERROR	cmd/sentinel.go:1009	no eligible masters

kurt · August 22, 2022, 6:31pm

So this is a thing we’re not sure how to handle. That Postgres crashed because it OOMed, which can corrupt data. We do pretty aggressive cleanup when a new VM boots against that disk to make it work, but I’m not fully comfortable doing that automatically on crash (it is better for someone to go “this is, in fact, ok”).

I think you can configure the built in healthcheck to restart for you if you run:

fly config save -a mess-with-dns-pg

# edit fly.toml under [checks.pg]
# restart_limit = 3 (instead of restart_limit = 0)

fly deploy -i flyio/postgres:14 -a mess-with-dns-pg

This should make our health checker do the (possibly destructive) restart for you after 3 sequential postgres healthcheck failures.

Also, you should know these Postgreses aren’t “managed” exactly, they’re automated Fly apps. Normally people think “managed” means “a human responds when something goes wrong”, but that’s not something we do. We’ve worked to make this more obvious, but I’m guessing we set the wrong expectation when you created yours.

julia · August 22, 2022, 6:47pm

thanks so much! I can definitely stop asking questions about my databases problems here if it’s not helpful – I can’t always tell if a problem I’m having is unique to me or not.

kurt · August 22, 2022, 6:50pm

Oh we love the questions. I was just concerned that you were expecting something else from “managed”. Asking questions in the forum is exactly what we want. And we’re happy to look at DBs when people ask forum questions.

julia · August 22, 2022, 7:01pm

nope, I’m definitely only expecting “I can ask questions in the forums and hopefully get answers eventually”

By “managed” I just meant that I don’t have control over the docker image so I don’t know how it works (like it’s not an image I built myself)

kurt · August 22, 2022, 7:08pm

Ok cool. That fits! phew.

The actual source code for the Postgres app is here, btw: GitHub - fly-apps/postgres-ha: Postgres + Stolon for HA clusters as Fly apps.

You can do all kinds of fun stuff with that. fly deploy -i flyio/postgres:14 just deploys our public build of that exact app. If you fork it and fiddle around, you can deploy your changes over the DB app we created for you.

Topic		Replies	Views
Postgres instance crashing and unresponsive Questions / Help postgres	16	2548	February 3, 2022
Postgres down and won't restart	3	357	December 7, 2021
Fly Postgres machine crashed, won't start or stop postgres	8	70	February 10, 2025
Postgres database needed to be manually restarted	11	573	August 17, 2022
Postgres server down postgres	1	320	January 10, 2023

how to automatically restart a managed postgres when it crashes?

Related topics