Recurring error since latest service interruption incident

Overall, the HA clusters require a lot of manual intervention and gritty mechanics knowledge sometimes…

(I think you’ve already seen @uncvrd’s classic post, for example!)

Basically, you can either try to do steps along those lines to fix things from the inside, or take the simpler but less elegant forking approach:

https://community.fly.io/t/urgency-problems-with-postgres-the-database-is-not-responding/19926/2

(Ideally, there would be a fully managed alternative—which Fly apparently is still working on. That may end up costing ~$80/month, though. It’s rather unclear at this point…)

2 Likes