Postgres health checks perpetually failing

For context, I noticed that one of the database replicas had 2/3 health checks that were critical yesterday and it hasn’t resolved since. I tried restarting the app but then it falls back into the same 2/3 health checks that are failing for that replica. I believe that some data was lost yesterday as well when a user was trying to onboard onto the app.

The app has plenty of memory and volume size, so I don’t think those are the issues. One of the replicas is a few deploy versions behind but has 3 healthy checks. I tried upgrading the Postgres image to the newer Fly Postgres image this morning and then ran into the same health check issue.

Would love some support, if possible.

I ended up solving this by spinning up a new database container using a volume snapshot and attaching it to the app container. Still not entirely sure how this issue occurred in the first place, but hopefully that’s helpful context for anyone who runs into the same thing.