Urgency: problems with postgres, the database is not responding

Hi Quis,

My name is John from the Infra team. There were a few Fly Postgres clusters that had some type of bug triggered by a routine maintenance action we run against our host server fleet. We’re still trying to diagnose what this bug/interaction was; it’s quite obscure. There will be more forthcoming on that; right now, let’s get your cluster back to health.

Because it’s not clear what this bug is, we are recommending that users restore their database to a new Fly PG cluster and point their frontend apps at that. The following steps will do this for you:

# Create a new Postgres app from one of your existing app's volumes
# (Do this once)
fly pg create --initial-cluster-size 3 --fork-from <OLD_DB_APP_NAME>:<OLDEST_VOL_ID> -n <NEW_DB_APP_NAME>
# Repeat from here for every front-end app that connects to the database
# Remove the old database config from app
fly secrets unset -a <FRONTEND_APP_NAME> --stage DATABASE_URL
# Add the config for the new database
fly pg attach -a <FRONTEND_APP_NAME> --database-user <NEW_DB_USER> --database-name <OLD_DB_NAME> <NEW_DB_APP_NAME>
fly secrets deploy -a <FRONTEND_APP_NAME>

I have just started my day and will be checking the forums throughout the day; ready to respond if you need help.