Our postgres app is stuck at “pending” – running multiple deployments to update some secrets and env vars, the changes did not apply – so I forced the deployment by stopping the stale VM. This then led to there being no VMs, and no new VMs were created to replace it.
We now have a postgres cluster with no VMs, so none of our apps are working – while this is currently a staging environment this is clearly unacceptable. An app should never be stuck in this state.
fly status --all -a $APP shows no instances, I tried fly scale count 1 -a $APP and that hasn’t worked. I really need a resolution on this, very desperately.
After waiting ~30 minutes with no VMs being created in lhr I finally managed to get something to start in iad – I’m now working out how to restore the pg data using wal-g , though I need to be able to STOP postgres on the server to resore the data.
For anyone else who finds themselves with a similar problem and is using wal-g for backups.
The big caveat is to ensure you have a copy of the superuser password and also set it for flypgadmin after the restore, else you will not be able to use your restored database.