Postgres machines down?

DB has been working fine for a few weeks but I didn’t change anything and now can’t connect to it from my app. Last I checked it wasn’t close to full. I can’t restart it or see the config with the fly CLI.

fly pg config -a mayorgame-db show
gives me:

command is not compatible with this image

fly pg restart --app <name>
gives me:

Error failed to obtain lease: failed to get lease on VM 73287903f11685: dial tcp [fc01:a7b:92::]:3593: i/o timeout

Metrics aren’t showing up in my logs either:

This database is on a host that had a hardware failure. We’re working to restore it. We’re also working on ways to communicate hardware failures more aggressively, because there’s no way you would have known! It is statistically likely we’ll have hardware fail most days, so we don’t update our global status page per host anymore, but you should still be able to figure this out.

When it comes back, I would recommend adding a second Postgres node to your cluster. This is the best way to ensure you’re resilient to hardware issues.

Thanks kurt! How would one do this?

This should do it: High Availability & Global Replication · Fly Docs

1 Like

I’m experiencing this issue now. It’s been two days. Do we know when it’ll be addressed?