Machine and volume host unreachable in sin region (all operations return 408)

Hi,

My Postgres machine and its attached volume are both on an unreachable host in the sin region and I’m unable to perform any operation on them.

Details:

  • App: onth-database

  • Machine ID: 3d8d3979f6dd08

  • Volume ID: vol_498zy2yn2e08jx9v

  • Region: sin, Zone: 158a

What I see:

  • fly status -a onth-database shows the machine as started but the volume is flagged with * (host unreachable)

  • fly machine restart/stop/kill all return 408 errors

  • Proxy logs show: fly-proxy-p2p/tls/tcp-backhaul: unexpected end of file

  • My production app (onth-api) is down as a result

I’m on a hobby plan so I can’t open a support ticket. Is there anything I can do, or does this require Fly staff to intervene on the physical host?

Thanks,
Hugo

Hi, this host is currently offline due to a hardware issue. Please do not run single-node Postgres for production apps as you will hit this exact case and your database will be unavailable. You can restore from a backup or wait for the host to be restored to service.

Definitely what Lillian said.

There are a few managed providers that will supply a small PG database free of charge; that might be worth a go. If it is reasonably close to your host, latency should not be much of an issue. I use Supabase, though bear in mind they pause any databases that are getting insufficient use.

Yes but i will have data lose if i restore the database from a previous snapshot. What is the situation exactly ? How long time it will take to restore the situation ?

Were you using daily snapshots? If so, you only have no more than a day’s data loss. Did you have any other backups?

1 Like

there is a network issue on this host; it’s unknown how long it will take to restore, but our team is working on it.
please note that today’s issue doesn’t have data loss, but tomorrow’s might. from our volumes documentation:

  • Create redundancy in your primary region: If your app needs a volume to function, and the NVMe drive hosting your volume fails, then that instance of your app goes down. There’s no way around that. You can run multiple Machines with volumes in your app’s primary region to mitigate hardware failures.
  • Create and store backups: If you only have a single copy of your data on a single volume, and that drive fails, then the data is lost. Fly.io takes daily snapshots and can retain them from 1 to 60 days (5 days by default), but the snapshots shouldn’t be your primary backup method.
2 Likes

Okay thanks for the feedback!
I will wait the issue to be solved to create another instance of the DB in the same region.

The app seems working again thank lilian & halfer for the quick support :folded_hands:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.