Can I create a new machine in ORD to attach my PG volume to?

skyline88 · March 25, 2026, 11:11pm

- - Machine: 4d89213c256138
  - Region: ORD
  - Issue: VM resize (shared-cpu-1x → performance-1x) caused EXT4 filesystem corruption (bad block bitmap checksum). Postgres aborts on startup.
  - Snapshot restore stuck: vol_vgjzndkg6ypp56pv in “restoring” state for 35+ minutes
  - Can’t launch new machines: ORD returning “insufficient resources” for ALL sizes including shared-cpu-1x
  - Fork volume worked (vol_493z1dyzj6l3jdz4) but can’t attach to a machine — no ORD capacity
  - Production database is down
  - Snapshot IDs: vs_7PZqQeaq75G1h9AqVD2laNbz (1hr), vs_DzQY5okY4l1Gc8GNv37Xxe5 (1 day), vs_jQxD2A8Dbg1BclXZ4zNJ6G0D (2 days)
  - Single node, no cluster — just one machine with one volume
  Need either ORD capacity freed up or help completing the snapshot restore.

halfer · March 25, 2026, 11:34pm

Is this a self-hosted Postgres? If so, is it in a cluster, and how many nodes do you have in the cluster? I believe folks in this forum have rebuild a single unhealthy node in a cluster from other healthy nodes.

skyline88 · March 25, 2026, 11:37pm

Hey halfer - thanks for the response. Single node, no cluster — just one machine (4d89213c256138) with one volume. The volume has filesystem corruption (EXT4 bad block bitmap checksum)
and Postgres won’t start. I have a clean forked volume (vol_493z1dyzj6l3jdz4) ready to go but can’t launch any machines in ORD — getting
“insufficient resources” errors on all VM sizes including shared-cpu-1x. Need either ORD capacity or help with the stuck snapshot restore (35+ min
in “restoring” state).

halfer · March 25, 2026, 11:40pm

OK, a clean forked volume is a good start. Thus, at least you have not lost your data. I assume also you have snapshots configured and tested as working.

Is this a good time to boot up a managed Postgres instance and import your data into that?

If not, and if ORD is out of capacity, consider spinning up something in a neighbouring region for now. I assume getting back online is more important than region-related latency issues.

skyline88 · March 25, 2026, 11:42pm

Thanks — yes, the forked volume gives me confidence the data is intact. I have snapshots but haven’t been able to test them since the restore has
been stuck for 40+ min.

I’m considering spinning up a fresh Postgres in a nearby region (DFW or IAD) to get back online, but the forked volume with my data is stuck in
ORD. So I’d come up online with an empty database. Is there a way to move or restore data across regions, or do I need to wait for ORD capacity to
come back so I can read from the forked volume first?

Open to managed Postgres suggestions too — what would you recommend?

halfer · March 25, 2026, 11:45pm

I don’t use PG on Fly. However, I think people have had a good experience with Fly’s MPG. It is way, way safer than running a single node on a Fly app; people getting burned by that configuration is commonplace here. Fly MPG seems to be supported in your region.

How much data do you have in GB? Would it take you a long time to move the data to another region?

skyline88 · March 25, 2026, 11:47pm

Thanks — only about 583MB of data, so moving regions would be quick once I can read from the forked volume. The problem is I can’t launch any machine in ORD to access it.

MPG sounds like the right move long-term. Is that fly postgres create with the --flex flag, or is there a separate setup? And can I restore from a volume snapshot into MPG?

halfer · March 25, 2026, 11:49pm

I think fly postgres is for the self-host option; you want flyctl mpg.

I am surprised ORD is completely full. ~~I will see if I can create a machine there~~. Update: I am struggling to find an image to launch, but the capacity number does look problematic.

skyline88 · March 25, 2026, 11:51pm

would be great - If there’s capacity I can get my data back, then move to MPG

halfer · March 26, 2026, 12:00am

Hmm, I assume negative capacity means there is a problem:

$ curl -s 'https://api.machines.dev/v1/platform/regions?size=shared-cpu-1x' | jq -c '.Regions[]|[.code,.capacity]' | head -15
["ams",-13386]
["arn",7667]
["bom",1150]
["cdg",16463]
["dfw",-5210]
["ewr",-4051]
["fra",-8822]
["gru",11250]
["iad",-65322]
["jnb",2828]
["lax",-5298]
["lhr",7722]
["nrt",-5285]
["ord",-52415]
["sin",-3931]

mayailurus · March 26, 2026, 12:04am

The capacity API is no longer accurate, unfortunately. See the following other recent forum thread for more context:

https://community.fly.io/t/persistent-could-not-reserve-resource-error-in-ord/27356

mayailurus · March 26, 2026, 7:20pm

As a small side note, there are some suggestions for doing so in the other thread, for those who happened to find this one first.

(And it looks like that approach did work in this case.)