How to restore a volume where a machine can't be provisioned due to low capacity?

URGENT: Cannot launch machine in sjc due to volume conflict

My app REDACTED uses a volume logs_volume in region sjc. I scaled down all machines and now cannot create any new machine with the volume attached — all attempts return:

“failed to launch VM: insufficient resources to create new machine with existing volume”

Please help reattach it to a new performance machine so I can bring the app back online. This is a production client.

Hi… The sjc region is somewhat prone to these capacity crunches, unfortunately.

Most of the time this just works—dfw and iad are solid bets for region choices. But some regions, like sjc, gru, and bom, tend to be in high demand.

Volume forking is typically a way of out of this, if you haven’t tried that yet.


Aside: If you’re trying to reach Fly Support, then it’s best to use your dedicated email address or the dashboard portal, rather than post here in the community forum.

There’s a note in the machine creation API docs that is somewhat relevant here:

Important: This request can fail, and you’re responsible for handling that failure. If you ask for a large Machine, or a Machine in a region we happen to be at capacity for, you might need to retry the request, or to fall back to another region. If you’re working directly with the Machines API, you’re taking some responsibility for your own orchestration!

The scale command allows a region list to be supplied, but this text I believe tells us how to think about the platform: we have to think in multi-region terms. In other words, it is not an error for a region to be unavailable (even though Fly do also want to roll out new capacity where these gaps happen).

Anyway, yes, it’s a pain! I hope you get it sorted out.

Another thought occurs to me, based on the assumption that your volume contains valuable data and you need it to restore service to your client.

Are you regularly backing up or replicating the volume? I also run a single-host app, but the data is not valuable, and I am aware I can lose the data at any time. Volumes are based on on-host NVMe drives, and they’re super fast, but there have been reports in this forum of less than stellar reliability.

(There is a daily backup, which lasts for five days by default. However a daily snapshot may not be enough for your purposes, and a custom backup is recommended.)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.