Hi, I’ve been expierencing this error for some hours now
Failed: error creating a new machine: failed to launch VM: insufficient resources to create a new machine with existing volume ‘vol_xxxxxxxxxxxxxx’
I’ve tried to switch to other regions (cloning the required volumes, of course), but none of them have been successful for me. I always get the same error.
My deployment requires GPUs so my region options are limited. According to the CLI ‘fly platform regions’ output, ams region has available capacity; however, the error persists.
Hi… It would probably help to post the exact commands that you’ve been trying, since some ways of creating Machines will stubbornly cling to older volumes, etc.
Also, is it complaining about the same volume ID each time, and the like?
This just shows its default Machine size (which is performance-1x, non-GPU), whereas GPU hosts are a completely disjoint subset, as I understand it.
You can try to use the underlying Machines API to query instead for the desired GPU flavor, but that part of the infrastructure is suspect lately (unfortunately). For example, iad is currently showing negative capacity (for the default performance-1x).
The volume ID changes if I use different regions (primary_region field), but if I use the same region between runs, the command always refers to the same volume ID (the one assigned to the ‘data’ mount in the toml file for the current region being attempted)
Thanks… Capacity looks super-tight for a100-80gb right now, assuming that the API actually is still accurate:
region
capacity
ams
3
iad
0
sjc
0
syd
8
Also2, what command were you using to clone the volume? Remember that you (non-intuitively) have to say in advance that it’s going to be for a GPU Machine, …
I was using this to clone the volume: fly volumes fork vol_xxxxxxxxxxx --region zzz
I saw ord region has 19 of a100-40gb size, I’ll recreate the volume for that region using the –vm-gpu-kind parameter and reattempt the deploy using a100-40gb size. I already try this other size, but maybe the key is in the way the volume was forked (I honestly doubt it, but who knows…)
Well… that did the trick. The ·$%&# thing just created the machine…
I destroyed the previous volume on the ord region and re-forked, specifying the GPU type to ‘a100-40gb’ (the one available in that region)
Then I switched my deployment to ord region and set vm.size to ‘a100-40gb’
fly deploy output:
==> Verifying app config
Validating fly.toml
✓ Configuration is valid
--> Verified app config
==> Building image
Searching for image 'myorg/an-image' remotely...
image found: img_xxxxxxxxxxxxxxxx
Watch your deployment at https://fly.io/apps/my-app/monitoring
INFO Using wait timeout: 10m0s lease timeout: 13s delay between lease refreshes: 4s
Process groups have changed. This will:
* create 1 "app" machine
No machines in group app, launching a new machine
-------
⠏ Machine xxxxxxxxxxxxxxxx [app] was created
The machine has been created, and in Grafana I can see: