I have been running a GPU instance for some inference workloads for a while now, and recent changes have made it so my docker image went over the 4Gb max. For my non GPU instances I have simply attached a disk and I am storing my dependencies on the persistent disk, but when I try to attach a volume to my GPU instance I keep getting this error
Error: error creating a new machine: failed to launch VM: insufficient resources to create new machine with existing volume ‘vol_4m8jwj9kn6xk69dr’ (Request ID: 01K7537D3NV0QGJYANDK7M150M-iad) (Trace ID: ceb381967f4b9778356e3b558dd2bc18)
I have tried forking the volume multiple times, and I have tried multiple different GPU types, but nothing seems to get it to work. Any idea’s of how I could get this to actually mount a persistent volume and schedule a GPU ?
@koz The image size limit for GPU machines is larger than normal machines, we support up to 50GB. (I believe we might have also raised the limit for normal machines a bit past 4GB as well, but that’s beside the point). That said, moving your large dependencies like Models onto a volume is a recommended best practice, as it helps keep your images slim and speed up your build/ deploys significantly.
That said the reason you’re hitting the error actually doesn’t have anything to do with your image size! The insufficient resources to create new machine with existing volume error means that there’s no capacity for creating a new GPU machine of your specified size on the host your volume is on. Volumes are stored locally, on the hosts NVME drives. This means they’re very fast, but they’re pinned to specific physical hosts.
Forking the volume is the right approach if you hit that, but by default the volume placement logoic doesn’t know what size machine you’ll want to attach to it. This can lead to the forked volume being placed on a host without capacity, or even a non-gpu host (which will fail for obvious reasons when trying to schedule a GPU on it).
The solution is to use the --vm-gpu-kind, --vm-gpus, and --vm-memory hint flags on fly volume fork . Those let you specify the size of machine you need, as well as the GPU type, ensuring the fork is placed on a host with capacity for your machine.