Hi, I’ve been struggling with an existing GPU machine for the last three days. According to logs, the machine is unable to start because there are no available resources to run the machine on.
I already attempted to scale down to 0 and back to 1 to trigger a recreation of the machine but the error I received was: ‘Error: failed to launch VM: insufficient resources to create new machine …’
So, is this happening to be related to the deprecation of GPUs from fly.io?
Hi again… That capacity API has unfortunately gotten even less reliable since we last spoke, and I think it is simply completely wrong today, …
All queries that I’ve tried just now, even ones for vanilla (non-GPU) Machines, have claimed capacity: 1 in all regions.
I’ve added the Questions / Help category to this thread, which will improve the odds of someone from Fly.io weighing in on the GPU deprecation side. (I wouldn’t be surprised if that was independently/simultaneously a problem.)
Aside: If you still have an attached volume, then that will tie you to a particular underlying physical host machine, and fly scale on its own won’t shake you free of that. You may need the forking trick again…
I’ve taken a look at this, and mayailurus is right on.
You have a volume in ORD, so any new GPU machine you bring up in ORD will try to use a specific host machine. That host is in the process of being decommissioned, so no new machines can start on it.
You’ll want to fork that volume and delete it and then try to bring up the machine again.