We’ve gotten a lot better at reporting and minimizing capacity issues over the past few months, but GPUs threw a bit of a monkey wrench into the works. Say for instance you want to play with our new A10s and try deploying something to iad. Only, we don’t have A10s in iad. Exactly which GPU types are available and where is confusing–I work here and it trips me up sometimes.
Previously, when attempting to use GPU kinds not available in your chosen region, you’d get an error like this:
Error: error creating a new machine: failed to launch VM: no capacity available in iad (Request ID: 01HWR0QZFGN6ZAJHM5HE4JNTDX-dfw) (Trace ID: e2016c3b35a6ac3d614de36dd3da0b6b)
Try choosing a different region for machine creation
This is very confusing because it looks an awful lot like any other capacity error that may or may not go away in a few minutes, when in fact this GPU kind isn’t available at all in the chosen region (unless we add it there later, of course.) It also doesn’t even indicate the GPU as the cause!
Now the error message is much clearer:
Error: error creating a new machine: failed to launch VM: GPU GPU_KIND_A100_PCIE_40GB not available in region iad (Request ID: 01HX7B3ND90JWSRMP5V4BRKGFV-dfw) (Trace ID: 101a9526c61a86907cf5cd09d7d14101)
Hopefully this helps distinguish temporary capacity errors from permanent unavailability so you’re not like me, trying several times to deploy an app before it occurs to you to check the docs. We hope you enjoy working with our shiny new GPUs!