@lillian This topic comes up every week or so. The advice I have given a few times is that a capacity failure should be handled by the user, by falling back to a series of acceptable alternatives. I infer that from this text in the API docs:
[Machine creation] can fail, and you’re responsible for handling that failure. If you ask for a large Machine, or a Machine in a region we happen to be at capacity for, you might need to retry the request, or to fall back to another region. If you’re working directly with the Machines API, you’re taking some responsibility for your own orchestration!
Is this still correct advice, in your view? (I appreciate commands like scale
can do this on the user’s behalf, but to apply the same advice, I wonder if readers should advocate that all Fly users configure additional regions. Based on the OP’s error, I assume they are set up to only deploy in sjc, and thus need to expand that list in order to be resilient.)