Scaling v2: `Error: failed to obtain lease: failed to get lease on VM`

We heavily use the idea of scaling up and down in different regions for our ‘job processing’ application.
In v1 we could just fly scale count 1 or fly scale count 40 and by setting a bunch of regions it scaled up and down within these regions. After upgrading to v2 we keep running into this error, from time to time, when scaling:

Error: failed to obtain lease: failed to get lease on VM XXXX: machine ID XXX lease currently held by XXX@XXX.XX, expires at 2023-06-06T07:17:43Z

Running fly m stop XXX returns the same error message.

The Machines API allows a user to obtain a lease on a machine to prevent other users from updating it concurrently. Various flyctl commands make use of this. Off the top of my head, I can think of at least two possible causes for this error:

  • On occasion, your fly scale commands (or other fly commands that work on machines, like fly deploy) are failing to release the leases that they acquire on machines. Intermittent network or API issues could cause this, for example. Generally flyctl will attempt to print a warning if it’s unable to release leases; perhaps you might find some in your logs.
  • You’re running multiple fly scale commands simultaneously.

Perhaps one of these describes your situation?

If you’re sure that there are no other users operating on a machine, you can use fly machines leases clear <machine ID> to clear the lease for a machine (or replace the machine ID with --select to choose the machine interactively from a list).

1 Like

Both options don’t really apply to my situation. Yes, we regularly scale out but that is managed by one orchestrator and so no commands run in parallel.

Releasing a lease is an option I can try out.

However, I feel this v2 is a step back from v1 where scaling up and down just worked.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.