We heavily use the idea of scaling up and down in different regions for our ‘job processing’ application.
In v1 we could just
fly scale count 1 or
fly scale count 40 and by setting a bunch of regions it scaled up and down within these regions. After upgrading to v2 we keep running into this error, from time to time, when scaling:
Error: failed to obtain lease: failed to get lease on VM XXXX: machine ID XXX lease currently held by XXX@XXX.XX, expires at 2023-06-06T07:17:43Z
fly m stop XXX returns the same error message.
The Machines API allows a user to obtain a lease on a machine to prevent other users from updating it concurrently. Various
flyctl commands make use of this. Off the top of my head, I can think of at least two possible causes for this error:
- On occasion, your
fly scale commands (or other
fly commands that work on machines, like
fly deploy) are failing to release the leases that they acquire on machines. Intermittent network or API issues could cause this, for example. Generally
flyctl will attempt to print a warning if it’s unable to release leases; perhaps you might find some in your logs.
- You’re running multiple
fly scale commands simultaneously.
Perhaps one of these describes your situation?
If you’re sure that there are no other users operating on a machine, you can use
fly machines leases clear <machine ID> to clear the lease for a machine (or replace the machine ID with
--select to choose the machine interactively from a list).
Both options don’t really apply to my situation. Yes, we regularly scale out but that is managed by one orchestrator and so no commands run in parallel.
Releasing a lease is an option I can try out.
However, I feel this v2 is a step back from v1 where scaling up and down just worked.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.