Hi everyone,
We’ve been running a Phoenix/Elixir app (`rivena`) on Fly with 2 shared-cpu-1x machines in CDG. Last Friday, one machine got permanently stuck and never recovered.
The error, repeated by the proxy every 1-2 seconds for hours:
machine ID 48e3e4ef57d358 lease currently held by 93c3d86b-a3ac-5d3d-9a47-0725b7c98f14@tokens.fly.io, expires at 2026-06-13T14:06:26Z
Some retries also got `rate limit exceeded`.
This happened 3 times on the same machine between 13:36 and 14:06 UTC on June 13. The second machine (`d89492eb145718`) was completely unaffected and handled all traffic fine — so no downtime for users.
We ended up destroying the stuck machine and creating a fresh one, which started immediately without issues.
A few questions:
- Is this a known issue with Fly’s proxy / machine leasing?
- What triggers a lease to become permanently held like this?
- Is there a way to recover without destroying the machine?
- Should we expect it to happen again?
Thanks!