Deployment started failing with "failed to release lease for machine"

eipe · October 21, 2024, 1:21pm

On friday we started getting failing deployments in our Github PR workflow (without doing changes to configuration). I got a response from the fly staff that basicly said “try again”. And I did, this monday morning. And it worked! … For a few hours.

Now we are getting the same problem again, and this is serious since we cannot work effectivly in a critical phase.

Here is a snippet from the github workflow log

This deployment will:
 * create 1 "app" machine
> Launching new machine
No machines in group app, launching a new machine
> Machine 185e053c449208 [app] was created
WARN failed to release lease for machine 185e053c449208: lease not found
✖ Failed: timeout reached waiting for health checks to pass for machine 185e053c449208: failed to get VM 185e053c449208: Get "https://api.machines.dev/v1/apps/klimsek-editor-fixes-288/machines/185e053c449208": net/http: request canceled
Error: timeout reached waiting for health checks to pass for machine 185e053c449208: failed to get VM 185e053c449208: Get "https://api.machines.dev/v1/apps/klimsek-editor-fixes-288/machines/185e053c449208": net/http: request canceled
Error: Process completed with exit code 1.

Grateful for any insight / suggestions

eipe · October 21, 2024, 2:46pm

Re-triggered and now it worked. I have no idea what is happening.

eipe · October 22, 2024, 7:55am

And now the problems are back again…

eipe · October 22, 2024, 10:55am

When I login on the dashboard I see that the app has now deployed successfully (without me doing anything).

From the fly deploy --help I find following

     --lease-timeout string             Time duration to lease individual machines while running deployment. All
                                         machines are leased at the beginning and released at the end.The lease
                                         is refreshed periodically for this same time, which is why it is
                                         short.flyctl releases leases in most cases. (default "13s")

      --wait-timeout string              Time duration to wait for individual machines to transition states and
                                         become healthy. (default "5m0s")


      --release-command-timeout string   Time duration to wait for a release command finish running, or 'none' to
                                         disable. (default "5m0s")

      --deploy-retries string            Number of times to retry a deployment if it fails (default "auto")

I have currently not configured any of these (i.e. default values).

Do you have any ideas on what I could / try? Since it feels that it gets “stuck” I am not so sure that just extending a timeout would help (and which one, even?). It often goes fairly quick, if everything is well. Perhaps I should decrease the timeout and add 3 retries? What do you think?

eipe · October 22, 2024, 11:56am

From App not working to Questions / Help

system · October 24, 2024, 11:56am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fail to release new deployments	2	53	August 28, 2024
"Error lease not found" when doing "fly deploy"	9	2211	April 4, 2025
Need Help With Rust Deployment Error Questions / Help	9	316	October 17, 2023
I cannot deploy my project, because my project fails health checks rails , machines	2	38	April 2, 2025
Deployment causes failure on CI workflow but actually succeeding Questions / Help	12	332	January 26, 2024

Deployment started failing with "failed to release lease for machine"

Related topics