Machines stuck in “starting” state

onnwen · October 2, 2025, 7:20am

Hi everyone,

I have an app running on two machines. From time to time, both machines get stuck in the “starting” state, and the app becomes completely inaccessible.

The first time this happened was months after the initial deploy, but recently it’s been occurring more and more often. Each time, the only workaround I’ve found is to clone the machines and force-destroy the ones stuck in “starting”.

Has anyone else experienced something similar, or is there a known cause/fix for this behavior?

Thanks in advance for any help!

erlangga · October 2, 2025, 9:02am

Same here +1 , really need this patch to deployed quickly

here is the log

dashan108 · October 2, 2025, 1:46pm

confirm, also experiencing this (mentioned it in this message)

Sam-Fly · October 2, 2025, 5:07pm

We’re continuing to investigate the cause of these. From what we’ve seen, it seems to impact a small number of machines when waking from a suspended state. As such, changing your app to use a stop instead of a suspend should avoid them altogether.

If switching away from suspend is not an option, two things to try if you find a machine in this state:

Run a machine metadata update with fly machine update <machine-id> --yes --metadata foo=bar . The update should force it out of the starting state. You can update any value, but a metadata update doesn’t change anything in your actual machine settings, so it’s a good fit for cases like this.
If that fails, clone a fresh machine with fly machine clone and destroy the stuck one.

system · October 9, 2025, 5:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

dangra · October 16, 2025, 2:42pm

Hello. This turned out to be a tricky bug for which @rian found the solution