How to connect to a machine that is crash-looping?

porton · September 17, 2025, 6:33pm

This is a bug:

I can’t connect to a machine in bad state using fly ssh console. Because, I can’t connect, I can’t fix its state, and because I can’t fix its state, I can’t connect – full loop.

khuezy · September 17, 2025, 6:45pm

If it’s an infra level bad state, then there’s not much you can do. You can try to scale to 0 and back up. But if it’s an application level issue, debug it locally before deploying to Fly.

lillian · September 17, 2025, 7:11pm

just a guess - your machine has a volume, it is crash looping, you need to change a file on the volume to stop it crash looping? if that’s the case, update the entrypoint to /bin/sleep inf instead of your app’s code, and it will stay running so you can SSH in.

porton · September 17, 2025, 7:25pm

Yes, met that situtation.

For now, I solved it by destroying all machines and volumes (it is not yet a production system with meaningful data in volumes). But the bug report remains valid.

jfent · September 18, 2025, 8:30pm

What specifically is meant by “bad state”? The machine is in the started state (try fly status -a APP_NAME to check machine state) but you’re still not able to fly ssh console?

Or is the machine in stopped state, even after you just ran fly m start MACHINE_ID?

The former could be a bug.

The latter is not, and would be akin to saying “My music won’t play when I put my scratched CD into my CD player, so the CD player must be buggy”. Your app is the CD, Fly is the CD player.

porton · September 18, 2025, 8:55pm

By bad state I mean, when e.g. 0/2 (red, rather than green) checks passed in the machines list.

halfer · September 18, 2025, 9:05pm

Sounds like your healthchecks are failing. Pop your TOML config here (in a Markdown code block please) if you want advice on it. The summary is I’d suggest commenting the healthchecks out, and you may then find your machine comes alive sufficiently to get a console on it.