How to connect to a machine that is crash-looping?

This is a bug:

I can’t connect to a machine in bad state using fly ssh console. Because, I can’t connect, I can’t fix its state, and because I can’t fix its state, I can’t connect – full loop.

If it’s an infra level bad state, then there’s not much you can do. You can try to scale to 0 and back up. But if it’s an application level issue, debug it locally before deploying to Fly.

1 Like

just a guess - your machine has a volume, it is crash looping, you need to change a file on the volume to stop it crash looping? if that’s the case, update the entrypoint to /bin/sleep inf instead of your app’s code, and it will stay running so you can SSH in.

Yes, met that situtation.

For now, I solved it by destroying all machines and volumes (it is not yet a production system with meaningful data in volumes). But the bug report remains valid.

What specifically is meant by “bad state”? The machine is in the started state (try fly status -a APP_NAME to check machine state) but you’re still not able to fly ssh console?

Or is the machine in stopped state, even after you just ran fly m start MACHINE_ID?

The former could be a bug.

The latter is not, and would be akin to saying “My music won’t play when I put my scratched CD into my CD player, so the CD player must be buggy”. Your app is the CD, Fly is the CD player.

1 Like

By bad state I mean, when e.g. 0/2 (red, rather than green) checks passed in the machines list.

Sounds like your healthchecks are failing. Pop your TOML config here (in a Markdown code block please) if you want advice on it. The summary is I’d suggest commenting the healthchecks out, and you may then find your machine comes alive sufficiently to get a console on it.