failed to update machine {machine_id}: failed to destroy VM {machine_id}: not_found: machine not found

Trying to deploy my app but a machine seems to be unreachable so I can’t destroy it and thus can’t deploy.

Failed to update machines: failed to update machine 784e476cdd0938: failed to destroy VM 784e476cdd0938: not_found: machine not found Retrying…

If I try to destroy it from the dashboard, I get:

But when I try to deploy, this is the what I get:

It’s been like this for hours now. Nothing elsewhere in the forum has been helpful.

If I try to force destroy the machine:

$ fly machine destroy 784e476cdd0938 --force
Error: machine 784e476cdd0938 was not found in app ‘wotw-sync-up’

But it is a part of that app. The machine is listed, but the system is not able to communicate with the machine, apparently. How do I force it to detach or kill it or whatever. It’s stopping me from being able to deploy.

Hi… These cases are always tough, :adhesive_bandage:, since we forum readers can’t reach in and look at the actual metadata, etc. Does the --exclude-machines option to fly deploy help at all?

–exclude-machines strings - Deploy to all machines except machines with these IDs. Multiple IDs can be specified with comma separated values or by providing the flag multiple times.

(You may also need to fly m cordon it first.)

1 Like

Whoa! --exclude-machines worked! Not a permanent solution, but definitely helps!

Thank you, @mayailurus!

1 Like

To be fair, this doesn’t resolve the issue with the specific machine that I can’t destroy. It merely allows me to deploy without it failing.

Still need to figure out how to handle this zombie machine. I’m assuming I need someone from Fly to actually investigate? Not sure how I ping them though.

True… The Support plan is the only guaranteed way to do that.

If you were already on the fence about getting one, then that would definitely be the surest way forward.

Personally, I think I would just wait a week or two and see if a notification about a physical host failure appears. (Sometimes those glitch for a while, like a flickering lightbulb, before really being declared dead.) Also, there are grounds for anticipating a few reboots of the global metadata subsystem, as part of the fixes for the problem behind last week’s API disruption.

(There’s a postmortem for that expected to be published this week, as well.)

Deleting the app itself seems to reliably remove zombie machines—according to what others have posted, at least—but then of course it takes time to re-create…

Ha. Yeah, if I’m deleting the app entirely, it’s because I’m moving it to a new host. Hoping not to get to that point.

I posted in another thread about machine issues we’re having during deployments. I wanted to share here that the Machines API looks like its having problems across most regions according to https://rtt.fly.dev . Most regions aren’t even displaying a response time for me.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.