Cannot deploy app, machine not found, cannot destroy machine either

rtrann · October 21, 2024, 4:29am

Hey,

A machine on one of my apps is messed up and I can no longer deploy the app. It also no longer is functioning properly (cannot be loaded at the url). I tried to destroy the machine in the dashboard but it fails.

Can someone help me with this?

Deploy attempt

Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found-

Cleared lease for e28603db77e938

Error: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found (Request ID: 01JAPMED4PZD03Z4ASZ6GTG885-sea) (Trace ID: 616120d3607da2fe72d396c8dfa629f7)

CLI destroy attempt

fly machine destroy e28603db77e938 --force
Error: machine e28603db77e938 was not found in app ‘hyperserve-docs-dev’

Dashboard destroy attempt

AlbyJames · October 22, 2024, 6:08pm

I’m having the same exact issue. Can’t restart my app or restart machine.

greg · October 22, 2024, 6:28pm

Hi,

It looks like there is an issue with the API right now Fly.io Status - Increased API failures

ktosiek · October 22, 2024, 7:31pm

It’s getting resolved as we speak! From the status page:

Increased API failures

Identified
2024-10-22 19:25:14 UTC
We are currently in the process of rolling out a fix across our fleet.

earl · October 22, 2024, 8:04pm

It would also be helpful if, when you run fly deploy, it would warn you doing so will break your app. It was running fine before the deploy. I could have waited to release instead of having an outage.

andrewmcgrath · October 22, 2024, 8:13pm

I have no idea how fly has such bad observability of their own infra. I could beat their ops team to every single outage by simply watching a script that monitors the forum activity. In 2024 there’s no excuse for this other than bad culture around product and deployments.

ktosiek · October 22, 2024, 8:13pm

@earl that would be great! But I’m not sure how to anticipate all the places where flyctl would need to check for warnings.

Anyway, it seems it’s better now. At least I’ve managed to deploy a tiny app to iad.

ktosiek · October 22, 2024, 8:16pm

I hope they’ll release some analysis of what went wrong, it was a quite long incident considering impact.

andrewmcgrath · October 22, 2024, 8:17pm

The last one regarding the postgres connections was similar. It was over 12 hours between when it was first posted in the forums (by me) and their status page updated.

andrewmcgrath · October 22, 2024, 8:29pm

Still cant deploy on my end.

Tearsnake · October 22, 2024, 8:36pm

Waiting for depot builder… is infinite, fly scale doesnt work, fly deploy without depo works, but I cant see new machines in fly status

mayailurus · October 22, 2024, 9:02pm

They actually do always do so now, in the Infrastructure Log. For example, the September 1 global outage was explained down to the level of individual lines of Rust code.

Today’s probably won’t be covered there until next Tuesday (October 29), though, since it’s a weekly update tempo…

andrewmcgrath · October 22, 2024, 9:04pm

It’s a good log, i wish there was a “this is up to date as of X date” so we know if they missed something or if its still just being written up.

Still cant do deploys on my end.

mayailurus · October 22, 2024, 9:09pm

Sorry to hear that, … Yeah, the navigability is arguably starting to get a little sprawl-y over there, more broadly.

earl · October 22, 2024, 9:34pm

For anyone following along, still hard down and I’m now getting different errors when I try to deploy. As of 232pm pst / 2132 utc.

jacobdejean · October 22, 2024, 9:35pm

Feels pretty disingenuous to say things like ‘parts of the api are down’ when for a non-zero percent of us what that actually means is ‘production server is down and the mechanism to recover is also down’. I guess it just doesn’t roll off the tongue the same… I mean I get it, but my customer (who’s app is down) doesn’t

andrewmcgrath · October 22, 2024, 9:46pm

Exactly. Their last update really made me mad. What parts of the API are down, which are up? Because from my perspective the whole thing is broken, and has been for hours now.

rtrann · October 22, 2024, 10:10pm

My original problem was unrelated to the current discussion I think. When I paid for a support plan to fix the issue I was facing (unresponsive machine), they fixed that and then the APIs went down. Painful.

I had been experiencing the original issue since Friday last week, but now I’m still unable to move forward and it’s Tuesday.

I’m wondering why fly doesn’t know when a machine goes down like it did, for days, before I have to pay for support to tell them.

andrewmcgrath · October 22, 2024, 10:16pm

lol yeah, see my earlier comment

andrewmcgrath · October 22, 2024, 10:25pm

Success. Finally able to update my apps.

Really need to prioritize moving over to AWS…