Cannot deploy app, machine not found, cannot destroy machine either

Hey,

A machine on one of my apps is messed up and I can no longer deploy the app. It also no longer is functioning properly (cannot be loaded at the url). I tried to destroy the machine in the dashboard but it fails.

Can someone help me with this?

Deploy attempt

Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found Retrying…
Failed to update machines: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found-


:heavy_check_mark: Cleared lease for e28603db77e938

Error: failed to update machine e28603db77e938: failed to destroy VM e28603db77e938: not_found: machine not found (Request ID: 01JAPMED4PZD03Z4ASZ6GTG885-sea) (Trace ID: 616120d3607da2fe72d396c8dfa629f7)

CLI destroy attempt

fly machine destroy e28603db77e938 --force
Error: machine e28603db77e938 was not found in app ‘hyperserve-docs-dev’

Dashboard destroy attempt

I’m having the same exact issue. Can’t restart my app or restart machine.

Hi,

It looks like there is an issue with the API right now Fly.io Status - Increased API failures :neutral_face:

1 Like

It’s getting resolved as we speak! From the status page:

Increased API failures

Identified
2024-10-22 19:25:14 UTC
We are currently in the process of rolling out a fix across our fleet.

It would also be helpful if, when you run fly deploy, it would warn you doing so will break your app. It was running fine before the deploy. I could have waited to release instead of having an outage.

1 Like

I have no idea how fly has such bad observability of their own infra. I could beat their ops team to every single outage by simply watching a script that monitors the forum activity. In 2024 there’s no excuse for this other than bad culture around product and deployments.

1 Like

@earl that would be great! But I’m not sure how to anticipate all the places where flyctl would need to check for warnings.

Anyway, it seems it’s better now. At least I’ve managed to deploy a tiny app to iad.

I hope they’ll release some analysis of what went wrong, it was a quite long incident considering impact.

The last one regarding the postgres connections was similar. It was over 12 hours between when it was first posted in the forums (by me) and their status page updated.


Still cant deploy on my end.

Waiting for depot builder… is infinite, fly scale doesnt work, fly deploy without depo works, but I cant see new machines in fly status


They actually do always do so now, in the Infrastructure Log. For example, the September 1 global outage was explained down to the level of individual lines of Rust code.

Today’s probably won’t be covered there until next Tuesday (October 29), though, since it’s a weekly update tempo…

2 Likes

It’s a good log, i wish there was a “this is up to date as of X date” so we know if they missed something or if its still just being written up.

Still cant do deploys on my end.

1 Like

Sorry to hear that, :adhesive_bandage:… Yeah, the navigability is arguably starting to get a little sprawl-y over there, more broadly.

For anyone following along, still hard down and I’m now getting different errors when I try to deploy. As of 232pm pst / 2132 utc.

Feels pretty disingenuous to say things like ‘parts of the api are down’ when for a non-zero percent of us what that actually means is ‘production server is down and the mechanism to recover is also down’. I guess it just doesn’t roll off the tongue the same… I mean I get it, but my customer (who’s app is down) doesn’t

1 Like

Exactly. Their last update really made me mad. What parts of the API are down, which are up? Because from my perspective the whole thing is broken, and has been for hours now.

1 Like

My original problem was unrelated to the current discussion I think. When I paid for a support plan to fix the issue I was facing (unresponsive machine), they fixed that and then the APIs went down. Painful.

I had been experiencing the original issue since Friday last week, but now I’m still unable to move forward and it’s Tuesday.

I’m wondering why fly doesn’t know when a machine goes down like it did, for days, before I have to pay for support to tell them.

lol yeah, see my earlier comment

1 Like

Success. Finally able to update my apps.

Really need to prioritize moving over to AWS…