Machines won't start!

All our machines are gone and now new ones wont start

Can someone help us as our whole app is down?

Error: could not start machine 3d8d700dc31e48: failed to start VM 3d8d700dc31e48: failed_precondition: unable to start machine from current state: ‘created’ (Request ID: 01HXC0APZG26WPGTP8J49637MC-lhr)

3 Likes

We are having the same issue here. All our infrastructure is down.

Can someone tell when this is gonna be fixed please ?

Eliot

Hey! We have an active incident with machine starts. We have identified the root case, and are pushing out a fix.

1 Like

Any update with this? Unable to launch any instances.
Is there any way we can launch instances? I tried scaling to 0 and back up as well.

1 Like

We also can’t seem to push new deployments through the ctl (it hangs during upload of new images), and the live logs show failed to change machine state: machine getting replaced, refusing to start

Ridiculous. They took more than 3 hours to fix it while everything was down. I’ve been on AWS before and was thinking to shift to fly, but after this experience I really cannot trust them. Moving to Render instead.

Does it work for you? I still can’t push new deployment

Machines in one of our apps still aren’t starting. The service has been down for hours and customers are complaining.

Surprised about the status page…

1 Like

Hi guys, I am looking at the issue with the failing to start machines. It helps me if you share your app names or a machine id to check

1 Like

I redeployed the app and the machines are passing health checks but still appear stopped when doing fly machines list and are not responding to HTTP requests.

1 Like

The app name is wavekit-audio-encoder-production.

It seems I can start machines with fly machines start but still not responding to HTTP requests.

Ok things seem to be working now!

1 Like

I am seeing your machines starting and self gracefully shutting down a minute later

1 Like

hey @dominiks, do you mind sharing your fly deploy output or the error you get when deploying?

Yeah this is normal. I implemented scale to zero before Fly offered it and haven’t migrated yet.

It basically stops when trying to push

WARN failed to finish build in graphql: Post "https://api.fly.io/graphql": context canceled

Hi @aychtang, could you share the app name or machine id ? I see your machines failing to start because of a missing file .env.local but can’t find a machine in “replacing” state.

Same here,

Registry seems fucked too

2 Likes

The push refers to repository [registry-iad.fly.io/customer-credits]
ff10b430d065: Waiting
a0195938cfde: Waiting
ac8cfc402b6d: Waiting
f47424876570: Waiting
889fe9bc0aaa: Waiting
e0ce87e958a0: Waiting
f0fc506f7982: Waiting
46cf0e48c979: Waiting
c3e35b2afb23: Waiting
b7fcac299347: Waiting

Just tried a different registry, same thing: registry-iad.fly.io

I’ve enabled LOG_LEVEL=debug

The push refers to repository [registry-iad.fly.io/customer-credits]
ff10b430d065: Waiting
DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...

DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...

DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...

DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...

DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...
ff10b430d065: Waiting
a0195938cfde: Waiting
ac8cfc402b6d: Waiting
f47424876570: Waiting
889fe9bc0aaa: Waiting
e0ce87e958a0: Waiting
f0fc506f7982: Waiting
46cf0e48c979: Waiting
c3e35b2afb23: Waiting
b7fcac299347: Waiting
ec4a38999118: Waiting
DEBUG Sending remote builder heartbeat pulse to http://fdaa:0:b44e:a7b:41:b74d:a294:2:8080/flyio/v1/extendDeadline...