fly deploy failing - machine does not go to started state

Looking for pointers on how to address a deploy failure. (sorry if i miss an L as my keyboard a bit flaky)

I am deploying v73 of my app. So it is not something reasonably new.

mix phx.server works on my local box.

but deploy look like this:
Watch your deployment at Sign In · Fly

Running nvwweb release_command: /app/bin/migrate
release command failed - aborting deployment. error waiting for release_command machine 5683923f15398e to start: timed out waiting for machine to reach started state: failed to wait for VM 5683923f15398e in started state: Get “https://api.machines.dev/v1/apps/nvwweb/machines/5683923f15398e/wait?instance_id=01HE5J7MFZ6AJRGYQ6Q8MKPWXS&state=started&timeout=60”: net/http: request canceled
You can increase the timeout with the --wait-timeout flag

I have tried recommitting the code. I have redeployed and destroyed machines.

the machine itself gets stuck in the following state:
nvwweb
ID NAME STATE REGION IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE
5683923f15398e floral-fog-8022 created sin nvwweb:deployment-01HE5J7ETMHM32Q810RQC37WSD fdaa:0:6d28:a7b:a754:e736:d85c:2 2023-11-01T13:47:42Z 2023-11-01T13:47:42Z v2 fly_app_release_command shared-cpu-2x:512MB

Any pointers would be helpful…
I can not get the machine to “started” state. Maybe there is something wrong with the app/bin/migrate but a little vague

Hi, your machine in Singapore just took an inordinately long amount of time to pull its image:

Successfully prepared image registry.fly.io/nvwweb:deployment-01HE5J7ETMHM32Q810RQC37WSD (26m52.684533528s)

increasing the --wait-timeout to 1800 seconds might help work around this - it will take a while but at least the deployment shouldn’t break :slight_smile:

If you want something faster: Notice that what’s failing is the release_command, and this works by creating an ephemeral machine in your primary region, running the release command, and then deleting the machine. So another thing you can try is forcing a different primary region for the deployment so the release_command machine is created in a region that’s not affected by image pull issues:

PRIMARY_REGION=hkg fly deploy

and of course, we’re looking at this random slowness pulling images in that particular region so these workarounds are not needed anymore.

  • Daniel

Sorry for the obvious but how do i set the wait timeout? in the docker file? on the command line? Apologies for being an idiot in advance

I am trying the following:
fly machine destroy 9080e092c03dd8 --force

machine 9080e092c03dd8 was found and is currently in created state, attempting to destroy…

9080e092c03dd8 has been destroyed

scottsproule@Scotts-Mac-mini nvwweb % fly deploy --wait-timeout 1000

Please note that it seems like i have to destroy the old machines as they are stuck. just fyi not a problem to do.

1 Like

Thank you. THe longer timeout allowed me to deploy successfully.

I get the following error despite the app still working.
he app is not listening on the expected address and will not be reachable by fly-proxy.35b105789 [app] update finished: success
and it suggests to

You can fix this by configuring your app to listen on the following addresses:

  • 0.0.0.0:8080

Do i need to do this?

Thank you again. Changing the wait timeout allowed it to deploy

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.