Failure to redeploy a previously working Fly.io deployment

So Ive been able to push from my local version of my code to my GitHub repository and use a continues-deployment in order to keep it up to date. Since last night I keep getting this error whenever I try to deploy, either with a PR or manually with fly deploy


Running game-site release_command: /app/bin/migrate
Starting machine
error starting release_command machine: failed to start VM 185e159a223128: aborted: machine destroyed, cannot add any more events

-------
 ✖ Failed: failed to start VM 185e159a223128: aborted: machine destroyed, cannot add any more events
-------
Error: release command failed - aborting deployment. failed to start VM 185e159a223128: aborted: machine destroyed, cannot add any more events (Request ID: 01JSYR189S4Y3P6JJ8XZBAWDD3-sjc) (Trace ID: 03fad0e974f2d2c5bcf5777510b7130a)

Here are the live logs


2025-04-28T17:51:06.680 runner[4d89664a452578] sjc [info] Successfully prepared image registry.fly.io/game-site:deployment-01JSYRBE1KYZCZ05MARPCN9ZH4 (2.499402566s)

2025-04-28T17:51:13.000 runner[4d89664a452578] sjc [info] Configuring firecracker

2025-04-28T17:51:16.319 app[4d89664a452578] sjc [info] 2025-04-28T17:51:16.319403773 [01JSYRC7XT51KM5R9MF4PSS4FQ:main] Running Firecracker v1.7.0

2025-04-28T17:51:17.213 app[4d89664a452578] sjc [info] INFO Starting init (commit: d15e62a13)...

2025-04-28T17:51:17.362 app[4d89664a452578] sjc [info] INFO Preparing to run: `/app/bin/migrate` as nobody

2025-04-28T17:51:17.367 app[4d89664a452578] sjc [info] ERROR Error: failed to spawn command: /app/bin/migrate: No such file or directory (os error 2)

2025-04-28T17:51:17.368 app[4d89664a452578] sjc [info] does `/app/bin/migrate` exist and is it executable?

2025-04-28T17:51:17.369 app[4d89664a452578] sjc [info] [ 0.981954] reboot: Restarting system

2025-04-28T17:51:17.458 app[4d89664a452578] sjc [warn] Virtual machine exited abruptly

2025-04-28T17:51:17.504 runner[4d89664a452578] sjc [info] machine restart policy set to 'no', not restarting 

Happy to try and give more information as needed, but this woill be the 15th time that I have pushed to the Github repo and every other time it worked fine. Any help would be great.

Hi… Is /app/bin/migrate a shell script? (I.e., starting with #!?)

Sometimes those get CR-LF anomalies when people switch between Windows and macOS/Linux…

#!/bin/sh
set -eu

cd -P -- "$(dirname -- "$0")"
exec ./game_site eval GameSite.Release.migrate

It exists and it seems to have the right lines of code.

Would it have anything to do with permissions on the files? Ive tried to see if there is any difference between the files from when I first deployed and now and I can’t seem to find any real differences and even if I try to go back to previous commits for those files it still doesn’t work. Thanks for quick feedback though.

It could! That’s what the “is it executable?” in its message was referring to, in fact—the x bit in the permissions. (For the user named nobody, in this instance.)

At times like this, I usually try changing the CMD to sleep inf temporarily and then fly ssh console to poke around in the filesystem and see what is actually there:

$ fly deploy   # after changing CMD.
$ fly m start  # ensure at least one running.
$ fly ssh console
# ls -l /app/bin/migrate
# od -c /app/bin/migrate  # will show offending CRs explicitly as `\r`.

(It can also help to disable auto-stop during this debugging phase, to keep it from shutting down in the middle of your SSH session, :sweat_smile:.)

-rwxr-xr-x 1 nobody root 96 Apr 24 02:32 bin/migrate
0000000   #   !   /   b   i   n   /   s   h  \n   s   e   t       -   e
0000020   u  \n  \n   c   d       -   P       -   -       "   $   (   d
0000040   i   r   n   a   m   e       -   -       "   $   0   "   )   "
0000060  \n   e   x   e   c       .   /   g   a   m   e   _   s   i   t
0000100   e       e   v   a   l       G   a   m   e   S   i   t   e   .
0000120   R   e   l   e   a   s   e   .   m   i   g   r   a   t   e  \n
0000140

Also how do you stop it auto shutting down?

ls -l /app/bin/migrate
-rwxr-xr-x 1 nobody root 96 Apr 24 02:32 /app/bin/migrate
od -c /app/bin/migrate
0000000   #   !   /   b   i   n   /   s   h  \n   s   e   t       -   e
0000020   u  \n  \n   c   d       -   P       -   -       "   $   (   d
0000040   i   r   n   a   m   e       -   -       "   $   0   "   )   "
0000060  \n   e   x   e   c       .   /   g   a   m   e   _   s   i   t
0000100   e       e   v   a   l       G   a   m   e   S   i   t   e   .
0000120   R   e   l   e   a   s   e   .   m   i   g   r   a   t   e  \n
0000140

Thanks… I was just about to ask about the working directory, :sweat_smile:

Those two look ok, actually. Can you run /app/bin/migrate as root, from the SSH session?

This is normally the auto_stop_machines setting in fly.toml, although the 7-day free trial has an undocumented 5-minute time limit (from what I hear), and I don’t think that one can be disabled.

Thanks again.

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

Assuming that it’s the 3rd line what do I change it to

root@1781997b5e2e08:/app# /app/bin/migrate
18:52:19.672 [info] Migrations already up
root@1781997b5e2e08:/app# 

also just so its here here are the permissions for the local files

ls -l rel/overlays/bin
total 16
-rwxr-xr-x 1 vinny vinny 99 Apr 28 10:58 migrate
-rwxr-xr-x 1 vinny vinny 52 Apr 28 10:27 migrate.bat
-rwxr-xr-x 1 vinny vinny 91 Apr 28 10:27 server
-rwxr-xr-x 1 vinny vinny 49 Apr 28 10:27 server.bat
1 Like

Changing that to 'off' should do the trick.

Interesting… How about runuser -u nobody -- /app/bin/migrate?

(It’s typically not possible to SSH in as nobody, so this one is a little roundabout.)

runuser -u nobody -- /app/bin/migrate
19:04:36.875 [info] Migrations already up

Huh… The good news is that you should be able to change your release_command, etc., to incorporate that runuser prefix, and then be USER root in your Dockerfile. (That’s roughly how I have my own Elixir project arranged, due to LiteFS.) It really should work without that, though…

It might help to post your full Dockerfile, even if it’s the one that fly launch auto-generated. That’s where the user-fiddling details would be defined.

Also, for tagging purpose, is this Elixir? (Or maybe Erlang?)

1 Like

It’s in elixir

Dockerfile

# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
# instead of Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
# This file is based on these images:
#
#   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
#   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20250203-slim - for the release image
#   - https://pkgs.org/ - resource for finding needed packages
#   - Ex: hexpm/elixir:1.14.5-erlang-26.2.5.8-debian-bullseye-20250203-slim
#
ARG ELIXIR_VERSION=1.14.5
ARG OTP_VERSION=26.2.5.8
ARG DEBIAN_VERSION=bullseye-20250203-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies
RUN apt-get update -y && apt-get install -y build-essential git \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
  mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && \
  apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/game_site ./

# #Added this line
# RUN chmod 755 /app/bin/*
# #Added this line
# RUN [ -f /app/bin/migrate ] && chmod +x /app/bin/migrate 

USER nobody

# If using an environment that doesn't automatically reap zombie processes, it is
# advised to add an init process such as tini via `apt-get install`
# above and adding an entrypoint. See https://github.com/krallin/tini for details
# ENTRYPOINT ["/tini", "--"]

CMD ["/app/bin/server"]

fly.toml

# fly.toml app configuration file generated for game-site on 2025-04-11T18:56:22-07:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = 'game-site'
primary_region = 'sjc'
kill_signal = 'SIGTERM'

[build]

[deploy]
  release_command = '/app/bin/migrate'

[env]
  PHX_HOST = 'game-site.fly.dev'
  PORT = '8080'

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = 'off'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

  [http_service.concurrency]
    type = 'connections'
    hard_limit = 1000
    soft_limit = 1000

[[vm]]
  memory = '1gb'
  cpu_kind = 'shared'
  cpus = 1

When changing

[deploy]
  release_command = '/app/bin/migrate'

to

[deploy]
release_command = 'runuser -u nobody -- /app/bin/migrate'

and

USER nobody

to

USER root

I get this error after running fly deploy

 2025-04-28T19:47:57.129 runner[2865100f702798] sjc [info] Pulling container image registry.fly.io/game-site:deployment-01JSYZ1DJ5AVFBMWYEY1B6B4RD

2025-04-28T19:48:01.830 runner[2865100f702798] sjc [info] Successfully prepared image registry.fly.io/game-site:deployment-01JSYZ1DJ5AVFBMWYEY1B6B4RD (4.700915395s)

2025-04-28T19:48:03.526 runner[2865100f702798] sjc [info] Configuring firecracker

2025-04-28T19:48:05.982 app[2865100f702798] sjc [info] 2025-04-28T19:48:05.982223047 [01JSYZ28G3RK0XW2C0C7T3BWWX:main] Running Firecracker v1.7.0

2025-04-28T19:48:07.043 app[2865100f702798] sjc [info] INFO Starting init (commit: d15e62a13)...

2025-04-28T19:48:07.184 app[2865100f702798] sjc [info] INFO Preparing to run: `runuser -u nobody -- /app/bin/migrate` as root

2025-04-28T19:48:07.187 app[2865100f702798] sjc [info] INFO [fly api proxy] listening at /.fly/api

2025-04-28T19:48:07.233 runner[2865100f702798] sjc [info] Machine started in 1.352s

2025-04-28T19:48:07.264 app[2865100f702798] sjc [info] runuser: failed to execute /app/bin/migrate: No such file or directory

2025-04-28T19:48:07.501 app[2865100f702798] sjc [info] 2025/04/28 19:48:07 INFO SSH listening listen_address=[fdaa:12:95db:a7b:181:7e5c:e38d:2]:22

2025-04-28T19:48:08.193 app[2865100f702798] sjc [info] INFO Main child exited normally with code: 1

2025-04-28T19:48:08.211 app[2865100f702798] sjc [info] INFO Starting clean up.

2025-04-28T19:48:09.781 app[2865100f702798] sjc [info] WARN could not unmount /rootfs: EINVAL: Invalid argument

2025-04-28T19:48:09.782 app[2865100f702798] sjc [info] [ 3.708968] reboot: Restarting system

2025-04-28T19:48:10.691 runner[2865100f702798] sjc [info] machine restart policy set to 'no', not restarting 

and

image: registry.fly.io/game-site:deployment-01JSYZ1DJ5AVFBMWYEY1B6B4RD
image size: 49 MB

Watch your deployment at https://fly.io/apps/game-site/monitoring

Running game-site release_command: runuser -u nobody -- /app/bin/migrate
Starting machine

-------
 ✖ release_command failed
-------
Error release_command failed running on machine 2865100f702798 with exit code 1.
Checking logs: fetching the last 100 lines below:
2025-04-28T19:48:05Z 2025-04-28T19:48:05.982223047 [01JSYZ28G3RK0XW2C0C7T3BWWX:main] Running Firecracker v1.7.0
2025-04-28T19:48:07Z  INFO Starting init (commit: d15e62a13)...
2025-04-28T19:48:07Z  INFO Preparing to run: `runuser -u nobody -- /app/bin/migrate` as root
2025-04-28T19:48:07Z  INFO [fly api proxy] listening at /.fly/api
2025-04-28T19:48:07Z Machine started in 1.352s
2025-04-28T19:48:07Z runuser: failed to execute /app/bin/migrate: No such file or directory
2025-04-28T19:48:07Z 2025/04/28 19:48:07 INFO SSH listening listen_address=[fdaa:12:95db:a7b:181:7e5c:e38d:2]:22
2025-04-28T19:48:08Z  INFO Main child exited normally with code: 1
-------
Error: release command failed - aborting deployment. machine 2865100f702798 exited with non-zero status of 1

Did I miss a line that I should have changed on an other file?

No, that really should have worked…

If you comment out release_command entirely, does the rest of the deploy go through?

So I removed the line and the “rest” of it went through but now the site is down…

--> Build Summary:  (​)
--> Building image done
image: registry.fly.io/game-site:deployment-01JSYZEGMPPQJF9XR77ZB3C1H3
image size: 49 MB

Watch your deployment at https://fly.io/apps/game-site/monitoring

-------
Updating existing machines in 'game-site' with rolling strategy

-------
 ✔ [1/2] Cleared lease for 1781997b5e2e08
 ✔ [2/2] Cleared lease for 080e45dc109068
-------
Checking DNS configuration for game-site.fly.dev

Visit your newly deployed app at https://game-site.fly.dev/

Where as before at least the site was still up.

All the machines are down and are running into issues trying to start back up now too.

Yikes, :dragon:… You should be able to revert† to an older build via fly deploy --image registry.fly.io/<long-string>. (You can get the list of recent ones via fly releases --image.)


Aside: I haven’t been able to reproduce any of these release_command anomalies on my own Machines, despite trying several variations, so I don’t think this is a global glitch…


†Later edit: It appears that you’re back up and running now, but I should insert a caveat for general reference that any incompatible changes to fly.toml, secrets, etc., since the time of that older image have to be manually undone. fly deploy --image only affects the Dockerfile part; there unfortunately isn’t a single command that rolls back all aspects (of an app).

1 Like

Yes, it is up for me now too.

However, if your site is down, but your machines are stable, then at least you have a clear debugging path; you can shell into your container using flyctl, and then see if your listener has died, or run networking tools to see what IP address your listener has attached to.

1 Like

Thanks for the help I would have let you know yesterday but I reached my max limit on replies for my first day.

It was in fact that at some point my migrate and a few other files within the rel folder got formatted with the windows ^M

-#!/bin/sh
-
-# configure node for distributed erlang with IPV6 support
-export ERL_AFLAGS="-proto_dist inet6_tcp"
-export ECTO_IPV6="true"
-export DNS_CLUSTER_QUERY="${FLY_APP_NAME}.internal"
-export RELEASE_DISTRIBUTION="name"
-export RELEASE_NODE="${FLY_APP_NAME}-${FLY_IMAGE_REF##*-}@${FLY_PRIVATE_IP}"
-
-# Uncomment to send crash dumps to stderr
-# This can be useful for debugging, but may log sensitive information
-# export ERL_CRASH_DUMP=/dev/stderr
-# export ERL_CRASH_DUMP_BYTES=4096
+#!/bin/sh^M
+^M
+# configure node for distributed erlang with IPV6 support^M
+export ERL_AFLAGS="-proto_dist inet6_tcp"^M
+export ECTO_IPV6="true"^M
+export DNS_CLUSTER_QUERY="${FLY_APP_NAME}.internal"^M
+export RELEASE_DISTRIBUTION="name"^M
+export RELEASE_NODE="${FLY_APP_NAME}-${FLY_IMAGE_REF##*-}@${FLY_PRIVATE_IP}"^M
+^M
+# Uncomment to send crash dumps to stderr^M
+# This can be useful for debugging, but may log sensitive information^M
+# export ERL_CRASH_DUMP=/dev/stderr^M
+# export ERL_CRASH_DUMP_BYTES=4096^M

I went through each commit till I found a working one and checked the diff from my current working (local) repo.

2 Likes