Docker Build & Deploy Fail on Apple M1

I am trying to deploy the example Elixir app by following the getting started guide, but fly deploy fails with a segfault on an Apple M1.

After adding python2 as an additional dependency to the Dockerfile (otherwise the node-gyp rebuild fails during [build 9/16] RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error because python is not found, perhaps this needs to be fixed in the getting-started Dockerfile?), a local docker build works, but only for arm64:

$ docker build .
[+] Building 0.8s (27/27) FINISHED
 => [internal] load build definition from Dockerfile                                                                                             0.1s
 => => transferring dockerfile: 2.10kB                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                0.0s
 => => transferring context: 34B                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/alpine:3.13.3                                                                                 0.7s
 => [internal] load metadata for docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3                                                       0.7s
 => [build  1/16] FROM docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3@sha256:4eefe11eb82f277cd82ba5d792b82c0e82f7d41e3a021676b5f1eb1  0.0s
 => [internal] load build context                                                                                                                0.0s
 => => transferring context: 2.84kB                                                                                                              0.0s
 => [app 1/5] FROM docker.io/library/alpine:3.13.3@sha256:826f70e0ac33e99a72cf20fb0571245a8fee52d68cb26d8bc58e53bfa65dcdfa                       0.0s
 => CACHED [app 2/5] RUN apk add --no-cache libstdc++ openssl ncurses-libs                                                                       0.0s
 => CACHED [app 3/5] WORKDIR /app                                                                                                                0.0s
 => CACHED [app 4/5] RUN chown nobody:nobody /app                                                                                                0.0s
 => CACHED [build  2/16] RUN apk add --no-cache build-base npm python2                                                                           0.0s
 => CACHED [build  3/16] WORKDIR /app                                                                                                            0.0s
 => CACHED [build  4/16] RUN mix local.hex --force &&     mix local.rebar --force                                                                0.0s
 => CACHED [build  5/16] COPY mix.exs mix.lock ./                                                                                                0.0s
 => CACHED [build  6/16] COPY config config                                                                                                      0.0s
 => CACHED [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile                                                                    0.0s
 => CACHED [build  8/16] COPY assets/package.json assets/package-lock.json ./assets/                                                             0.0s
 => CACHED [build  9/16] RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error                                               0.0s
 => CACHED [build 10/16] COPY priv priv                                                                                                          0.0s
 => CACHED [build 11/16] COPY assets assets                                                                                                      0.0s
 => CACHED [build 12/16] RUN npm run --prefix ./assets deploy                                                                                    0.0s
 => CACHED [build 13/16] RUN mix phx.digest                                                                                                      0.0s
 => CACHED [build 14/16] COPY lib lib                                                                                                            0.0s
 => CACHED [build 15/16] COPY rel rel                                                                                                            0.0s
 => CACHED [build 16/16] RUN mix do compile, release                                                                                             0.0s
 => CACHED [app 5/5] COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/hello_elixir ./                                                0.0s
 => exporting to image                                                                                                                           0.0s
 => => exporting layers                                                                                                                          0.0s
 => => writing image sha256:9a12e58ad929571d205cc15b4f203f3fc9ad9f9f5b1fd9d3f30ac34377862960

The build for amd64 fails, which is not entirely unexpected:

$ docker build --platform linux/amd64 .
[+] Building 2.2s (16/26)
 => [internal] load build definition from Dockerfile                                                                                             0.0s
 => => transferring dockerfile: 37B                                                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                0.0s
 => => transferring context: 34B                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/alpine:3.13.3                                                                                 1.6s
 => [internal] load metadata for docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3                                                       1.6s
 => [internal] load build context                                                                                                                0.0s
 => => transferring context: 2.84kB                                                                                                              0.0s
 => [app 1/5] FROM docker.io/library/alpine:3.13.3@sha256:826f70e0ac33e99a72cf20fb0571245a8fee52d68cb26d8bc58e53bfa65dcdfa                       0.0s
 => [build  1/16] FROM docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3@sha256:4eefe11eb82f277cd82ba5d792b82c0e82f7d41e3a021676b5f1eb1  0.0s
 => CACHED [app 2/5] RUN apk add --no-cache libstdc++ openssl ncurses-libs                                                                       0.0s
 => CACHED [app 3/5] WORKDIR /app                                                                                                                0.0s
 => CACHED [app 4/5] RUN chown nobody:nobody /app                                                                                                0.0s
 => CACHED [build  2/16] RUN apk add --no-cache build-base npm python2                                                                           0.0s
 => CACHED [build  3/16] WORKDIR /app                                                                                                            0.0s
 => CACHED [build  4/16] RUN mix local.hex --force &&     mix local.rebar --force                                                                0.0s
 => CACHED [build  5/16] COPY mix.exs mix.lock ./                                                                                                0.0s
 => CACHED [build  6/16] COPY config config                                                                                                      0.0s
 => ERROR [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile                                                                     0.5s
------
 > [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile:
#16 0.482 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#16 0.487 Segmentation fault
------
executor failed running [/bin/sh -c mix deps.get --only prod &&     mix deps.compile]: exit code: 139

I have seen that the suggested fix (discussed here and here) is to run fly deploy --remote-only, but this also errors out (tried it for 2 different Elixir apps):

$ fly deploy --remote-only
Deploying summer-pine-7691
==> Validating app configuration
--> Validating app configuration done
Services
TCP 80/443 ⇢ 4000

Error error connecting to docker: An unknown error occured.

Is there anything else I can do short of using an external service to build the docker image for the right arch? I tried setting up a Rosetta-emulated brew installation to run Docker under x84_64, but Docker then complains about the wrong architecture, even if I launch it explicitly under rosetta using arch -x86_64 ..., so at this point I’m not sure what else to try.

Docker on m1 is really buggy with x86_64 images. About 50% of our test runs crash with that qemu segfault.

The --remote-only command should work, though. Will you try stopping Docker entirely and then run:

LOG_LEVEL=debug fly deploy --remote-only

Yup, this is Docker problem more than Fly at this point, though. The remote option works great, assuming your code is normal sized - it needs to be uploaded to Fly each time you run a build, and the build itself runs on Fly.

It’s not as fast as I’d like, and I had to tap a few keys a few times to make sure it wasn’t stuck, but I’ve got an M1 too and it’s the only choice at this point.

We just released flyctl 0.0.243, which upgraded buildkit and which may help with builds on M1. Can you give it a try?

Thanks for all the help!

The new version improves the developer experience of the remote build, but does not yet enable local builds on M1, correct? (I still get a segfault when using --local-only, but flyctl now automatically runs a remote build by default, which works fine and lets me deploy the image.)