Docker Build & Deploy Fail on Apple M1

I am trying to deploy the example Elixir app by following the getting started guide, but fly deploy fails with a segfault on an Apple M1.

After adding python2 as an additional dependency to the Dockerfile (otherwise the node-gyp rebuild fails during [build 9/16] RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error because python is not found, perhaps this needs to be fixed in the getting-started Dockerfile?), a local docker build works, but only for arm64:

$ docker build .
[+] Building 0.8s (27/27) FINISHED
 => [internal] load build definition from Dockerfile                                                                                             0.1s
 => => transferring dockerfile: 2.10kB                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                0.0s
 => => transferring context: 34B                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/alpine:3.13.3                                                                                 0.7s
 => [internal] load metadata for docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3                                                       0.7s
 => [build  1/16] FROM docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3@sha256:4eefe11eb82f277cd82ba5d792b82c0e82f7d41e3a021676b5f1eb1  0.0s
 => [internal] load build context                                                                                                                0.0s
 => => transferring context: 2.84kB                                                                                                              0.0s
 => [app 1/5] FROM docker.io/library/alpine:3.13.3@sha256:826f70e0ac33e99a72cf20fb0571245a8fee52d68cb26d8bc58e53bfa65dcdfa                       0.0s
 => CACHED [app 2/5] RUN apk add --no-cache libstdc++ openssl ncurses-libs                                                                       0.0s
 => CACHED [app 3/5] WORKDIR /app                                                                                                                0.0s
 => CACHED [app 4/5] RUN chown nobody:nobody /app                                                                                                0.0s
 => CACHED [build  2/16] RUN apk add --no-cache build-base npm python2                                                                           0.0s
 => CACHED [build  3/16] WORKDIR /app                                                                                                            0.0s
 => CACHED [build  4/16] RUN mix local.hex --force &&     mix local.rebar --force                                                                0.0s
 => CACHED [build  5/16] COPY mix.exs mix.lock ./                                                                                                0.0s
 => CACHED [build  6/16] COPY config config                                                                                                      0.0s
 => CACHED [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile                                                                    0.0s
 => CACHED [build  8/16] COPY assets/package.json assets/package-lock.json ./assets/                                                             0.0s
 => CACHED [build  9/16] RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error                                               0.0s
 => CACHED [build 10/16] COPY priv priv                                                                                                          0.0s
 => CACHED [build 11/16] COPY assets assets                                                                                                      0.0s
 => CACHED [build 12/16] RUN npm run --prefix ./assets deploy                                                                                    0.0s
 => CACHED [build 13/16] RUN mix phx.digest                                                                                                      0.0s
 => CACHED [build 14/16] COPY lib lib                                                                                                            0.0s
 => CACHED [build 15/16] COPY rel rel                                                                                                            0.0s
 => CACHED [build 16/16] RUN mix do compile, release                                                                                             0.0s
 => CACHED [app 5/5] COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/hello_elixir ./                                                0.0s
 => exporting to image                                                                                                                           0.0s
 => => exporting layers                                                                                                                          0.0s
 => => writing image sha256:9a12e58ad929571d205cc15b4f203f3fc9ad9f9f5b1fd9d3f30ac34377862960

The build for amd64 fails, which is not entirely unexpected:

$ docker build --platform linux/amd64 .
[+] Building 2.2s (16/26)
 => [internal] load build definition from Dockerfile                                                                                             0.0s
 => => transferring dockerfile: 37B                                                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                0.0s
 => => transferring context: 34B                                                                                                                 0.0s
 => [internal] load metadata for docker.io/library/alpine:3.13.3                                                                                 1.6s
 => [internal] load metadata for docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3                                                       1.6s
 => [internal] load build context                                                                                                                0.0s
 => => transferring context: 2.84kB                                                                                                              0.0s
 => [app 1/5] FROM docker.io/library/alpine:3.13.3@sha256:826f70e0ac33e99a72cf20fb0571245a8fee52d68cb26d8bc58e53bfa65dcdfa                       0.0s
 => [build  1/16] FROM docker.io/hexpm/elixir:1.12.1-erlang-24.0.1-alpine-3.13.3@sha256:4eefe11eb82f277cd82ba5d792b82c0e82f7d41e3a021676b5f1eb1  0.0s
 => CACHED [app 2/5] RUN apk add --no-cache libstdc++ openssl ncurses-libs                                                                       0.0s
 => CACHED [app 3/5] WORKDIR /app                                                                                                                0.0s
 => CACHED [app 4/5] RUN chown nobody:nobody /app                                                                                                0.0s
 => CACHED [build  2/16] RUN apk add --no-cache build-base npm python2                                                                           0.0s
 => CACHED [build  3/16] WORKDIR /app                                                                                                            0.0s
 => CACHED [build  4/16] RUN mix local.hex --force &&     mix local.rebar --force                                                                0.0s
 => CACHED [build  5/16] COPY mix.exs mix.lock ./                                                                                                0.0s
 => CACHED [build  6/16] COPY config config                                                                                                      0.0s
 => ERROR [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile                                                                     0.5s
------
 > [build  7/16] RUN mix deps.get --only prod &&     mix deps.compile:
#16 0.482 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#16 0.487 Segmentation fault
------
executor failed running [/bin/sh -c mix deps.get --only prod &&     mix deps.compile]: exit code: 139

I have seen that the suggested fix (discussed here and here) is to run fly deploy --remote-only, but this also errors out (tried it for 2 different Elixir apps):

$ fly deploy --remote-only
Deploying summer-pine-7691
==> Validating app configuration
--> Validating app configuration done
Services
TCP 80/443 ⇢ 4000

Error error connecting to docker: An unknown error occured.

Is there anything else I can do short of using an external service to build the docker image for the right arch? I tried setting up a Rosetta-emulated brew installation to run Docker under x84_64, but Docker then complains about the wrong architecture, even if I launch it explicitly under rosetta using arch -x86_64 ..., so at this point I’m not sure what else to try.

Docker on m1 is really buggy with x86_64 images. About 50% of our test runs crash with that qemu segfault.

The --remote-only command should work, though. Will you try stopping Docker entirely and then run:

LOG_LEVEL=debug fly deploy --remote-only
1 Like

Yup, this is Docker problem more than Fly at this point, though. The remote option works great, assuming your code is normal sized - it needs to be uploaded to Fly each time you run a build, and the build itself runs on Fly.

1 Like

It’s not as fast as I’d like, and I had to tap a few keys a few times to make sure it wasn’t stuck, but I’ve got an M1 too and it’s the only choice at this point.

1 Like

We just released flyctl 0.0.243, which upgraded buildkit and which may help with builds on M1. Can you give it a try?

Thanks for all the help!

The new version improves the developer experience of the remote build, but does not yet enable local builds on M1, correct? (I still get a segfault when using --local-only, but flyctl now automatically runs a remote build by default, which works fine and lets me deploy the image.)