Failing to `fly launch` phoenix app; hanging on docker build

TLDR

I have a new phoenix app I want to deploy. I’m getting an error where (it looks like…) docker build is hanging on a build step and eventually a timeout occurs. The Dockerfile is correctly building locally using docker build --no-cache ., so I would expect it to work when launching / deploying, but it is not.

Here is a gist with the Dockerfile.

Here is the error from fly launch:

Error error building: failed commit on ref "layer-sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7": "layer-sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7" failed size validation: 0 != 31366757: failed precondition

Can anyone help debug this hanging build? How can I get insight into where it’s hanging? I’m using:

  • Elixir 1.14.0, OTP 25.0.4
  • Mac OS Monterey with M1
  • fly v0.0.387 darwin/arm64 Commit: d46c14f3 BuildDate: 2022-09-01T21:01:46Z
  • docker.io/library/debian:bullseye-20220801-slim
  • docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim

Longer

The phoenix app is a tiny demo with one DB table (using postgres) but does have two things that make it slightly more involved:

  1. It uses a library with NIFs written in Rust, so the rust compiler needs to be installed when building
  2. There are a couple JS dependencies, so a package.json was created and Node/npm need to be installed when building, plus a step to run npm ci.

I edited the Dockerfile Fly generated during fly launch to incorporate these steps and it is now correctly building locally.

Here is the failure output after running fly launch --remote-only (I also tried without the --remote-only option).

App is not running, deploy...
Deploying siwe-ex
==> Validating app configuration
--> Validating app configuration done
Services
TCP 80/443 ⇢ 8080
Remote builder fly-builder-cold-flower-4020 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
Sending build context to Docker daemon  52.06kB
[+] Building 435.9s (6/33)
 => [internal] load remote build context                                                                                                                                        0.0s
 => copy /context /                                                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/debian:bullseye-20220801-slim                                                                                                0.7s
 => [internal] load metadata for docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim                                                                      0.8s
 => ERROR [builder  1/23] FROM docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim@sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440  435.1s
 => => resolve docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim@sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440c440                0.0s
 => => sha256:09d0092b35e56d3af6e1773e4e1287fff0b8a3a5a893513304d776b17559f21e 44.38MB / 44.38MB                                                                                0.4s
 => => sha256:16cb966efea4fb9ecec54d2f52775acae50143f6f133978efb537674ec09b62e 5.25MB / 5.25MB                                                                                  0.1s
 => => sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440c440 772B / 772B                                                                                      0.0s
 => => sha256:209f0875608f8e44721f7bf32c056b3d282e8ebb70e57b1218b31cad39ba25e1 1.16kB / 1.16kB                                                                                  0.0s
 => => sha256:65f04ead5fb13e52a5be7136607e3d07d479acd68317e8cbee4f47dbad312346 2.33kB / 2.33kB                                                                                  0.0s
 => => sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7 0B / 31.37MB                                                                                   435.1s
 => => sha256:adf116327167d4f5af462c5e2d060a61b970589d8e0e92284fc6d2db04d22cec 14.35MB / 14.35MB                                                                                0.2s
 => ERROR [stage-1 1/6] FROM docker.io/library/debian:bullseye-20220801-slim@sha256:a811e62769a642241b168ac34f615fb02da863307a14c4432cea8e5a0f9782b8                          435.1s
 => => resolve docker.io/library/debian:bullseye-20220801-slim@sha256:a811e62769a642241b168ac34f615fb02da863307a14c4432cea8e5a0f9782b8                                          0.0s
 => => sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7 0B / 31.37MB                                                                                   435.1s
 => => sha256:a811e62769a642241b168ac34f615fb02da863307a14c4432cea8e5a0f9782b8 1.85kB / 1.85kB                                                                                  0.0s
 => => sha256:139a42fa3bde3e5bad6ae912aaaf2103565558a7a73afe6ce6ceed6e46a6e519 529B / 529B                                                                                      0.0s
 => => sha256:6a8065e4ba130e5f581c9fb79da4f58bc6513fa65fae1aba12418be4c5a7ed76 1.46kB / 1.46kB                                                                                  0.0s
------
 > [builder  1/23] FROM docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim@sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440c440:
------
------
 > [stage-1 1/6] FROM docker.io/library/debian:bullseye-20220801-slim@sha256:a811e62769a642241b168ac34f615fb02da863307a14c4432cea8e5a0f9782b8:
------
Error error building: failed commit on ref "layer-sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7": "layer-sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7" failed size validation: 0 != 31366757: failed precondition

Update: seems like maybe an issue with the debian version: bullseye-20220801-slim.

I created a brand new phoenix app and didn’t change anything other than the DEBIAN_VERSION to bullseye-20220801-slim in the Dockerfile. Deploying the generated phoenix app is now hanging with same output as above.

Hi @benjreinhart!

You said the Dockerfile builds correctly locally, right?

Try deploying with the local Dockerfile build. If it deploys and works that way, then it may help to narrow it down to the build machines.

fly deploy --local-only

That command does the deploy using the local Docker build.

1 Like

Hey @Mark, thank you for your reply I appreciate it!

Yes it builds locally without errors when running docker build (either with or without the --no-cache flag). However, after trying the method you suggested, it fails with a segfault:

 => ERROR [builder  4/17] RUN mix local.hex --force &&     mix local.rebar --force                                                                                              0.6s
------
 > [builder  4/17] RUN mix local.hex --force &&     mix local.rebar --force:
#13 0.606 qemu: uncaught target signal 11 (Segmentation fault) - core dumped
#13 0.616 Segmentation fault
------
Error failed to fetch an image or build from source: error building: executor failed running [/bin/sh -c mix local.hex --force &&     mix local.rebar --force]: exit code: 139

The Fly Elixir guide mentions this as an M1 issue and recommends using the --remote-only option which brings me back to the original issue in this post. I’m not sure what the fly CLI is doing differently from my docker build command. I briefly took a look at the flyctl source but nothing immediately stood out in the time I had to debug.

In either case, it seems I’m currently unable to deploy this using local or remote builds on a fresh phoenix app (v1.6.11). If I have more time I can try downgrading the base image versions since the 1.14.0 images are new.

Any other pointers would be much appreciated!

Hi @benjreinhart,

I went back to your gist with the Dockerfile. I took that as a starting point and, before using it, tweaked it according to other known good Dockerfiles of mine. I was able to deploy using the remote builders. Here’s my working Dockerfile. Keep in mind, my app doesn’t use anything of the Rust tooling, so that might be an issue on your end.

# Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian instead of
# Alpine to avoid DNS resolution issues in production.
#
# https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
# https://hub.docker.com/_/ubuntu?tab=tags
#
#
# This file is based on these images:
#
#   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
#   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20210902-slim - for the release image
#   - https://pkgs.org/ - resource for finding needed packages
#   - Ex: hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20210902-slim
#
ARG ELIXIR_VERSION=1.14.0
ARG OTP_VERSION=25.0.4
ARG DEBIAN_VERSION=bullseye-20220801-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies
# RUN apt-get update -y && apt-get install -y build-essential git \
#     && apt-get clean && rm -f /var/lib/apt/lists/*_*

# install build dependencies (including curl and node)
# https://github.com/nodesource/distributions/blob/master/README.md#debmanual
RUN apt-get update -y && apt-get install -y build-essential git curl gcc g++ make \
    && curl -fsSL https://deb.nodesource.com/setup_18.x | bash - \
    && apt-get install -y nodejs \
    && rm -f /var/lib/apt/lists/*_* \
    && apt-get clean

# # Install curl which is used to install rust and node
# RUN apt-get update -y && apt-get install -y curl

# Install Rust which is needed to compile NIFs written in rust (like the siwe elixir lib)
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain stable --profile minimal --target x86_64-unknown-linux-gnu -y

# Add .cargo/bin (Rust) to PATH
ENV PATH="/root/.cargo/bin:${PATH}"

# # Install Node/npm which is needed for installing node dependencies
# RUN curl -fsSL https://deb.nodesource.com/setup_18.x | bash - \
#     && apt-get install -y nodejs

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

# install node packagesbuild assets
RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error

# # Install JS dependencies
# WORKDIR /app/assets
# RUN npm ci
# WORKDIR /app

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && apt-get install -y libstdc++6 openssl libncurses5 locales \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/siwe_example ./

USER nobody

CMD ["/app/bin/server"]
# Appended by flyctl
ENV ECTO_IPV6 true
ENV ERL_AFLAGS "-proto_dist inet6_tcp"

# # Appended by flyctl
# ENV ECTO_IPV6 true
# ENV ERL_AFLAGS "-proto_dist inet6_tcp"

# # Appended by flyctl
# ENV ECTO_IPV6 true
# ENV ERL_AFLAGS "-proto_dist inet6_tcp"

# # Appended by flyctl
# ENV ECTO_IPV6 true
# ENV ERL_AFLAGS "-proto_dist inet6_tcp"

# # Appended by flyctl
# ENV ECTO_IPV6 true
# ENV ERL_AFLAGS "-proto_dist inet6_tcp"

I left your original parts but commented them out so you can see.

Hope this helps!

Currently having the same issue deploying with github actions.

  deploy_to_dev:
    name: Deploy to dev
    needs: [dependencies, static_code_analysis, unit_tests]
    runs-on: ubuntu-latest
    timeout-minutes: 25
    environment: dev
    env:
      TARGET: dev
      REPO_NAME: ${{ github.event.repository.name }}
    steps:
      - uses: actions/checkout@v3
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - run: flyctl deploy --dockerfile ./Dockerfile --remote-only

Maybe something with the signatures? This guy keeps popping up and never gets downloaded:

#10 [builder  1/19] FROM docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim@sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440c440

Update:

What was missing was updating the github action that setups up the BEAM, particularly the uses commit hash for the correct version that supports OTP 25.

- uses: actions/checkout@v3
    - name: Set up Elixir
      uses: erlef/setup-beam@69fb25ba7e2c546dfc085871ecf2fc5382c9843d <----- THIS
      with:
        elixir-version: 1.14.0
        otp-version: 25.0.4
        fetch-depth: 0

Hey @Mark, thanks again for your help here, much appreciated.

To be clear, I also cannot get a fresh phoenix app to build, that is, without rust and NPM and any special elixir dependencies. If I run mix phx.new and then try to fly launch (after changing to the correct base images since the generated Dockerfile contains broken links), it does not work. This is true when using the local-only and remote-only options. docker build works.

I did try the Dockerfile you sent me as well in my existing project, and that too still hangs with the original issue in the top of this post.

I’m on a mac with an M1 and I’m using

  • fly v0.0.387 darwin/arm64 Commit: d46c14f3 BuildDate: 2022-09-01T21:01:46Z
  • Phoenix 1.6.11
  • docker.io/library/debian:bullseye-20220801-slim
  • docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim

Hi @benjreinhart,

One thing I’m wondering about is if you are using fly launch to deploy? That’s what is sounds like. Given that the initial fly launch failed, but it completed the primary tasks of registering the fly app in the system, generating the Elixir release commands and the Dockerfile, then it’s purpose is done.

At that point, you should just use fly deploy. The fly launch command is also appending ENV settings to the end of the Dockerfile. I saw several copies of those settings in your original Dockerfile file and I deleted them.

Also, try this command: fly apps list. Then, delete any builder apps you see. They are named like fly-builder-.... At this point I’m just guessing, but perhaps the builder has a problem.

You delete using the fly apps destroy fly-builder-app-name-123.

From the builder builder logs, it sounds like it failed trying to fetch the docker image having tried for 0s.

You can also try the fly doctor command to see if any issues can be spotted.

@Mark Thanks for reply. So I have been deleting all the apps and trying to start again from scratch, but still no luck. This is reproducible on my machine by destroying all fly apps, running mix phx.new and then fly launch. I’ve tried in ORD and San Jose regions.

The one thing that is consistent is it’s hanging on this exact same docker layer each time on the remote builds:

 => => sha256:1efc276f4ff952c055dea726cfc96ec6a4fdb8b62d9eed816bd2b788f2860ad7 0B / 31.37MB

This is true whether or not it’s on the first go using fly launch or on subsequent attempts with fly deploy. I did destroy the remote build app in between launch/deploy attempts. When I use the local-only option, it doesn’t hang on that layer, but it does segfault at the end of the build because of the M1 issue.

The only next step I see is to try to understand exactly what that layer is referring to / why it’s hanging. I could also do this on another machine but I was evaluating fly as a potential new hosting for my team who all have M1 macs, so hoping to get past any issues that may be related to that.

Thanks again for all your help!

Well dang. I don’t know what else could be the issue either.

It is interesting that it’s the same docker layer. :thinking:

That’s a strange issue. M1 user here, and my only problem was when deploying from GHA (and that was because the action that sets erlang and elixir up was not the latest version (which prevented it from setting up erlang). What is the layer it’s hanging on? Mine was

#10 [builder 1/19] FROM docker.io/hexpm/elixir:1.14.0-erlang-25.0.4-debian-bullseye-20220801-slim@sha256:cf8adcc00f80ae2d7ac417a62fe8138a3f8bf1b4e55d2a60f94b80663440c440

I got past all that by changing bullseye to buster in the Dockerfile.

Confirm, you need to bump your docker image as unfortunately the image phoenix used to generate for the Dockerfile got yanked from the registry: