Deploy "Failed due to unhealthy allocations"

Hi everyone,

One of my env failed to deploy with the message “Failed due to unhealthy allocations”. I read on other threads that it could be related to a port error, but I doubt it in my case: I took the same configuration as my other environments, which work for them.

This seems more related to a DNS access error, but I’m not sure how to resolve this and the error is not very clear:

2022-08-18T07:19:57Z   [info]07:19:57.199 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:20:02Z   [info]07:20:02.200 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'

Here are the full logs:

Logs
==> Verifying app config
--> Verified app config
==> Building image
WARN Error connecting to local docker daemon: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/_ping": dial unix /var/run/docker.sock: connect: permission denied
Remote builder fly-builder-restless-haze-8424 ready
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.12 linux x86_64
[+] Building 10.1s (0/1)                                                                  
[+] Building 1.8s (28/28) FINISHED                                                        
 => CACHED [internal] load remote build context                                      0.0s
 => CACHED copy /context /                                                           0.0s
 => [internal] load metadata for docker.io/library/alpine:3.15.3                     1.7s
 => [internal] load metadata for docker.io/hexpm/elixir:1.13.4-erlang-24.3.4-alpine  1.7s
 => [build  1/17] FROM docker.io/hexpm/elixir:1.13.4-erlang-24.3.4-alpine-3.15.3@sh  0.0s
 => [app 1/6] FROM docker.io/library/alpine:3.15.3@sha256:f22945d45ee2eb4dd463ed5a4  0.0s
 => CACHED [app 2/6] RUN apk add --no-cache libstdc++ openssl ncurses-libs imagemag  0.0s
 => CACHED [app 3/6] RUN apk --no-cache add msttcorefonts-installer fontconfig &&    0.0s
 => CACHED [app 4/6] WORKDIR /app                                                    0.0s
 => CACHED [app 5/6] RUN chown nobody:nobody /app                                    0.0s
 => CACHED [build  2/17] RUN apk add --no-cache build-base npm git                   0.0s
 => CACHED [build  3/17] WORKDIR /app                                                0.0s
 => CACHED [build  4/17] RUN mix local.hex --force &&     mix local.rebar --force    0.0s
 => CACHED [build  5/17] COPY mix.exs mix.lock ./                                    0.0s
 => CACHED [build  6/17] COPY config/config.exs config/prod.exs config/              0.0s
 => CACHED [build  7/17] RUN mix deps.get --only prod &&     mix deps.compile        0.0s
 => CACHED [build  8/17] COPY assets/package.json assets/package-lock.json ./assets  0.0s
 => CACHED [build  9/17] RUN npm --prefix ./assets ci --progress=false --no-audit -  0.0s
 => CACHED [build 10/17] COPY priv priv                                              0.0s
 => CACHED [build 11/17] COPY lib lib                                                0.0s
 => CACHED [build 12/17] COPY assets assets                                          0.0s
 => CACHED [build 13/17] RUN mix assets.deploy                                       0.0s
 => CACHED [build 14/17] RUN mix compile                                             0.0s
 => CACHED [build 15/17] COPY config/runtime.exs config/                             0.0s
 => CACHED [build 16/17] COPY rel rel                                                0.0s
 => CACHED [build 17/17] RUN mix release                                             0.0s
 => CACHED [app 6/6] COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/e  0.0s
 => exporting to image                                                               0.0s
 => => exporting layers                                                              0.0s
 => => writing image sha256:7f6cb87fc04d56300b4030a494711654f5ebeeeec87f71856c6edaa  0.0s
 => => naming to registry.fly.io/encheres-immo-beta:deployment-1660807007            0.0s
--> Building image done
==> Pushing image to fly
The push refers to repository [registry.fly.io/encheres-immo-beta]
55132ca85db4: Layer already exists 
f4a292c3c3d6: Layer already exists 
10dc836ba99a: Layer already exists 
922c23e78fac: Layer already exists 
a5ef1b54ee81: Layer already exists 
a1c01e366b99: Layer already exists 
deployment-1660807007: digest: sha256:875364c51e687cdb769d3fd3b36b6fd55bf5bad1dc2a727d4cbef131c4cb610a size: 1577
--> Pushing image done
image: registry.fly.io/encheres-immo-beta:deployment-1660807007
image size: 234 MB
==> Creating release
--> release v14 created

--> You can detach the terminal anytime without stopping the deployment
==> Release command detected: /app/bin/migrate

--> This release will not be available until the release command succeeds.
         Starting instance
         Configuring virtual machine
         Pulling container image
         Unpacking image
         Starting instance
         Configuring virtual machine
         Pulling container image
         Unpacking image
         Starting instance
         Configuring virtual machine
         Pulling container image
         Unpacking image
         Preparing kernel init
         Configuring firecracker
         Starting virtual machine
         Preparing kernel init
         Configuring firecracker
         Starting virtual machine
         Preparing kernel init
         Configuring firecracker
         Starting virtual machine
         Starting init (commit: 9b0a951)...
         UUID=d636a499-882d-4cfd-aabe-f02454a95bf1
         Starting init (commit: 9b0a951)...
         Setting up swapspace version 1, size = 536866816 bytes
         UUID=d636a499-882d-4cfd-aabe-f02454a95bf1
         Preparing to run: `/app/bin/migrate` as nobody
         2022/08/18 07:15:13 listening on [fdaa:0:33f5:a7b:cbb7:6fc3:8343:2]:22 (DNS: [fdaa::3]:53)
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.928 [info] Migrations already up
         Starting init (commit: 9b0a951)...
         Setting up swapspace version 1, size = 536866816 bytes
         UUID=d636a499-882d-4cfd-aabe-f02454a95bf1
         Preparing to run: `/app/bin/migrate` as nobody
         2022/08/18 07:15:13 listening on [fdaa:0:33f5:a7b:cbb7:6fc3:8343:2]:22 (DNS: [fdaa::3]:53)
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.928 [info] Migrations already up
         Main child exited normally with code: 0
         Reaped child process with pid: 572 and signal: SIGUSR1, core dumped? false
         Starting clean up.
         Preparing to run: `/app/bin/migrate` as nobody
         2022/08/18 07:15:13 listening on [fdaa:0:33f5:a7b:cbb7:6fc3:8343:2]:22 (DNS: [fdaa::3]:53)
         Reaped child process with pid: 570 and signal: SIGUSR1, core dumped? false
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.928 [info] Migrations already up
         Main child exited normally with code: 0
         Reaped child process with pid: 572 and signal: SIGUSR1, core dumped? false
         Starting clean up.
         Main child exited normally with code: 0
         Reaped child process with pid: 572 and signal: SIGUSR1, core dumped? false
         Starting clean up.
         Starting instance
         Configuring virtual machine
         Pulling container image
         Unpacking image
         Preparing kernel init
         Configuring firecracker
         Starting virtual machine
         Starting init (commit: 9b0a951)...
         Setting up swapspace version 1, size = 536866816 bytes
         UUID=d636a499-882d-4cfd-aabe-f02454a95bf1
         Preparing to run: `/app/bin/migrate` as nobody
         2022/08/18 07:15:13 listening on [fdaa:0:33f5:a7b:cbb7:6fc3:8343:2]:22 (DNS: [fdaa::3]:53)
         Reaped child process with pid: 570 and signal: SIGUSR1, core dumped? false
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'
         07:15:14.928 [info] Migrations already up
         Main child exited normally with code: 0
         Reaped child process with pid: 572 and signal: SIGUSR1, core dumped? false
         Starting clean up.
==> Monitoring deployment

 1 desired, 1 placed, 0 healthy, 0 unhealthy [restarts: 1] [health checks: 1 total, 1 crit
 1 desired, 1 placed, 0 healthy, 0 unhealthy [restarts: 2] [health checks: 1 total, 1 crit
 1 desired, 1 placed, 0 healthy, 1 unhealthy [restarts: 2] [health checks: 1 total, 1 crit
 1 desired, 1 placed, 0 healthy, 1 unhealthy [restarts: 2] [health checks: 1 total, 1 critical]
Failed Instances

Failure #1

Instance
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS CREATED   
f87db3ff                14      fra     run     running 1 total, 1 critical     2        7m13s ago

Recent Events
TIMESTAMP               TYPE                    MESSAGE                                                         
2022-08-18T07:15:19Z    Received                Task received by client                                        
2022-08-18T07:15:19Z    Task Setup              Building Task Directory                                        
2022-08-18T07:15:22Z    Started                 Task started by client                                         
2022-08-18T07:17:08Z    Restart Signaled        healthcheck: check "a61773ab9e61f7afdefca4f759fca6f9" unhealthy
2022-08-18T07:17:12Z    Terminated              Exit Code: 0                                                   
2022-08-18T07:17:12Z    Restarting              Task restarting in 1.038306486s                                
2022-08-18T07:17:19Z    Started                 Task started by client                                         
2022-08-18T07:19:04Z    Restart Signaled        healthcheck: check "a61773ab9e61f7afdefca4f759fca6f9" unhealthy
2022-08-18T07:19:09Z    Terminated              Exit Code: 0                                                   
2022-08-18T07:19:09Z    Restarting              Task restarting in 1.16697939s                                 
2022-08-18T07:19:16Z    Started                 Task started by client                                         

2022-08-18T07:19:22Z   [info]Reaped child process with pid: 583, exit code: 0
2022-08-18T07:19:25Z   [info]07:19:25.092 [debug] Tzdata polling for update.
2022-08-18T07:19:25Z   [info]07:19:25.586 [info] tzdata release in place is from a file last modified Tue, 22 Dec 2020 23:35:21 GMT. Release file on server was last modified Tue, 16 Aug 2022 01:15:47 GMT.
2022-08-18T07:19:25Z   [info]07:19:25.586 [debug] Tzdata downloading new data from https://data.iana.org/time-zones/tzdata-latest.tar.gz
2022-08-18T07:19:26Z   [info]07:19:26.082 [debug] Tzdata data downloaded. Release version 2022c.
2022-08-18T07:19:26Z   [info]07:19:26.935 [info] Tzdata has updated the release from 2020e to 2022c
2022-08-18T07:19:26Z   [info]07:19:26.935 [debug] Tzdata deleting ETS table for version 2020e
2022-08-18T07:19:26Z   [info]07:19:26.938 [debug] Tzdata deleting ETS table file for version 2020e
2022-08-18T07:19:27Z   [info]07:19:27.104 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:27Z   [info]07:19:27.117 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:32Z   [info]07:19:32.118 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:32Z   [info]07:19:32.131 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:37Z   [info]07:19:37.132 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:37Z   [info]07:19:37.144 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:42Z   [info]07:19:42.145 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:42Z   [info]07:19:42.159 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:47Z   [info]07:19:47.160 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:47Z   [info]07:19:47.171 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:52Z   [info]07:19:52.171 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:52Z   [info]07:19:52.183 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:19:57Z   [info]07:19:57.184 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:19:57Z   [info]07:19:57.199 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:20:02Z   [info]07:20:02.200 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:20:02Z   [info]07:20:02.215 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:20:07Z   [info]07:20:07.215 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:20:07Z   [info]07:20:07.226 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:20:12Z   [info]07:20:12.227 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:20:12Z   [info]07:20:12.241 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
2022-08-18T07:20:17Z   [info]07:20:17.242 [debug] [libcluster:fly6pn] polling dns for 'encheres-immo-beta.internal'
2022-08-18T07:20:17Z   [info]07:20:17.254 [warning] [libcluster:fly6pn] unable to connect to :"encheres-immo-beta@fdaa:0:33f5:a7b:b9b8:70b9:e805:2"
--> v14 failed - Failed due to unhealthy allocations - rolling back to job version 13 and deploying as v15 

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

Here flyctl status:

App
  Name     = encheres-immo-beta          
  Owner    = encheres-immo               
  Version  = 15                          
  Status   = running                     
  Hostname = encheres-immo-beta.fly.dev  
  Platform = nomad                       

Deployment Status
  ID          = 9f9c9bc7-84ff-30c2-55e9-2447ff9d0d78                                                                                   
  Version     = v15                                                                                                                    
  Status      = failed                                                                                                                 
  Description = Failed due to unhealthy allocations - not rolling back to stable job version 15 as current job has same specification  
  Instances   = 1 desired, 1 placed, 0 healthy, 1 unhealthy                                                                            

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS          HEALTH CHECKS           RESTARTS        CREATED              
cc150bd9        app     15 ⇡    fra     stop    complete        1 total, 1 critical     2               40m23s ago          
f87db3ff        app     14      fra     stop    complete        1 total, 1 critical     2               45m22s ago          
6c50052c        app     13      fra     stop    complete        1 total, 1 critical     2               12h45m ago          
70b9e805        app     9       fra     run     running         1 total, 1 passing      0               2022-07-28T13:15:36Z

Here flyctl checks list :

Health Checks for encheres-immo-beta
  NAME                             | STATUS  | ALLOCATION | REGION | TYPE | LAST UPDATED         | OUTPUT                                    
-----------------------------------*---------*------------*--------*------*----------------------*-------------------------------------------
  a61773ab9e61f7afdefca4f759fca6f9 | passing | 70b9e805   | fra    | TCP  | 2022-07-28T13:16:09Z | TCP connect 172.19.4.98:4000: Success[✓]  
                                   |         |            |        |      |                      |                                           
                                   |         |            |        |      |                      |                                           

Thanks for any help or lead :slight_smile:

Hi @Nev,

Reviewing the logs, I noticed the warnings about the cert.

     07:15:14.362 [warning] Description: 'Authenticity is not established by certificate path validation'
              Reason: 'Option {verify, verify_peer} and cacertfile/cacerts is missing'

Check your runtime.exs file and ensure it isn’t trying to terminate the TLS/SSL connection using a cert there.

From an app of mine as an example:

  config :web, Web.Endpoint,
    server: true,
    # # Force redirect to HTTPS
    # force_ssl: [rewrite_on: [:x_forwarded_proto], host: nil],
    url: [host: "my-custom-site.com", port: 80],
    http: [
      port: String.to_integer(System.get_env("PORT") || "4000"),
      # IMPORTANT: configure IPv6 support
      transport_options: [socket_opts: [:inet6]]
    ],
    secret_key_base: secret_key_base

It is setup using inet6 for IPv6 and the force_ssl is commented out.

Not sure if that’s the problem, but the logs would point me in that direction.

After thinking a bit more, if you are wanting to terminate your own certificates, then I wonder if the Dockerfile is missing the cacerts. That warning seems to indicate the expected file is missing.

I have to admit I barely understand anything when it come to DevOPS… But for runtime.exs looks like I have the exact same code:

  config :encheres_immo, EncheresImmoWeb.Endpoint,
    server: true,
    url: [host: System.get_env("DOMAIN"), port: 80],
    http: [
      port: String.to_integer(System.get_env("PORT") || "4000"),
      transport_options: [socket_opts: [:inet6]]
    ],
    secret_key_base: secret_key_base,
    cache_static_manifest: "priv/static/cache_manifest.json"

And here is my dockerfile :

dockerfile
# Using the Hex.pm docker images. You have much better version control for
# Elixir, Erlang and Alpine.
#
#   - https://hub.docker.com/r/hexpm/elixir/tags
#   - Ex: hexpm/elixir:1.11.2-erlang-23.3.2-alpine-3.13.3
#
# Debugging Notes:
#
#   docker run -it --rm hello_elixir /bin/ash

###
### Fist Stage - Building the Release
###
FROM hexpm/elixir:1.13.4-erlang-24.3.4-alpine-3.15.3 AS build

# install build dependencies
RUN apk add --no-cache build-base npm git


# prepare build dir
WORKDIR /app


# extend hex timeout
ENV HEX_HTTP_TIMEOUT=20


# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force


# set build ENV as prod
ENV MIX_ENV=prod
ENV SECRET_KEY_BASE=nokey


# Copy over the mix.exs and mix.lock files to load the dependencies. If those
# files don't change, then we don't keep re-fetching and rebuilding the deps.
COPY mix.exs mix.lock ./
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.get --only ${MIX_ENV} && \
    mix deps.compile

# install npm dependencies
COPY assets/package.json assets/package-lock.json ./assets/
RUN npm --prefix ./assets ci --progress=false --no-audit --loglevel=error

COPY priv priv


# NOTE: If using TailwindCSS, it uses a special "purge" step and that requires
# the code in `lib` to see what is being used. Uncomment that here before
# running the npm deploy script if that's the case.
COPY lib lib


# build assets
COPY assets assets
RUN mix assets.deploy


# compile and build release
RUN mix compile
# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/
COPY rel rel
RUN mix release


###
### Second Stage - Setup the Runtime Environment
###

# prepare release docker image
FROM alpine:3.15.3 AS app

RUN apk add --no-cache libstdc++ openssl ncurses-libs imagemagick
RUN apk --no-cache add msttcorefonts-installer fontconfig && \
    update-ms-fonts && \
    fc-cache -f

WORKDIR /app

RUN chown nobody:nobody /app
USER nobody:nobody

COPY --from=build --chown=nobody:nobody /app/_build/prod/rel/encheres_immo ./

ENV HOME=/app
ENV MIX_ENV=prod
ENV SECRET_KEY_BASE=nokey
ENV PORT=4000

CMD ["bin/encheres_immo", "start"]

# Appended by flyctl
ENV ECTO_IPV6 true
ENV ERL_AFLAGS "-proto_dist inet6_tcp"