"Failed due to unhealthy allocations" on Phoenix Deployment

--> v2 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v3 

I’d love some help getting this health check to pass. I’ve looked at 4 or 5 similar issues posted by others, but so far I’ve had no luck with the configurations that solved those issues. The app works just fine in dev environment on localhost:4000. Also, I don’t see any errors in the log. Here is my config/runtime.exs and fly.toml. Does anyone know what’s misconfigured?


runtime.exs

import Config

if System.get_env("PHX_SERVER") do
  config :my_app, MyApp.Endpoint, server: true
end

if config_env() == :prod do
  database_url =
    System.get_env("DATABASE_URL") ||
      raise """
      environment variable DATABASE_URL is missing.
      For example: ecto://USER:PASS@HOST/DATABASE
      """

  config :my_app, MyApp.Repo,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
    socket_options: [:inet6]

  secret_key_base =
    System.get_env("SECRET_KEY_BASE") ||
      raise """
      environment variable SECRET_KEY_BASE is missing.
      You can generate one by calling: mix phx.gen.secret
      """

  host = System.get_env("PHX_HOST") || "xyz.fly.dev"
  port = String.to_integer(System.get_env("PORT") || "8080")

  config :my_app, MyAppWeb.Endpoint,
    url: [host: host, port: 443, scheme: "https"],
    http: [
      # Enable IPv6 and bind on all interfaces.
      # Set it to  {0, 0, 0, 0, 0, 0, 0, 1} for local network only access.
      # See the documentation on https://hexdocs.pm/plug_cowboy/Plug.Cowboy.html
      # for details about using IPv6 vs IPv4 and loopback vs public addresses.
      ip: {0, 0, 0, 0, 0, 0, 0, 0},
      port: port
    ],
    secret_key_base: secret_key_base
end

fly.toml

app = "xyz"
kill_signal = "SIGTERM"
kill_timeout = 5
processes = []

[deploy]
  release_command = "/app/bin/migrate"

[env]
  PHX_HOST = "xyz.fly.dev"
  POOL_SIZE = "15"
  PORT = "8080"

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "30s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

I found @chrismccord’s GitHub - fly-apps/live_beats and copied it’s configuration, but still no dice… If I disable (comment out) the services.tcp_checks part of fly.toml, the app starts successfully in production and I can get into iex, but there’s nothing served at xyz.fly.dev. If I try xyz.fly.dev:80 I get a “ERR_SSL_PROTOCOL_ERROR”.

Hi @PinOcean!

Do you see anything helpful from running fly logs? I mean, besides the single line you included. :slight_smile:

Hey @Mark! Thanks for helping me out on this!

I can’t make anything out from the logs, but hopefully someone with more experience can! This is my first time with Phoenix and with Fly.io. Here’s the output from fly logs:

2022-09-29T14:07:28Z runner[559cbb3e] lax [info]Starting instance
2022-09-29T14:07:28Z runner[559cbb3e] lax [info]Configuring virtual machine
2022-09-29T14:07:28Z runner[559cbb3e] lax [info]Pulling container image
2022-09-29T14:07:31Z runner[559cbb3e] lax [info]Unpacking image
2022-09-29T14:07:31Z runner[559cbb3e] lax [info]Preparing kernel init
2022-09-29T14:07:31Z runner[559cbb3e] lax [info]Configuring firecracker
2022-09-29T14:07:31Z runner[559cbb3e] lax [info]Starting virtual machine
2022-09-29T14:07:32Z app[559cbb3e] lax [info]Starting init (commit: xxxxxxx)...
2022-09-29T14:07:32Z app[559cbb3e] lax [info]Setting up swapspace version 1, size = 512 MiB (536866816 bytes)
2022-09-29T14:07:32Z app[559cbb3e] lax [info]no label, UUID=6230325c-222d-427b-9acc-163cee5c3cf7
2022-09-29T14:07:32Z app[559cbb3e] lax [info]Preparing to run: `/app/bin/migrate` as nobody
2022-09-29T14:07:32Z app[559cbb3e] lax [info]2022/09/29 14:07:32 listening on [xxxx:x:xxxx:xxx:xxxx:xxxx:xxxx:x]:22 (DNS: [fdaa::3]:53)
2022-09-29T14:07:35Z app[559cbb3e] lax [info]14:07:35.072 [info] Migrations already up
2022-09-29T14:07:35Z app[559cbb3e] lax [info]Starting clean up.
2022-09-29T14:07:41Z runner[76fa70db] lax [info]Starting instance
2022-09-29T14:07:42Z runner[76fa70db] lax [info]Configuring virtual machine
2022-09-29T14:07:42Z runner[76fa70db] lax [info]Pulling container image
2022-09-29T14:07:44Z runner[76fa70db] lax [info]Unpacking image
2022-09-29T14:07:46Z runner[76fa70db] lax [info]Preparing kernel init
2022-09-29T14:07:46Z runner[76fa70db] lax [info]Configuring firecracker
2022-09-29T14:07:46Z runner[76fa70db] lax [info]Starting virtual machine
2022-09-29T14:07:46Z app[76fa70db] lax [info]Starting init (commit: xxxxxxx)...
2022-09-29T14:07:46Z app[76fa70db] lax [info]Preparing to run: `/app/bin/server` as nobody
2022-09-29T14:07:46Z app[76fa70db] lax [info]2022/09/29 14:07:46 listening on [xxxx:x:xxxx:xxx:xxxx:xxxx:xxxx:x]:22 (DNS: [fdaa::3]:53)
2022-09-29T14:07:53Z app[76fa70db] lax [info]14:07:53.027 [info] tzdata release in place is from a file last modified Fri, 22 Oct 2021 02:20:47 GMT. Release file on server was last modified Sat, 24 Sep 2022 04:40:44 GMT.
2022-09-29T14:07:54Z app[76fa70db] lax [info]14:07:54.376 [info] Tzdata has updated the release from 2021e to 2022d
2022-09-29T14:12:56Z runner[76fa70db] lax [info]Shutting down virtual machine
2022-09-29T14:12:56Z app[76fa70db] lax [info]Sending signal SIGTERM to main child process w/ PID 514
2022-09-29T14:12:56Z app[76fa70db] lax [info]14:12:56.491 [info] SIGTERM received - shutting down
2022-09-29T14:12:58Z app[76fa70db] lax [info]Starting clean up.

Here’s the last output from fly launch:

Monitoring Deployment

1 desired, 1 placed, 0 healthy, 1 unhealthy [health checks: 1 total, 1 critical]
v24 failed - Failed due to unhealthy allocations - rolling back to job version 23
Failed Instances

==> Failure #1

Instance
  ID            = 1b01b568             
  Process       = app                  
  Version       = 24                   
  Region        = lax                  
  Desired       = run                  
  Status        = running              
  Health Checks = 1 total, 1 critical  
  Restarts      = 0                    
  Created       = 4m51s ago            

Recent Events
TIMESTAMP            TYPE       MESSAGE                 
2022-09-30T01:02:57Z Received   Task received by client 
2022-09-30T01:02:57Z Task Setup Building Task Directory 
2022-09-30T01:03:00Z Started    Task started by client  

Recent Logs
2022-09-30T01:03:07.000 [info] 01:03:07.589 [info] Tzdata has updated the release from 2021e to 2022d
***v24 failed - Failed due to unhealthy allocations - rolling back to job version 23 and deploying as v25 

Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

Hi @PinOcean,

Thanks for the logs. It looks like it’s failing the health checks and so being shutdown.

A reason I’ve seen for that before is when an endpoint setting in runtime.exs like server: true is missing. However, I see that in your file too.

While this shouldn’t make it succeed, I tuned my fly.toml file’s grace_period down to “5s”. Basically means “don’t take as long to figure out if it’s healthy”.

Otherwise I’m not seeing anything obvious either!

Have you tried creating a brand new Phoenix app and deploying that? Just wondering if there’s something else going on.

Good idea about the brand new Phoenix app! I just deployed the Phoenix template app and it deployed no problem… I’m not sure what this means though.

This was it @Mark!

A reason I’ve seen for that before is when an endpoint setting in runtime.exs like server: true is missing. However, I see that in your file too.

My config had: config :my_app, MyApp.Endpoint, server: true
The correct line is: config :my_app, MyAppWeb.Endpoint, server: true.

After changing MyApp.Endpoint to MyAppWeb.Endpoint, it deployed correctly!

Thank you very much for your time, you helped me a lot and I hope this will also help others who make the same mistake in the future :slight_smile:

1 Like

Great! Glad you found the issue! Best of luck!