Unable to deploy basic Elixir Phoenix app

tejpochiraju · June 25, 2021, 2:46pm

Hi all,

I am trying to deploy a basic, unmodified Phoenix app generated with mix phx.new --no-ecto --no-webpack test.

I followed this official guide and the only changes from that flow are:

I am not provisioning or using a DB
I have modified the Dockerfile to remove references to nodejs assets

Environment:

Erlang: 24.0.1
Elixir: 1.12.1
Phoenix: 1.5.9
Docker Build: Alpine 3.13.3
Docker Final: Alpine 3.13.3

I am running into cryptic deployment errors with no detail in the logs (flyctl logs output below):

2021-06-25T14:16:19.451539499Z proxy[78cf7399] sin [warn] Health check status changed 'passing' => 'warning'
2021-06-25T14:16:30.241535982Z proxy[78cf7399] sin [error] Health check status changed 'warning' => 'critical'
2021-06-25T14:17:59.084636935Z runner[78cf7399] sin [info] Shutting down virtual machine
2021-06-25T14:17:59.362818009Z app[78cf7399] sin [info] Sending signal SIGTERM to main child process w/ PID 507
2021-06-25T14:17:59.363761686Z app[78cf7399] sin [info] 14:17:59.362 [info] SIGTERM received - shutting down
2021-06-25T14:18:01.364829239Z app[78cf7399] sin [info] Main child exited normally with code: 0
2021-06-25T14:18:01.365074944Z app[78cf7399] sin [info] Starting clean up.
2021-06-25T14:18:04.578037634Z runner[78cf7399] sin [info] Starting instance
2021-06-25T14:18:04.600596503Z runner[78cf7399] sin [info] Configuring virtual machine
2021-06-25T14:18:04.601492870Z runner[78cf7399] sin [info] Pulling container image
2021-06-25T14:18:05.536423944Z runner[78cf7399] sin [info] Unpacking image
2021-06-25T14:18:05.541977528Z runner[78cf7399] sin [info] Preparing kernel init
2021-06-25T14:18:05.940926148Z proxy[78cf7399] sin [info] Health check status changed 'critical' => 'passing'
2021-06-25T14:18:05.963694523Z runner[78cf7399] sin [info] Configuring firecracker
2021-06-25T14:18:06.238034542Z runner[78cf7399] sin [info] Starting virtual machine
2021-06-25T14:18:06.351047090Z app[78cf7399] sin [info] Starting init (commit: cc4f071)...
2021-06-25T14:18:06.365021317Z app[78cf7399] sin [info] Running: `bin/test start` as nobody
2021-06-25T14:18:06.372326336Z app[78cf7399] sin [info] 2021/06/25 14:18:06 listening on [fdaa:0:2e6e:a7b:f0f:78cf:7399:2]:22 (DNS: [fdaa::3]:53)
2021-06-25T14:18:07.371621581Z app[78cf7399] sin [info] Reaped child process with pid: 548, exit code: 0
2021-06-25T14:18:08.373370354Z app[78cf7399] sin [info] Reaped child process with pid: 569 and signal: SIGUSR1, core dumped? false
2021-06-25T14:18:13.653300820Z proxy[78cf7399] sin [warn] Health check status changed 'passing' => 'warning'
2021-06-25T14:18:21.860043995Z proxy[78cf7399] sin [error] Health check status changed 'warning' => 'critical'
2021-06-25T14:19:52.553677381Z runner[78cf7399] sin [info] Shutting down virtual machine
2021-06-25T14:19:52.813712435Z app[78cf7399] sin [info] Sending signal SIGTERM to main child process w/ PID 507
2021-06-25T14:19:52.814684545Z app[78cf7399] sin [info] 14:19:52.813 [info] SIGTERM received - shutting down
2021-06-25T14:19:54.816738719Z app[78cf7399] sin [info] Main child exited normally with code: 0
2021-06-25T14:19:54.817030962Z app[78cf7399] sin [info] Starting clean up.

I thought it might be a missing config :test, TestWeb.Endpoint, server: true line in runtime.exs but adding this line results in the same output.

I can confirm that the app works fine locally with mix phx.server.

kurt · June 25, 2021, 3:37pm

You can run fly status instance 78cf7399 to help troubleshoot this, it looks like the process is getting restarted because the health check is failing, you’ll see messages like this:

2021-06-25T14:18:03Z  Terminated        Exit Code: 0
2021-06-25T14:17:59Z  Restart Signaled  healthcheck: check "a61773ab9e61f7afdefca4f759fca6f9" unhealthy
2021-06-25T14:16:13Z  Started           Task started by client

This might be because the check is taking too long to become healthy. Will you open your fly.toml file and find the grace period setting in the checks, then change it to "30s"? It should look something like this:

  [[services.tcp_checks]]
    grace_period = "30s"
    interval = "15s"
    restart_limit = 6
    timeout = "2s"

tejpochiraju · June 25, 2021, 4:35pm

I was using grace_period = "30s" and increased it to grace_period = "120s" without luck.

I think the error is being caused by some misconfiguration in runtime.exs. This is what I am using now:

import Config

if config_env() == :prod do
  secret_key_base =
    System.get_env("SECRET_KEY_BASE") ||
      raise """
      environment variable SECRET_KEY_BASE is missing.
      You can generate one by calling: mix phx.gen.secret
      """

  app_name =
    System.get_env("FLY_APP_NAME") ||
      raise "FLY_APP_NAME not available"

  config :hello_elixir, TestWeb.Endpoint,
    server: true,
    url: [host: "#{app_name}.fly.dev", port: 80],
    http: [
      ip: {0, 0, 0, 0, 0, 0, 0, 0},
      port: String.to_integer(System.get_env("PORT") || "4000"),
      # IMPORTANT: support IPv6 addresses
      transport_options: [socket_opts: [:inet6]]
    ],
    secret_key_base: secret_key_base

  config :test, TestWeb.Endpoint, server: true
end

Since the config from the guide didn’t work, I added the ip: {0, 0, 0, 0, 0, 0, 0, 0}, and config :test, TestWeb.Endpoint, server: true lines. I have also tried ip: {0, 0, 0, 0},.

For reference, this is my fly.toml:

# fly.toml file generated for quiet-cherry-4788 on 2021-06-25T19:38:28+05:30

app = "quiet-cherry-4788"

kill_signal = "SIGTERM"
kill_timeout = 5

[env]

[[services]]
  http_checks = []
  internal_port = 4000
  protocol = "tcp"
  script_checks = []

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "120s"
    interval = "15s"
    restart_limit = 6
    timeout = "2s"

What do the tcp_checks do?

michael · June 25, 2021, 4:57pm

tcp_checks just make sure a tcp connection can be opened on that port. Can you try removing all the default checks to see if your app boots normally?

tejpochiraju · June 25, 2021, 5:08pm

Thanks, commented out the [[services.tcp_checks]] section and flyctl deploy ran without error this time.

... 
deployment-1624640425: digest: sha256:8bc227608272cb6df1d0bb27b5a7894c2a0c417129729b385e87a9964a26707b size: 1364
--> Pushing image done
Image: registry.fly.io/quiet-cherry-4788:deployment-1624640425
Image size: 21 MB
==> Creating release
Release v7 created

You can detach the terminal anytime without stopping the deployment
Monitoring Deployment

1 desired, 1 placed, 1 healthy, 0 unhealthy
--> v7 deployed successfully

However, flyctl logs shows that the app failed to start/stay running.

2021-06-25T17:00:39.358241369Z proxy[86902b71] sin [error] Health check status changed 'warning' => 'critical'
2021-06-25T17:00:44.424154029Z runner[9652833a] sin [info] Starting instance
2021-06-25T17:00:44.450757797Z runner[9652833a] sin [info] Configuring virtual machine
2021-06-25T17:00:44.451684794Z runner[9652833a] sin [info] Pulling container image
2021-06-25T17:00:47.808318660Z runner[9652833a] sin [info] Unpacking image
2021-06-25T17:00:48.101055001Z runner[9652833a] sin [info] Preparing kernel init
2021-06-25T17:00:48.486761234Z runner[9652833a] sin [info] Configuring firecracker
2021-06-25T17:00:48.749830556Z runner[9652833a] sin [info] Starting virtual machine
2021-06-25T17:00:48.889217946Z app[9652833a] sin [info] Starting init (commit: cc4f071)...
2021-06-25T17:00:48.904804396Z app[9652833a] sin [info] Running: `bin/test start` as nobody
2021-06-25T17:00:48.914353684Z app[9652833a] sin [info] 2021/06/25 17:00:48 listening on [fdaa:0:2e6e:a7b:ead:9652:833a:2]:22 (DNS: [fdaa::3]:53)
2021-06-25T17:00:49.910658058Z app[9652833a] sin [info] Reaped child process with pid: 547, exit code: 0
2021-06-25T17:00:50.764067263Z app[9652833a] sin [info] 17:00:50.763 [info] Access TestWeb.Endpoint at http://localhost:4000
2021-06-25T17:00:50.912336458Z app[9652833a] sin [info] Reaped child process with pid: 568 and signal: SIGUSR1, core dumped? false
2021-06-25T17:01:14.678015004Z runner[86902b71] sin [info] Shutting down virtual machine
2021-06-25T17:01:14.936088457Z app[86902b71] sin [info] Sending signal SIGTERM to main child process w/ PID 507
2021-06-25T17:01:14.937070735Z app[86902b71] sin [info] 17:01:14.936 [info] SIGTERM received - shutting down
2021-06-25T17:01:16.940217136Z app[86902b71] sin [info] Main child exited normally with code: 0
2021-06-25T17:01:16.940385645Z app[86902b71] sin [info] Starting clean up.

Is there any way to get more logs? Are there some IPv6/IPv4 specifics I am missing?

Mark · June 25, 2021, 6:28pm

I noticed that your runtime.exs file is using :hello_elixir as the application name. That needs to changed to your app name :test.

tejpochiraju · June 26, 2021, 3:19am

Ah, crap. Copy-paste and that catches me out! Thanks, deployed and it’s up.

Will deploy one of our apps and evaluate for a bit.

tejpochiraju · June 26, 2021, 6:57am

Managed to deploy one of SQLite3 based test apps. This has Litestream.io integrated and I hit a couple of weird DB initialisation errors before it finally worked - will try and put together a short tutorial once I understand what happened.

Thanks for your support, chaps! Excited to see if we can migrate our workloads to Fly.

Mark · June 26, 2021, 1:18pm

The use of SQLite3 sounds interesting. I’m also unfamiliar with litestream.io. I’d be interested in learning what you’re doing there.

tejpochiraju · June 26, 2021, 2:38pm

The deployed app is the Edge version of our IoT event ingress - https://bodh.iotready.co

We use SQLite3 for the Edge version to keep resource usage low. Litestream makes backups to S3 or similar trivially simple. We then have a simple script to create a read replica of the DB every 30s to load into Grafana.

On a default 256MB VM, I was just able to simulate 400 consecutive websockets clients sending events every 1s. Ran into OOM errors (I guess) at ~500 persistent clients. I was also able to get 1200+ one-shot clients working that publish and disappear.

I am trying to benchmark the number of simultaneous clients we can reliably support at a couple of different VM sizes but this is already more than plenty.

I think Litestream + SQLite is more reliable (and much faster) than a networked PostgreSQL server for such applications.

Mark · June 27, 2021, 1:40am

Very cool! Thanks for sharing!

Topic		Replies	Views
Unable to deploy basic phoenix 1.7.7 app without any changes Phoenix elixir	8	684	July 28, 2023
Deploying an elixir app Phoenix	3	729	June 24, 2022
Error when running `fly deploy` for generated phoenix app after `fly launch` Phoenix elixir	11	1690	January 19, 2022
Unable to deploy Phoenix App, child exited normally with code: 1 after preparing to run migrate Questions / Help	3	684	September 28, 2021
Deploying phoenix app - Stuck on Running Docker release generator Phoenix	3	781	January 27, 2023

Unable to deploy basic Elixir Phoenix app

Related topics