Basic Phoenix app fails to deploy

Hi there. I’m trying to get started deploying a completely bog standard Phoenix / Elixir app using Fly.io. I think I’m doing absolutely nothing outside the box, so I’m stumped as to what could be going wrong. What looks like is broken is that the app doesn’t start at port 8080 (despite it being defined in the fly.toml file, which I have not edited by hand).

Github repo: GitHub - sgardn/fly_metapod: Phoenix / Elixir deployment on Fly.io
Elixir 1.16.0
Erlang 24.3.4.14
Phoenix 1.7.10
flyctl 0.1.147

Fly.toml

# fly.toml app configuration file generated for fly-metapod on 2024-02-02T12:07:55-05:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = 'fly-metapod'
primary_region = 'bos'
kill_signal = 'SIGTERM'

[build]

[deploy]
  release_command = '/app/bin/migrate'

[env]
  PHX_HOST = 'fly-metapod.fly.dev'
  PORT = '8080'

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

  [http_service.concurrency]
    type = 'connections'
    hard_limit = 1000
    soft_limit = 1000

[[vm]]
  cpu_kind = 'shared'
  cpus = 1
  memory_mb = 1024

Launching

Running fly launch with a boilerplate Phoenix app and attaching a minimal Postgres database during installation ends up showing me the following:

No machines in group app, launching a new machine

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
  PROCESS                                                                                                                                                                                                                                                                                                                                                                                                                        | ADDRESSES
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------*---------------------------------------
  /app/erts-12.3.2.14/bin/beam.smp -- -root /app -progname erl -- -home /nonexistent -- -proto_dist inet6_tcp -- -noshell -s elixir start_cli -mode embedded -setcookie LOEBTJLCTZLK2TZ3ZXQDFTWON4D6QQNT5Y2NJDV4SQXOSBRPNOWQ==== -name fly-metapod-01HNNCN5DCF9BGB85XSZ26X5JQ@fdaa:2:d6eb:a7b:1ed:dc47:95c3:2 -config /app/releases/0.1.0/sys -boot /app/releases/0.1.0/start -boot_var RELEASE_LIB /app/lib -- -extra --no-halt | [::]:39359
  /.fly/hallpass                                                                                                                                                                                                                                                                                                                                                                                                                 | [fdaa:2:d6eb:a7b:1ed:dc47:95c3:2]:22
  /app/erts-12.3.2.14/bin/epmd -daemon                                                                                                                                                                                                                                                                                                                                                                                           | 0.0.0.0:4369, [::]:4369

Finished launching new machines
-------
NOTE: The machines for [app] have services with 'auto_stop_machines = true' that will be stopped when idling

-------
Checking DNS configuration for fly-metapod.fly.dev

Visit your newly deployed app at https://fly-metapod.fly.dev/

Logs

Running fly logs shows me the following:

2024-02-02T17:14:33Z proxy[1852636a411d58] bos [info]Starting machine
2024-02-02T17:14:33Z app[1852636a411d58] bos [info][    0.047000] PCI: Fatal: No config space access function found
2024-02-02T17:14:33Z app[1852636a411d58] bos [info] INFO Starting init (commit: bfa79be)...
2024-02-02T17:14:33Z app[1852636a411d58] bos [info] INFO Preparing to run: `/app/bin/server` as nobody
2024-02-02T17:14:33Z app[1852636a411d58] bos [info] INFO [fly api proxy] listening at /.fly/api
2024-02-02T17:14:33Z app[1852636a411d58] bos [info]2024/02/02 17:14:33 listening on [fdaa:2:d6eb:a7b:1ed:dc47:95c3:2]:22 (DNS: [fdaa::3]:53)
2024-02-02T17:14:33Z runner[1852636a411d58] bos [info]Machine started in 438ms
2024-02-02T17:14:34Z proxy[1852636a411d58] bos [info]machine started in 1.442739911s
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]17:14:36.207 [notice] Application fly_metapod exited: FlyMetapod.Application.start(:normal, []) returned an error: shutdown: failed to start child: DNSCluster
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]    ** (EXIT) an exception was raised:
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]        ** (UndefinedFunctionError) function :net_kernel.get_state/0 is undefined or private
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (kernel 8.3.2.4) :net_kernel.get_state()
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (dns_cluster 0.1.2) lib/dns_cluster.ex:163: DNSCluster.warn_on_invalid_dist/0
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (dns_cluster 0.1.2) lib/dns_cluster.ex:84: DNSCluster.init/1
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (stdlib 3.17.2.4) gen_server.erl:423: :gen_server.init_it/2
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (stdlib 3.17.2.4) gen_server.erl:390: :gen_server.init_it/6
2024-02-02T17:14:36Z app[1852636a411d58] bos [info]            (stdlib 3.17.2.4) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
2024-02-02T17:14:36Z app[1852636a411d58] bos [info] WARN Reaped child process with pid: 363 and signal: SIGUSR1, core dumped? false
2024-02-02T17:14:36Z app[1852636a411d58] bos [info] WARN Reaped child process with pid: 365 and signal: SIGUSR1, core dumped? false
2024-02-02T17:14:37Z app[1852636a411d58] bos [info]{"Kernel pid terminated",application_controller,"{application_start_failure,fly_metapod,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,\"lib/dns_cluster.ex\"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,\"lib/dns_cluster.ex\"},{line,84}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,423}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,226}]}]}}},{'Elixir.FlyMetapod.Application',start,[normal,[]]}}}"}
2024-02-02T17:14:37Z app[1852636a411d58] bos [info]Kernel pid terminated (application_controller) ({application_start_failure,fly_metapod,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,"lib/dns_cluster.ex"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,"lib/dns_cluster.ex"},{line,84}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,423}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}}},{'Elixir.FlyMetapod.Application',start,[normal,[]]}}})
2024-02-02T17:14:37Z app[1852636a411d58] bos [info]Crash dump is being written to: erl_crash.dump...done
2024-02-02T17:14:38Z app[1852636a411d58] bos [info] INFO Main child exited normally with code: 1
2024-02-02T17:14:38Z app[1852636a411d58] bos [info] INFO Starting clean up.
2024-02-02T17:14:38Z app[1852636a411d58] bos [info] WARN hallpass exited, pid: 307, status: signal: 15 (SIGTERM)
2024-02-02T17:14:38Z app[1852636a411d58] bos [info]2024/02/02 17:14:38 listening on [fdaa:2:d6eb:a7b:1ed:dc47:95c3:2]:22 (DNS: [fdaa::3]:53)
2024-02-02T17:14:39Z app[1852636a411d58] bos [info][    6.296606] reboot: Restarting system
2024-02-02T17:14:39Z runner[1852636a411d58] bos [info]machine has reached its max restart count (10)
2024-02-02T17:14:40Z proxy[1852636a411d58] bos [info]waiting for machine to be reachable on 0.0.0.0:8080 (waited 5.560859277s so far)
2024-02-02T17:14:46Z proxy[1852636a411d58] bos [info]waiting for machine to be reachable on 0.0.0.0:8080 (waited 11.564189499s so far)
2024-02-02T17:14:51Z proxy[1852636a411d58] bos [error]failed to connect to machine: gave up after 15 attempts (in 16.568935362s)

Am I actually running into trouble with Erlang in net_kernel.get_state? Or is that a red herring, and I’m actually seeing a healthcheck fail repeatedly because the server is running on the wrong port?

The warning message that you see when you try to fly launch indicates that your app process is listening on port 39359, while your fly.toml is expecting it to listen on port 8080. You’ll need to either change the port in the app or the fly.toml setting.
More info here:

That makes sense to me, but I’ve followed the walkthrough instructions here (Getting Started · Fly Docs) for getting started with Elixir with (I think) no divergence. So I don’t understand how the command fly launch is ignoring the env value that’s explicitly been set.

These two lines:

# fly.toml
[env]
  PHX_HOST = 'fly-metapod.fly.dev'
  PORT = '8080'

hopefully set the PORT env variable that then gets used in config/runtime.exs:

if config_env() == :prod do
  ...
  host = System.get_env("PHX_HOST") || "example.com"
  port = String.to_integer(System.get_env("PORT") || "4000")

Other variables are successfully referenced and resolved from using System.get_env(<key>) here, since higher up in the file DATABASE_URL is required for prod deploys. I’ve seen fly deploy fail without a valid DATABASE_URL, so some subset of env and secrets are useful. The only difference is that DATABASE_URL is a secret instead of an env variable:

$ fly config env
Secrets
NAME           	DIGEST          	CREATED AT
DATABASE_URL   	966ec5a9a720988b	2024-02-02T17:07:52
SECRET_KEY_BASE	d072155b5cea1a0d	2024-02-02T17:07:17

Environment Variables
NAME    	VALUE
PHX_HOST	fly-metapod.fly.dev
PORT    	8080

If the env variables weren’t retrievable at all, we’d at least see that the server is mistakenly bound to :4000 instead of :39359 - so something else during the invocation of fly launch has likely gone wrong which overrides PORT.

I’ve also experimented with hardcoding the PORT to be 8080 here, and after running fly deploy, I still do not see any difference in the logs. The service still crash loops and makes me question my sanity.

I’m new to both Phoenix / Fly, so it’s always possible I’m doing something foolish here - but I feel like I’m rubber ducking vocally and am not sure what else I could have missed.

My questions are…

  • how could the command used to start the server be ignoring the value explicitly declared by the fly.toml file?
  • what commands can I use to peek at service startup logs outside of what’s provided by fly logs? I’m seeing “hint: look at your startup logs, servers often print the address they are listening on”, but not sure how I can do that
  • does it seem reasonable that this is a regression, since I don’t see anyone else having run into this before?
    • seems completely unavoidable from my end, given that all I’ve done is type mix phx.new foobar, cd foobar && fly launch, and attached a Postgres instance
1 Like

OK, you’re right that your ports seem to be configured correctly.

In the logs I see

[notice] Application fly_metapod exited: FlyMetapod.Application.start(:normal, []) returned an error: shutdown: failed to start child: DNSCluster

You might need to add the DNS_CLUSTER_QUERY to the [env] section of your fly.toml to get it to start?
Like

[env]
DNS_CLUSTER_QUERY = "my-app-name.internal"

Here’s a branch where I add the DNS_CLUSTER_QUERY to the fly.toml file. But I still see the same crash looping behavior in the logs:

2024-02-02T21:35:03.575 app[1852636a411d58] bos [info] [ 6.293768] reboot: Restarting system
2024-02-02T21:35:03.732 runner[1852636a411d58] bos [info] machine did not have a restart policy, defaulting to restart
2024-02-02T21:35:06.206 app[1852636a411d58] bos [info] [ 0.048666] PCI: Fatal: No config space access function found
2024-02-02T21:35:06.433 app[1852636a411d58] bos [info] INFO Starting init (commit: bfa79be)...
2024-02-02T21:35:06.448 app[1852636a411d58] bos [info] INFO Preparing to run: `/app/bin/server` as nobody
2024-02-02T21:35:06.453 app[1852636a411d58] bos [info] INFO [fly api proxy] listening at /.fly/api
2024-02-02T21:35:06.464 app[1852636a411d58] bos [info] 2024/02/02 21:35:06 listening on [fdaa:2:d6eb:a7b:1ed:dc47:95c3:2]:22 (DNS: [fdaa::3]:53)
2024-02-02T21:35:06.483 runner[1852636a411d58] bos [info] Machine started in 449ms
2024-02-02T21:35:07.252 proxy[1852636a411d58] bos [error] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] 21:35:08.938 [notice] Application fly_metapod exited: FlyMetapod.Application.start(:normal, []) returned an error: shutdown: failed to start child: DNSCluster
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] ** (EXIT) an exception was raised:
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] ** (UndefinedFunctionError) function :net_kernel.get_state/0 is undefined or private
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (kernel 8.3.2.4) :net_kernel.get_state()
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (dns_cluster 0.1.2) lib/dns_cluster.ex:163: DNSCluster.warn_on_invalid_dist/0
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (dns_cluster 0.1.2) lib/dns_cluster.ex:84: DNSCluster.init/1
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (stdlib 3.17.2.4) gen_server.erl:423: :gen_server.init_it/2
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (stdlib 3.17.2.4) gen_server.erl:390: :gen_server.init_it/6
2024-02-02T21:35:08.940 app[1852636a411d58] bos [info] (stdlib 3.17.2.4) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
2024-02-02T21:35:09.457 app[1852636a411d58] bos [info] WARN Reaped child process with pid: 363 and signal: SIGUSR1, core dumped? false
2024-02-02T21:35:09.458 app[1852636a411d58] bos [info] WARN Reaped child process with pid: 365 and signal: SIGUSR1, core dumped? false
2024-02-02T21:35:10.257 proxy[1852636a411d58] bos [error] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
2024-02-02T21:35:10.485 app[1852636a411d58] bos [info] {"Kernel pid terminated",application_controller,"{application_start_failure,fly_metapod,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,\"lib/dns_cluster.ex\"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,\"lib/dns_cluster.ex\"},{line,84}]},{gen_server,init_it,2,[{file,\"gen_server.erl\"},{line,423}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,226}]}]}}},{'Elixir.FlyMetapod.Application',start,[normal,[]]}}}"}
2024-02-02T21:35:10.485 app[1852636a411d58] bos [info] Kernel pid terminated (application_controller) ({application_start_failure,fly_metapod,{{shutdown,{failed_to_start_child,'Elixir.DNSCluster',{undef,[{net_kernel,get_state,[],[]},{'Elixir.DNSCluster',warn_on_invalid_dist,0,[{file,"lib/dns_cluster.ex"},{line,163}]},{'Elixir.DNSCluster',init,1,[{file,"lib/dns_cluster.ex"},{line,84}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,423}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,390}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}}},{'Elixir.FlyMetapod.Application',start,[normal,[]]}}})
2024-02-02T21:35:10.624 app[1852636a411d58] bos [info] Crash dump is being written to: erl_crash.dump...done
2024-02-02T21:35:11.461 app[1852636a411d58] bos [info] INFO Main child exited normally with code: 1

However, removing the configuration for the DNSClusterQuery entirely from config/runtime.exs did. Here’s a branch where I’ve done that (which successfully deploys with fly deploy).

Functionally, this means that the DNSCluster process isn’t started in lib/fly_metapod/application.ex, since this line should fail to find any value now:

  {DNSCluster, query: Application.get_env(:fly_metapod, :dns_cluster_query) || :ignore},

This leaves me with more questions:

  • Are there options during fly launch that end up controlling use of dns_cluster_query? I did manage to deploy once earlier this week and the only difference I can see is that the other app somehow omitted this line in config/runtime.exs:
    • config :fly_metapod, :dns_cluster_query, System.get_env("DNS_CLUSTER_QUERY")
    • (diffing repos is how I figured out that the line was “removable”)
  • What implications does removing process doing cluster querying have? Does this mean that I’m limited to running a single instance of my app?

This is still baffling to me, and I would think should be affecting anyone else attempting to bootstrap a basic Phoenix app. I may run the walkthrough again with a newer Erlang version tomorrow - I’d expect more users would run into this, unless there was something particular about my setup that makes the faulty behavior have a relatively small blast radius.

:netkernel.get_state/0 was introduced in OTP 25, but you’re on 24, so you need to bump your erlang version. fly launch runs mix phx.gen.release --docker underneath, which will take your dev box’s erlang/elixir versions and put those in your Dockerfile. So you’ll need to bump those in your Dockerfile as well as your system install and retry.

1 Like

Example:

ARG ELIXIR_VERSION=1.14.0
ARG OTP_VERSION=25.0.4
ARG DEBIAN_VERSION=bullseye-20220801-slim

Ok sweet, happy to bump versions.

I’m assuming the less naive view of the logs is realizing that net_kernel.get_state isn’t being run on something that failed to initialize (an empty object), but rather that the raised error literally does mean “the function isn’t available”. :sweat_smile:

I’m wildly new to the Elixir / Erlang ecosystem, but I’d think there would be a way of signaling higher required Erlang versions from within a mix.exs package file.

Digging around - looks like this has come up before (but not as much traffic as I’d expect): Specifying minimal Erlang version in mix.exs - Questions / Help - Elixir Programming Language Forum

Is it possible this sort of thing isn’t common / painful because…

  1. Erlang/OTP major versions seem to be bumped only once a year
  2. Backwards incompatible changes are rare
  3. This makes bumping to the next major version relatively painless / safe?

Will try version bumping quickly to see if that fixes.

Nice, it worked - let the record show I’ve bumped to Erlang/OTP 25.3.2.8 and fly launch command works and attaches a Postgres database in one go.

I’m still moderately curious what disabling the DNSCluster does to the deployed service, but definitely unblocked for now.

Thanks Chris and andie! :pray:

2 Likes

I published DNSCluster 0.1.3 which supports OTP 24. We only used get_state to show a helpful warning message. Of course, staying more up to date on 25 works as well :slight_smile:

3 Likes