Elixir app not starting up

Hey guys,

I’m currently stuck with the deployment of an elixir app. The app itself is a very small phoenix app, doing maintenance tasks on an external database, so there’s barely any code in there.

But I’m not really sure what to make of the health check during deployment… Here are the recent logs:

2021-05-07T14:18:14Z [info] 14:18:14.408 [info] Access PidroAdminV2Web.Endpoint at http://BLABLA.fly.dev
2021-05-07T14:18:14Z [info] Reaped child process with pid: 608 and signal: SIGUSR1, core dumped? false
2021-05-07T14:18:17Z [info] 14:18:17.805 [info] tzdata release in place is from a file last modified Tue, 22 Dec 2020 23:35:21 GMT. Release file on server was last modified Sun, 24 Jan 2021 19:35:23 GMT.
2021-05-07T14:18:19Z [info] 14:18:19.831 [info] Tzdata has updated the release from 2020e to 2021a
2021-05-07T14:18:27Z [error] Health check status changed 'warning' => 'critical'
***v8 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v9


App
  Name     = BLABLA
  Owner    = personal
  Version  = 7
  Status   = running
  Hostname = BLABLA.fly.dev

Deployment Status
  ID          = 59920b40-BLA-9883-dd89-9f50ecc17383
  Version     = v7
  Status      = running
  Description = Deployment is running pending automatic promotion
  Instances   = 1 desired, 1 placed, 0 healthy, 0 unhealthy    

Instances
    ID       VERSION REGION DESIRED STATUS  HEALTH CHECKS       RESTARTS CREATED
    34b09f6a 7       ams    run     running 1 total, 1 critical 1        3m59s ago

Secrets are set properly, the release was properly created (higher up in the logs) and as said, the app itself is nothing really complex. It’s actually pretty close to the demo app from @Mark 's deployment post. The only difference is that I use an external DB.

Does anybody have an idea where else I could look at what went wrong?

Also, when it says “Running” under “App”, “Deployment Status” and “Instances”, that means the docker container? Sorry if that’s a stupid question, just trying to wrap my head around it and how to debug my things better here :slightly_smiling_face:

Thank you so much!

Marcel

Does your endpoint config look like this?

port: String.to_integer(System.get_env("PORT") || "4000"),

Hi Mark,

thanks for getting back to me. Yes, that’s the config (inside runtime.exs):

 config :pidro_admin_v2, PidroAdminV2Web.Endpoint,
    server: true,
    url: [host: "#{app_name}.fly.dev", port: 80],
    http: [
      port: String.to_integer(System.get_env("PORT") || "4000"),
      # IMPORTANT: support IPv6 addresses
      transport_options: [socket_opts: [:inet6]]
    ],
    secret_key_base: secret_key_base
1 Like

Check you messages. I sent you a DM.

Glad you got it working and found the config problem! Have a blast playing!

BTW, I updated the Elixir guide and added a section at the end for clustering your nodes together. Something fun to play with!

1 Like

Thanks Mark, I just checked it out and it looks excellent! This might be very handy once I move my card game engine over. Most of our players are either from Scandinavia or the US and I’m currently serving both from Ireland. I will give this a try on fly! :raised_hands:

1 Like

I seem to be having the same issue, and cannot for the life of me figure it out.

Secrets are set, nearly identical config to @m.fahle, and yet the application simply doesn’t come up. Nothing in the logs, and the HTTP health check fails.

I followed the guide in the Fly docs, but I did also update to Elixir 1.12.1 and Erlang 24.0.1 with the adjustments from this comment Elixir Getting Started Guide - #16 by zimt28

How big is your VM? I’ve found that I must call fly scale memory 1024 before my apps will spin up. Elixir apps can start up without increasing memory, but I’ve found this only really works with an “empty” phoenix app. The initial 256Mb provided is just a little too small for anything more than that.

I’ve also found that my apps don’t generally use more than 400Mb, so 1024 feels like overkill for me, but we can’t scale to 512 so… :frowning: Still, 1024 allows for lots of useful ETS tables! :smiley:

@rushsteve12 @OldhamMade you can try tweaking the health check grace period in fly.toml. It’s possible the process is just taking longer to boot up than we wait for health checks by default.

Try adding this in the health check block:

grace_period = "10s"

I had to do that in our LiveView cluster example.

1 Like

@OldhamMade tried scaling up to 1024mb of memory with no luck.

@kurt I have grace_period = "30s" in both my health checks

  [[services.tcp_checks]]
    grace_period = "30s" # allow some time for startup
    interval = "15s"
    restart_limit = 6
    timeout = "2s"

  [[services.http_checks]]
    grace_period = "30s" # allow some time for startup
    interval = "60s"
    restart_limit = 6
    timeout = "2s"
    method = "get"
    path = "/en/"
    protocol = "https"
    tls_skip_verify = false

When I run fly deploy it reports that 1 check is passing and 1 is critical. I can’t figure out how to get it to tell me which is which, but I’m assuming that it’s the HTTP check that’s failing since it takes 6 minutes (6 tries * 60s) to give up.

Could it be the protocol = "https" line there?

To ask the “obvious” questions:

  • Have you tried the http_checks using standard HTTP?
  • Have you tried increasing the timeout from 2s to 10s?
  • Have you also tried disabling the http_checks completely?
  • Have you tried regenerating your fly.toml file and changing only the kill_signal before deployment?

When I run fly deploy it reports that 1 check is passing and 1 is critical.

I often see this during deploys, but the deploys complete as expected, so I don’t think the critical warning matters so much as long as the overall deploy reports success.

Oh yes, I’m betting this is the https protocol like @OldhamMade said. If you try disabling that check it will probably boot.

If your app isn’t serving TLS directly (most aren’t), that check will always fail. For HTTP checks it’s better to do checks over http, often with a special route. @Mark probably has a way to do this in Phoenix.

1 Like

I figured it out!

The issue was two-fold, the second part being very dumb on my part

  1. As @OldhamMade and @kurt said it should be http not https
  2. Your app actually has to pass the health check…

Thank you all for your help!

1 Like

Ah that’s great! Our healthcheck UX is not as helpful as we want it to be, we need to give much more obvious errors about that.

2 Likes