I’m currently stuck with the deployment of an elixir app. The app itself is a very small phoenix app, doing maintenance tasks on an external database, so there’s barely any code in there.
But I’m not really sure what to make of the health check during deployment… Here are the recent logs:
2021-05-07T14:18:14Z [info] 14:18:14.408 [info] Access PidroAdminV2Web.Endpoint at http://BLABLA.fly.dev
2021-05-07T14:18:14Z [info] Reaped child process with pid: 608 and signal: SIGUSR1, core dumped? false
2021-05-07T14:18:17Z [info] 14:18:17.805 [info] tzdata release in place is from a file last modified Tue, 22 Dec 2020 23:35:21 GMT. Release file on server was last modified Sun, 24 Jan 2021 19:35:23 GMT.
2021-05-07T14:18:19Z [info] 14:18:19.831 [info] Tzdata has updated the release from 2020e to 2021a
2021-05-07T14:18:27Z [error] Health check status changed 'warning' => 'critical'
***v8 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v9
App
Name = BLABLA
Owner = personal
Version = 7
Status = running
Hostname = BLABLA.fly.dev
Deployment Status
ID = 59920b40-BLA-9883-dd89-9f50ecc17383
Version = v7
Status = running
Description = Deployment is running pending automatic promotion
Instances = 1 desired, 1 placed, 0 healthy, 0 unhealthy
Instances
ID VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
34b09f6a 7 ams run running 1 total, 1 critical 1 3m59s ago
Secrets are set properly, the release was properly created (higher up in the logs) and as said, the app itself is nothing really complex. It’s actually pretty close to the demo app from @Mark 's deployment post. The only difference is that I use an external DB.
Does anybody have an idea where else I could look at what went wrong?
Also, when it says “Running” under “App”, “Deployment Status” and “Instances”, that means the docker container? Sorry if that’s a stupid question, just trying to wrap my head around it and how to debug my things better here
Thanks Mark, I just checked it out and it looks excellent! This might be very handy once I move my card game engine over. Most of our players are either from Scandinavia or the US and I’m currently serving both from Ireland. I will give this a try on fly!
I seem to be having the same issue, and cannot for the life of me figure it out.
Secrets are set, nearly identical config to @m.fahle, and yet the application simply doesn’t come up. Nothing in the logs, and the HTTP health check fails.
I followed the guide in the Fly docs, but I did also update to Elixir 1.12.1 and Erlang 24.0.1 with the adjustments from this comment Elixir Getting Started Guide - #16 by zimt28
How big is your VM? I’ve found that I must call fly scale memory 1024 before my apps will spin up. Elixir apps can start up without increasing memory, but I’ve found this only really works with an “empty” phoenix app. The initial 256Mb provided is just a little too small for anything more than that.
I’ve also found that my apps don’t generally use more than 400Mb, so 1024 feels like overkill for me, but we can’t scale to 512 so… Still, 1024 allows for lots of useful ETS tables!
@rushsteve12@OldhamMade you can try tweaking the health check grace period in fly.toml. It’s possible the process is just taking longer to boot up than we wait for health checks by default.
@OldhamMade tried scaling up to 1024mb of memory with no luck.
@kurt I have grace_period = "30s" in both my health checks
[[services.tcp_checks]]
grace_period = "30s" # allow some time for startup
interval = "15s"
restart_limit = 6
timeout = "2s"
[[services.http_checks]]
grace_period = "30s" # allow some time for startup
interval = "60s"
restart_limit = 6
timeout = "2s"
method = "get"
path = "/en/"
protocol = "https"
tls_skip_verify = false
When I run fly deploy it reports that 1 check is passing and 1 is critical. I can’t figure out how to get it to tell me which is which, but I’m assuming that it’s the HTTP check that’s failing since it takes 6 minutes (6 tries * 60s) to give up.
Have you tried the http_checks using standard HTTP?
Have you tried increasing the timeout from 2s to 10s?
Have you also tried disabling the http_checks completely?
Have you tried regenerating your fly.toml file and changing only the kill_signal before deployment?
When I run fly deploy it reports that 1 check is passing and 1 is critical.
I often see this during deploys, but the deploys complete as expected, so I don’t think the critical warning matters so much as long as the overall deploy reports success.
Oh yes, I’m betting this is the https protocol like @OldhamMade said. If you try disabling that check it will probably boot.
If your app isn’t serving TLS directly (most aren’t), that check will always fail. For HTTP checks it’s better to do checks over http, often with a special route. @Mark probably has a way to do this in Phoenix.