I tried deploying a Phoenix (LiveView) app and ran into errors (I think) because the migration failed because the tables were created with post-migration columns (and this the columns referred to in the migrations didn’t exist). I was able to get past this by connecting to Postgres from outside Fly and restoring the database from a backup (and deleting the migration files). But then it failed “due to unhealthy allocations”.
I thought perhaps there was still a problem with the Dockerfile so I destroyed the app without destroying the database, and then learned how to set the DATABASE_URL environment variable (using fly secrets set DATABASE_URL=...). But now I’m back to where I started. Here’s the log:
And here’s what I see in the terminal where I ran fly launch:
...
Monitoring Deployment
1 desired, 1 placed, 0 healthy, 1 unhealthy [health checks: 1 total, 1 critical]
v4 failed - Failed due to unhealthy allocations - no stable job version to auto revert to
Failed Instances
==> Failure #1
Instance
ID = 2a2d797b
Process =
Version = 4
Region = sjc
Desired = run
Status = running
Health Checks = 1 total, 1 critical
Restarts = 0
Created = 4m53s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-01-23T23:49:56Z Received Task received by client
2022-01-23T23:49:56Z Task Setup Building Task Directory
2022-01-23T23:49:59Z Started Task started by client
Recent Logs
2022-01-23T23:49:59.000 [info] Starting init (commit: 0c50bff)...
2022-01-23T23:49:59.000 [info] Preparing to run: `/app/bin/server` as nobody
2022-01-23T23:49:59.000 [info] 2022/01/23 23:49:59 listening on [fdaa:0:46ae:a7b:2295:2a2d:797b:2]:22 (DNS: [fdaa::3]:53)
2022-01-23T23:50:00.000 [info] Reaped child process with pid: 546, exit code: 0
2022-01-23T23:50:02.000 [info] Reaped child process with pid: 567 and signal: SIGUSR1, core dumped? false
2022-01-23T23:50:20.000 [error] Health check status changed 'passing' => 'critical'
***v4 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v5
Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort
The thread you linked to says use http instead of https for the checks. Could you try that?
Bump up the grace period even further to 60s just to rule it out
If you setup & ship metrics from your app (configurable in fly.toml), you can see if it’s a resource constraint of some sort. I believe dashboards are available in the Sign In · Fly page
Oh, and flyctl vm status <vm-id> should show VM events. There’s a chance of finding something there as well. flyctl vm status
When a deployment fails, the first step is to look at a failed VM and see what you can figure out. RAM increases are only useful if the VM had an out of memory error (which you might see in the logs). The health check grace period is only helpful if health checks took too long to pass.
To see the specific VM status, run fly status --all to get a list of VMs. Find one with status failed, then run fly vm status <id>. This will give you a lot more information. Make sure you check the exit code, if it’s 0 it means health check failures, if it’s not zero it’s some issue crashing the process.
$ fly status --all
App
Name = ssauction
Owner = personal
Version = 9
Status = running
Hostname = ssauction.fly.dev
Deployment Status
ID = a43a4cae-e858-e340-906a-f99a03bfb6ca
Version = v9
Status = failed
Description = Failed due to unhealthy allocations - no stable job version to auto revert to
Instances = 1 desired, 1 placed, 0 healthy, 1 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
3576c1df app 9 ⇡ sea(B) stop complete 1 total, 1 critical 0 6m1s ago
63e9f7df app 8 sjc run running 1 total, 1 critical 0 2022-01-24T00:14:03Z
$ fly vm status 3576c1df
Instance
ID = 3576c1df
Process =
Version = 9
Region = sea
Desired = stop
Status = complete
Health Checks = 1 total, 1 critical
Restarts = 0
Created = 7m12s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-02-02T04:18:57Z Received Task received by client
2022-02-02T04:18:57Z Task Setup Building Task Directory
2022-02-02T04:19:04Z Started Task started by client
2022-02-02T04:23:57Z Alloc Unhealthy Task not running for min_healthy_time of 10s by deadline
2022-02-02T04:23:58Z Killing Sent interrupt. Waiting 5s before force killing
2022-02-02T04:24:20Z Terminated Exit Code: 0
2022-02-02T04:24:20Z Killed Task successfully killed
Checks
ID SERVICE STATE OUTPUT
d6dd6a7392c47a522d5161aff2bffadd tcp-8080 critical dial tcp 172.19.6.2:8080: connect: connection refused
Recent Logs
I realized the one config change I made to my app (in config/dev.exs) is to enable Tailwind CSS. IIRC I followed these instructions to do so: Adding Tailwind CSS to Phoenix 1.6. I read them again and found section 8: “Building CSS in Production”. So I made the changes directed there and got this error while building the Docker image:
I found Tailwind Standalone for Phoenix · Fly and (after backing out the “8. Building CSS in Production” changes I describe above) I made the suggested changes and confirmed it works in dev. But fly launch continues to fail with:
...
Recent Logs
2022-02-02T05:03:37.000 [info] Unpacking image
2022-02-02T05:03:37.000 [info] Preparing kernel init
2022-02-02T05:03:38.000 [info] Configuring firecracker
2022-02-02T05:03:38.000 [info] Starting virtual machine
2022-02-02T05:03:38.000 [info] Starting init (commit: 0c50bff)...
2022-02-02T05:03:38.000 [info] Preparing to run: `/app/bin/server` as nobody
2022-02-02T05:03:38.000 [info] 2022/02/02 05:03:38 listening on [fdaa:0:46ae:a7b:ac2:82c2:4078:2]:22 (DNS: [fdaa::3]:53)
2022-02-02T05:03:39.000 [info] Reaped child process with pid: 546, exit code: 0
2022-02-02T05:03:41.000 [info] Reaped child process with pid: 567 and signal: SIGUSR1, core dumped? false
***v10 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v11
For this guide, we’ll use a Dockerfile and build a release for our Fly deployment. Internally, Fly’s networking uses IPv6, so there is a little config we can do to our application to make it a smooth experience.
…and then has subsections titled:
“Use releases” - configure the app to deploy using Releases including the section on containers
“Runtime configuration” - update config/runtime.exs to configure it for Fly
“Generate release config files” - use the mix release.init command
I didn’t do any of that. I followed the instructions in Getting Started · Fly Docs and ran fly launch. Does fly launch do all of the above? Or should I try deleting my app and start over following the above instructions?
About the health check, Fly does a basic TCP connection check when there’s a [[services]] block present in your fly.toml. So that’s what’s failing here: tcp-8080 critical dial tcp 172.19.6.2:8080: connect: connection refused
Considering the above and other setup work you’ve attempted so far, IMO, I’d suggest starting afresh with just the Fly docs & guides. Of course, Fly folks may be able to “check in the back” and suggest things based on internal knowledge to sort you out.
I’ve seen some posts here saying they’ve had to remove things suggested from the hexdocs.pm guides. At the very least, you could use it for cross-reference.
And finally, sorry I don’t know how many of the guides & links you’ve been through already; just posting here for thoroughness or whatever.
Fly sample apps are available at both github.com/fly-apps and github.com/superfly.
There’s also fly.io/phoenix-files if you want to keep up with Phoenix and LiveView stuff at Fly.
Note that phoenix-files is separate from fly.io/blog…
About the health check, Fly does a basic TCP connection check when there’s a [[services]] block present in your fly.toml . So that’s what’s failing here…
But since inside my [[services]] block I have http_checks = [], I’m already using http instead of https for the checks, right?
And all the TCP connection check failure is telling me is that the deployed Docker container is not responding to HTTP requests, right? Can I run the Docker image locally to give me more troubleshooting info?
Of course, Fly folks may be able to “check in the back” and suggest things based on internal knowledge to sort you out.
I sure wish they would.
Considering the above and other setup work you’ve attempted so far, IMO, I’d suggest starting afresh with just the Fly docs & guides.
I feel like I’ve already used only the Fly docs. But since I don’t know what else to try I’ll start over yet again unless I get more advice.
This is of course based only on my understanding & inference; could be totally wrong.
But since inside my [[services]] block I have http_checks = [] , I’m already using http instead of https for the checks, right?
Though there’s http_checks = [], since it’s an empty block, it must be taking it as no checks; therefore defaulting to TCP conn check (as seen in the HC failure).
And all the TCP connection check failure is telling me is that the deployed Docker container is not responding to HTTP requests, right?
Afraid I don’t know the exact check function that’s used, except for what I’ve seen on the forum (basic TCP connection check).
Can I run the Docker image locally to give me more troubleshooting info?
fly launch may have generated a Dockerfile you can use for local testing.
Yet another link, this one looks comprehensive, I hope it helps
Sorry I couldn’t be of more help…
I’ve only tried to fish out info from others who have had success with this from the forum.
There’s a lot of good bits here and there that can surely be put into gold standard guides, covering all common use cases seen so far.
The first time I ran mix phx.new ssauction_live_fly and then fly launch I saw:
...
We recommend upgrading to Phoenix 1.6.3 which includes a release configuration for Docker-based deployment.
...
I did that and fly launch failed. I neglected to record why.
So I rm -rfed the whole directory and started again, but upgraded to Phoenix 1.6.6 after running mix phx.new ssauction_live_fly but before running fly launch. This time I got:
--> Building image done
==> Pushing image to fly
The push refers to repository [registry.fly.io/ssauction]
29f06f2baaee: Pushed
4c686833369d: Layer already exists
f75686d47dae: Layer already exists
d3cce7faa027: Layer already exists
6129aa9d37ee: Layer already exists
ba5a5fe43301: Layer already exists
deployment-1644093446: digest: sha256:abc7146f666cbb07e18d4e9824579a75740f17e9de140f974eea89b227a84fd0 size: 1575
--> Pushing image done
Image: registry.fly.io/ssauction:deployment-1644093446
Image size: 117 MB
==> Creating release
Release v2 created
Release command detected: this new release will not be available until the command succeeds.
You can detach the terminal anytime without stopping the deployment
==> Release command
Command: /app/bin/migrate
Starting instance
Configuring virtual machine
Pulling container image
Unpacking image
Preparing kernel init
Starting virtual machine
Starting init (commit: 0c50bff)...
2022/02/05 20:38:02 listening on [fdaa:0:46ae:a7b:2295:baae:94ff:2]:22 (DNS: [fdaa::3]:53)
20:38:04.497 [info] Migrations already up
Main child exited normally with code: 0
Reaped child process with pid: 559 and signal: SIGUSR1, core dumped? false
Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
Starting clean up.
Monitoring Deployment
1 desired, 1 placed, 0 healthy, 1 unhealthy [health checks: 1 total, 1 critical]
v0 failed - Failed due to unhealthy allocations - no stable job version to auto revert to
Failed Instances
==> Failure #1
Instance
ID = 1ba57533
Process =
Version = 0
Region = sjc
Desired = run
Status = running
Health Checks = 1 total, 1 critical
Restarts = 0
Created = 4m57s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-02-05T20:38:15Z Received Task received by client
2022-02-05T20:38:15Z Task Setup Building Task Directory
2022-02-05T20:38:18Z Started Task started by client
Recent Logs
2022-02-05T20:38:19.000 [info] Reaped child process with pid: 546, exit code: 0
2022-02-05T20:38:21.000 [info] Reaped child process with pid: 567 and signal: SIGUSR1, core dumped? false
2022-02-05T20:38:49.000 [error] Health check status changed 'passing' => 'critical'
***v0 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v1
Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort
So even on a newly generated Phoenix app I’m still getting “Failed due to unhealthy allocations”. I don’t know what to do at this point (except throw up my hands and give up).
The Phoenix upgrade process prints a whole bunch of instructions for adjusting your config files. I missed these when I upgraded a Phoenix app.
That error probably means there’s some config missing, particularly the endpoint config.
If you generate a brand new project with Phoenix 1.6.6, the runtime.exs file has everything you need. It should include an endpoint block like this:
secret_key_base =
System.get_env("SECRET_KEY_BASE") ||
raise """
environment variable SECRET_KEY_BASE is missing.
You can generate one by calling: mix phx.gen.secret
"""
host = System.get_env("PHX_HOST") || "example.com"
port = String.to_integer(System.get_env("PORT") || "4000")
config :fizz, FizzWeb.Endpoint,
url: [host: host, port: 443],
check_origin: :conn,
http: [
# Enable IPv6 and bind on all interfaces.
# Set it to {0, 0, 0, 0, 0, 0, 0, 1} for local network only access.
# See the documentation on https://hexdocs.pm/plug_cowboy/Plug.Cowboy.html
# for details about using IPv6 vs IPv4 and loopback vs public addresses.
ip: {0, 0, 0, 0, 0, 0, 0, 0},
port: port
],
secret_key_base: secret_key_base
Does yours have that?
That health check error means it can’t connect to your app server, which is most likely because it’s not listening on the right IP / port combo.
Hi, I’m having a similar error. My LiveView app runs locally (port 4000), I can get the through the postgres stuff (so those inet changes I needed to make work), but I think I’m lost between the port mods. in the Prod.exs I have