Hey folks,
I have a working Phoenix application running but realised that the default health check doesn’t actually check that the machine is “healthy” (as in: can serve web requests). The other day, I deployed a bug that started the application, but caused every request to fail because of a configuration mistake.
So, I wanted to add a custom [[services.http_check]]
to the fly.toml
that checks whether a HTTP request to /healthy
returns 200
.
TLDR: The health check never worked and I couldn’t deploy new versions. When I deployed a version, the machine never stopped even though it was never “healthy” and I canceled the deployment.
Here is my fly.toml
:
app = "redacted"
primary_region = "arn"
kill_signal = "SIGTERM"
[build]
[deploy]
release_command = "/app/bin/migrate"
strategy = "canary"
[env]
PORT = "8080"
DNS_CLUSTER_QUERY = "redacted"
[http_service]
internal_port = 8080
force_https = false
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
processes = ["app"]
[http_service.concurrency]
type = "connections"
hard_limit = 1000
soft_limit = 1000
[[services.ports]]
handlers = ["http"]
port = 8080
[[services.http_checks]]
interval = 10000
grace_period = "5s"
method = "get"
path = "/healthy"
protocol = "http"
timeout = 2000
tls_skip_verify = false
and the critical parts of my runtime.exs
:
host = get_env!("PHX_HOST")
port = get_env("PORT", 4000, :int)
config :vcp, VcpWeb.Endpoint,
url: [host: host, port: 443, scheme: "https"],
http: [
ip: {0, 0, 0, 0, 0, 0, 0, 0},
port: port
],
secret_key_base: secret_key_base,
check_origin: [
"https://#{host}",
"https://www.#{host}"
]
When I run fly deploy --remote-only
, the application is built and deployed correctly, but the waiting for the machine to become healthy
check never completes. When I stop the deployment with CTRL + C
, the deployment stops, but the leases are not cleared right away. I had to clear them manually with fly machines leases clear
.
When I run fly checks list
, I don’t see a particular error, but just:
➜ fly checks list --debug --verbose
Health Checks for redacted
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
----------------------------*---------*----------------*--------------*----------------------------
servicecheck-00-http-8080 | warning | d89dee2c400348 | 45s ago | waiting for status update
----------------------------*---------*----------------*--------------*----------------------------
I currently have 6
machines that are either stopped
but I can’t kill them or delete them.
Please help