Thanks for trying to help @halfer. For now, the best solution is to make starting the endpoint dependent on the startup process, which wouldn’t be too hard with Elixir but still require a bit of refactoring.
I’m simply trying to point out to the Fly team that the documented (and logical) behavior of not forwarding requests to machines until their health checks are passing is not working as intended.
I have setup an all-in-one minimal reproduction repository to demonstrate the issue: GitHub - sheerlox-repros/fly_health_check
Here’s the resulting logs from running the instructions in the README:
2025-02-22T20:15:06Z proxy[48ed67dc599668] cdg [info]Starting machine
2025-02-22T20:15:06Z app[48ed67dc599668] cdg [info]2025-02-22T20:15:06.547959464 [01JMQMPCFX63KNFJYGREQX0WPR:main] Running Firecracker v1.7.0
2025-02-22T20:15:07Z app[48ed67dc599668] cdg [info] INFO Starting init (commit: 67f51b8b)...
2025-02-22T20:15:07Z app[48ed67dc599668] cdg [info] INFO Preparing to run: `/app/bin/server` as nobody
2025-02-22T20:15:07Z app[48ed67dc599668] cdg [info] INFO [fly api proxy] listening at /.fly/api
2025-02-22T20:15:07Z runner[48ed67dc599668] cdg [info]Machine started in 1.304s
2025-02-22T20:15:07Z proxy[48ed67dc599668] cdg [info]machine started in 1.310374664s
2025-02-22T20:15:08Z app[48ed67dc599668] cdg [info]2025/02/22 20:15:08 INFO SSH listening listen_address=[fdaa:3:d216:a7b:16a:f8cb:a07a:2]:22
2025-02-22T20:15:10Z app[48ed67dc599668] cdg [info]20:15:10.566 [info] Startup process started
2025-02-22T20:15:10Z app[48ed67dc599668] cdg [info]20:15:10.583 [info] Running FlyHealthCheckWeb.Endpoint with Bandit 1.6.7 at :::8080 (http)
2025-02-22T20:15:10Z app[48ed67dc599668] cdg [info]20:15:10.596 [info] Access FlyHealthCheckWeb.Endpoint at https://fly-health-check.fly.dev
2025-02-22T20:15:10Z app[48ed67dc599668] cdg [info] WARN Reaped child process with pid: 690 and signal: SIGUSR1, core dumped? false
2025-02-22T20:15:10Z app[48ed67dc599668] cdg [info]20:15:10.788 [error] Healthcheck failed: :not_ready
2025-02-22T20:15:10Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.224864138s
2025-02-22T20:15:10Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.235562454s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.269394045s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.297547363s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.342454458s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.358317458s
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.009 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.020 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.053 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.060 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.072 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.082 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.105 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.112 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.124 [error] Request before health checks are passing
2025-02-22T20:15:11Z app[48ed67dc599668] cdg [info]20:15:11.126 [error] Request before health checks are passing
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.451439049s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.481055955s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.523875611s
2025-02-22T20:15:11Z proxy[48ed67dc599668] cdg [info]machine became reachable in 3.535894472s
2025-02-22T20:15:15Z app[48ed67dc599668] cdg [info]20:15:15.566 [info] Startup process completed
2025-02-22T20:15:15Z app[48ed67dc599668] cdg [info]20:15:15.794 [info] Healthcheck passed
2025-02-22T20:15:16Z health[48ed67dc599668] cdg [info]Health check on port 8080 is now passing.