Can’t seem to get Healthcheck-based Routing to work.
I have an app with nginx and php-fpm. As php-fpm takes sometime to start I’ve setup php-fpm to respond to ping requests with ping.path = /_fpm/ping
and proxy the requests via nginx with:
location = /_fpm/ping {
access_log off;
include fastcgi.conf;
fastcgi_pass 127.0.0.1:9000;
}
I’ve setup a health check in fly.toml like this:
[[http_service.checks]]
method = "GET"
path = "/_fpm/ping"
grace_period = "3s"
interval = "5s"
timeout = "2s"
Health check is working correctly:
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
--------------------------*---------*----------------*--------------*---------
servicecheck-00-http-80 | passing | 6e8254d5c09d87 | 4m12s ago | pong
--------------------------*---------*----------------*--------------*---------
I have scale to 0 enabled for this app. I expect that when there’s a new request, the machine will start and wait for the health check to pass before the request is sent to the machine. However it seems like the request is sent to the machine right after it’s started and before the health check is passed resulting in 502. This is visible from the logs:
2024-05-28T05:49:34.305 proxy[6e8254d5c09d87] nrt [info] Starting machine
...
2024-05-28T05:49:34.741 runner[6e8254d5c09d87] nrt [info] Machine started in 433ms
2024-05-28T05:49:34.741 proxy[6e8254d5c09d87] nrt [info] machine started in 436.819157ms
2024-05-28T05:49:34.741 app[6e8254d5c09d87] nrt [info] 2024/05/28 05:49:34 INFO SSH listening listen_address=[fdaa:3:1193:a7b:2df:8580:82d:2]:22 dns_server=[fdaa::3]:53
2024-05-28T05:49:34.776 proxy[6e8254d5c09d87] nrt [info] machine became reachable in 34.410035ms
2024-05-28T05:49:34.778 app[6e8254d5c09d87] nrt [info] 2024/05/28 05:49:34 [error] 329#329: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 172.16.1.154, server: _, request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "***.fly.dev"
2024-05-28T05:49:34.778 app[6e8254d5c09d87] nrt [info] 172.16.1.154 - - [28/May/2024:05:49:34 +0000] "GET / HTTP/1.1" 502 157 "-" "curl/8.5.0"
2024-05-28T05:49:34.779 app[6e8254d5c09d87] nrt [info] 2024/05/28 05:49:34 [info] 329#329: *1 client 172.16.1.154 closed keepalive connection
2024-05-28T05:49:34.894 app[6e8254d5c09d87] nrt [info] [28-May-2024 05:49:34] NOTICE: fpm is running, pid 321
2024-05-28T05:49:34.894 app[6e8254d5c09d87] nrt [info] [28-May-2024 05:49:34] NOTICE: ready to handle connections
2024-05-28T05:49:39.180 app[6e8254d5c09d87] nrt [info] 127.0.0.1 - 28/May/2024:05:49:39 +0000 "GET /_fpm/ping" 200
2024-05-28T05:49:39.590 health[6e8254d5c09d87] nrt [info] Health check on port 80 is now passing.
...
2024-05-28T05:49:45.333 app[6e8254d5c09d87] nrt [info] 127.0.0.1 - 28/May/2024:05:49:45 +0000 "GET /index.php" 302
2024-05-28T05:49:45.333 app[6e8254d5c09d87] nrt [info] 172.16.1.154 - - [28/May/2024:05:49:45 +0000] "GET / HTTP/1.1" 302 5 "-" "curl/8.5.0"
As shown from the logs, the first request was sent to the machine right after machine became reachable and before Health check on port 80 is now passing thus resulting immediate 502. I’ve included a normal request at the end after the health check has passed as a reference.
Is this the expected behavior? Did I misunderstand how healthcheck-based routing should work?