I recently upgraded an app to rails 7.x, which includes the /up health check.
My fly.toml config wasn’t created using fly launch, so I’m manually adding in the [[http_service.checks]] section to instruct fly use the new rails health endpoint.
I just have a couple of questions to confirm my understanding of how it works.
Here’s the minimal config I’m planning to use:
[[http_service.checks]]
grace_period = "5s"
interval = "10s"
timeout = "2s"
path = "/up
Questions
-
Do I have to specify
method = 'get'andprotocol = 'http'? If these are omitted, what method & protocol does the check default to? -
If my config includes a
release_command, am I correct in assuming that the check won’t run in the temporary machine/app instance that executes the release command? -
When is the first check attempted?
e.g. for the above config, will it first check after 5s (grace period), after 10s (interval) or after 15s (grace period + interval)? -
When deciding on a suitable
grace_period, when does the “clock start ticking”?
The docs for this simply say:
grace_period: The time to wait after a Machine starts before checking its health. Make sure this is long enough for your app to start up. For example, if your app takes 2 seconds to start up, give it some runway by settinggrace_periodto at least 3 seconds.
Using the [[http_service.checks]] config shown above, an excerpt of the logs is below.
-
The first
healthlog can be seen at 01:46.34.733.
This is ~10s after the runner first starts pulling the container image, and roughly the point in time that the previous running process is killed. -
A second
healthlog appears at 01:46.42.602, which is about ~8s after the firsthealthlog.
2024-10-24T01:46:24.029 runner[6e824274a06987] syd [info] Pulling container image...
2024-10-24T01:46:30.887 runner[6e824274a06987] syd [info] Successfully prepared image...
2024-10-24T01:46:33.603 runner[6e824274a06987] syd [info] Configuring firecracker
2024-10-24T01:46:34.248 app[6e824274a06987] syd [info] INFO Sending signal SIGINT to main child process w/ PID 327
2024-10-24T01:46:34.254 app[6e824274a06987] syd [info] - Gracefully stopping, waiting for requests to finish
2024-10-24T01:46:34.257 app[6e824274a06987] syd [info] === puma shutdown: 2024-10-24 12:46:34 +1100 ===
2024-10-24T01:46:34.257 app[6e824274a06987] syd [info] - Goodbye!
2024-10-24T01:46:34.500 app[6e824274a06987] syd [info] INFO Main child exited normally with code: 0
2024-10-24T01:46:34.514 app[6e824274a06987] syd [info] INFO Starting clean up.
2024-10-24T01:46:34.517 app[6e824274a06987] syd [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-10-24T01:46:34.518 app[6e824274a06987] syd [info] [598969.556542] reboot: Restarting system
2024-10-24T01:46:34.733 health[6e824274a06987] syd [warn] Health check on port 3000 is in a 'warning' state. Your app may not be responding properly. Services exposed on ports [80, 443] may have intermittent failures until the health check passes.
2024-10-24T01:46:40.357 app[6e824274a06987] syd [info] 2024-10-24T01:46:40.357474982 [01JAY386CG5KFD3V01D541R3AY:main] Running Firecracker v1.7.0
2024-10-24T01:46:40.695 app[6e824274a06987] syd [info] [ 0.267627] PCI: Fatal: No config space access function found
2024-10-24T01:46:41.055 app[6e824274a06987] syd [info] INFO Starting init (commit: 04656915)...
2024-10-24T01:46:41.120 app[6e824274a06987] syd [info] INFO starting statics vsock server
2024-10-24T01:46:41.121 app[6e824274a06987] syd [info] INFO Preparing to run...
2024-10-24T01:46:41.124 app[6e824274a06987] syd [info] INFO [fly api proxy] listening at /.fly/api
2024-10-24T01:46:41.140 runner[6e824274a06987] syd [info] Machine created and started in 17.115s
2024-10-24T01:46:41.415 app[6e824274a06987] syd [info] 2024/10/24 12:46:41 INFO SSH listening listen_address=[fdaa:0:a7e1:a7b:2dd:945d:ba22:2]:22 dns_server=[fdaa::3]:53
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] Puma starting in single mode...
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Puma version: 6.4.3 (ruby 3.3.1-p55) ("The Eagle of Durango")
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Min threads: 3
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Max threads: 3
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Environment: production
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * PID: 327
2024-10-24T01:46:42.602 health[6e824274a06987] syd [error] Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
2024-10-24T01:46:45.035 app[6e824274a06987] syd [info] * Listening on http://0.0.0.0:3000
2024-10-24T01:46:45.044 app[6e824274a06987] syd [info] Use Ctrl-C to stop
I’m unable to reconcile the timing of these two health log entries exactly to the check configuration, so any help would be greatly appreciated.