http_service.checks random questions

I recently upgraded an app to rails 7.x, which includes the /up health check.

My fly.toml config wasn’t created using fly launch, so I’m manually adding in the [[http_service.checks]] section to instruct fly use the new rails health endpoint.

I just have a couple of questions to confirm my understanding of how it works.

Here’s the minimal config I’m planning to use:

	[[http_service.checks]]
		grace_period = "5s"
		interval = "10s"
		timeout = "2s"
		path = "/up

Questions

  1. Do I have to specify method = 'get' and protocol = 'http'? If these are omitted, what method & protocol does the check default to?

  2. If my config includes a release_command, am I correct in assuming that the check won’t run in the temporary machine/app instance that executes the release command?

  3. When is the first check attempted?
    e.g. for the above config, will it first check after 5s (grace period), after 10s (interval) or after 15s (grace period + interval)?

  4. When deciding on a suitable grace_period, when does the “clock start ticking”?
    The docs for this simply say:

  • grace_period: The time to wait after a Machine starts before checking its health. Make sure this is long enough for your app to start up. For example, if your app takes 2 seconds to start up, give it some runway by setting grace_period to at least 3 seconds.

Using the [[http_service.checks]] config shown above, an excerpt of the logs is below.

  • The first health log can be seen at 01:46.34.733.
    This is ~10s after the runner first starts pulling the container image, and roughly the point in time that the previous running process is killed.

  • A second health log appears at 01:46.42.602, which is about ~8s after the first health log.

2024-10-24T01:46:24.029 runner[6e824274a06987] syd [info] Pulling container image...
2024-10-24T01:46:30.887 runner[6e824274a06987] syd [info] Successfully prepared image...
2024-10-24T01:46:33.603 runner[6e824274a06987] syd [info] Configuring firecracker
2024-10-24T01:46:34.248 app[6e824274a06987] syd [info] INFO Sending signal SIGINT to main child process w/ PID 327
2024-10-24T01:46:34.254 app[6e824274a06987] syd [info] - Gracefully stopping, waiting for requests to finish
2024-10-24T01:46:34.257 app[6e824274a06987] syd [info] === puma shutdown: 2024-10-24 12:46:34 +1100 ===
2024-10-24T01:46:34.257 app[6e824274a06987] syd [info] - Goodbye!
2024-10-24T01:46:34.500 app[6e824274a06987] syd [info] INFO Main child exited normally with code: 0
2024-10-24T01:46:34.514 app[6e824274a06987] syd [info] INFO Starting clean up.
2024-10-24T01:46:34.517 app[6e824274a06987] syd [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-10-24T01:46:34.518 app[6e824274a06987] syd [info] [598969.556542] reboot: Restarting system
2024-10-24T01:46:34.733 health[6e824274a06987] syd [warn] Health check on port 3000 is in a 'warning' state. Your app may not be responding properly. Services exposed on ports [80, 443] may have intermittent failures until the health check passes.
2024-10-24T01:46:40.357 app[6e824274a06987] syd [info] 2024-10-24T01:46:40.357474982 [01JAY386CG5KFD3V01D541R3AY:main] Running Firecracker v1.7.0
2024-10-24T01:46:40.695 app[6e824274a06987] syd [info] [ 0.267627] PCI: Fatal: No config space access function found
2024-10-24T01:46:41.055 app[6e824274a06987] syd [info] INFO Starting init (commit: 04656915)...
2024-10-24T01:46:41.120 app[6e824274a06987] syd [info] INFO starting statics vsock server
2024-10-24T01:46:41.121 app[6e824274a06987] syd [info] INFO Preparing to run...
2024-10-24T01:46:41.124 app[6e824274a06987] syd [info] INFO [fly api proxy] listening at /.fly/api
2024-10-24T01:46:41.140 runner[6e824274a06987] syd [info] Machine created and started in 17.115s
2024-10-24T01:46:41.415 app[6e824274a06987] syd [info] 2024/10/24 12:46:41 INFO SSH listening listen_address=[fdaa:0:a7e1:a7b:2dd:945d:ba22:2]:22 dns_server=[fdaa::3]:53
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] Puma starting in single mode...
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Puma version: 6.4.3 (ruby 3.3.1-p55) ("The Eagle of Durango")
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Min threads: 3
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Max threads: 3
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * Environment: production
2024-10-24T01:46:41.861 app[6e824274a06987] syd [info] * PID: 327
2024-10-24T01:46:42.602 health[6e824274a06987] syd [error] Health check on port 3000 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
2024-10-24T01:46:45.035 app[6e824274a06987] syd [info] * Listening on http://0.0.0.0:3000
2024-10-24T01:46:45.044 app[6e824274a06987] syd [info] Use Ctrl-C to stop

I’m unable to reconcile the timing of these two health log entries exactly to the check configuration, so any help would be greatly appreciated.

1 Like

I’ve done some further investigation/troubleshooting and can add the following to the above:

  • Adding method = 'get' and protocol = 'http' to the config does not seem to make any difference. The health check never passes and ultimately the deployment fails with the error
Error: failed to update machine 6e824274a06987: Unrecoverable error: timeout reached waiting for health checks to pass for machine 6e824274a06987: failed to get VM 6e824274a06987: Get "https://api.machines.dev/v1/apps/loottest/machines/6e824274a06987": net/http: request canceled
  • If I comment out the entire [[http_service.checks]] section and redeploy, I can confirm that the health endpoint correctly responds with a 200 OK by navigating my browser to https://<tls server name>/up, so the app and health endpoint seem to be fine

  • Since my app uses http_service.force_https = true, I tried changing the health check config to use protocol = 'https' and tls_skip_verify = true. With this configuration I can see entries in the logs where the health endpoint is being hit, but the following error is logged, which suggests that using protocol = 'http' was correct:

HTTP parse error, malformed request: #<Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?>

It seems that no matter what I try, I can’t seem to get the http_service.checks config to work, despite the fact that my app and the /up endpoint seem to work fine when there’s no health check configured.

Any help would be appreciated.

I was hoping that you might get official responses to your questions (3) and (4) above. (Even though official replies are not the norm in the community forum, of course.)

In the interim…

This unfortunately isn’t conclusive, since a redirect is counted as a failed health check. (Which surprises a lot of people.)

The best way to verify is from inside the machine itself…

$ fly ssh console
# apt-get update
# apt-get install --no-install-recommends curl
# curl -i 'http://localhost:3000/up'  # no `-L` (!)

(Assuming a Debian-based image.)

The response needs to be a straight 200 OK; even a 301 or 302 redirect will count as unhealthy, :dragon:.

Hope this helps a little!

Thanks @mayailurus , that is super helpful and yes, you’re absolutely right - turns out I’m getting a 301 response, not a 200.

The redirect is from http://https://, which I’m guessing is either because of…

[http_service]
  force_https = true

…in my fly.toml config, or because of…

config.force_ssl = true

…in my Rails app’s config/production.rb file. This file, as well as my config/puma.rb are just default, out-of-the-box Rails.

As mentioned above, when I tried using https and tls_skip_verify for the health check, I was getting an error from Puma that seemed to suggest it doesn’t accept SSL connections; so I just need to figure out how my setup differs from one generated using fly launch.

Thanks for the tip though, that has saved me some time. Once I get this sorted, I might suggest a few changes to the fly docs to make this all a bit easier for the next person that comes along.

1 Like

OK, I’ve figured it out. :tada:

Answering some of my own questions here:

No, you don’t need to specify these, as it appears that these are the default values if left out.
It is safe to omit them from your fly.toml config if you prefer a more minimal configuration.

It was the latter. With config.force_ssl = true in your Rails app, a request to http:// will typically receive a 301 redirect to https://.

However, Rails will not redirect if the X-Forwarded-Proto: https header is present in the request. This header basically tells Rails “although you see this as an http request, an SSL-terminating proxy is in front of you, and it originated from the browser as https

The Fly proxy/load balancer sets this header for requests coming to your app from the outside, but it seems that it doesn’t automatically set this header for health check requests.

(Tip: you could alternatively enable the Rails config.assume_ssl = true setting if you’re sure that all requests hitting your Rails app are coming through an SSL-terminating proxy or load balancer. I chose not to enable this as I’m trying to stick as close as possible to vanilla-Rails as I can, and the defaults for Rails don’t enable this).

Here’s what my final health check config looks like:

	[[http_service.checks]]
		grace_period = "5s"
		interval = "10s"
		timeout = "2s"
		path = "/up"
		[http_service.checks.headers]
			X-Forwarded-Proto = "https"

I hope this can help the next poor person that struggles with manually configuring health checks for their Rails application.

If/when I have time, I’ll proposed some docs changes that includes some of this information, as it would have saved me a heap of time if it were better documented.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.