HTTP Health Checks won't pass

I have a docker app (appmasker-node) that just runs Caddy. On startup, Caddy fetches its config from a Node API which tells it to repond to /health-check. The app starts up and I can hit /health-check successfully. However, health checks never pass and the build gets replaced by an old successful deployment.

Here is my latest .toml:

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  auto_rollback = true

[mounts]
destination = "/config"
source = "config"

# port 80 mapping
[[services]]
internal_port = 80
protocol = "tcp"

[[services.ports]]
port = "80"

# port 443 mapping
[[services]]
internal_port = 443
protocol = "tcp"

[[services.ports]]
port = "443"

# port 2021 mapping (admin api)
[[services]]
internal_port = 10001
protocol = "tcp"

[[services.ports]]
port = "10001"

[[services.http_checks]]
  grace_period = "3m"
  interval = 10000
  method = "get"
  protocol = "https"
  path = "/health-check"
  timeout = 5000
  tls_skip_verify = true

Are you serving up a valid certificate from caddy? You can run fly checks list to see what exactly the failure is, I’m betting it can’t connect over https for some reason.

That was revealing. I had to switch the order of my services in the toml so that it didn’t try to access port 10001 for the check. I see that the IP Address being hit is that of the local instance and not the public domain.

I updated the toml to attempt to have the health check run on port 80 but as seen above it still makes the call with https://

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  auto_rollback = true

[mounts]
destination = "/config"
source = "config"


# port 2021 mapping (admin api)
[[services]]
internal_port = 10001
protocol = "tcp"

[[services.ports]]
port = "10001"

# port 443 mapping
[[services]]
internal_port = 443
protocol = "tcp"

[[services.ports]]
port = "443"

# port 80 mapping
[[services]]
internal_port = 80
protocol = "tcp"

[[services.http_checks]]
  grace_period = "3m"
  interval = 10000
  method = "get"
  protocol = "http"
  path = "/health-check"
  timeout = 5000
  tls_skip_verify = true

[[services.ports]]
port = "80"

But I still get this:

NAME                             STATUS   ALLOCATION REGION TYPE LAST UPDATED OUTPUT                               
c9c500b52d465d25f1653e631e15f72b critical 45a3b4c0   iad    HTTP 5m27s ago    Get                                  
                                                                              "https://172.19.2.138/health-check": 
                                                                              remote error: tls: internal error                           

Do you see a Caddy error as well that tells you why the health check request is failing?

Yes, I’m using on-demand tls which makes an api call to my server to determine if a given host should be issued a cert. In this case, the host is 172.19.0.218 which I don’t have has a valid host (I don’t really want to keep track of all my region’s internal IP addresses), but I do have appmasker-node.fly.dev as a valid host. I believe this is an issue because the health-check is using https:// even though I specified protocol = "http" in my toml.

2021-10-28T18:13:12.787 app[68e0fea8] iad [info] {"level":"debug","ts":1635444792.7874205,"logger":"http.stdlib","msg":"http: TLS handshake error from 136.144.56.219:61854: certificate for hostname '172.19.0.218' not allowed; non-2xx status code 404 returned from https://api.appmasker.com/domain/check"}

That checks out, yeah. I don’t think the health check here should be HTTPS. @michael @kurt Are we missing something?

Right, health checks should be http, checking https is hard. I don’t think you can even send SNI with a health check, so the Lets Encrypt on demand stuff won’t work.

The best way to do this is to configure Caddy to not require TLS for the health check URL. I’m not completely familiar with Caddy config but that should be possible.

Derp, I misread what was happening here.

I’m betting Caddy is redirecting the health check from http → https. So we’re not making an https health check call on purpose, but we are following the redirect (which then fails). Configuring caddy to not require https for that specific endpoint should fix you up.

You’re right, Caddy does redirect http → https in this case. Using on-demand tls while also specifying an exception for an IP address is non-trivial with Caddy. I’ll opt for TCP health checks for now while I figure this quirk out. Thank you!

When I’ve done this with nginx, I’ve created a separate HTTP server on a different port with no TLS requirement. Assuming you can do that in Caddy, it also might work! TCP checks will work just fine though.