App failing health check doesn't receive another health check

We notice a weird issue recently. Once our app failed a health check, fly proxy won’t do subsequent health check hence the machine stucks at the “critical” state.

Also we can’t reproduce this consistently

Screenshot below shows the log of our health check, notice that the last health check failed at 11:37 UTC, and there is no subsequent activity for this machine:

This is the sate of our machine, notice that the health check API responded with timeout

This is our health check configuration

[http_service]
  internal_port = 30002
  force_https = true
  auto_stop_machines = "off"
  auto_start_machines = true
  min_machines_running = 2
  processes = ['app']

  [[http_service.checks]]
    interval = '30s'
    timeout = '10s'
    grace_period = '30s'
    method = 'GET'
    path = '/health'

Turns out the service has 100% CPU, so it’s not a fly issue

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.