We notice a weird issue recently. Once our app failed a health check, fly proxy won’t do subsequent health check hence the machine stucks at the “critical” state.
Also we can’t reproduce this consistently
Screenshot below shows the log of our health check, notice that the last health check failed at 11:37 UTC, and there is no subsequent activity for this machine:
This is the sate of our machine, notice that the health check API responded with timeout
This is our health check configuration
[http_service]
internal_port = 30002
force_https = true
auto_stop_machines = "off"
auto_start_machines = true
min_machines_running = 2
processes = ['app']
[[http_service.checks]]
interval = '30s'
timeout = '10s'
grace_period = '30s'
method = 'GET'
path = '/health'