Health checks on Machines

Are any form of (local or remote) health checks available for Machine apps? If so, what’s the behaviour when those fail? I’d expect them to get torn down as if they were hit with fly m stop.

I’ve asked this before but got no answers [0][1]. Today the (main) app process on one of the Machine VMs was not responsive on its service ports (for reasons unknown), but the VM was still getting traffic (which it couldn’t serve), and then eventually hit 100% CPU usage (it is unclear why). Ideally, health-checks would have kicked in and stopped this Machine, rather than Fly Proxy sending requests to a Machine which is in no position to fulfill them.

The only thing in logs was a running stream of:

error.message="failed to connect to fly machine: Supposedly started, and not stopped, but: connection timed out" 2022-12-01T19:33:06Z proxy[3d8dd59fe15689] sea [error]
error.message="failed to connect to fly machine: Supposedly started, and not stopped, but: connection timed out" 2022-12-01T19:33:06Z proxy[3d8dd59fe15689] sea [error]
error.message="failed to connect to fly machine: Supposedly started, and not stopped, but: connection timed out" 2022-12-01T19:33:06Z proxy[3d8dd59fe15689] sea [error]
error.message="failed to connect to fly machine: Supposedly started, and not stopped, but: connection timed out" 2022-12-01T19:33:06Z proxy[3d8dd59fe15689] sea [error]

[0] How do I change the restart policy for Machines? - #8 by ignoramous

[1] Non-service health checks - #5 by ignoramous

Unfortunately, [checks] block don’t work for Machine apps :frowning: Non-service health checks - #5 by ignoramous