If I increase the interval, and instead of checking the current state, I check the events associated with the machine, that works quite well. Just a lot more complicated logic.
Your product might be eligible for the Extensions Program. If so, you’ll get access to Outbound Webhooks which include machine state changes.
If you don’t go that route, API polling and direct Prometheus integration are the two threads I’d pull on. Word of caution that Fly Metrics can be a bit rocky at times, so I would only suggest that interface for convenience and make sure you have a reconciliation plan for true-up.