Flycast returns PR03/PU03 ("unreachable worker host") between Fly Machines, even when target machine is started and healthy

We’re seeing repeatable private routing failures from one Fly Machine app to another Fly Machine app via Flycast.

  • Target app is running on Fly Machines (same org, region nrt)

  • Target app is created by machines api and its network status is following

  • Target machine starts successfully and serves health checks (healthcheck with 200)

  • source machine gets 502 after ~35-36s

  • Target machine logs at same time show:

    • [PR03] could not find a good candidate within 40 attempts at load balancing

    • last error: [PU03] unreachable worker host. the host may be unhealthy. this is a Fly issue.

I’m fairly lost at this point
Has anyone seen similar behavior, and what diagnostics/checklist would you recommend to isolate whether this is service config, process-group/routing setup, or platform-side networking/proxy behavior?

:waving_hand: This should be fixed now, there was a problem between the two specific hosts you were (un)fortunate enough to land on. Sorry about that!

Thanks for the update.. yes I see it working.

Is this a type of issue that won’t reach fly status? Because I’ve been having this problem for hours and checked fly status several times.

Our statuspage only shows platform-wide issues (or issues affecting at least multiple nodes), this was an issue specific to one single pair of hosts. I do agree it should have still generated a warning in your app’s dashboard (we have something specifically for this kind of single-host failures), but that wasn’t triggered this time, which is a problem I am going to fix. Apologies for the inconvenience.

1 Like

Understood. I really appreciate the quick response thanks.