We’re seeing repeatable private routing failures from one Fly Machine app to another Fly Machine app via Flycast.
-
Target app is running on Fly Machines (same org, region nrt)
-
Target app is created by machines api and its network status is following
-
Target machine starts successfully and serves health checks (healthcheck with 200)
-
source machine gets 502 after ~35-36s
-
Target machine logs at same time show:
-
[PR03] could not find a good candidate within 40 attempts at load balancing
-
last error: [PU03] unreachable worker host. the host may be unhealthy. this is a Fly issue.
-
I’m fairly lost at this point
Has anyone seen similar behavior, and what diagnostics/checklist would you recommend to isolate whether this is service config, process-group/routing setup, or platform-side networking/proxy behavior?
