Hi Fly.io Support,
I’m running into a persistent routing issue between two apps in the same organisation (webhouse) and would appreciate
your help diagnosing it.
Setup
- webhouse-cronjobs — a cron scheduler app, region: arn
- webhouse-whop — a monitoring app, region: arn, custom domain whop.webhouse.net
Both apps are in the webhouse org, both deployed to Stockholm (arn).
The problem
When webhouse-cronjobs makes outbound HTTP POST requests to https://whop.webhouse.net, approximately 50% of requests
fail with TypeError: fetch failed in 94–221ms. The other 50% succeed normally.
In webhouse-whop’s logs I can see the corresponding error on the proxy side:
error.message=“could not complete HTTP request to instance: client error (SendRequest)”
proxy fra request.url=“/”
The FRA (Frankfurt) edge proxy is handling the request but failing to forward it to the ARN machine. Since both apps
are in the same org and same region, I would expect traffic to route ARN→ARN directly.
What we tried
- fly-prefer-region: arn header on all outbound requests — no reliable improvement, still ~50% failure rate
- Dedicated IPv4 (37.16.16.161) replacing the shared IP, plus direct DNS A-records (no CNAME chain) — same failure
rate - Retry logic (3 retries, 5s fixed delay) — all 3 retries fail on the same request, so the underlying routing issue
persists across retries - Flycast (http://webhouse-whop.flycast) — allocated a private IPv6 (fdaa:2c:438c:0:1::3), DNS resolves correctly
from the cronjobs machine, but the connection fails with an unexpected EOF/SSL error. Our app uses [http_service]
with force_https = true — we suspect this may interfere with flycast even on the private network - .internal hostname on port 3000 — connection refused, likely because Next.js binds to IPv4 only and the .internal
address is IPv6
Current workaround
We moved from an external scheduler to node-cron running inside webhouse-whop itself, calling http://127.0.0.1:3000
via loopback. This works, but it means we can no longer use webhouse-cronjobs to trigger jobs on webhouse-whop, which
defeats the purpose of having a dedicated scheduler.
We also had to stop HTTP health-probing our other Fly apps (e.g., webhouse-whapi) from webhouse-whop because those
probes intermittently fail with the same FRA routing issue — even though the apps are accessible from outside Fly
without any problems.
Questions
- Why does traffic between two apps in the same org and same region (arn) route through the FRA edge proxy at all?
- Is there a supported way to make Fly apps in the same org communicate reliably — either via flycast or another
private networking mechanism — when using [http_service] with force_https = true? - Is the flycast SSL issue a known limitation with [http_service]? Would switching to [[services]] resolve it?
Happy to provide full logs, app names, or any other details. This is affecting our production monitoring
infrastructure.
Thanks,
Christian Broberg
WebHouse