Hi team,
Since around 2026-05-06 ~08:35 UTC, app machines in fra have lost public IPv6 egress. There is no IPv6 default route inside the container, and every TCP/HTTP attempt to a public AAAA host (Google, Cloudflare, AWS Cognito eu-central-1) times out. IPv4 to the same hosts works. 6PN-internal IPv6 between Fly machines and to Postgres on *.internal still works fine — the machine is reachable over its fdaa:… 6PN address for SSH and the app’s database connections are healthy.
This affects every machine in the app: a fly machine restart did not change the symptom, and a fly secrets-driven rolling restart that produced a new machine version also did not. So this isn’t a single bad host.
Reproduction (run inside any started machine)
Save as fly-ipv6-repro.sh and pipe through fly ssh console:
fly ssh console --app <your-app> -C "bash -s" < fly-ipv6-repro.sh
…or just paste the body of the script directly into an interactive fly ssh console session. Script:
#!/usr/bin/env bash
# Self-contained IPv6 egress reproducer. Run from inside a Fly machine.
# Exit codes inside the body don't matter; we just print pass/fail per check.
set -u
echo "=== machine=$(hostname) uptime=$(uptime -p) when=$(date -u +%FT%TZ) ==="
echo
echo "--- IPv6 default route ---"
ip -6 route show default || true
echo
echo "--- DNS for www.google.com (returns both A and real AAAA) ---"
getent ahosts www.google.com | sort -u
echo
echo "--- IPv4 control: wget -4 https://www.google.com ---"
timeout 8 wget -q -O /dev/null --timeout=6 --tries=1 -4 https://www.google.com \
&& echo IPv4_OK || echo IPv4_FAIL_$?
echo
echo "--- IPv6 wget to https://www.google.com (real AAAA) ---"
timeout 10 wget -q -O /dev/null --timeout=8 --tries=1 -6 https://www.google.com \
&& echo IPv6_OK || echo IPv6_FAIL_$?
echo
echo "--- IPv6 wget to https://www.cloudflare.com (different AS, real AAAA) ---"
timeout 10 wget -q -O /dev/null --timeout=8 --tries=1 -6 https://www.cloudflare.com \
&& echo IPv6_OK || echo IPv6_FAIL_$?
echo
echo "--- Raw TCP6 SYN to a public v6 IP:443 (no TLS, no HTTP) ---"
v6=$(getent ahostsv6 www.google.com | awk '!/::ffff:/{print $1; exit}')
echo "target=$v6"
timeout 8 bash -c "</dev/tcp/$v6/443" \
&& echo TCP6_OK || echo TCP6_FAIL_$?
echo
echo "--- 6PN sanity: ping Fly internal DNS over IPv6 ---"
timeout 6 ping -6 -c 2 -W 2 fdaa:0:1::3 || true
Observed output
=== machine=<redacted> uptime=up 35 minutes when=2026-05-06T11:46:48Z ===
--- IPv6 default route ---
(empty — no v6 default route configured)
--- DNS for www.google.com ---
142.251.157.119 STREAM www.google.com
2001:4860:4826:7700:: STREAM
2001:4860:4827:7700:: STREAM
2001:4860:4828:7700:: STREAM
...
--- IPv4 control: wget -4 https://www.google.com ---
IPv4_OK
--- IPv6 wget to https://www.google.com (real AAAA) ---
IPv6_FAIL_124
--- IPv6 wget to https://www.cloudflare.com (different AS, real AAAA) ---
IPv6_FAIL_124
--- Raw TCP6 SYN to a public v6 IP:443 ---
target=2001:4860:4829:7700::
TCP6_FAIL_124
--- 6PN sanity: ping Fly internal DNS over IPv6 ---
2 packets transmitted, 2 received, 0% packet loss
FAIL_124 = the outer timeout killed the call. Without the outer guard, wget -6 itself eventually exits via its --timeout after a much longer wall-clock — typical observed time before it gives up and falls back to v4 is ~30s.
Application-level symptom
We’re a .NET 10 ASP.NET Core app using Microsoft.AspNetCore.Authentication.JwtBearer against AWS Cognito eu-central-1. Cognito is one of the few public AWS endpoints that publishes real AAAA records (most others, like s3.eu-central-1.amazonaws.com, return only IPv4-mapped ::ffff:… addresses, so connects go via v4 anyway). Microsoft.IdentityModel.Protocols.OpenIdConnect.ConfigurationManager uses a managed HttpClient to fetch the OIDC discovery doc; on Linux it does not implement Happy-Eyeballs the way wget/curl do, so it stalls on the v6 connect for the full backchannel timeout, never gets a usable BaseConfiguration, and on every authenticated request the JwtBearer middleware throws:
Microsoft.IdentityModel.Tokens.SecurityTokenInvalidIssuerException:
IDX10204: Unable to validate issuer. validationParameters.ValidIssuer is null
or whitespace AND validationParameters.ValidIssuers is null or empty.
…which produces a 401 to the client with error="invalid_token", error_description="The issuer '<authority>' is invalid". So at the app level it looks like a Cognito issuer-mismatch bug, but it’s actually a missing OIDC config caused by failed v6 egress.
Environment
- Region:
fra - Image: based on
mcr.microsoft.com/dotnet/aspnet:10.0 - Process group:
app, twoappmachines (min_machines_running = 1, rolling deploy) - 6PN intra-org IPv6: working
- Public IPv4 egress: working
- Public IPv6 egress: 100% timeout (no default route)
What I’ve ruled out
- DNS —
getent ahostsv6returns expected real AAAA records. - Per-host route — fails identically to Google, Cloudflare, AWS Cognito (different ASes).
- Bad single host — issue persists across
fly machine restartand across afly secrets-triggered rolling restart that bumped to a new machine version. Both machines in the app affected. - Cognito side — fetching the OIDC discovery doc over IPv4 from inside the same machine returns the expected
issuerfield. Customer side and config are fine. - Process-level disable as a workaround — setting
DOTNET_SYSTEM_NET_DISABLEIPV6=1immediately broke the app: Postgres on*.internalis IPv6-only over 6PN, so disabling v6 process-wide killed DB connectivity at startup (Npgsql.NpgsqlException: Name or service not known). Reverted within seconds.
Workaround in place
Configuring JwtBearerOptions.BackchannelHttpHandler with a SocketsHttpHandler.ConnectCallback that filters DNS results to AddressFamily.InterNetwork — IPv4 only, scoped to OIDC/JWKS traffic. 6PN/Postgres continues over IPv6 unaffected. Deploying that now.
Questions
- Is public IPv6 egress from
fraknowingly broken right now, or specifically for our org? A peer reported the same symptom shape fromgruto a Supabase AAAA endpoint a few days ago — could be the same root cause surfacing in a different region. - Can someone on staff check the v6 default-route advertisement / RA on our machines in
frawithout us needing to reproduce live? Happy to share machine IDs and the app name in a DM. - Is there a
traceroute6/mtr -6you’d like us to run for one diagnostic pass? Neither is in our runtime image, but we canapt-get installonce if it’d help.
Thanks!