I have a Managed Postgres (MPG) cluster (basic plan, syd, PG 16, engine mpgd) that my app can talk to fine internally, but every external admin path is broken. I’ve done a fair bit of layer-by-layer diagnosis and it looks like the issue is on the managed control-plane / auth side rather than my machine. Hoping someone can confirm or point me at a fix.
What IS working:
- My Fly app reaches the DB over the internal network without any problem. SELECT 1 from a fly ssh console Rails runner returns a result, and both app + worker machines arehealthy/started. So the database itself is up and serving.
- WireGuard tunnel is up: I can ping6 the cluster’s 6PN address with 0% packet loss (~20ms).
- Raw TCP to the cluster’s 6PN address on :5432 completes the handshake (nc -z succeeds).
- fly agent ping is healthy,
fly mpg status <cluster>shows Status: ready. - The v1 control-plane API (
/api/v1/postgres/<cluster>) returns credentials: ok and password: ok.
What is NOT working:
fly mpg proxy <cluster>binds the local port and logs Proxying localhost:NNNN to remote […]:5432, but every connection shows accepted new connection … connection closed immediately. psql through it fails with server closed the connection unexpectedly / before or while processing the request.- Connecting psql directly to the 6PN address (bypassing the proxy entirely) fails the same way — with both sslmode=require and sslmode=disable. So it’s not the proxy and not TLS negotiation. The TCP socket is accepted, but the moment any Postgres protocol bytes arrive (SSLRequest or plaintext StartupMessage) the backend slams the socket shut.
- The public .flympg.net hostnames (both the pgbouncer. and direct.* forms) don’t resolve at all (NXDOMAIN). I assume the basic plan has no public endpoint, so that route is expected to fail — flagging it just in case.
- The v2 control-plane API (
/api/v1/postgresv2/<cluster>) returns credentials: error and an empty/error password value, while v1 (above) returns ok. That mismatch feels like the
real smell.
My read: the listener/pgbouncer shell answers TCP, but the managed auth layer behind it is in a bad state (consistent with the v2 credentials: error), so it kills every real connection. My credentials are valid (they match the live v1 API value), my network/WireGuard is fine, and the DB is healthy internally — so this looks like a Fly-side managed-Postgres front-end problem, not a client issue.
Questions:
- Does the v2 credentials: error while v1 says ok indicate a known control-plane sync issue?
- Will
fly mpg restart <cluster>safely kick the auth/pgbouncer layer back into shape, or is this something only Fly support can repair? - Is there a current known issue with fly mpg proxy on recent flyctl (v0.4.57)?
Happy to share cluster ID / region privately if a Fly staffer wants to look. Thanks!