Cert shows "Issued / verified and active" but edge serves no certificate

Hit what looks like a control-plane vs. edge desync on a custom hostname today.
Posting in case it helps anyone else and so Fly can take a look.

App / hostname

  • App: my-app (name redacted)
  • Broken hostname: app.client-domain.com (redacted)
  • Working sibling hostname on the same app / same IPs: app.other-domain.dev (redacted)

So this isn’t a DNS, app-routing, or anycast problem — a second hostname on
the same machines worked fine throughout.

Symptoms

TLS handshake fails immediately with zero bytes read, no peer certificate:

$ echo | openssl s_client -connect app.client-domain.com:443 \
    -servername app.client-domain.com
CONNECTED(00000005)
... SSL handshake has read 0 bytes and written 1577 bytes
SSL routines::unexpected eof while reading

curl returns (35) SSL_ERROR_SYSCALL / HTTP 000:

$ curl -sS -o /dev/null -w "HTTP %{http_code}\n" https://app.client-domain.com/
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL
HTTP 000

Meanwhile the Fly control plane insisted the cert was healthy:

$ fly certs show app.client-domain.com -a my-app
  Status                  = Issued
  Certificate Authority   = Let's Encrypt
  Issued                  = rsa,ecdsa
  Expires                 = ~2 months from now
✓ Certificate is verified and active

Edge proxy logs (visible via fly logs) repeatedly showed
client problem: invalid authority for that hostname.

What didn’t fix it

  1. fly certs add <hostname> (idempotent re-add) — reported “Certificate
    created” but TLS still failed.
  2. fly deploy --strategy immediate (no-op deploy to force a re-sync) — no
    change to edge behaviour.

What did fix it

Full remove + re-add cycle:

$ fly certs remove app.client-domain.com -a my-app --yes
✓ Certificate app.client-domain.com deleted from app my-app

$ fly certs add app.client-domain.com -a my-app
✓ Certificate created for app.client-domain.com

After ~15s the edge started serving a valid Let’s Encrypt cert and the
hostname returned HTTP 200. No DNS changes were needed (already correct,
matching the working sibling).

Guess at the cause

The cert record existed in the control plane (so fly certs show reported
“verified and active”) but the SNI/cert binding was missing on the edge
proxy. Idempotent fly certs add didn’t re-push the binding; only a full
delete forced the edge to re-sync when the cert was re-added.

Apologies for this. The root cause is actually last month’s incident with our certificate store, Vault. We renew a certificate 30 days ahead of expiry at the moment, so your certificate will have been renewed then but wasn’t persisted into Vault mid-incident. This didn’t break anything until now, when the previous certificated expired.

Your re-adding it fixed things for obvious reasons. I’ve also done a pass reconciling any certificates from that incident window that were also missed in the same manner. Our existing process handles vault outages fine, but it’s possible this incident caused writes to appear successful that were then lost, and we didn’t account for that.

We also have some changes going out this week that fix up a hole in our ongoing reconciliation, which will catch this case well before it reaches the expiry date.