App: eziopro-prod-db (postgres-flex 17.2, 3-node HA, jnb region)
Discovered: 2026-05-28 during pre-deploy preparation
Volume snapshots have not been creating on this cluster for ~4 months despite being configured correctly. Specifically:
-
flyctl volumes show <vol-id> -a eziopro-prod-dbreports:
Snapshot retention: 5
Scheduled snapshots: true -
flyctl volumes snapshots list <vol-id>returns “No snapshots available” for all 3 volumes:- vol_40ln1g39o0y5plm4 (primary)
- vol_vdmwg8okqjoykjkv (replica)
- vol_4o5glq6e5dkwolxv (replica)
-
Three manual triggers via
flyctl volumes snapshots create <vol-id>on 2026-05-28 returned “Scheduled to snapshot” success acknowledgments, but no snapshots ever appeared insnapshots listeven after ~15 minutes of polling.
The volumes themselves are healthy and the cluster is functioning normally (replication, queries, writes all fine). All 3 volumes were created ~4 months ago and the cluster has been operational throughout, so based on retention: 5 and the default daily auto-snapshot schedule, I would expect to see at least 5 rotating snapshots at all times — there are zero.
I’ve since enabled Tigris/WAL backups as a workaround (flyctl postgres backup enable), which is working correctly. But the volume snapshot pipeline being silently broken seems like a meaningful platform issue, especially for users who rely on volume snapshots as their backup mechanism without realizing it isn’t working.
Could you investigate why the snapshot worker hasn’t been creating snapshots for this cluster, and whether other postgres-flex clusters might be affected by the same issue?
Happy to provide additional diagnostic output if useful.
Thanks,
Richmond