Postgres WAL Overflow without Replication volumes

Hi @TheFlyingCoder. This is an interesting problem. If you’d be willing to share, I’d be curious to know what version of Fly PG you’re running (fly image show), how large your volume was, and how much was being taken up by WAL.

I’m not a Postgres expert, but AFAIK, the database might produce a significant amount of WAL files even without replication. Since they keep a record of changes before they’re flushed to Postgres’s main table storage, an instance with a write-heavy workload could still create a lot of them even without replication enabled.

A few queries that I can think of to run with fly pg connect (some of which you may have already tried):

  • SELECT * FROM pg_replication_slots to see if any replication slots were accidentally created that might be preventing WAL cleanup.
  • SHOW max_wal_size to check that the soft limit for WAL size isn’t too high relative to the volume size. (The most current version of Fly PG configures this to be 10% of the disk size when the database is created.)
  • SHOW wal_keep_size to check that keeping extra WAL files is disabled (should be 0).
  • SHOW archive_command and SHOW archive_library to see if there’s any WAL archiving enabled that might be preventing WAL cleanup. (Not sure how this would have been turned on, but it’s quick to check anyway.)

(Several of these are described here in the official Postgres docs.)