Fly Postgres v1 replication errors after OOM

For completeness, I didn’t ever figure exactly what went wrong, but I ended up fixing it by recreating the volume:

  • Figured out which volume was the one in use: fly volumes list
  • Created a new one to be used fly volumes create pg_data --region syd --size 10
  • Scaled back up (and luckily the new VM used the new volume): fly scale count 2
  • Deleted the dud volume: fly volumes destroy $vol_id

I was then still left with the no keeper info available logs, but according to this post, this is fixed in a newer postgres image. So I updated using fly image update and hopefully they will stop after ~48 hours.