PostgreSQL database backup & restore

nickluger · March 30, 2022, 6:50pm

How is the status regarding PostgreSQL db backup & restore. Is there something planned/some ETA for this?

Many managed database services out there offer some kind of PITR besides automatic snapshots.

I discovered volume snapshots but they have a an RPO of 24 hours?

Also: Read-replicas don’t protect against human error or malicious events (DROP TABLE), as they propagate instantly. (Although having a delayed replication would work)…

We could set up all this ourselves using something like WAL-G, (here’s a nice tutorial), but that binds many resources - especially to get it right and have recovery tested properly, too.

Still, one should maybe backup to an external provider like AWS S3 for increased resilience.

I think a solid production ready DB setup is a pretty important feature for Fly, because when picking a cloud provider everything lives or dies with the database. The location of the database determines everything else, because even small latencies add up tremendously, when doing READ-WRITE-READ patterns etc.

What’s the current recommendation for disaster recovery? Is there something on the way? What’s your experience with that?

Thanks in advance!

kurt · April 1, 2022, 11:22pm

Doh, I had a draft reply all written up for you and never posted it. Here it is!

Delayed read replicas are your best bet, and what we suggest. We don’t do this automatically, but you can configure one cluster to be a delayed read replica of a second cluster. This would make a good doc! Stolon includes some settings for creating clusters that are delayed replicas of other clusters, you can probably run two Fly.io Postgres apps in this config: stolon/standbycluster.md at master · sorintlab/stolon · GitHub

Our long term goal is to “give” Postgres to a company like Supabase that is focused on a really nice dev UX for postgres itself. The launch story would be the same, but their tooling would handle things like point in time restores, forks, etc. You can use them right now with Fly apps, if you want.

The plumbing we build is meant to be general purpose. We’ve made it easy to create Postgres, but probably won’t ship Postgres specific infrastructure. We will be exposing volume snapshot settings soon, though. And potentially incremental volume snapshots.

nickluger · April 2, 2022, 6:03am

Hey Kurt,

thank you for your comprehensive answer, that helps us move forward and set up a solid strategy.

I have to mitigate my statement about latency, though, as latency in some regions between AWS and Fly (e.g., fra) seems perfectly sufficient for distributing hosting between providers, in case that’s what someone wants to do.

kurt · April 2, 2022, 3:35pm

Yes, good point! There are many regions with <1ms latency between us and AWS. Generally, if they’re in the same city they work really well.

spiffytech · April 20, 2022, 8:31pm

Are there options for me to set up point-in-time restores for Fly’s Postgres? All of the off-the-shelf tools I see want either to either run an agent on the Postgres server or SSH into it.

binajmen · May 12, 2022, 7:24pm

Could you elaborate on that? Are you suggesting to look at Supabase for the Postgres hosting? What would be the consequence regarding the egress traffic cost? Would love to know more on your long-term strategy about Postgres, if you have any big plans

joelmoss · May 13, 2022, 12:00pm

Yes, I would also like to know about this “giving” of postgres. That would surely affect latency between our apps and the DB if the DB is hosted elsewhere? ie. Supabase.

I’m currently evaluating heroku alternatives, so need to know things like this.

lessless · May 14, 2022, 11:51am

@kurt @nickluger, it seems that it’s possible to create volume snapshots manually and hence have a “simple” app to apply a custom backup strategy.

For example, create snapshots every 6 hours and keep snapshots only for the last three months.

That’s if snapshots are not removed automatically.

nickluger · May 14, 2022, 5:32pm

Yes, that’s possible.

For us, it’s not an acceptable solution, though, as we have a shorter RPO, i.e., we cannot tolerate potential data loss of worst case 6 hours. Depends on the use case, of course.

We decided to go with AWS RDS in the end as they have PITR with an RPO of 5 minutes, everything fully managed for a reasonable price. Latency is not as low as within the Fly network, but sufficient for us.

lessless · June 2, 2022, 10:21am

We are taking the same direction

qqwy · June 3, 2022, 9:08pm

Some of you might already be aware, but others might find this interesting as well:
I recently opened a PR on Fly’s postgres-ha repository, to add support for wal-g to Fly’s postgres image.

Feedback from people savvy with wal-g or the streaming backup/restore space in general would be very welcome!

Elder · August 19, 2022, 8:45pm

@qqwy Awesome! Any chance it will get accepted?

qqwy · September 2, 2022, 10:59am

It was accepted and merged just now. you should be able to run fly image update on your DB app to pick up the changes, and then start using it as per the instructions in the PR.

Topic		Replies	Views
Setting up PITR for PostgreSQL Questions / Help wishlist , postgres	2	660	May 16, 2022
Postgres not being backed up when expected Questions / Help postgres	6	357	November 27, 2022
Clone database and volume	11	2068	January 14, 2023
Emergency maintenance for over 24 hours? postgres , volumes	9	86	October 23, 2024
Emergency Maintenance Affecting My App - Can I Migrate or Do I Need to Wait? Questions / Help lhr , postgres , machines , databases , volumes	3	107	May 6, 2025

PostgreSQL database backup & restore

Related topics