Is postgres on Fly ready to host production databases ?

I have a production app running on Fly, I don’t have any data to migrate so I started with Fly Postgres. Deployed to the same region as main app in ewr. After reading the comments on this forum for a while, I got the impression that Fly Postgres is missing some features and is not functionally complete as a managed db offering.

I’m thinking about creating a managed db in DigitalOcean, in New York and connecting through DATABASE_URL in my Phoenix App. Is this better than the postgres on fly.

I’m a single dev and is not interested that much in operations, I want something that works and with minimal latency. I also want backups, but that’s about it. Should I stay with Fly then ?

1 Like

The Fly postgres offering is just a regular Fly deployment. It works but does require intervention from your side to perform backups. That’s not much fun, and enough hassle for me to look for other solutions as well for my customers while the Fly version gets worked out. I think it depends a lot on the latency you can handle. In two tests so far, I’ve seen around 5-8ms latency, in the Frankfurt and Toronto regions, between Fly and AWS.

So I looked into Crunchy Data’s Bridge product which was mentioned elsewhere.

This product is a bit hidden on their site, as it’s only in production since September. But it’s great! Basically like Heroku’s postgres offering, but can run on a variety of clouds. It won’t run directly on Fly yet, but there are enough cloud host options that I suspect you can find a low latency option.

I’ll be testing this out this week and see how it goes.

2 Likes

Backups are right around the corner. We’re already backing up all volumes, so it’s mostly a matter of exposing the functionality to download and restore.

As far as latency goes: they’re definitely meant to be used from a Fly-hosted app. A few ms of latency is expected crossing providers that aren’t in the same datacenters unfortunately.

@aswinmohanme as to your main question: the postgres offering is still considered beta. We’ve made large improvements and take uptime and any issues happening very seriously, but hiccups are to be expected while we’re in beta.

I expect we’ll be a lot more production ready within a month! It should be improving every week :slight_smile:

7 Likes

Nice! Will this backup happen in a consistent manner so data and logs are preserved? Just thinking ahead for both PG and Mysql installations.

Otherwise, I have been testing a fairly chatty Rails app peered with AWS RDS over wireguard. The latency is just too high for this kind of app (a ~200ms request with 20-30 sql queries took around 5x as long). So looking forward to the future of internally hosted databases :smiley:

Yes, they’re point-in-time LVM snapshots under the covers. n+1 latency is no joke.

Any updates on when Postgres on Fly will be out of beta / production-ready?

We’re prepping to migrate our core systems away from AWS (Fargate and Aurora Postgres and a couple smaller services) in a few weeks. Fly is my top pick but I’m a bit concerned about the database side of things

1 Like

Our last remaining todo is self service backup restores. We’re very close!

We are happy with reliability, I think our Postgres clusters are a good place to host production data right now. We just think people need to be able to restore clusters without talking to us before we can remove the “beta” label.

6 Likes

Sorry if I’m off topic, but looking at the Postgres HA example code, wouldn’t it be more efficient to skip the write attempt to catch the PG read-only error? For example:

if FLY_REGION != PRIMARY_REGION {
  [replay the request in PRIMARY_REGION]
  return
}

[proceed with local write]

This code would replay all requests in the primary region.

However, detecting a potential write instead of relying on exceptions is smart. In the official Fly Ruby gem, POST requests - and requests that happen immediately after a write - are replayed before even hitting the application layer.

See https://github.com/superfly/fly-ruby/blob/main/lib/fly-ruby/regional_database.rb#L69-L75

Thanks Joshua,

I should have been specific about the code. It is in a function that is explicitely about to do a write.

Right - same principle there!

Is there a status update on Postgres database backup?

@user16 We perform daily snapshots “per volume” every 24 hours. Here’s some documentation that covers how to view your snapshots and how to perform a restore: Multi-region PostgreSQL .

Adding on to the theme of the production ready-ness of Fly postgres…is it possible to backup the snapshots to another provider like an AWS S3 bucket, in addition to the snapchats that Fly provides? Maybe I am paranoid but it just feels sketchy having my only database backups on one provider.

I’d strongly recommend GitHub - wal-g/wal-g: Archival and Restoration for Postgres - you could run this off a docker container on Fly, point it at your DB, and it’ll backup your WAL logs to S3 (and I think any other cloud provider of choice). This is a streaming backup as well, so it basically runs 24/7.

There’s also simple commands to restore DBs from the WAL backups in your bucket, as well as manual (you could cron schedule them too) snapshot and restore.

And these are WAL logs, so they’re the best right-up-to-the-edge of the crash backup system I know of.

3 Likes

This is very interesting! We are an elixir custom-app development agency and we are seriously considering adopting the fly.io platform for our services and for our customers.

Currently, the absence of a continuous backup and point-in-time-recovery strategy for the postgres offering is the only feature we really lack.

We are also considering using wal-g or pgbackrest but we don’t understand how we should do it: could you provide more details and maybe an example of use?

2 Likes

Shaun,

What might be the issue if following the guide that you link to and there are no volumes listed?

We have an operational database but fly volumes list <app-name> returns no volumes.

UPDATE I can list volumes if I use an -a flag, so fly volumes list -a <app-name>. So perhaps the docs are just out of date?

However, it shows just one volume, created 1 month ago. So that doesn’t appear to be benefitting from a ‘daily snapshot’ or is there something else I, or the documentation is missing?

Thanks

@adamwiggall Not sure how this was missed, but there’s a mount specification within the fly.toml file. I went ahead and removed it within the repo, you should try removing that locally as well.

A volume should not be required to run the migration.

@adamwiggall Thanks for catching that missing -a flag. I added it to the doc.

1 Like

Shaun,

Thanks for the swift response. I’m totally lost on what you are asking me to do though?

My fly.toml doesn’t appear to have a ‘mount specification’, and when you say ‘I went ahead and removed it from the repo’ I’m not sure what you mean?

My intention here was to be able to see snapshot(s), perhaps have the ability to archive them to a 3rd party service, and to feel comforted that if something goes awry I can get back up and running. Your mention of running a migration has me puzzled too.

Sorry for not grasping what you mean.