Early look: PostgreSQL on Fly. We want your opinions.

We’re going to rely on disk snapshots for backups and also ship features for quickly cloning disks (or restoring from backup).

Our goal is to build plumbing to make things like Postgres/MongoDB/other HA work and then make the actual app projects open source.

The Postgres HA cluster app is what we’re running for this, which means two things:

  1. You can make pull requests to enhance functionality
  2. You can fork it and run it without going through our launcher

One (unique) thing we’re doing is giving out replica privileges on Postgres clusters. You can, for example, create a Postgres replica that streams to S3. We’d love to roll all that into our primary Postgres HA project, but you shouldn’t have to wait on us to do it. :slight_smile:

All our persistent volumes are nvme drives, each between 2 and 7TB. You can expect to get a guaranteed proportional amount of IOPs (500GB is about 25% of a 2TB NVMe capacity) with much higher bursts. You will probably be surprised how fast these disks are.

2 Likes

I’m repeatedly getting “Oops, something went wrong! Could you try that again?” when I try to run fly pg create and then complete the questions.

Is this still available to test out in flyctl?


$ fly version  
flyctl 0.0.167

Not sure what was going on with the CLI, but I was able to create one through the API manually. :slight_smile:

Can you try with the CLI again, but with debug logging?

LOG_LEVEL=debug fly pg create

Sure, same error message and nothing too useful in the output.

DEBUG Loaded flyctl config from/Users/scott/.fly/config.yml
DEBUG Working Directory: /Users/scott
DEBUG App Config File: 
? App name: contaim-cli-test
DEBUG --> POST https://api.fly.io/graphql {{"query":"{ organizations { nodes { id slug name type } } }","variables":null}
}
DEBUG <-- 200 https://api.fly.io/graphql (358.69ms)
? Select organization: Contaim (contaim-a9d13f1f-a0b9-44df-8d78-d55fbce237aa)
DEBUG --> POST https://api.fly.io/graphql {{"query":"query { platform { requestRegion regions { name code gatewayAvailable } } }","variables":null}
}
DEBUG <-- 200 https://api.fly.io/graphql (183.45ms)
Oops, something went wrong! Could you try that again?

@Scott could paste the output of https://debug.fly.dev here?

=== Headers ===
X-Forwarded-Ssl: on
Via: 2 fly.io
Accept-Encoding: gzip, deflate, br
Fly-Forwarded-Proto: https
Fly-Forwarded-Ssl: on
Fly-Forwarded-Port: 443
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
X-Forwarded-Proto: https
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.2 Safari/605.1.15
X-Forwarded-Port: 443
Fly-Region: chi

=== ENV ===
FLY_APP_NAME=debug
FLY_REGION=ord
FLY_VM_MEMORY_MB=128
HOME=/root
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TERM=linux
WS=this
is
a
test
cgroup_enable=memory

2021-02-02 04:39:55.750442595 +0000 UTC m=+619738.961021422

@michael Can you remove my IP address information once you’ve got all you need?

@Scott thanks for the debug info. Looks like we couldn’t detect your region which caused a panic in flyctl (my fault) – could you update to the latest flyctl version and try again? Thanks!

1 Like

Yep, looks good! I was able to create/delete it. Thanks for the quick turnaround.

I would love to give this a shot as soon as daily backups are a thing. Without that I would be hesitant to try it. Also is there PGBouncer or something of the sort that will be supported?

Also would be nice to support Redis in a similar fashion (globally shared between the regions as a full blown instance).

Is it possible to connect to it externally without wireguard? For example inside of github actions to run migrations?

We’ll be offering volume snapshots for backups. More on that soon.

We’re planning on baking PGBounder in the postgres app but haven’t gotten to it yet.

We have a prototype of this you can test out now: Upstash for Redis®* · Fly Docs. It’s using KeyDB instead of Redis which supports master-master replication.

Not right now, but you could launch an application that proxies an exposed port to postgres over the private network.

Question on pricing, do you technically pay 2x the vm pricing to cover the replica as well?

It’s just a fly app with sprinkles on top so pricing is the same. You can add more replicas or scale back to one if you want. We’ll have some docs covering scaling scenarios in a few days.

hmmm I guess I am confused, it seems there is always a replica, is it a bad idea to only run with one instance (no replica)?

fly autoscale show outputs Disabled - How would we remove the replica or add more in the future if need be? Sorry If I am completely missing something here haha.

This brings up another points I wanted to ask about, do pg apps scale in a similar way to normal fly apps? If so, how does this work? Is this based on connection counts? How many connections do we get per vm?

Trying to grasp how scaling works when app servers start scaling and eating up pg connections and how we keep the PG scale in sync if that makes sense.

Thanks in advance! SUPER excited to see PG on fly :raised_hands:t2:

It is a bad idea to run with one instance. The only way to do resilient data on Fly is to run replicas.

Postgres clusters don’t autoscale. Right now, scaling is all up to you! I’d like to introduce vertical autoscaling, but automatically adding instances is probably wrong for PG.

You manually scale them the same way you scale other apps. Here’s how you’d add a third instance in Sydney:

flyctl volumes create pg_data --region syd --size 10
flyctl scale count 3

Apps connect directly to postgres instances,

Interesting, so just by adding a new volume and setting the scale count to 3 it will automatically know to run a PG replica there?

Is there some magic happening behind the scenes?

We’re launching this Fly app when you create postgres clusters.

It uses Stolon (along with our shared Consul service) to manage replication + leader election. So when a third instance starts up, it figures out it should be a replica from the existing leader.

Very cool, and it will add these replicas based on regions there are volumes created?

Also, was I correct in my findings regarding pricing on the PG apps? Being that there is at least one replica, are these billed for both the leader and the replica?

Is there a concept of connection limits on these postgres instances? Do you have a best practice for when to scale and how (vertically vs. horizontally). Also wondering if there is a best practice for handling paired app servers scaling up and using more connections/memory/cpu.

Yes, volumes “restrict” where instances can run, so it’ll always launch them where you have volumes created.

There’s no special pricing for PG apps, they’re just normal Fly apps. So you do pay for both VMs (or all three if you add another replica).

Postgres has its own connection limits, we’re using the default of 100 per instance.

As a really rough heuristic, I’d shoot for VM sizes with 1GB of memory per 10-20GB of Postgres data. This varies wildly depending on workload but it seems to be a reasonable guideline for most full stack apps. I wouldn’t add replicas to scale most apps, but I would add replicas to run in other regions.