Fly Postgres ReadOnly Nodes and HA

noahschu · October 26, 2024, 2:39pm

I have a lot of questions related to Fly PG specifically around High Availability, Read Replicas, and Load Balancing. I read through most the docs including the HA & Global Replication Guide but haven’t really found the info I am looking for. Maybe I have missed it somewhere along the way but figured a topic post like this might be helpful for other too.

Let’s assume I created a single node (machine) PG instance in fly and then added 2 replicas in the same region EWR.

The first machine is a writer master node and the replicas are ReadOnly replicas right? If so how do I connect to the ReadOnly replicas? In aws it’s often <hostname>-ro. I see in the HA article the port 5433 is for read replicas but when I use this port to connect I can still perform write operations! How is this possible? Is there a different hostname that I should be using to connect to the read replicas?
How is load balancing done across the replicas? Fly scaling model doesn’t seem like it would support load balancing across multiple machines right? It would instead a single machine until the request/connection limits are hit then start sending additional requests to the second machine?
Is it possible to see replication lag (in ms) or get any information around replication to reader nodes?

I feel like my mental model of Fly PG must be a little off because of my confusion around the above questions. Please correct me if so. But as I understand it Fly PG not a “managed PG” but is just a pre-configured machine that runs a fairly base version of PG and the fly team has added some sugar to it in the CLI.

mayailurus · October 26, 2024, 5:12pm

I don’t think I can agree with this part, although “fairly base” is always in the eye of the beholder, … It’s definitely way more than apt-get install postgresql, though.

I’d suggest SSHing in and then poking around a little with ps aux, ss -tnpl, etc.

No, but the docs really do lead people in that direction… In reality, it’s the port for when you don’t care whether it’s writable:

Try cloning a node into bos, and then you’ll see the distinction way better.

If your overall aim is to spread queries out evenly over all 3 ewr machines, then I think Fly is expecting you to query db-app-name.internal (not .flycast) and combine with your own round-robining.

Conversely, if you did specifically only want to connect to replicas for some reason, then the (relatively) new metadata-based DNS queries may eventually allow that. (At a glance, it doesn’t look like the roles are currently getting published there, although some generic minutiae are.)

Hope this helps a little!

noahschu · October 26, 2024, 5:56pm

Thank you this definitely helps!

So if I set up two connection pools in my service as:

postgres://<user>:<password>@myapp.internal:5432/<db>
postgres://<user>:<password>@myapp.internal:5433/<db>

The first will connect via the internal ha proxy to main writer (always). The second will bypass the proxy and connect to any node (writer or reader). However if I want to achieve actual load balancing within the same region I need to implement some extra logic by obtaining the IP’s of the machines and spreading connections across them. This sounds a bit involved / brittle so probably will try and avoid writing custom LB logic and managing connections.

But TLDR is (let me know if I have this wrong)

5432 → writer
5433 → any replica
5433 does not load balance across replicas (I wonder if I had say 20 machines connecting on 5433 would it be somewhat uniform how their connections get distributed to the replicas)

Again thanks so much for the response and also the links to other useful threads!

mayailurus · October 26, 2024, 10:55pm

I would divide this into a 2 × 2 matrix:

	`.internal`	`.flycast`
5432 (primary)	single node	single node
5433 (all)	deterministic list*	coarse-grained load balancing†

*People say that vanilla Postgres clients tend to default to always just using the first entry, making the bottom-left square effectively no load balancing—unless additional measures are taken.

†The definition of “coarse-grained”, which it sounds like you’ve probably seen at least in part already, just in case:

Fly Proxy doesn’t consider the exact number of concurrent connections or requests. From the proxy’s point of view a Machine is handling between 0 and the soft limit, handling between the soft limit and the hard limit, or is at the hard limit. Along with region, this is how the proxy decides which Machine gets traffic.

I.e., only three effectively distinct levels of “load”.

(And since the default soft limit is 1000 connections, that first level lumps a huge amount together.)

Also relevant is the fact that this proxy understands only HTTP requests and generic TCP connections, so it further won’t make a distinction between an idle PG connection and an active one.

It might be worth checking whether your pooler already has knobs for load-balancing at the query layer across multiple addresses obtained via DNS. Otherwise, maybe start with Flycast port 5433 (when writes aren’t required) and see if the results aren’t actually tolerable.

(I don’t run heavily used clusters myself, so my interpretation of the load-balancing aspect is straight from the book—and hence may be overly pessimistic.)

Aside: Admittedly, the above table mixes the results of DNS queries in with various kinds of proxy magic, but it gives the right overall idea…

mayailurus · October 26, 2024, 10:55pm

Added proxy

noahschu · October 27, 2024, 12:06am

Amazing thank you!

People say that vanilla Postgres clients tend to default to always just using the first entry, making the bottom-left square effectively no load balancing—unless additional measures are taken

This has been my experience from the limited testing I have done connecting on 5433.

So my current takeaways are that having replicas in the same region is really just to provide fail over support and not to allow distributing read load across different machines. Correct? If it was I would think a more PG focused load-balancing solution would be in place.

For what I have right now I feel confident that the writer node can be vertically scaled to handle the read requests but it is good to know that it will become a bit of lift if I need to scale out horizontally. Caveat being - I can somewhat leverage different zones as a way to “distribute” read load since I can use the top1.nearest.of in that region.

Regardless I feel much better about my understanding thank you for taking the time to share all this info

system · November 3, 2024, 12:07am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.