Sort of. Since you’re connecting to port 5432 the HA setup that Fly uses will always send the connection over to the primary even if the IP is pointing at the replica, so you’re 99% ok and not too far from where we started.
But I can see how this would be worrying, so it’ll make sense to run the connections straight to Postgres for now until the bouncer is cleanly deployed. Either way you’re still in the single digit application connection limits, so I’m not sure the bouncer is an immediate necessity. I usually don’t add the bouncer until the first major connection pool related outage
Totally, we truly wanted to have a better understanding and sure we could have the option to switch to pg bouncer in the case that VMs start scaling up and prisma starts eating up connections.
As a side note, to better understand the 5432 port / ipv6 address - When a replica becomes the leader for whatever reason the leader has died, will the replica take over the address previously owned by the leader?
Not entirely understanding how the leader is replaced in the situation where the leader dies, is a new leader created or is the replica immediately used while a new leader spins up? Does the replicat become writable in the case that its the new leader? If so, if a new leader is spun up will it then use the same IPv6 address as the original leader?
Postgres HA always creates two volumes and two VMs/instances
they communicate over Consul to figure out who’s the leader (primary) and replica
writes always go to the primary
there’s a HA proxy layer that forwards all connections on port 5432 to the primary, irrespective of which instance the connection is made do
5433 port connections are handled right there
if one instance goes down the other instance will take over as primary
the instance that went down will be restarted, and since the IP is pinned to the volume, when it comes back up it will have the same IP address.
So let’s say in our current case you were lucky enough to choose
the IP of the instance that stayed up, and that’s the primary,
you’d experience no interruption if the replicate went down.
if the primary went down, downtime would be the time it takes for the primary to restart — if it sees it’s still the primary it takes the connections itself, if the other instances has taken over all connections will be proxied to it.
If you chose the IP of the replica, all your connections would be proxied to the primary and you wouldn’t notice.
if the primary went down, downtime would be the time it takes for the replica to elect itself
if the replica went down, downtime would be the time it takes for the replica to restart
So not doing the DNS resolution might add downtime equivalent to the time it takes to restart the service in some cases.
So it seems by basically pinning PGBouncer to a single IP, HA is no longer have 100% uptime as it will take time to start the new instance up, and I am guessing in the PG world, this is pretty slow.
Just for curiosity, is my understanding correct if this was properly pinned to the the internal dns record, it would instantly switch to the replica as the leader once it determines that the leader is dead?
Should we go about forking this pg bouncer image to get away from alpine and move it to slim? I would assume a lot of users here that are using fly postgres would need a solution for pg bouncer.
HA is no longer have 100% uptime
it would instantly switch to the replica
Unfortunately 100% uptime or instant switchover was never on the table. The election takes non-zero time (not sure how long), there’s a lag for the election trigger to consider the primary “dead”, etc. And Postgres restarts are fairly quick, but the real time-to-effective-operation depends on DB size really. Think the server starts quick, but it needs to read a lot into RAM to have an effective cache hit rate.
if this was properly pinned to the the internal dns record
It is properly pinned, in the sense that the internal DNS is consistent with what is supposed to be running at all times. If there’s a deviation from what we want, we want to fix the deviation, not the DNS. So if we lose a VM we do change the DNS immediately, while we try to bring up the old VM with its pinned address.
But most changes of any sort will cause a few connections to become invalid and error out, HA or not. The only question is how much downtime until newly established connection are guaranteed to work.
The HA system with DNS lookups attempts to minimise that time, but once a connection is established the DNS is not looked up again for the lifetime of that connection — so all connections held to a primary will break when it goes down, no matter how fast the replica elects itself.
The advantage gained by having a DNS entry updated quickly is that the re-established connections will quickly see only the other instance, even before the restart finishes.
And yeah, I’m thinking of forking this and also raising a PR. If you’re up to it you can try it before I get around to it, but I’m thinking early next week.