How do I connect my node app to postgres to make it resilient to an outage?

Hey everyone :wave:

I was looking to see if anyone here has achieved HA with fly postgres. Today we had some issues with our Fly Postgres which has a standard setup using the fly cli to launch (2 VMs, same region, leader/replica, 2 volumes, etc.) but when there was some outages caused by some volume arrays going down in the single region where the two VMs were deployed, our node apps (running prisma client) started to fail.

Is there a way to better handle this failure on the application client side of things, possibly a way to switch over to the healthy VM quickly and start writing to that while theoriginall leader becomes healthy again?

@dan.wetherald here’s a note we’re thinking of adding to the docs as well. This approach is supported by libpq based libraries, but I’m not sure if your app is based on libpq. It should be possible to use the ideas anywhere, though:


To better use the Postgres HA system Fly offers, there are a few changes you can make on the application server:

Postgres supports a way to specify multiple hosts on the connection query string. The default DATABASE_URL will specifies a URL of the form postgresql://db-app.internal:5432/dbname which will point connections at either the primary or the replica DB instances. If the connection lands on the replica instance, the HA Proxy running in there will proxy the connection to the primary. Either way, if your application client caches the DNS call to db-app.internal and remembers only either the primary or the replica, connections will be inactive a long time in the event that the cached instance fails.

By specifying multiple hosts, it’s possible directly tell your client both the primary and the replica address (and replicas addresses as well), and ask it to choose a suitable read-only or read/write connection. By specifying postgresql://host1:123,host2:456/somedb?target_session_attrs=any, your client will always know alternative hosts that it can connect to in the event of a problem with any one of them. The target_session_attrs property allows you to specify that either read-only or read/write connections are acceptable (any) or that the connection must be writable (read-write). This allows you to configure a replica-ok or primary-only application connection respectively.

Because Fly instance IP addresses are always tied to a specific volume [TK- need to confirm this], you can pre-configure the full list of possible hosts in your DATABASE_URL beforehand. This prevents DNS caching errors, while also letting your application directly cycle between good hosts based on what it needs.

You can also set this up dynamically during application startup by querying the db-app.internal DNS AAAA name using your language’s equivalent of dig. This provides the list of instance addresses that are running your database, and you can set the target_session_attrs according to whether you’re currently configuring the primary or a read replica. Using the new nearest.of.db-app.internal special address (which orders the addresses by nearness) it’s also possible to set up Postgres to connect to closer instances preferentially.

Support for multiple hosts and target_session_attrs is available in the libpq library, which most common Postgres libraries use.


References:

2 Likes

Ah very cool, I am taking a look at what is possible with prisma and it is looking like they are using libpq fortunately.

Here is some options they have for connection string: prisma2/postgresql.md at master · maticzav/prisma2 · GitHub

So for now to get something launched would probably be to just spin up a few more regions so that there is at least two regions where our PG app is running, which we have already gone ahead and done, the leader in ORD, replica in ORD and now a second replica in IAD.

As for constructing the URI, Im not sure I completely understand what should be host1, host2, host3? nor do I completely understand what I should be selecting for the target_session_attrs value based on the fly postgres leaders/replicas/ports, etc. After reading the linked article above, Im not sure if I can use any or read-write for the current example.

To keep things simple, for now, I will need to assign IPs to the app to avoid the DNS cache, as the app currently does not have any IPs when running fly ips list. I have gone ahead and done so for both ipv4 and ipv6, so now I have 2 IPs, but the way I understand this is that these are app level IPs not region based IP, meaning I do not have IPs for host1,host2,host3, etc. This is where I have gotten stuck, am I missing something?

Thanks again @sudhir.j