Fly PG Read Replica Multi-Region Clusters with Prisma / Node

Hello everyone!

We are getting ready to deploy a new version of our service on Fly and would love to take advantage of the Multi-Region Postgres Clusters but we are using Prisma/Node for this new project.

Looking over a few discussions/issues on Github it does not look like there is an “official” solution for solving this, wanted to see if anyone has got something up and working yet.

We see this as a SUPER powerful and valuable offering from Fly and would love to utilize it.

Being that we serve optimized checkout pages for merchants who have customers all over the globe accessing their checkout pages at the same time, this could be a big performance enhancement (and natural load balancer) for us :muscle:

Thanks in advance!

Hey Dan! We’ve got a few customers successfully working with Prisma on this. Our fly-node library is a very bare bones collection of helpers to make this work. We’d like to make this library more useful.

It would be helpful if you could describe a bit more about your app behavior and which frameworks you’re using, if any.

Here’s another example of a project built by @kentcdodds on Remix.js that takes advantage of request replays as well as writing directly to the primary region over the Fly private network.

Depending on your app behavior, one or both of these techniques should be easy to plug in. Feel free to DM me here as well!

Joshua

1 Like

Hey @joshua - thanks for getting back to me!

We plan to deploy a few apps on Fly (all with regional autoscale):

Merchants:

  • Merchant GraphQL Relay - Node API - Prisma
  • Merchant Dashboard - React

Checkouts:

  • Checkouts GraphQL Relay - Node API - Prisma
  • Checkout Client - NextJS

DB:

  • Postgres Cluster

The Checkout Client will be a NextJS web server with cache that will auto scale up and down based on load and geo. The idea is to have the client deployed as close to the customer as possible for optimal page load times.

This being said, we would like these to also have have the Checkouts GraphQL API nearby as well, which will also need a local PG Read Replica nearby to prevent an overall bottle neck at the DB level (for reads).

I have seen a few posts/threads out there that have the prismaRead and prismaWrite concepts, which could easily be the solution, but I wanted to see what the team at Fly recommends for such a setup. It would be nice if there was no need to manually change each resolver to use the read vs write connection and behave similar to your Rails example.

Great, that helps a lot!

So the helpers in the node package linked above are middleware designed to hook into an Express server. This is functionally equivalent to how the Ruby gem operates. The logic is simple enough: POST requests or GET requests that throw write exceptions are replayed in the primary region. We wrote about this approach in detail in this blog post.

However, this is a bit harder to pull off in a generic way with GraphQL servers, since all requests are POSTs. But the natural separation between write mutations and read queries may be helpful here. If your framework has a way to inject some “middleware” - or anywhere pre-request - it should be possible to centralize logic to decide which requests should be replayed.

This is the first step. The gotchas exist in the space of reads that write. Like writes to statistics tables, or a last_seen_at field every time a user hits the app. These are good candidates for the prismaWrite concept.

If you could provide some example code using the frameworks you chose, we can work from there! Or feel free to DM and invite to a private repo if that works for you.

Joshua

1 Like

I will shoot you a DM to get you some code examples, but would love to keep this going in public as well for the rest of the community.

OK, first thing: which web server and/or API framework are you using with NodeJS?

Web Server: apollo-server
API Framework: graphql / graphql-relay

So the key is really knowing which requests are going to write to your database, and are therefore replayable. If, for example, you only write in mutations, the job is a lot simpler!

I see a few options with Apollo Server. I’ll need to dig in more to give you some code to work with. But here’s the general idea.

We’d want to be able to hook into the framework in two places: at the start of the request, and in cases where write exceptions from the database are raised.

For #1, we want to be able to stop request completely and return a HTTP 409 response with the Fly-Replay header. Ideally this happens as far up the stack as possible. For example, you could simply check if the body of the request contains mutation and replay it in your primary region. This might be good enough to start out!

One way to do this with Apollo Server would be to switch to using it as express middleware. Here we could pass in a middleware before Apollo Server that literally checks for mutation in the body. We could start here as an experiment.

Another way might be to create an Apollo Server plugin and hook into its requestDidStart event, or another event deeper in the stack should you need more context.

One more thing to think about is whether your UI will be fetching data immediately after it writes. If you do this, there’s a risk that the query will return stale data from the nearby replica. There are a few ways to deal with this, such as setting an expiring cookie to temporarily send queries to the primary region.

I hope this all makes sense. I’ll look at setting up an example with the express middleware and apollo server.

1 Like

Awesome! This sounds very interesting. I am definitely following and excited to see some examples :raised_hands:t2:

Hello, I didn’t read the full thread but this might give you some ideas.
We have a similar setup (Node - Nestjs - Prisma)
And what we do to access replicas (and avoid the magic with the fly-replay header), is we make 2 instances. 1 for write (always connects to our leading db) and 1 for read (connects to the current replica and if it’s not present, then it just falls back to the leading db).
This involves some code changes though, because you always need to specify which instance you want to use, this should be straightforward. I recommend wrapping these 2 instances into 1 class/object so you can just do databaseService.read.table.readingAction or databaseService.write.table.writingAction.
Hope it helps!
Also, here is a little utility to parse the current replica URL. (typescript)

private getRegionalDbUrl(
    dbUrl: string,
    region: string,
  ): { root: string; replica: string } {
    const parts = dbUrl.split('@');
    return {
      replica: `${parts[0]}@${region}.${parts[1]}`.replace('5432', '5433'),
      root: dbUrl,
    };
  }

I worked up a minimal example using Prisma and the Apollo Server stack. This example will catch write errors to Postgres in regions with read replica, then replay the request in the primary Postgres region.

This approach is generally way more efficient than writing to the database over the network, as the latter will suffer from latency. While this approach is naive, you could easily modify this to replay all mutations automatically, as mentioned above. I’ll be looking into ways to do that. For now, this may suffice for testing purposes.

What you’ll want to test properly is whether you see replication lag in your UI. For example, you might create a product and redirect a user to a ‘product view’ page. That product data may not have arrived in the replica. There are various ways to solve this, like setting a cookie. We can discuss as things progress.

There are no doubt other edge cases to consider, so do bring them up here.

1 Like

Woh! This is an amazing start :tada:

I will absolutely play with this. We would mostly be using this replication on the API that powers checkout pages, so the goal is loading the checkout page as fast as possible (typically this includes product information ie. line items) and possibly pre-filled information from the customer from a prior session) So this “should” eliminate some of the replication lag edge cases.

Possibly some issues with dynamic “calculation” fields within our nodes that run things like “total” on the order, if some line items have not been added to the replica this could return incorrect financial calculations possibly?

Thanks again!

1 Like

Generally the problems will occur when you want to RYOW (read your own write) from the client. It’s probably best to try out your app and see what works.