Postgres replication seems to fail

Hello,
We run a globally scaled image hosting service and we recently noticed that if someone from Syndey uploads something, it will appear in the lead database, but it won’t appear across the replicas (so the user won’t see it on their dashboard). Internally our backend uses 2 Prisma (ORM) instances, one for read and one for write. Write always connects to the leading cluster in Amsterdam and read connects to the current region’s replica (this eliminates the need for the fly-replay header, if I understood the docs correctly).
Has anyone experienced an issue similar to this and if so how did you solve it?
Thanks in advance

How large are these uploads? And how exactly are you writing to the primary database?

If it’s a relatively large file, there’s probably just some replication lag between when it gets written and when the replica is up to date. Our Ruby library handles this by sending requests to the primary region for 5s after a write, but there are other techniques that could work too.

If you run fly checks list -a <postgres-name> you can see replication lag check status, too.

-How large are these uploads?
The uploads are being uploaded to AWS, before the write request to the db even begins. (but most of them are a few kilobytes, mostly screenshots)
-And how exactly are you writing to the primary database?
Not sure what you mean here, here is a link to Prisma for reference https://www.prisma.io/
-If you run…
Didn’t know about that, thank you.

Oh I missed the note about read/write connections on Prisma. This is a hard problem, but the typical trick is to read from the write connection after an upload.

Does this clear up after a few moments? Also, silly question, but is the sydney instance of your database healthy? fly status should show you, at least.

Assuming it’s healthy and the data shows up after a refresh, you’ll need to build some logic to work around replication lag.

The fly-replay approach solves this for most people. It would be worth experimenting with unless you have a reason to do writes directly to postgres over a long distance.

-Does this clear up after a few moments?
I just checked by switching to a VPN in Melbourne (I’m in Hungary, so I’m connected to the leading cluster), and no it doesn’t, no data seems to replicate.

  • Is the sydney instance of your database healthy
    Yes
  • The fly-replay approach solves this for most people.
    Main problem is the fact that Prisma abstracts the database logic, and node doesn’t even connect to the DB, the underlying Rust engine does, but if there is no other solution, I might try to figure out something.

We have the beginnings of a node library that makes handling fly-replay almost seamless. Are you using Express or Fastify: Fly PG Read Replica Multi-Region Clusters with Prisma / Node - #4 by joshua

I’m going to look at your database to see what’s up. Replication should not be failing unless the DB instance is failing somehow (and you’d see health checks for that).

  • Are you using Express or Fastify
    We are using Nestjs with the Fastify adapter.
  • I’m going to look at your database to see what’s up.
    Thank you
  • We have the beginnings of a node library that makes handling fly-replay almost seamless.
    Looking forward to use/contribute

@wowjesus I noticed that your app is running an older image that may be contributing to the health check failures i’m seeing. Do you mind if I push through an update to get your app on our latest image? The latest image also contains improved replication lag checks, which would be useful to reference in this case.

Sorry for the delay in the reply I took a quick nap and no I absolutely don’t, thank you!

No problem at all! I went ahead and pushed through the image update and looks like it cleared the failing health checks. You should now see the new replication lag health checks by running fly checks list.

Also, you can get a more detailed view of the state of replication by running select * from pg_stat_replication; against master.

Hope that helps!

Thank you so much, the issues seems to be solved, appreciate the fast replies and help!

Also one more question, where can I see if a new version of the pg image gets pushed?
Github?

Also one more question, where can I see if a new version of the pg image gets pushed?
Github?

That’s a great question. While you could monitor our Github repo for new releases, there’s currently no way for you to know which version you’re currently on. That being said, this is a problem that I am actively working on and i’m hoping to have this addressed within the next week or two. So be on the lookout. :slight_smile:

1 Like

I will, thank you.