Data replication delay solutions

I’ve noticed both with postgres and sqlite litefs that sometimes the application in a non-primary region can show old data after a mutation because there’s a slight delay in the data propagation. Does anyone have a good solution to this?

I deployed the LiteFS getting started demo app and created a video to demonstrate what I mean:

Tips welcome

you can use the same fly-replay header to redirect their request to the primary region: docs. This requires a little more work on the application, which is the preferred option if the application only has a very small portion of the data required to be fully synchronized.

There are a few ways to handle it. The simplest is to set a timestamp in a cookie on the client after a write and then redirect its read requests to the primary for a few seconds. Propagation is typically subsecond so this handles most cases.

For LiteFS, it provides a replication position file for each database. If your database is called db then the position file is called db-pos. If you read this file it’ll give you the transaction ID & the database checksum formatted as slash-separated hex-formatted uint64 numbers (e.g. %016x/%016x). The first number (TXID) is monotonically incrementing so you can check it after the write on the primary and then have your replica wait until it catches up to that number.

For Postgres, it has a replication position as well called the Log Sequence Number (LSN). You can track the replica’s position with that number. I’m sure there’s other folks that have more expertise on that than me though.

1 Like