Multi-region postgres deployment and consistency

I have a question about what happens if I deploy a postgres cluster following the recommended approach: Multi-region PostgreSQL · Fly

Does fly ensure that write requests don’t complete until after it’s been propagated to every region? If not, how do we avoid inconsistencies? If a secondary region makes a write to the primary region and then immediately does a read of its closer region before the change has been propagated then it would get old data.

So my question is, does fly ensure writes don’t complete until propagation is completed? If not, how are inconsistencies avoided?

There are some notes on consistency here, the tldr is you might be able to just ignore the problem for many kinds of apps. Replication mostly happens before the subsequent HTTP request, and if the cost of stale data is low that’s probably good enough: Globally Distributed Postgres · Fly

Our Ruby gem “pins” people to the primary region for a few seconds after a write. @josh wrote the code for this, but the basic idea is that you set a cookie after a write. When new requests come in, we check to see if the cookie exists, and if it does we fly-replay to the primary region.

Yikes :grimacing: I didn’t consider this problem. In my app, immediately after a write happens, new reads will happen which means those reads will be out of date unless I put something in place like this. Seems like a great deal of complexity involved. I wish there were a more reliable way to know for sure whether the regional db was up-to-date with the latest write. How long does propagation typically take and are there any guarantees on how long it would take?

New reads happen in separate requests? Or write + read in the same request?

Replication is very fast in most cases, but there are no guarantees. Replicas can fall behind for a bunch of reasons (this is true even in a single datacenter, it was the biggest issue we had with overloaded databases at compose.com).

We should be able to make it transparent when we build an npm!

What kinds of writes is your app going to do? One other way to solve this is to increment a data version after you do a write, like:

UPDATE app_info SET data_version += 1

Then set that value in a cookie for the user. On each HTTP request in a replica region, you can just query and compare that version with the cookie to decide if they’re in a good spot or not.

New reads happen in separate requests. Having an app_info table seems like a reasonable solution I may pursue to avoid arbitrarily making non-primary-readers wait after writing. Just kind of a bummer there’s no way to make the write request not complete until propagation has finished. I think FaunaDB does this by default and I was hoping that could be the same here. I kind of assumed that was the case until I started thinking more about this.

Just a bunch of extra complexity I don’t want to worry about in my app… But we’ll see what it looks like when I’m done I guess.

1 Like

I wanted to see if I could trigger a consistency issue between requests, so I tweaked our Rails demo: https://fly-global-rails.fly.dev/

When you click “record results” it replays the request in Santiago Chile, stores the last submitted ID in a cookie, then redirects to a new URL to make a second HTTP request. It shows the cookie value + latest ID in the current database replica after a save now. I haven’t been able to get them to diverge yet. I’m going to see if I can simulate longer write lags to at least show how it might work.

1 Like

I think you were looking to credit Joshua Sierles, not me. :slight_smile:

1 Like

Dangit! @jsierles is who I meant. :wink:

You might get the synchronous replication behavior you describe using the Redis WAIT command. The tradeoff there is that if there is a consistent network issue going on, your app could stall on writes waiting for replicas to catch up.

EDIT: Sorry, I now noticed this post is about Postgres. The same idea applies, and it is possible to enable synchronous replication in Postgres.

2 Likes

That’s good to know. If I experience issues I’ll look into this :+1:

1 Like