How do you improve DB performance in countries across the world

So with Fly I have been deploying to: sin, fra, and sjc with my db in SFO (with digitalocean currently). The issue I’m having is that requests in California are responding in 200ms where as from Germany and India are taking 2-3 seconds. I was wondering what people do in this situation? I think moving my db to fly might help, but I’m not sure how. Prisma doesn’t support read-only replicas so the same DB URL needs to have both read and write permissions. Here are the statistics from my tests:

Location Basic Endpoint (returns static JSON) Endpoint with SQL (3-4 queries)
California 195ms 210ms
India 489ms 2.82s
Frakfurt 233ms 2.31s

Checks are done by checklyhq.com

We have plumbing for this! It’s a little ragged, but you can get it to work with read replicas. Check this out, it’s using a Postgres cluster in Santiago Chile with read replicas elsewhere in the world. Try posting something and you’ll see the region change: https://rails-on-fly.fly.dev

This works by catching readonly errors and telling our proxy to replay those requests to Chile: rails-on-fly/application_controller.rb at main · fly-apps/rails-on-fly · GitHub

It should be possible to do this with prisma as well, assuming you can put a little express middleware in there. The Rails app is using the readonly replicas as its only database connection string in the other regions: rails-on-fly/fly.rb at main · fly-apps/rails-on-fly · GitHub

If you want to try this out, we can help you get a Postgres configured properly. It’s currently a little more difficult than it should be to setup a bunch of read regions but we really want this to be slick. Especially with Prisma!

I had no idea fly-replay existed. Is this documented somewhere? I assume the proxy picks up the header in the response, replays the exact request (with headers, body, etc.) in the given region, and then sends the response from that region instead of the region the request was originally directed to?

That’s exactly it! It’s not documented anywhere yet, we’ve been working with a couple of folks to solve the db problem multi region and it’s the best we have come up with.

3 Likes

Thanks! Currently trying to architect the best way to do this as well. I am not sure if it makes more sense to maintain two database connections for read/write or to just do a replay here on Fly.

The project we’re working on will be open source but we’re planning to use Fly for the managed/hosted version, so I’m really trying to avoid a lot of special changes to make things work on/off Fly in the application itself.

Our plan is to have libraries to make the fly bits almost seamless. What language is your app written in?

The problem we found with read/write connections in the app is that they’re still terribly slow cross region. Shipping the whole request over makes everyone’s n+1 queries quick.

The project is all Go.

We embed NATS and a distributed cache service. The only outside dependency right now is the database and we only plan to support Postgres right now.

Ah that sounds super cool! What postgres driver are you using? I can try and get you an example config this week. We want this whole process to be transparent, so we should be able to come up with something that works for your OSS version and opts in to Fly region handling.

I’m curious: what’s the benefit of this is over using direct read/write split support in Rails? I set things up this way with Fly is it works great, as long as the consul cluster stays up :laughing:

Of course it’s very cool that this is possible for any framework that doesn’t support the split.

1 Like

This is pretty cool, however seems like its a bit of effort that I don’t have the time (or expertise) for. I ended up switching to only using the SJC data center (+ cloudflare) and saw the Frankfurt drop down to 500ms

Direct read/write support works great for local datacenters. From what we’ve seen though, most apps don’t work well with high latency between app servers and the primary database. Requests with writes tend to do multiple queries, so even adding 50ms of latency stacks up. Our replay feature works around that, you’d still be better off with whatever your framework suggests in the primary region.

@nahtnam we’re working on minimizing the effort. When you’re ready, let us know and we’ll make it as easy as possible. If HTTP caching works for you, though, that’s great.

I see. Now I re-read the code, it looks like a pretty cool concept. I’ll give it a spin!

We use pgx, though they have no internal configuration for separate read/write configurations. We have a configuration option for separate read/write clients that we originally intended to use in Fly, though I think the replay will be the route we try to push for and we’ll likely remove the read/write configs.

Happy to work with you on this. I fully intend to implement the one-click Fly deploy for this whenever we are more stable (and it’s available for everyone to offer). We’re not trying to lock people into a managed service if they don’t want/need it. We hope to add value elsewhere as well.

It seems it would faster if you shutdown any frontend that doesn’t have a db nearby, even for users in India or EU.

That’s the plan! We’d like to add some magic here to scale app servers alongside DBs, and even bring up DB replicas in regions that get busy. Maybe not full serverless, but close!

1 Like

Thats exactly what I did!

Went from 3 seconds to < 1

1 Like

We need to document some of this. There are a few stages of “make apps faster” and I think we could have saved you some trouble just by writing them down!

1 Like

Sounds good, I likely wouldn’t switch until external connections are allowed, there is some sort of back-up system (or support to backup to a s3 bucket), and some sort of admin interface for the db. Quite a high bar but I honestly don’t have the ability to set up and maintain something that doesn’t have those features

We’re thinking through our managed offering now so it’s really helpful to hear your sticking points. Thanks for sharing.

You can do external connections now. Save the config file with flyctl config save, add a services section to expose port 5432, then deploy. We’ll add that to the docs.

We’ve also been doing volume snapshots but haven’t exposed it to the api or flyctl yet, should have more on that this week or next.

What do you think you’d want on an admin panel? RDS like control over the cluster? Or do you mean some visibility into the cluster state but generally it’s all managed for you like Heroku?

2 Likes