Wrong automatically set DATABASE_URL resulted in poor performance

kombustor · March 11, 2023, 5:27pm

Hello everyone,

I recently set up a Fly app and attached a Postgres DATABASE_URL with the fly pg attach command. I also created a 2 node cluster of the database, with 1 leader and 1 replica. The app was deployed to the same two regions. The automatically assigned DATABASE_URL was using the hostname “top2.nearest.of.myapp.internal”.

However, I noticed that this connection always connected to the worst (furthest away) replica instead of the closest one, i.e. the top 2 with 2 replicas. This resulted in super slow queries that affected the performance of my app.

After some investigation, I found that setting the DATABASE_URL to “top1.nearest.of.myapp.internal” solved the issue. This way, the connection always connected to the closest replica, reducing the round trip time and improving the app’s performance.

Why was the automatically assigned URL the top2 one and not top1? Maybe I am misunderstanding something.

Best regards

MatthewIngwersen · March 13, 2023, 5:22pm

Hi! This is an interesting question. Thanks for bringing this up!

First of all, does your DATABASE_URL have you connecting to port 5432 or to port 5433? Connections made to port 5432 on any of your PG instances, whether leader or replica, will be proxied to the leader. As a result, writes to the database will “just work,” but the downside is that if the PG leader is in a different region, it will always be slow if you use port 5432. Connecting to port 5433 will go directly to the instance you’re connecting to, so if it’s in the same region, it should be fast. On the other hand, that instance might be read-only, so your application will need to have a way to redirect writes to the leader. You can read more about all this here.

Regarding the default of top2: I asked internally, and this is to ensure that even if the closest PG instance is down, your app can still connect to a different instance. A DNS request for top2.nearest.of.<database-app>.internal will return the addresses of the two closest PG instances, in order. If your particular application/programming language/DB library/framework always tries the first address returned first, that will be the closest one, and it will work as you expect. But if it does something more sophisticated (like choosing between the two randomly), then it might sometimes choose the further replica.

It’s also worth noting that your configuration isn’t what we generally recommend. Fly PG was built for 2+ instances in the primary region, with one replica in each remote region. In particular, the leader must be in the primary region, so for redundancy you’ll want to have at least two instances there. (The link above has some info on this.)

If you want to troubleshoot your specific situation further, it’d be helpful to have a little more information on what you were seeing. (E.g. were you observing very slow queries from both of your app instances, or just one? How are you checking which PG instance is receiving connections? Also, can you check whether your database is running on the machines platform?)

system · March 20, 2023, 5:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No access to DATABASE_URL hostname	3	149	April 2, 2024
Database connection problem Questions / Help	4	377	May 27, 2022
What hostname to use for postgres to use stolon proxy Questions / Help elixir , postgres	0	280	October 26, 2022
Client Side Postgres High Availability Connection Tips how-to	0	1271	November 22, 2021
Retrieving direct DATABASE_URL after losing it (without internal host?) Questions / Help postgres	4	391	October 3, 2023

Wrong automatically set DATABASE_URL resulted in poor performance

Related topics