I recently set up a Fly app and attached a Postgres DATABASE_URL with the fly pg attach command. I also created a 2 node cluster of the database, with 1 leader and 1 replica. The app was deployed to the same two regions. The automatically assigned DATABASE_URL was using the hostname “top2.nearest.of.myapp.internal”.
However, I noticed that this connection always connected to the worst (furthest away) replica instead of the closest one, i.e. the top 2 with 2 replicas. This resulted in super slow queries that affected the performance of my app.
After some investigation, I found that setting the DATABASE_URL to “top1.nearest.of.myapp.internal” solved the issue. This way, the connection always connected to the closest replica, reducing the round trip time and improving the app’s performance.
Why was the automatically assigned URL the top2 one and not top1? Maybe I am misunderstanding something.
Hi! This is an interesting question. Thanks for bringing this up!
First of all, does your DATABASE_URL have you connecting to port 5432 or to port 5433? Connections made to port 5432 on any of your PG instances, whether leader or replica, will be proxied to the leader. As a result, writes to the database will “just work,” but the downside is that if the PG leader is in a different region, it will always be slow if you use port 5432. Connecting to port 5433 will go directly to the instance you’re connecting to, so if it’s in the same region, it should be fast. On the other hand, that instance might be read-only, so your application will need to have a way to redirect writes to the leader. You can read more about all this here.
Regarding the default of top2: I asked internally, and this is to ensure that even if the closest PG instance is down, your app can still connect to a different instance. A DNS request for top2.nearest.of.<database-app>.internal will return the addresses of the two closest PG instances, in order. If your particular application/programming language/DB library/framework always tries the first address returned first, that will be the closest one, and it will work as you expect. But if it does something more sophisticated (like choosing between the two randomly), then it might sometimes choose the further replica.
It’s also worth noting that your configuration isn’t what we generally recommend. Fly PG was built for 2+ instances in the primary region, with one replica in each remote region. In particular, the leader must be in the primary region, so for redundancy you’ll want to have at least two instances there. (The link above has some info on this.)
If you want to troubleshoot your specific situation further, it’d be helpful to have a little more information on what you were seeing. (E.g. were you observing very slow queries from both of your app instances, or just one? How are you checking which PG instance is receiving connections? Also, can you check whether your database is running on the machines platform?)