Is your DATABASE_URL host configured to top1.nearest.of.yourapp-db.internal ? It sounds like your reads are being routed back to the primary instead of hitting the replica. You can see your full (secret)DATABASE_URL by using fly ssh console, and $ export | grep DATABASE_URL
I think you’re right that it’s connecting to the wrong postgres, but I don’t think DNS is the problem. You can test DNS after you’ve SSHed in by running:
One potential issue is that top2 returns 2 IP addresses. The second one is not very close to the app server. I don’t think the underlying Erlang bits use the second IP address, but it’s possible!
Our Elixir package should probably use top1, I can’t remember why I encouraged top2. I’m pretty sure this isn’t the problem, though.
We’ll see if we can think of anything else. You could get this behavior if the Fly.Postgres configuration isn’t applied properly. Can you post your Ecto config here?
When I was first starting with the fly_postgres library, I would sometimes get different DB connection times. It was stupid frustrating and didn’t make sense. This was back before we had the top2.nearest.of. DNS to help direct it. It was also before I added the specific region to connect to in the DNS… which we’ve since moved away from.
My app was randomly connecting to a PG instance that wasn’t necessarily close. Sometimes it was the close one, sometimes it wasn’t. This reminds me of that. So I am wondering about the top2 part and wondering if top1 is a more reliable option.