Really long postgres query from same-region server

My site (currently hosted at kent.dev) is deployed to a bunch of regions all over the world. The node server and the postgres db are both deployed to each region.

Interestingly, sometimes my queries to postgres take a while to complete. More than 100-200ms (and in some cases way more, like I had one that was a full 1000ms a minute ago). It seems to only happen in the primary region:

2021-08-05T04:27:10.374213046Z app[486aa3bd] dfw [info] prisma:query - 249ms - SELECT "public"."Session"."id", "public"."Session"."createdAt", "public"."Session"."userId", "public"."Session"."expirationDate" FROM "public"."Session" WHERE "public"."Session"."id" = $1 LIMIT $2 OFFSET $3

In other regions it’s quite fast:

2021-08-05T04:26:48.645706448Z app[b2282ad8] ams [info] prisma:query - 2ms - SELECT "public"."Session"."id", "public"."Session"."createdAt", "public"."Session"."userId", "public"."Session"."expirationDate" FROM "public"."Session" WHERE "public"."Session"."id" = $1 LIMIT $2 OFFSET $3

I’ve added a console.log to make sure I’m connecting to the correctly region:

2021-08-05T04:17:54.730588448Z app[486aa3bd] dfw [info] Connecting to dfw.kcd-postgres.internal:5432
2021-08-05T04:18:06.340777890Z app[b2282ad8] ams [info] Connecting to ams.kcd-postgres.internal:5433

I’m not sure what I’m doing wrong here, but it’s making my site a lot more sluggish than I would like it to be :sweat_smile:

In case it’s useful, the source is here: GitHub - kentcdodds/remix-kentcdodds: My remix rewrite of kentcdodds.com. My fly app name is withered-frost-3196 and the db app name is kcd-postgres.

I’m at the end of my debugging abilities here :grimacing:

You’re not crazy, it’s slow. I’m almost positive this is my fault. We’ve been migrating disks to new hosts in DFW which was your original primary region, but it looks like the master is running in AMS now. We’ll get it moved back to DFW tomorrow morning, I’m sorry about that!

1 Like

Oh good to know. So does that mean that AMS will be this slow when you make the change? :grimacing:

It will if it hits the primary, but I’m not sure why it’s going all the way to the primary instead of reading from the local secondary.

I’m not sure I understand what’s happening. Can you explain it a bit?

We needed to migrate apps to new hardware in DFW this week. We do that without downtime for postgres apps by adding a new replica, waiting for data to sync, then turning off the old one. Your cluster only allows the primary in DFW, but it couldn’t elect one there and was stuck in read-only mode while it retried. We removed the constraint and the primary moved to AMS which restored writes while we investigate. Our resident PG expert is digging into this now – we’ll get it back in DFW once we can and fix whatever caused it.

1 Like

Great, thanks for explaining!

1 Like

This has been fixed :slight_smile: Queries are now nice and fast at >3ms :fire:

2 Likes

Doh! I typed up a reply on Friday to let you know but never hit send.

I’m glad it’s working now :smiley:

1 Like