postgres slowness is happening again

luizkowalski · August 23, 2022, 1:35pm

couple of months ago I noticed a certain slowness in my app. after lots of debugging, @jsierles pointed to top2.nearest.of being the root cause of the issue, most likely I should’ve been using top1. After updating it, the slowness was gone

This morning, I upgraded the fly-ruby gem to 0.4.1 and I noticed that the slowness is back, this is one of the logs I got from my app:

2022-08-23T13:14:03.222 app[5e53d40d] fra [info] [4b874cd4-d9b4-4176-aa2d-37b77aa987dd] {"method":"POST","path":"/m","format":"*/*","controller":"MessagesController","action":"create","status":200,"duration":10844.91,"view":4.36,"db":7785.25,"host":"sumiu.link","ip":"46.161.11.227","time":"2022-08-23T15:14:03+02:00"}

notice the db":7785.25. Over 7s on db processing. I tried to roll back the gem, thinking it could be the issue, and tried to scale up the machine to 512Mb (currently running on 256Mb) also to no success.

I haven’t changed anything, didn’t introduce any new feature nor do I have a spike of users coming.

are there any known issues with postgres at this time? what else could cause this

jsierles · August 23, 2022, 3:33pm

So rolling back the gem didn’t help here? This being a POST request, it should have been replayed to the primary instance. Do you have the PRIMARY_REGION env var set?

luizkowalski · August 23, 2022, 3:45pm

Nope, doesn’t look like the gem is at fault in here.

This being a POST request, it should have been replayed to the primary instance.

I’m hitting the primary instance always, both for read/write since I’m sitting in Berlin and the primary instance is fra.

Do you have the PRIMARY_REGION env var set?

yup, I’ve never changed that

jsierles · August 23, 2022, 3:58pm

And your DATABASE_URL is set to top1.nearest.of...? Can you try fly dig top1.nearest.of.pg-app-name.internal -a appname? Does the IP there match the IP of fly dig fra.pg-app-name.internal -a appname?

luizkowalski · August 23, 2022, 4:02pm

hmm, no, they match…sometimes…
running multiple times, they match like 90% of the time, that probably explains why now and then I see some requests go through real quick while other hang for 7+ seconds

jsierles · August 23, 2022, 4:03pm

Huh, that’s not good. For now, what you can do is set your URL hostname to postgres-app-name.internal. fly-ruby should automatically adjust the URL for the secondary region to point to the regional replica.

luizkowalski · August 23, 2022, 4:16pm

I didn’t see any difference tbh, time is still inconsistent, here are two requests:

db:1166.66

and

db:7788.06

no noticeable time difference here

jsierles · August 23, 2022, 4:39pm

Where is your secondary located? A last attempt could be to change the host do fra.pg-app-name.internal and only run in the primary region, to just test the primary.

luizkowalski · August 23, 2022, 4:44pm

secondary is located in gru

setting the db URL to fra.pg-app.internal worked

db":5.15"

seems like something is wrong with the replicas/routing

jsierles · August 23, 2022, 4:50pm

Could you share the org name where your pg app is located, or the name of that app? It would be helpful to see the health of the pg app.

If it’s flapping for some reason - particularly the primary - it would be normal for the IPs to change like that. You can get some status on the pg app with fly status -a pg-app-name and fly logs -a pg-app-name.

UPDATE: We found your app and are looking into what might cause this. For now, I’d recommend keeping the app running only in fra with the current hostname so visitors in gru don’t get slow reads.

luizkowalski · August 23, 2022, 5:03pm

the app name is “fatia-pizza”, apps connected to it are “sumiu-web” and “sumiu-worker”

kurt · August 23, 2022, 7:26pm

Ok this was an issue with our app mis-detecting ping times for your database instances, making the results of top1.nearest.of fail. It should be fixed now. We added a health check so we’ll get alerted if this ever happens again.

luizkowalski · August 23, 2022, 8:12pm

holy shit that’s why I love fly

all is working now! thanks a lot, you both

update: it is back again

luizkowalski · August 24, 2022, 9:28am

@kurt it is happening again

fly dig top1.nearest.of and fly dig fra.pg-app from sumiu-web are returning different ips as before

jsierles · August 24, 2022, 10:32am

Sorry about this. Is this still happening right now? I checked just now and it looks like the IPs are now resolving correctly.

luizkowalski · August 24, 2022, 10:39am

yup, still happening, the IPs are resolving correctly but the slowness is here. changing from top1.nearest.of to fra.pg-app didn’t help too, which is weird. scaling in postgres to only 1 (forcing it to have one and only one instance in fra) actually solved it…somehow seems like the requests are still going to gru

jsierles · August 24, 2022, 10:45am

Weird! Which version of fly-ruby are you using now?

luizkowalski · August 24, 2022, 10:46am

from github’s main branch, actually

jsierles · August 24, 2022, 10:48am

OK - and you still have PRIMARY_REGION set? I’d suggest perhaps disabling fly-ruby by unsetting PRIMARY_REGION, and switching back to fra.pgapp. Then scaling up the cluster. If requests in fra are fast, then we might suspect it’s fly-ruby related, though that seems unlikely here.

It’s worth also double checking that the correct URL is set on the FRA VM via fly ssh console.

luizkowalski · August 24, 2022, 11:07am

I triple-checked PRIMARY_REGION and it is set to fra on both instances (fra and gru). Disabling the middleware by removing it, works but then it kills the multi-region feature, everything is routed to fra (currently using VPN and setting it to Brazil and Argentina)

It’s worth also double checking that the correct URL is set on the FRA VM via fly ssh console .

I did that a hundred times already, cause I was thinking I might have screwed up and did something weird, I don’t know but the database URL is correct, tried a number of URLs:

postgres://fatia-pizza.internal:5432/sumiu_web
postgres://fra.fatia-pizza.internal:5432/sumiu_web
postgres://top1.nearest.of.fatia-pizza.internal:5432/sumiu_web

Topic		Replies	Views
I'm seeing slow ActiveRecord database interactions postgres	14	416	July 22, 2022
Fly Postgres Connections to Replicas Slow Questions / Help postgres , django	18	724	December 21, 2023
fly_postgres questions Phoenix	44	2044	November 17, 2022
Strange behavior on syd region Questions / Help elixir , postgres	13	484	March 2, 2022
Multi region database guide Phoenix	39	4848	September 4, 2023

postgres slowness is happening again

Related topics