I'm seeing slow ActiveRecord database interactions

luizkowalski · July 19, 2022, 8:00pm

Hey

I’m seeing a craaazy slow AR interactions on my app for a couple of hours. Here is what the app is doing:

signed_content = crypt.encrypt_and_sign(data[:content])
message = Message.create!(content: signed_content, password:, mid:, burn_after_read:)
Success(message:)

That’s pretty much it. but the request is consistently slow, taking 7, 10 seconds

...INFO -- : [92be833b-5d57-4f5f-bd86-b89362bdffcc] Completed 200 OK in 10765ms (Views: 4.3ms | ActiveRecord: 7749.6ms | Allocations: 21196)

I instrumented the app with Skylight like this

instrument("saving") do
  message = Message.create!(content: signed_content, password:, mid:, burn_after_read:)
end

and this is what the response looks like

the replication lag of my postgres app is weird too:

Could this be the cause of the slow AR queries?

luizkowalski · July 19, 2022, 8:09pm

Update: I scaled in the Postgres from 2 to 1, disabling the replication and the issue is gone…

Completed 200 OK in 256ms (Views: 0.6ms | ActiveRecord: 4.5ms | Allocations: 1960)

It is the replication lagging behind for sure.

Any idea how to fix it?

lubien · July 20, 2022, 2:10pm

Hi @luizkowalski

Can you send me the name of this app so we can take a look on those issues?

luizkowalski · July 20, 2022, 2:24pm

hey lubien

the app name is fatia-pizza-app and the pg app is called fatia-pizza.

just to update that I don’t see this issue anymore, but I do see a big repl. lag in the last 6 hours:

luizkowalski · July 21, 2022, 2:52pm

Hey @lubien, any news?

I’m still seeing problems now and then. up until now, it was all good, now the issue is back…

Also, I deployed with names, the app is now called sumiu-web while postgres hasn’t changed

catflydotio · July 21, 2022, 3:00pm

Hi @luizkowalski, it seems you’re running this app in both Frankfurt and São Paulo. Is it possible you have app instances trying to reach pg across the world? That would be really slow.

luizkowalski · July 21, 2022, 4:02pm

Unlikely for three reasons:

I’m currently in Berlin and I can see the app routing to fra
I have fly-ruby gem and I inspected the config on prod, looks right
the issue goes away when I scaled down the postgres to 1 replica

I’m not entirely sure but I would blame the replication lagging. does that make sense? could a lag be causing it?

jsierles · July 21, 2022, 5:28pm

It’s unlikely this would be caused by replication lag. That would lead to stale data, but not slow requests.

@luizkowalski Are you using puma or unicorn? If so, are you running more than one process per VM? On puma, this is usually set via WEB_CONCURRENCY which sets the workers in config/puma.rb. By default, I believe you’ll get 2 workers.

If that’s the case, you may be running into this known issue: Connection hijacking in secondary regions must take forking servers into account · Issue #12 · superfly/fly-ruby · GitHub

A quick fix is to set concurrency to 1. I have a fix slated for the core problem, but haven’t gotten around to it.

luizkowalski · July 21, 2022, 6:00pm

I will give it a try for sure, thanks a lot @jsierles!

I still wonder why is it that the problem goes away when I scale down the postgres app…

my problem here is that I can’t easily reproduce, sometimes I open the app and it is just slow, I then scale down the postgres app, and boom, the issue is gone, later I scale up and I don’t see the problem for hours until the cycle begins again.

anyway, I will tweak my puma config, maybe disabling workers for good until it is fixed

also, I think it is worth notice that I’m having this issue on my primary region (fra), not on a secondary region (gru)

kurt · July 21, 2022, 6:35pm

What does the DATABASE_URL in your environment look like? Is it something like top1.nearest.of.<db>.internal? Or is it just <db>.internal?

The second form will return any postgres instance. So your app in Frankfurt could be connecting to your DB in São Paulo, and then getting routed back to Frankfurt from there.

luizkowalski · July 21, 2022, 6:49pm

irb(main):001:0> ENV['DATABASE_URL']
=> "postgres://...@top2.nearest.of.fatia-pizza.internal:5432/sumiu_web"

this is what it looks like actually, in both instances but on gru I see that the fly-ruby does the job and routes to 5433

kurt · July 21, 2022, 6:50pm

Do you have two database nodes running in frankfurt or just one? If you only have one, that hostname is returning IPs from both regions.

luizkowalski · July 21, 2022, 7:44pm

ohh ok looks like we are getting somewhere…

this is what I have:

postgres:

VM Resources for fatia-pizza
        VM Size: shared-cpu-1x
      VM Memory: 256 MB
          Count: 2
 Max Per Region: Not set

these are the allocations

and two volumes

this morning I recreate the web app and I connected the postgres to it running fly pg attach --postgres-app fatia-pizza -a sumiu-web

this is where the URL came from

should it be top1?

kurt · July 21, 2022, 9:35pm

Yeah, top1 will work better for you. We deploy 2 Postgres instances in your primary region for write redundancy, so we do top2 by default. It’s actually best to run 2x fra DBs if you can. Changing to top1 should fix the problem right now though.

luizkowalski · July 22, 2022, 9:37am

If I understand what you said, I should add a new “fra” volume?

anyway, you said “we do top2 by default”. why is that exactly? I’m asking because the docs don’t mention “top2”, it is always “top1” and I got this URL by attaching postgres to my app, so fly actually set “top2”.

when you say “top1 will work better for you” I wonder if there are cases where top2 is better? if so, when?

sorry, lots of questions hahaha trying to get to the bottom of it

Topic		Replies	Views
Slow Postgres performance? Build debugging postgres , rails , ruby	8	792	April 19, 2023
Increasing replication lag on Postgres (single region) Questions / Help postgres	2	374	November 23, 2021
Experiencing very slow response times Questions / Help postgres	10	1631	October 5, 2024
Fly Postgres Connections Extremely Slow from Django Questions / Help postgres , django	7	817	August 1, 2023
postgres slowness is happening again	41	592	August 24, 2022

I'm seeing slow ActiveRecord database interactions

Related topics