Slow response times sometimes in the last few days

chrismccord · February 4, 2022, 9:23pm

At this point it would be really helpful to start profiling your application to understand where the request time is being spent. The only possible network issues from your description of the code would be app to database, but given same region it is unlikely for that to be causing such a large backup.

Can you define “many change events at once”? Are we talking dozens coming in at the same time? Hundreds? thousands? If it’s the database pool backing up because you have a thundering herd problem, that would be helpful to pinpoint, but profiling ecto would let us know where time is being spent. If all you’re really doing is a couple reads and conditional write, then the rest of the phoenix stack wouldn’t be blocking on anything. Having said that, please rule out any other plugs that could be talking to an external service, or hitting a single process bottleneck. Again, ecto profiling would help us rule that out if we see Ecto is the place where the time is being spent. Monitoring the VM processes ordered my message queue size would also rule out a single process bottleneck in your system. The LiveDashboard is showing this stuff, but only if you happen to catch it (all data is ephemeral)

You can see the telemetry events that Ecto dispatches here:
https://hexdocs.pm/ecto/Ecto.Repo.html#module-adapter-specific-events

You could also wrap whatever your main operations are in spans and log the times:

More information will help narrow down what is happening.

mathiasn · February 4, 2022, 10:05pm

Thank you @chrismccord.
Actually I just started running ab against a URL which does not even do sth with the database (and redirects as there is no route): ab -n 100 -c 10 http://127.0.0.1:4000/myhandler.

Locally it’s not a problem at all:

Concurrency Level:      10
Time taken for tests:   3.192 seconds
Complete requests:      100
Failed requests:        0
Non-2xx responses:      100
Total transferred:      62100 bytes
HTML transferred:       15600 bytes
Requests per second:    31.33 [#/sec] (mean)
Time per request:       319.229 [ms] (mean)
Time per request:       31.923 [ms] (mean, across all concurrent requests)
Transfer rate:          19.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       1
Processing:   170  299  52.2    287     442
Waiting:      168  298  52.2    287     442
Total:        170  299  52.1    288     442

But on Fly with three instances:

Concurrency Level:      10
Time taken for tests:   17.385 seconds
Complete requests:      100
Failed requests:        0
Non-2xx responses:      100
Total transferred:      60200 bytes
HTML transferred:       7800 bytes
Requests per second:    5.75 [#/sec] (mean)
Time per request:       1738.490 [ms] (mean)
Time per request:       173.849 [ms] (mean, across all concurrent requests)
Transfer rate:          3.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       68  122  57.4     95     289
Processing:    23 1561 1542.3   1750    4398
Waiting:       22 1560 1542.5   1749    4398
Total:         94 1683 1532.3   1859    4516

I’ll check what I can find out on the machine itself and on the Live Dashboard. And as you suggested checking which plugs are involved.

mathiasn · February 4, 2022, 11:24pm

Found it. It was the log backend towards LogDNA (ex_logdna). Looks like every log line was sent synchronously.

So yeah, looks like I did some wrong assumptions on my way debugging it. Thanks a lot for your help!

kurt · February 5, 2022, 10:41pm

Wow nice catch!

Topic		Replies	Views
Certain instances respond slowly Questions / Help	3	194	July 7, 2023
Sudden decrease in throughput, no recent changes Questions / Help elixir	13	521	October 21, 2022
App suddenly really slow Questions / Help lhr , proxy	18	176	September 23, 2024
Very slow app response times Questions / Help machines	6	132	November 2, 2024
App was down for 20 minutes in the middle of the night and then restarted. How to investigate?	2	398	January 6, 2022

Slow response times sometimes in the last few days

Related topics