Slow response times?

heimann · October 20, 2021, 10:16pm

I’m seeing intermittent really slow response times for our application deployed on EWR and also having a hard time connecting to https://fly.io, but the status page says all systems are operational? Is this an error just with fly on my end? Any suggestions how to debug?

kurt · October 20, 2021, 10:19pm

Will you try this with curl?

curl -v -o /dev/null -sS https://<url>

I’m curious if it’s the TLS handshake or something else.

heimann · October 20, 2021, 10:21pm

Getting 200’s just fine that way. It’s really hard to pin this down because it’s not like the app is inaccessible or consistently just slow, but if you look at: https://c1255139-2947-4a05-98dc-6ca56ddda3d5.site.hbuptime.com/ see those response time humps earlier today, and ongoing right now?

kurt · October 20, 2021, 10:22pm

Do you know which region it’s actually hitting by chance? If you visit debug.fly.dev you’ll see a FLY_REGION header. Your app is in ewr, but connections could be happening to another location.

heimann · October 20, 2021, 10:23pm

That page says LGA

heimann · October 20, 2021, 10:24pm

and EWR

kurt · October 20, 2021, 10:35pm

It seems like the app itself might be responding slowly:

As far as we can tell, there’s no performance issue caused by the load balancing. When you had problems connecting to fly.io, what was happening?

heimann · October 20, 2021, 10:38pm

It just timed out to a blank html page eventually. It’s happening again for me right now (https://fly.io that is)

heimann · October 20, 2021, 10:41pm

I typed both URLs into my browser, and it took about 15 seconds for both app.ressemble.com and fly.io to load, and they eventually loaded at the same time…

Is it possible there’s some routing error here that’s local to my geographic area?

That screenshot you posted has response times in the 40ms range yeah? That would be fine. We’re talking 10s plus to load a page here.

kurt · October 20, 2021, 10:42pm

That screenshot is actually 40 seconds, not ms. Definitely very slow responses.

Will you run a traceroute to fly.io and paste the output?

heimann · October 20, 2021, 10:56pm

Yeah 40 seconds is definitely not acceptable

traceroute -I fly.io
 1  192.168.1.1 (192.168.1.1)  2.136 ms  1.670 ms *
 2  10.240.162.101 (10.240.162.101)  8.520 ms  6.838 ms  9.678 ms
 3  67.59.235.58 (67.59.235.58)  13.085 ms  12.833 ms  12.485 ms
 4  ool-4353dd18.dyn.optonline.net (67.83.221.24)  9.627 ms  21.917 ms  15.457 ms
 5  451be060.cst.lightpath.net (65.19.99.96)  14.415 ms  14.794 ms  17.509 ms
 6  64.15.2.94 (64.15.2.94)  14.811 ms  16.067 ms  14.577 ms
 7  * * *
 8  zayo.ntt.ter1.ewr1.us.zip.zayo.com (64.125.15.85)  13.353 ms  14.623 ms  21.722 ms
 9  ae-1.r20.nwrknj03.us.bb.gin.ntt.net (129.250.6.52)  27.864 ms  16.011 ms  19.263 ms
10  ae-0.a01.nycmny17.us.bb.gin.ntt.net (129.250.3.153)  13.421 ms  12.675 ms  17.250 ms
11  * * *
12  * * *
13  213.188.199.153 (213.188.199.153)  13.433 ms  18.703 ms  12.175 ms

heimann · October 20, 2021, 10:57pm

(Traceroute over UDP still hasn’t finished, just getting an endless list of * * * assume it’ll go to 64 and then stop)

kurt · October 20, 2021, 11:03pm

That traceroute looks fine. It’s pretty weird that you’re getting slow responses from fly.io, though. If you get another blank page will you pop open the web inspector and see what the network tab says about it?

The slow response times in that graph are actually at the app level. I don’t think they’re related to us (it would be hard for us to slow that particular metric down), but the other things you’re seeing are suspicious.

heimann · October 20, 2021, 11:04pm

Well, the app’s database is also on Fly in the same region, so what I was guessing was that the app wouldn’t respond until it got a response back from the database (which could cause that graph no?)

heimann · October 20, 2021, 11:10pm

Response times are looking back to normal now. Very odd and hard for me to grok why a Phoenix app would just start being slow (intermittently!) for a period of time one day then stop.

kurt · October 20, 2021, 11:11pm

It is unlikely that the delay is between the app and the database. Not impossible, but networking in region is as simple as it gets. The complexity is all between you → our proxy → app.

We still haven’t found anything to indicate what the problem is, though. Still looking!

heimann · October 20, 2021, 11:12pm

Thank you! Any idea if there’s other monitoring I could enable on my end to see if it’s for sure the app itself? (probably out of scope but worth asking )

kurt · October 20, 2021, 11:14pm

The simplest thing to do is expose your own Prometheus metrics to our scraper, then hook up a grafana dashboard. It’ll let you see what Phoenix itself is doing (our metrics are only what our proxy observes).

qiming · October 20, 2021, 11:47pm

We heard from multiple customers today that response times were extraordinarily slow as well. Our NodeJS monitoring doesn’t record any anomalous response times, but the Fly.io metrics do show high latency.

kurt · October 22, 2021, 2:46pm

We tracked down a bottleneck that might have been the cause of some of these. It’s hard to tell app by app, but we have found latency spikes between our edge proxies and app worker servers that run app VMs. These should have improved substantially over the last day.

If you’re getting weird latency spikes, check your app config concurrency setting to make sure it has type = "requests" defined like this:

  [services.concurrency]
    hard_limit = 500
    soft_limit = 250
    type = "requests"

If there’s no type defined, or it’s set to connections, your requests might get stuck waiting for a whole new TCP connection. requests will reuse an existing connection pool, which should help mitigate this problem.

Topic		Replies	Views
Experiencing massive slow responses on both Fly.io and app hosted on it	6	569	March 16, 2023
Very slow app response times Questions / Help machines	6	146	November 2, 2024
Abnormally slow SSL handshake resulting in slow server responses across Fly apps? Questions / Help	4	163	September 23, 2024
Is Fly.io slower today?	26	930	September 1, 2022
slow requests during deploys	14	990	September 2, 2021

Slow response times?

Related topics