My app is suddenly running very slowly, but it appears to be a slow connection rather than a problem with the app itself.
A little context, when I first deployed the app to Fly it would respond in <100ms from my location (UK) with the app deployed in fra. However, now it seems to be >600ms at best. This was a sudden change and was not related to any deployment.
I have quite extensive tracing in my app and can see that it is running as quickly as ever. I’m fairly confident it’s not the app.
This has been happening for at least a few days now.
I found this URL from another thread: https://debug.fly.dev which shows me connecting to the syd region. I don’t know how reliable this link is, but it does somewhat explain the slow responses from my app.
Is anyone experiencing the same at the moment? We’re building up to launch and the response times at the moment are going to be a real killer for us.
Yes I also experienced this about 4 hours ago, related to the slow API issue on fly. It was resolved a few hours ago but maybe the fix is still propagating to your region. Or maybe it regressed in your region.
oh, suddenly sorta implies it’s happening recently. Anyways… look at your memory usage, possible memory leak can cause systems to grind to a halt when capacity is at 80%+
It’s a rust app and currently sits below 100MB usage. Restarting the machine doesn’t make a difference. Tracing shows that the app responds in ~20ms for most operations, with the slowest being ~200ms.
I used suddenly in the sense that it wasn’t a gradual change. One day it was just slow!
Sounds like a proxy issue then? You say you get +500ms latency but according to Global Ping Statistics - WonderNetwork London => Sydney round trip is under 300ms. Where is the extra 200ms coming from.
Just to jump in I would suspect this is the cause:
I found this URL from another thread: https://debug.fly.dev which shows me connecting to the syd region. I don’t know how reliable this link is, but it does somewhat explain the slow responses from my app.
I think Fly have that app in all regions. When you request it it should slow which region it thinks is closest, which won’t always be the one that’s actually closest. Sometimes routing goes wrong somewhere along the line, likely with the ISP. I just tried that app and I was served from lhr, so I got 50-60ms latency. If your requests are being wrongly routed thousands of miles away, that would certainly explain the delay. And why it randomly happened, not due to anything you did. They’ll probably need a traceroute to debug.
Yeah, it seems like it was a problem with IPv6. We use an AAAA record to connect to our app. Grafana also showed my traffic going through syd so it was a routing issue.
Fly has sorted it on their end with their ISP, and everything is back to the speed we expect (<100ms for most endpoints).
Fly’s support has been amazing, it must be said. They responded within minutes of my support ticket, and the issue was resolved within a couple of hours. Very impressed!
It’s been fixed now and it was a routing issue in Fly’s network (specifically IPv6). Support has fixed the issue with their ISP and everything is back to running at the speeds we expect!
I think the 600ms+ responses were due to the traffic being routed like this: me > syd > fra > syd > me.
I’ve been noticing 1-2 second response times lately on some very simple endpoints. How did you debug that you were seeing traffic being routed like “me > syd > fra > syd > me”? About half of the roundtrip time is the TLS handshake…struggling to figure out if this is normal or not
Thanks, I’ll give that a shot. In the mean time (for Fly support if they see this). Something that’s odd is that on my staging site, I have a machine running in iad but my handshakes are occurring in sin?
For me, I went to https://debug.fly.dev and it showed the region as being syd instead of lhr as I was expecting. In my case it was an ISP issue downstream of fly on ipv6.
@khuezy here is my traceroute, I need to check to see what I’m looking at exactly but in the meantime…
traceroute to foundry-staging-trpc-web.fly.dev (66.241.124.45), 64 hops max, 40 byte packets
1 192.168.1.254 (192.168.1.254) 1.331 ms 1.111 ms 0.970 ms
2 172.8.144.1 (172.8.144.1) 1.939 ms 2.179 ms 1.965 ms
3 64.148.105.112 (64.148.105.112) 2.011 ms 3.690 ms 2.422 ms
4 12.243.128.102 (12.243.128.102) 10.878 ms 8.497 ms 7.964 ms
5 12.122.128.101 (12.122.128.101) 12.017 ms 6.073 ms 3.879 ms
6 192.205.37.26 (192.205.37.26) 10.925 ms 4.808 ms 4.303 ms
7 96.110.44.157 (96.110.44.157) 12.777 ms
96.110.44.153 (96.110.44.153) 11.958 ms
96.110.44.149 (96.110.44.149) 11.770 ms
8 96.110.33.66 (96.110.33.66) 11.479 ms
96.110.33.78 (96.110.33.78) 12.501 ms
96.110.33.74 (96.110.33.74) 12.422 ms
9 75.149.231.130 (75.149.231.130) 11.240 ms 4.684 ms 3.536 ms
10 * * *
See anything odd here? debug.fly.dev does indeed show my region as sin even though I am right next to lax so that seems strange. Is this something I need to flag to Fly support to fix as a downstream ISP issue as @tdwells90 had done?
EDIT
my A/AAAA are through cloudflare so here is the traceroute using my actual domain that’s pointed to my fly IPv4/IPv6 addresses
traceroute to trpc-web.foundry-staging.xyz (66.241.124.45), 64 hops max, 40 byte packets
1 dsldevice (192.168.1.254) 1.132 ms 0.871 ms 0.825 ms
2 172-8-144-1.lightspeed.irvnca.sbcglobal.net (172.8.144.1) 2.017 ms 1.943 ms 1.853 ms
3 64.148.105.112 (64.148.105.112) 1.791 ms 2.527 ms 2.055 ms
4 12.243.128.102 (12.243.128.102) 8.360 ms 8.677 ms 8.385 ms
5 ggr2.la2ca.ip.att.net (12.122.128.101) 4.958 ms 6.214 ms 6.481 ms
6 192.205.37.26 (192.205.37.26) 5.243 ms 3.789 ms 3.495 ms
7 be-3302-cs03.losangeles.ca.ibone.comcast.net (96.110.44.153) 4.302 ms
be-3402-cs04.losangeles.ca.ibone.comcast.net (96.110.44.157) 5.050 ms
be-3102-cs01.losangeles.ca.ibone.comcast.net (96.110.44.145) 12.157 ms
8 be-3212-pe12.600wseventh.ca.ibone.comcast.net (96.110.33.70) 12.516 ms
be-3312-pe12.600wseventh.ca.ibone.comcast.net (96.110.33.74) 4.418 ms
be-3412-pe12.600wseventh.ca.ibone.comcast.net (96.110.33.78) 4.177 ms
9 75.149.231.130 (75.149.231.130) 4.149 ms 3.731 ms 3.773 ms
10 *^C
It looked like you aborted before the traceroute completed. Weird how your routing to Illinois in the first one. Something’s fishy going on in the routing with it bouncing around so many times. You’ll need to wait for a Fly dev to investigate.