I noticed that a simple POST request to my server in NRT (I’m in south korea) is experiencing significantly more delay than my expectation so I tried traceroute.
traceroute to my-app.fly.dev, 64 hops max, 52 byte packets
[first 7 hops redacted - local network]
8 112.174.80.174 163.839 ms 162.803 ms 163.766 ms
9 te-0-11-0-3-6-pe01.seattle.wa.ibone.comcast.net (66.208.228.45) 169.728 ms 151.755 ms 166.945 ms
10 be-2301-cs03.seattle.wa.ibone.comcast.net (96.110.39.225) 164.305 ms
be-2201-cs02.seattle.wa.ibone.comcast.net (96.110.39.205) 145.163 ms
be-2401-cs04.seattle.wa.ibone.comcast.net (96.110.39.229) 145.101 ms
11 be-2413-pe13.seattle.wa.ibone.comcast.net (96.110.44.94) 143.716 ms
be-2313-pe13.seattle.wa.ibone.comcast.net (96.110.44.90) 119.933 ms
be-2113-pe13.seattle.wa.ibone.comcast.net (96.110.44.82) 141.594 ms
12 96-87-9-102-static.hfc.comcastbusiness.net (96.87.9.102) 136.545 ms 162.770 ms 158.224 ms
I’m not an expert at this but it seems its hopping to seattle for some reason.
The domain is registered on cloudflare and used on fly single region app in nrt.
Is there something I can do on fly.io to make the hop route better?
This route wasn’t making much sense routing wise. I tried to cross check and latest trace I see from Korea Telecom AS4766 is:
traceroute to 77.83.140.34 (77.83.140.34), 20 hops max, 60 byte packets
1 192.168.0.1 (192.168.0.1) 0.425 ms 0.448 ms
2 * *
3 112.188.61.105 (112.188.61.105) 2.290 ms 2.308 ms
4 112.188.53.29 (112.188.53.29) 2.231 ms 2.249 ms
5 112.174.47.49 (112.174.47.49) 9.314 ms 9.332 ms
6 112.174.86.154 (112.174.86.154) 9.571 ms 9.587 ms
7 63-222-57-229.static.as3491.net (63.222.57.229) 34.190 ms 34.187 ms
8 Hu0-0-1-0.br06.tok02.as3491.net (63.218.250.22) 33.734 ms 33.787 ms
9 * *
10 * *
11 103.84.154.10 (103.84.154.10) 38.008 ms 37.968 ms
12 77.83.140.34 (77.83.140.34) 36.826 ms *
It seems fine now. Can you please re-check?
May be you tested when PCCW AS3491 had a broken connectivity to either side: Korea Telecom AS4766 or NETACTUATE AS36236 (upstream in Japan for fly.io).
Let’s cross check for AS3491 since they feed route-collectors (RIPE RIS RRC01/19/23 with their full table) - This shows prefix has been stable & not much w.r.t AS3491 side. So only guess here is something inside AS4766 triggred it. Cannot be sure since it’s full routes are visible at a collector.
This was brought up while investigating why 0.2-0.5% of my users requests have either been timing out or have unusually long response time, despite my server side logs being fairly consistent without spikes.
I’m fairly new to this topic, is there anything I can do on my side to better point the domain on fly by any chance? Currently my app is only available in East Asia and does get affected by delays over 100-150ms.
Internet at large can be wild. These issues can come anywhere though are more common in Asia because generally in US and EU you will find more small to mid sized networks peered with each other. And incase when not, traffic goes via their upstream (often a tier 1 network). And except one known exception at this point, all transit free networks peer with each other and hence indirect paths are not that long. In Asia however many large backbones are known not to be connected and hence impact of any failures on primary paths can cause traffic to go all the way to US.
I’m fairly new to this topic, is there anything I can do on my side to better point the domain on fly by any chance? Currently my app is only available in East Asia and does get affected by delays over 100-150ms.
Not much unfortunately. You can steer your own traffic a bit by way of using other networks/overlays to reach a destination (when you know routing is bad) but for end users hitting your app, your app has to be reachable with decent latency. Your best case would be when fly launches local region near you which will have less changes of bad routing.