Being routed to very far edge instance

Hi, for the past couple weeks, communicating with my apps has been very slow. I’ve noticed that even though I am based nearest PHX, requests are sent to LHR. Here’s the results of a curl to debug:

$ curl -s -I -H "flyio-debug: doit" http://debug.fly.dev | grep flyio-debug
flyio-debug: {"n":"edge-cf-lon1-a6eb","nr":"lhr","ra":"67.1.197.253","rf":"Verbatim","sr":"lhr","sdc":"lon1","sid":"3d8d314f456d89","st":0,"nrtt":0,"bn":"worker-cf-lon1-79f0","fbn":null}

and mtr for my applications static ipv4:

$ mtr -r -n -o "L BAWV MI" 137.66.15.125
Start: 2024-10-03T09:01:24-0700
HOST: fedora                      Loss%   Best   Avg  Wrst StDev  Javg Jint
  1.|-- 192.168.0.1                0.0%    0.2   0.3   0.5   0.1   0.0  0.3
  2.|-- 75.160.240.26              0.0%    3.0   3.2   3.7   0.2   0.2  1.5
  3.|-- 75.160.241.201             0.0%    2.1   3.2   3.9   0.4   0.5  3.5
  4.|-- 4.68.73.122                0.0%    5.5   6.5  10.4   1.4   1.2  8.9
  5.|-- ???                       100.0    0.0   0.0   0.0   0.0   0.0  0.0
  6.|-- 4.30.181.50               10.0%   60.2  60.4  60.8   0.3   0.3  2.2
  7.|-- 87.245.233.230             0.0%  130.0 140.4 173.0  17.8   8.7 66.5
  8.|-- 87.245.208.155             0.0%  129.9 146.0 196.7  24.2  14.5 105.8
  9.|-- 137.66.15.125              0.0%  128.4 136.7 158.0  10.8   8.1 60.5

My IPv4 traffic is also taking the long route. I don’t know how long it’s been happening, but I definitely started noticing some latency-sensitive applications slowing down in recent weeks.

For anyone curious about their IPv4/6 routes (and who doesn’t want to use the command line), I set up a little test here: https://my-route.fly.dev

The app is running in DFW, but that shouldn’t matter. Anycast is supposed to route you to the closest edge location.

1 Like

Added lhr, proxy

Sorry for the inconvenience, and thanks for providing the curl and debug mtr information. We can work with our network provider to fix the routes for ISPs that are being routed incorrectly, so these reports are helpful.

1 Like

Hello @treebones ,

We have made some changes on our end. Can you try again and see if you are still being routed to lhr? If you are, please still include an mtr.

Thank you.

Your changes worked wonderfully! Thank you very much

~ ❯❯❯ curl -s -I -H "flyio-debug: doit" http://debug.fly.dev | grep flyio-debug
flyio-debug: {"n":"edge-nac-lax1-93ef","nr":"lax","ra":"67.1.197.253","rf":"Verbatim","sr":"lax","sdc":"lax1","sid":"73d8dd921b5891","st":0,"nrtt":0,"bn":"worker-cf-lax1-7042","fbn":null}
~ ❯❯❯ mtr -r -n -o "L BAWV MI" 137.66.15.125
Start: 2024-10-03T16:28:56-0700
HOST: fedora                      Loss%   Best   Avg  Wrst StDev  Javg Jint
  1.|-- 192.168.0.1                0.0%    0.2   0.3   0.4   0.1   0.1  0.6
  2.|-- 75.160.240.26              0.0%    3.1   4.2  12.3   2.8   1.9 12.5
  3.|-- 75.160.241.201             0.0%    2.4   3.7   5.5   0.9   1.0  7.1
  4.|-- 4.68.73.122                0.0%    5.6  10.4  23.6   5.6   5.1 38.7
  5.|-- 129.250.8.90               0.0%    5.4   7.7  21.1   4.7   3.3 23.1
  6.|-- 129.250.3.85              50.0%   14.3  14.8  15.6   0.5   0.3  1.5
  7.|-- 129.250.3.79               0.0%   14.3  31.3  44.7   9.4   9.4 69.2
  8.|-- ???                       100.0    0.0   0.0   0.0   0.0   0.0  0.0
  9.|-- ???                       100.0    0.0   0.0   0.0   0.0   0.0  0.0
 10.|-- 104.225.18.44              0.0%   14.5  15.1  16.9   0.8   0.6  5.1
 11.|-- 137.66.15.125              0.0%   14.7  14.9  15.5   0.3   0.3  2.3

1 Like

Hi,
I’m evaluating fly.io right now to move an application there, but I’m perplex on the latency I observe. I did deploy a test app on the CDG region and keep being routed to the IAD edge despite being located in Paris. Can’t get below 160ms latency (despite the app responding in <5ms). I’ve got quite a good fibre connection and usually don’t get above 20ms on more or less any west european DC, what’s happening here?

Here is the debug thingy and a traceroute:

❯ curl -I -H 'flyio-debug: doit' http://[2a09:8280:1::4a:3c88:0]:80/
HTTP/1.1 400 Bad Request
server: Fly/a71b98465 (2024-10-09)
via: 1.1 fly.io
fly-request-id: 01J9W654DT46HP999BP2R6QB2R-iad
flyio-debug: {"n":"edge-cf-iad2-69b2","nr":"iad","ra":"2a01:cb08:8b4:XXXX:XXXX:XXXX:XXXX:XXXX","rf":"Verbatim","sr":null,"sdc":null,"sid":null,"st":null,"nrtt":null,"bn":null,"fbn":null}
date: Thu, 10 Oct 2024 21:42:58 GMT
❯ mtr 2a09:8280:1::4a:3c88:0
                                                                        Packets               Pings
 Host                                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 2a01cb0808b4XXXXXXXXXXXXXXXXXXXX.ipv6.abo.wanadoo.fr               0.0%    74    2.7   2.7   2.4   6.6   0.5
 2. 2a01cb08a00402020193025300770047.ipv6.abo.wanadoo.fr               0.0%    74    4.3   4.1   3.4   5.1   0.3
 3. 2a01:cfc0:200:8000:193:252:102:135                                 0.0%    74    4.6   5.0   4.1   8.2   0.8
 4. bundle-ether149.pastr4.paris.opentransit.net                      34.2%    74    4.7  15.8   4.4 256.5  48.2
 5. 2001:688:0:2:1::231                                               18.9%    74   81.4  92.8  80.5 397.1  55.1
 6. comcast.gw.opentransit.net                                         0.0%    73   81.5  81.5  80.8  82.6   0.4
 7. be-3307-cs03.beaumeade.va.ibone.comcast.net                        0.0%    73   81.5  81.6  80.5  99.0   2.1
 8. be-3311-pe11.ashburn.va.ibone.comcast.net                          0.0%    73   81.6  81.3  80.6  84.4   0.5
 9. 2001:559:0:6::ba                                                   0.0%    73   81.2  86.6  81.0 141.7  12.7
10. 2a09:8280:1::4a:3c88:0                                             0.0%    73   80.5  81.1  80.4  82.6   0.4

Any help or direction would be much appreciated.

1 Like

I have quite similar situation: on my home’s ISP it always iad though when I switch to mobile it selects waw/fra correctly.
Sometimes I have cdg. any EU-based routing is acceptable, but in most cases it’s iad.

traceroute to debug.fly.dev (77.83.140.164), 64 hops max, 40 byte packets
 1  lan (192.168.1.1)  5.392 ms  3.587 ms  3.695 ms
 2  kra-bng1.neo.tpnet.pl (83.1.4.239)  8.007 ms  8.269 ms  17.372 ms
 3  kra-r12.tpnet.pl (80.50.122.5)  9.929 ms
    kra-r21.tpnet.pl (80.50.18.5)  7.249 ms
    kra-r12.tpnet.pl (80.50.122.5)  9.699 ms
 4  * 193.251.141.75 (193.251.141.75)  31.057 ms
    kra-r22.tpnet.pl (195.116.35.226)  8.937 ms
 5  193.251.128.13 (193.251.128.13)  508.263 ms
    193.251.141.75 (193.251.141.75)  23.046 ms
    193.251.128.13 (193.251.128.13)  108.475 ms
 6  193.251.128.13 (193.251.128.13)  112.594 ms
    193.251.248.206 (193.251.248.206)  111.085 ms  115.076 ms
 7  be-3107-cs01.beaumeade.va.ibone.comcast.net (96.110.32.185)  112.059 ms
    193.251.248.206 (193.251.248.206)  107.312 ms
    be-3407-cs04.beaumeade.va.ibone.comcast.net (96.110.32.197)  109.353 ms
 8  be-3111-pe11.ashburn.va.ibone.comcast.net (96.110.32.122)  107.798 ms
    be-3107-cs01.beaumeade.va.ibone.comcast.net (96.110.32.185)  109.402 ms
    be-3207-cs02.beaumeade.va.ibone.comcast.net (96.110.32.189)  108.910 ms
 9  be-3411-pe11.ashburn.va.ibone.comcast.net (96.110.32.134)  106.898 ms
    50.248.117.70 (50.248.117.70)  108.190 ms
    be-3211-pe11.ashburn.va.ibone.comcast.net (96.110.32.126)  109.582 ms
10  50.248.117.70 (50.248.117.70)  107.440 ms * *

Hi there :wave:,

I have made some adjustments (read: Traffic Engineering :tm:), and it looks like from our monitoring that traffic originating from Orange should now be routed correctly to European regions. Can you confirm whether this has resolved your bad routes?

yep, so far - so good. thanks! Now I see CGD. latency is much better, 2-3x on WSS connections, and 5-6x on http requests.
On a positive side, I did a lot of optimisations in the app b/c initially I thought it’s something wrong in my code :smile:

As I understood it’s a question more to ISP and some clients could still have this issue?
I’m just curios, why was Paris selected, but not Fra or Waw that seems to be closer to me. I’m Poland based at the moment.

I’m getting consistent ~10ms latency to my app, so I think the magic worked quite well!
Looks like it improved latency on mobile network too (on a different operator, still routed to cdg but ~30ms quicker)

Thanks a lot!

Long story short, in the BGP-land, routes are not decided based on geographical distance, but based on a set of complicated criteria, mainly the AS-Path (AS = Autonomous Systems, think ISPs), i.e. “how many ISPs do I need to pass through to deliver this packet?”. ISPs can also prefer or avoid upstreams based on business relations (are they a direct customer of us? do they pay us more?)

In this case, Orange probably has much better “peering” in Paris – they can effectively reach us in a single hop there. This is probably why it was selected by Orange. I am not sure if they peer as much in FRA or WAW. This could also be a business decision (preferring to route traffic back to Paris inside their own backbone network might be cheaper for them than to FRA).

6 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.