The above is a tornado deployment. I’m also seeing it for other services, including some that have a simple nginx setup. Is there something going on? Thank you
Sure! So here’s some other interesting datapoints. Updown.io is showing no problems but when I try to ping servers from my local machine and from other machines out in the wild (mostly digital ocean) I often get timeouts.
% traceroute registry.fly.io
traceroute to registry.fly.io (77.83.143.220), 64 hops max, 52 byte packets
1 10.0.0.1 (10.0.0.1) 1.491 ms 1.215 ms 1.006 ms
2 192.168.99.1 (192.168.99.1) 1.260 ms 1.256 ms 1.167 ms
3 148-64-111-65.public.monkeybrains.net (148.64.111.65) 3.077 ms 3.061 ms 3.303 ms
4 172.17.19.170 (172.17.19.170) 3.296 ms 3.334 ms 3.187 ms
5 172.17.18.50 (172.17.18.50) 2.303 ms 1.749 ms 1.683 ms
6 172.17.22.244 (172.17.22.244) 1.659 ms 2.490 ms 1.553 ms
7 208.52.0.73 (208.52.0.73) 1.908 ms 2.646 ms 1.926 ms
8 192.175.30.252 (192.175.30.252) 2.405 ms 2.695 ms 2.323 ms
9 192.175.29.226 (192.175.29.226) 3.132 ms 3.334 ms 3.185 ms
10 be13.cr2-55smarket.bb.as11404.net (192.175.30.220) 5.641 ms 5.044 ms 4.646 ms
11 be11.cr3-11greatoaks.bb.as11404.net (192.175.30.38) 5.131 ms 5.423 ms 5.087 ms
12 cr1-9greatoaks-be3.bb.as11404.net (192.175.30.214) 5.058 ms 4.987 ms 5.120 ms
13 * * *
14 * * *
15 * * *
16 * *
Don’t have traceroute6 installed. Will look at installing it.
Hmm, I can’t quite tell which region that’s hitting. Can you provide the results of curl -I http://registry.fly.io -H "flyio-debug: doit" from wherever it’s failing?
Thanks so much btw. Here you go from my local machine where requests don’t always fail but sometimes take a while. Also it might be better to look up my “proxy-sea” service instead because that’s just nginx, so there are fewer confounding variables, like my tornado instance. While I don’t believer it do be the case, nikola-sharder could have some tornado bug causing a stall.
For the sake of transparency, just wanted to report I’m also seeing some failures from digital ocean to digital ocean. It’s pretty hard for me to explain all these happening at once.
Okay, thank you. And to clarify, was there an issue with my servers where I should consider adding more instances or something or is it about some intermediate fly servers that are under your control? Thanks!
traceroute to api.fly.io (2a09:8280:1:f28:246e:d6a:949:dbbf), 30 hops max, 80 byte packets
1 2003:a:1344:24fc:: (2003:a:1344:24fc: 0.409 ms 0.663 ms 0.447 ms
2 2003:a:1344:2400:e228:6dff:fe6b:db2a (2003:a:1344:2400:e228:6dff:fe6b:db2a) 1.959 ms 3.004 ms 2.808 ms
3 2003:0:1406:6419::1 (2003:0:1406:6419::1) 18.567 ms 18.555 ms 19.113 ms
4 2003:0:1406:2410::2 (2003:0:1406:2410::2) 19.816 ms 20.031 ms 20.743 ms
5 e0-51.switch2.fra2.he.net (2001:470:0:5f6::1) 27.225 ms 26.358 ms 27.413 ms
6 e0-34.core2.ams2.he.net (2001:470:0:4b7::2) 34.424 ms 27.266 ms *
7 100ge2-1.core1.ams1.he.net (2001:470:0:489::1) 41.138 ms 41.125 ms 41.112 ms
8 amsix.as36236.net (2001:7f8:1::a503:6236:1) 35.961 ms 32.552 ms 41.109 ms
9 2607:f740:d:10::4 (2607:f740:d:10::4) 46.596 ms 46.584 ms 36.058 ms
10 2607:f740:d:16::2 (2607:f740:d:16::2) 35.256 ms 35.479 ms 35.467 ms
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 …
I see a few TLS handshake EOF errors to the API through Amsterdam. This seems like a network issue between you and AMS, if it cleared up we probably can’t figure out why, if it happens again you can post here and we can do some more digging.