IPv4 not reachable from Brazil

Hi there, folks.

Sorry if this isn’t the right place to post this question (couldn’t find a “Networking” tag) but could’t find any other official channel.

Right now (and at least since yesterday) our server at gitpace.com is not reachable from the south of Brazil (tested from at least 3 cities, 3 distinct providers) through IPv4.

$ dig gitpace.com A
#...
;; ANSWER SECTION:
gitpace.com.		60	IN	A	169.155.63.33

$  dig gitpace.com AAAA
# not set
$ nc -v 169.155.63.33 443
# does not connect
$ dig gitpace.fly.dev A    
#...
;; ANSWER SECTION:
gitpace.fly.dev.	300	IN	A	169.155.63.33

$ dig gitpace.fly.dev AAAA
#...
;; ANSWER SECTION:
gitpace.fly.dev.	300	IN	AAAA	2a09:8280:1::1:aebf

nc -v gitpace.fly.dev 443
Connection to gitpace.fly.dev (2a09:8280:1::1:aebf) 443 port [tcp/https] succeeded!

Ping works, for some reason that escapes my knowledge:

ping gitpace.com
PING gitpace.com (169.155.63.33) 56(84) bytes of data.
64 bytes from 169.155.63.33 (169.155.63.33): icmp_seq=1 ttl=50 time=42.2 ms

Tracepaths:

$tracepath gitpace.com  
 1?: [LOCALHOST]                      pmtu 1500
 1:  192.168.0.1                                           9.531ms 
 1:  192.168.0.1                                           8.654ms 
 2:  10.13.0.1                                            17.332ms 
 3:  bd0792f5.virtua.com.br                               16.812ms 
 4:  embratel-T0-6-0-0-2-4004-agg02.pltqn.embratel.net.br  17.623ms 
 5:  200.230.28.243                                       26.304ms asymm  8 
 6:  200.230.28.0                                         25.678ms asymm  7 
 7:  200.230.28.0                                         21.710ms 
 8:  200.250.247.150                                      24.196ms asymm  9 
 9:  ae1.3502.edge2.SaoPaulo1.level3.net                  43.816ms asymm 11 
10:  209.219.163.148.ptr.anycast.net                      48.663ms asymm 13 
11:  65.219.163.148.ptr.anycast.net                       47.091ms asymm 14 
12:  65.219.163.148.ptr.anycast.net                       47.143ms asymm 14 
13:  65.219.163.148.ptr.anycast.net                       43.764ms asymm 14 
14:  no reply
15:  no reply
16:  no reply
17:  no reply
18:  no reply
19:  no reply
20:  no reply
21:  no reply
22:  no reply
23:  no reply
24:  no reply
25:  no reply
26:  no reply
27:  no reply
28:  no reply
29:  no reply
30:  no reply
     Too many hops: pmtu 1500
     Resume: pmtu 1500 
$ tracepath gitpace.fly.dev
 1?: [LOCALHOST]                        0.007ms pmtu 1500
 1:  2804:14d:4082:900d:4a29:52ff:fe46:5b6b               10.754ms 
 1:  2804:14d:4082:900d:4a29:52ff:fe46:5b6b                8.724ms 
 2:  2804:14d:4082::1                                     17.670ms 
 3:  2804:14d:403f::161                                   19.539ms 
 4:  2804:a8:2:d2::759                                    17.220ms asymm  6 
 5:  no reply
 6:  no reply
 7:  2804:a8:2:d0::132a                                   29.024ms asymm  9 
 8:  2804:a8:2:d0::132a                                  131.291ms asymm  9 
 9:  2001:13b4:4000:4::e03                               165.220ms asymm 11 
10:  2607:f740:1:1::5                                    154.356ms asymm 13 
11:  2607:f740:1:1::3                                     52.394ms asymm 14 
12:  2607:f740:1:1::3                                     53.603ms asymm 14 
13:  2607:f740:1:1::3                                     46.857ms asymm 14 
14:  no reply
15:  no reply
16:  no reply
17:  no reply
18:  no reply
19:  no reply
20:  no reply
21:  no reply
22:  no reply
23:  no reply
24:  no reply
25:  no reply
26:  no reply
27:  no reply
28:  no reply
29:  no reply
30:  no reply
     Too many hops: pmtu 1500
     Resume: pmtu 1500 

Is it a problem from my providers’ end? I’m using Claro NXT Telecomunicacoes Ltda; but tested with Vivo and Osirnet as well. Please let me know if there’s anything else I can help you with!

Best!
Bruno

3 Likes

Thanks for the detailed report!

Pings working got me thinking this was probably on our end.

Yesterday we had some issues in GRU and it appears it didn’t recover cleanly. I’ve now fixed the issue.

1 Like

Thanks Jerome! Do you do post-mortem analysis on issues like this one? (how can we make sure this won’t happen in other parts of the world?)

1 Like

This could’ve happened anywhere, indeed.

We don’t yet do post-mortems. We’re too small and don’t have enough time on our hands. Not to diminish the impact of this issue, but it only affected a single region and some IP ranges. We do want to do post-mortem analysis at some point, we’re hiring a lot more infrastructure and operations people to help us get there. We’re happy to do give you more details on a case-by-case basis though!

I’ve already logged a ticket (with a potential solution) to work on a way to detect this issue the next time it happens do we can deal with it automatically / swiftly.

1 Like