is `sin` region of fly experiencing networking problems? im getting random tls handshake timeout

health checks suddenly got tls and then suddenly was working again

this was one of the most perplexing bugs for me ever


random tls handshake timing out, what is happening?

has happened in the past yet not reported in status?

Is it possible to run a mtr or traceroute from your health checks when it detects a handshake timeout? This kind of issue tends to be isolated to single ISPs and we can’t really catch all of them on our side. We do know that our platform itself seems to be okay right now in sin.

what confuses me is im able to access the endpoint in my browser across different devices now and even using curl but fly somehow cannot access a different fly machine

i’ve also experienced this endpoint not being accessible to device a, but device b yes

same browser, same internet connection

and for me just really really perplexing for health check behavior to do that

Ok, then that sounds like a different problem. I had the impression that your health checks were running from outside Fly.

In this case it could be a single-host issue affecting the host running your check machine. Can you share the name of your app doing these checks?

and it’s frustrating because ik this has happened in the past and somehow it’s back again

ive also tried deploying on multiple

the name of the app is control-vm which is hitting example machines: 1d-vm, 4h-vm, 1h-vm, etc

ive configured it to hit different endpoints from a diff provider for now because nothing is happening

the initial image is app called fk-me (sorry), those are logs from health checks being performed by fly

please let me know how else i can help debug, i really wanna deploy my platform on fly

this is also confusing because it’s saying not reachable, but as you can see green dot + last health check is passing

last concrete example

(control-vm.fly.dev) pinging → 5m-vm.fly.dev

and getting TLS TIMEOUT but i am perfectly able to hit that endpoint

something must be wrong in fly networking (i think) @PeterCxy

pic1: log of control-vm pinging 5m-vm.fly.dev and timing out

pic2: i am able to hit that endpoint perfectly fine

Finally reported in status haha

i guess i am 3 for 3 in reporting issues not yet in status page :rofl:

im glad im not crazy though but really concerns me running production apps in fly

For more context, this only affects new IPv6s assigned in the past little while, which is also why it didn’t get caught on our side. We do have alerts for networking problems in general, but this one is a little bit weird. We’ll definitely need to include newly-assigned IPs in alerts going forward.

all good, stubbornly waiting for fix even if already ~2am here

i still greatly love fly - just a bit more concerned with instability / unreliability

looks like it is fixed!!!