health checks suddenly got tls and then suddenly was working again
this was one of the most perplexing bugs for me ever
health checks suddenly got tls and then suddenly was working again
this was one of the most perplexing bugs for me ever
has happened in the past yet not reported in status?
Is it possible to run a mtr or traceroute from your health checks when it detects a handshake timeout? This kind of issue tends to be isolated to single ISPs and we can’t really catch all of them on our side. We do know that our platform itself seems to be okay right now in sin.
what confuses me is im able to access the endpoint in my browser across different devices now and even using curl but fly somehow cannot access a different fly machine
i’ve also experienced this endpoint not being accessible to device a, but device b yes
same browser, same internet connection
and for me just really really perplexing for health check behavior to do that
Ok, then that sounds like a different problem. I had the impression that your health checks were running from outside Fly.
In this case it could be a single-host issue affecting the host running your check machine. Can you share the name of your app doing these checks?
and it’s frustrating because ik this has happened in the past and somehow it’s back again
ive also tried deploying on multiple
the name of the app is control-vm which is hitting example machines: 1d-vm, 4h-vm, 1h-vm, etc
ive configured it to hit different endpoints from a diff provider for now because nothing is happening
the initial image is app called fk-me (sorry), those are logs from health checks being performed by fly
please let me know how else i can help debug, i really wanna deploy my platform on fly
this is also confusing because it’s saying not reachable, but as you can see green dot + last health check is passing
last concrete example
(control-vm.fly.dev) pinging → 5m-vm.fly.dev
and getting TLS TIMEOUT but i am perfectly able to hit that endpoint
something must be wrong in fly networking (i think) @PeterCxy
pic1: log of control-vm pinging 5m-vm.fly.dev and timing out
pic2: i am able to hit that endpoint perfectly fine
Finally reported in status haha
i guess i am 3 for 3 in reporting issues not yet in status page ![]()
im glad im not crazy though but really concerns me running production apps in fly
For more context, this only affects new IPv6s assigned in the past little while, which is also why it didn’t get caught on our side. We do have alerts for networking problems in general, but this one is a little bit weird. We’ll definitely need to include newly-assigned IPs in alerts going forward.
all good, stubbornly waiting for fix even if already ~2am here
i still greatly love fly - just a bit more concerned with instability / unreliability
looks like it is fixed!!!