I’m having some issues talking to one of my servers if I ping it from Seattle (seeing a lot of read timeouts and hangs on the TLS handshake). If I hop on a VPN and ping it from France, it works fine. It seems to have issues going through the sjc edge, but that’s the only one I’ve seen
There was some transient issues momentarily in one of the sjc ingest edges, but it should be clearing up now.
Oh, I think I was hitting this too.
Still seeing the problem on our end. Any updates on the incident?
There’s now an official status-page entry open for this one. That’s usually the best place to look for updates…
sjc and ord are down for all my apps.
cli commands for fly and sprite are timing out intermittently
The status page is underselling it for me. “Increased latency” is not what I am experiencing. Everything is failing for me. I don’t even have any jobs running in SJC but because that’s where I’m closest to, the control plane is basically down for me right now.
iad not accessible now as well
There is also no mention of other regions that are having connectivity problems like ord or iad.
There is also no mention for the apis that fly and sprite cli tools use being inaccessible.
Hi, just to provide an update here too: the issue was affecting edge hosts in sjc, so if you happen to be routed to an sjc edge, the issue would have affected you regardless of what you’re trying to do. On the other hand, apps that are actually located in the region are probably less affected for users outside (since in that case, routing is completely within our control and we can route around the bad edges).
I do agree that the wording around the statuspage could have been better. We’ll provide a more detailed write-up in our Infra Log after a couple of days.
Yeah I’m in LA area and can’t connect to DFW machines. Definitely larger impact than the incident status page indicates. Still ongoing even after the latest update.
Mind running a curl -H "flyio-debug: doit" against any fly-hosted address? e.g. https://fly.io?
fly-request-id: 01KSRAYWP2ACZ1CR29XJX3GEYR-lax
flyio-debug: {"n":"edge-cf-lax1-2427","nr":"lax","ra":"4.53.155.98","rf":"Verbatim","sr":"lax","sdc":"lax1","sid":"8dd95ecee97738","st":0,"nrtt":0,"bn":"worker-cf-lax1-6d95","mhn":null,"mrtt":null}
Got a few curl: (35) Recv failure: Connection reset by peer failures trying to run that
Not seeing the connection reset errors on that curl any more. Accessing servers from Seattle seems to be working better now
It should be okay now, sorry for the bit of back-and-forth. The gist is that we are very edge capacity-constrained in the west coast due to a number of reasons. SJC was the first region to fall over, we attempted to reroute to LAX, but LAX turned out to also be capacity constrained (that we didn’t expect). You’ll see more details once we post our Infra Log writeup for this.
appreciate the transparency ![]()
Thanks Peter, it’s resolved for me now. One thing to note: I did notice this issue about a week ago for only a little bit, so the symptoms did show up earlier but took down a few apps for a couple hours today.
Looks like this is back! My sites are spotty going on and off and the fly.io site is not working well for me either
Hey, apologies, we had a brief hiccup with a follow-up deploy to this incident. Are you still seeing issues in the last few minutes?
Hey! We have been experiencing issues the whole afternoon, non of our apps in sjc is reachable at the moment