Is anyone else experiencing major connectivity issues trying to connect to their app or even the main fly.io website? Even logging into this community site (which directs to fly.io, not community, to authenticate) was extremely slow (60+ seconds).
When I try to access fly.io I’m intermittently getting the This site can’t be reached error. Maybe once or twice I was able to load my dashboard in the last hour. Ditto for my app.
Ordinarily I’d chalk this up to my personal computer being slow, my app being poorly programmed, etc. But I’m able to access the rest of the internet just fine and the fact that the main fly.io website can’t be reached gives me pause. I don’t see anything on the status site, though. And the status site I can reach just fine (I think the community and status subdomains are fine). https://status.flyio.net/
Just curious if anyone else is experiencing this or if it really is just me
Everything went down at some point last night for me… Couldn’t reach fly, couldn’t reach my deployed apps, couldn’t get status from the cli, nothing…
Been a lot of issues lately seems like.
Things seem to be back up now, but it’s unclear how long they were down. It seems less like the machines were down and more like traffic could not be routed.
Just to confirm for everyone in this thread, this description is accurate:
We’ve been fighting a DDoS for the past > 24 hours. The Fly Platform itself has been operating normally, but in the regions the DDoS is targeting (which move around), there have been connectivity issues. Here is a thread with more information.
We appreciate the kind words, but frankly this is Fly.io’s problem to solve and we’re kind of bummed you and so many others got caught up in it. I think we have a pretty rad infra team and they’re working really hard to evolve our toolkit for handling these sorts of issues. While I’m easily excited by cool tech and we should probably tell you all sorts of details about what we’re building to handle attacks, you should feel free to tell us how we could have handled communicating this better. We have thick skin.
Hey, the incident seems to be fully resolved based on the status page but my uptime monitoring is still reporting very high latencies and timeouts in some regions.
Should I try to restart machines/redeploy, or may I provide some info for you to take a look?
With the caveat that as a DDoS is an attack and so could resume at any time, yes, it looks like things have quieted down, though we still have some protective measures active just in case.
Latency can be hard to debug, so if your app is designed for it, a good old turn-it-off-then-back-on is always a good start. A new deploy could do this, or running flyctl m restart <MACHINE IDS>. If you still have problems, you can make a post in this forum or email support.