One of my apps (nikola-sharder) is seeing major delays and often 599s suddenly with no changes on my end. I’ve just tried to re-deploy but the issue remains? Any ideas what might be up here?
I’m going to try removing and re-adding regions. Thanks!
Update: looks like the requests are failing when they happen from my digital ocean servers, but not from my local network.
We don’t have anything that would generate a 599, are you running a proxy in front of your fly app? We had some issues with our routing layer that might be the cause of this: https://status.flyio.net/
hey sorry I forgot you don’t give 599s. My tornado health checker is reporting 599s, but the real issue is timeouts I think. The 599s are synthetic, but the timeouts are real. Updown.io reports intermittent TLS timeouts.
No proxy in front of my Fly App. Btw, is there a way to explicitly route a request to each region to narrow in on the issue.
okay so the problem is persisting after I tried removing and adding regions, redeploying, and scaling up / down. Here’s a curl that’s failing often with a TLS timeout. Am going to try checking region by region next. Any ideas? Thanks!
Yes, so I just suspended it to try the old “turn it off and turn it on again” trick. I’m happy to report that may have “fixed” the issue after I resumed it.
Though I’m not yet sure and I am still spooked as this is a key service for my app.
I flushed the cached certificates for your app around the same time you restarted. Odds are, my flush fixed it. We might have had lingering cache issues for your cert from the router issues earlier today.
This morning, I noticed there were still some issues with TLS handshakes in certain regions. I did a thing (restart our proxy) that appears to have fixed it.
We’re still on this, it’s very strange but I think we’re making progress! You should see far fewer check failures now (but you’ll still see them intermittently).