App suddenly getting 599s

davidhodge · March 20, 2021, 7:01am

@jerome sorry for my delay here. I moved on for a bit when the issue went away! So as far as I know there shouldn’t be a service I’m using that would be taking a long time to do handshakes or that would be aborting them.

The majority of traffic will come from a hand-rolled health checker with python/tornado. If the above is happening this seems the most likely culprit.
updown.io
The actual app. While this is the primary “use-case” and the service itself is critical, the actual traffic from real users is quite low.

If you have any ideas on how to investigate, especially #1 I’d be curious to take a look.

jerome · March 21, 2021, 7:25pm

Is it possible to see the source for the hand-rolled health checker? Or perhaps describe what it does specifically? If it doesn’t kept the connection around without completing the TLS handshake, that might’ve caused this issue.

We’ve used updown.io in the past and it’s never caused these issues so I’m assuming this is fine.

I think the way we were spawning the asynchronous task was at fault here. I learned there was no way to know if the operation had timed out (despite having logic to that effect). We’ve refactored this whole bit to make it detectable and it should now be fine. We have further optimizations to make concerning TLS handshakes which should also help.

davidhodge · March 21, 2021, 8:52pm

Sure, I’ll make a slimmed down version and send it over. What’s the best email to use for you?

jerome · March 21, 2021, 10:33pm

support @ our domain works you might get an automated response but we’ll see it!

Topic		Replies	Views
599s from Fly to Fly this evening	6	414	August 25, 2020
SSL Connection Issues after a deployment	21	791	April 13, 2021
Elevated error rates	20	1208	July 22, 2021
odd timeouts for a service	3	573	March 26, 2022
understanding spikes in P99	18	1629	July 21, 2021

App suddenly getting 599s

Related topics