502 errors from app ... possibly related to 525?

greg · June 30, 2021, 12:27am

Hello,

This may be related to the 525 errors (Cloudflare 525 error randomly occurs - #3 by greg) if kurt or someone on the team is playing with the proxy but a few moments ago I got a notification of another app failing, different to the one there, this time with 502 errors.

And when I look at the app status, all is well. Has one instance, running in ams. I haven’t touched it for ages, no deploys or changes.

But when I look at the logs, with flyctl logs, I see pages of errors where it looks like the request arriving in lhr, from me or a healthcheck, is failing:

2021-06-30T00:22:34.487955794Z proxy lhr [error] error.code=1002 error.message="No suitable (healthy) instance found to handle request" request.method="HEAD" request.url="/admin/signin" request.id="01F9D4NTQM2PJX6XF2SW08A786" response.status=502

That’s from a healthcheck test URL which should respond and work. I can’t connect to it either, requests timeout for me in browser too.

Hmm … not good. I can try making two instances as a temporary fix? Or give it a restart? Shouldn’t need to but wondered if you were changing anything currently?

greg · June 30, 2021, 12:35am

Ah, it’s back up again now.

I didn’t touch anything Not sure if you did anything at your end? Still showing the same one instance in the status. in ams(B). Doesn’t appear to have been replaced.

I had a load of errors in the log e.g

2021-06-30T00:22:04.825421114Z proxy lhr [error] error.code=1002 error.message="No suitable (healthy) instance found to handle request" request.method="GET" request.url="/favicon.ico" request.id="01F9D4MXT37EHASSBNSMTM1QWC" response.status=502

And indeed it did not work.

… but now I’m seeing 200s again. And can connect to the app in browser too.

Don’t know if those request IDs shed any light at your end, whether this is connected to the proxy/525/restart, or just a coincidence?

But it’s working again now. Down for maybe 7 minutes.

And yep, got a 502 Cloudflare error. So that 525 does seem SSL related, and so not related to the app itself.

greg · June 30, 2021, 12:43am

Ah, if it helps, it does appear to be proxy-related. Only when scrolling through the logs, I see e.g

2021-06-30T00:22:01.955175043Z app[8b2d659a] ams [info] GET /healthcheck 200 16 - 0.632 ms
2021-06-30T00:22:04.825421114Z proxy lhr [error] error.code=1002 error.message="No suitable (healthy) instance found to handle request" request.method="GET" request.url="/favicon.ico" request.id="01F9D4MXT37EHASSBNSMTM1QWC" response.status=502 
2021-06-30T00:22:06.961117721Z app[8b2d659a] ams [info] GET /healthcheck 200 16 - 2.582 ms

… and that /healthcheck is what I have in the fly toml. So the app itself was working the whole time, and reporting 200 to that internal check. Which would explain why your system did not auto-replace it (I assume, as I guess that would happen on a healthcheck failure?).

But the outside world could not connect to it as the proxy was reporting no instance was found to serve the request. Which is a problem.

Pesky internet!

michael · June 30, 2021, 12:49am

Is it looking better now?

greg · June 30, 2021, 12:52am

Yep, it’s still up

Like I say, I didn’t intervene. Was wondering whether to, but didn’t seem like anything I could do given the app instance was saying it was healthy. And I haven’t changed it.

It seems like the wobble was at your proxy which connects it to the outside world? I noticed errors for lhr and sea but may have been others. As that was reporting no instance to connect, even though there was.

michael · June 30, 2021, 12:54am

Yeah this was on our end. Still trying to figure out what happened, but right now it seems a network issue caused consul to lose a leader for a few minutes which then caused bad state in several other services, including our proxy.

greg · June 30, 2021, 12:58am

Ah. That would explain it.

I wasn’t sure whether to report the issue but didn’t know how long it would last, naturally.

I guess in this case even having more VMs in different regions wouldn’t have helped, as that was my other thought. As the proxy controls access to them. So it’s all dependent on that.

Topic		Replies	Views
Random 525 errors for various apps	3	454	March 16, 2021
525 errors are back	8	853	March 18, 2021
Cloudflare 525 error randomly occurs	46	7210	March 6, 2025
Error logs saying "Internal problem" result in 502s	10	454	August 16, 2021
Seeing requests to the Fly proxy in the logs	4	774	August 26, 2021

502 errors from app ... possibly related to 525?

Related topics