Any issues with your routing over the past half hour?

greg · March 23, 2021, 3:42pm

Hello,

I got an alert that an app wasn’t working, and indeed it wasn’t. It was timing out.

I looked at others too.

I checked your home page and that was down too, so it wasn’t my app, code etc. Has anything been changed recently?

(They have come back up again now and your site is back up too)

jerome · March 23, 2021, 3:43pm

There was a short blip in our London location. We’re investigating this.

greg · March 23, 2021, 3:51pm

Ah, that would explain it. I should have said it was LHR.

So … how does the situation work with multiple regions. Like I have an app in LHR and IAD. What happens to requests in those cases? I had assumed that if a region was down, which I understand does happen, requests would just be handled by the other region, and so would get no error. But that didn’t seem to be the case here as I did get an error.

Or would that have kicked in at some point and taken LHR out of service, essentially, so all requests would have been routed to IAD?

Currently it is a good time of the day for me, and probably for you too, but if this happened at 5am or something I’m wondering what would have happened.

kurt · March 23, 2021, 4:05pm

There are lots of different regional problems that we respond to in different ways. This one was a network issue that caused connections from outside to fail. We’re trying to figure out if it was us (meaning, our network provider) or upstream of us (meaning, we’re too small to get someone’s attention).

When this type of thing happens, we’ll typically route around an affected region. This is not an automated process, we get paged, check to see what’s happening, and then basically pull the plug by hand. Withdrawing routes for a region is disruptive to traffic that is flowing, so we’re somewhat careful. This particular flap lasted ~3 minutes, we didn’t get far enough into diagnosing to respond.

We didn’t have any issues connecting to LHR over our own backhaul this time. People going through Paris would still be able to connect to your VMs.

There are other things that can happen with regions. If there’s a power outage, for example, routes get withdrawn automatically (and VMs get rescheduled other places, if possible).

It actually doesn’t matter what time it is, we are always on call so the response is basically the same at 5am as late afternoon. Also, @jerome has a young child so he’s awake 24x7. Convenient.

greg · March 23, 2021, 4:13pm

I see. I wasn’t sure what kind of automated/manual process was involved.

Thanks for the detailed explanation

kurt · March 23, 2021, 4:14pm

No problem! And just so I say this out loud, a 3 min network flap sucks and we will get better and hiding those from you.

Topic		Replies	Views
Unexpected routing from an Edge to an App in a different region	7	470	March 30, 2022
Any issue with the LHR region at the moment?	7	457	January 6, 2022
LHR Down - AMS Won't Deploy as Workaround lhr	3	118	March 23, 2025
Any issues/downtime last night/this morning?	2	315	November 30, 2021
Being routed to very far edge instance lhr , proxy	13	160	October 18, 2024

Any issues with your routing over the past half hour?

Related topics