Regional emergency power maintenance in SEA

jssjr · March 20, 2023, 8:15pm

We’ve updated our statuspage, but want to make sure everyone is aware about an upcoming maintenance window.

One of our upstream providers in the SEA region will be performing emergency maintenance today, Monday 20th, 2023 at 22:45 UTC. The maintenance window is 5 hours. We are adding additional server capacity elsewhere in the SEA region but Machines and Apps with volumes will not be automatically migrated and will experience a service interruption during this time period.

Based on discussion with the provider we’re hopeful the communicated 5 hour window is overly cautious. It’s likely we’re only going to lose this subset of hosts for less than an hour. Regardless of 5 minutes or 5 hours, we know this sucks and we’re taking steps to make it less painful.

Right now we’re adding capacity in a separate SEA datacenter so that we can drain applications from the affected servers and bring them back on new servers. We don’t have a way to do this automatically for Machines and Apps that use volumes yet. Around 30% of the applications in this region have volumes attached and are likely to be impacted.

kurt · March 20, 2023, 9:26pm

Since this is the forum, we can speculate a little. We got a heads up about emergency power maintenance a few hours ago. There are some interesting details here: first, the generators aren’t going to kick in. Whatever they need to fix is between the generators and the actual hardware.

Given the huge disruption (this is not a tiny facility, this is wildly disruptive for everyone using it), we think they failed an IR test or some other diagnostic that made them think they’re at high risk of a fire. A five hour maintenance window with is probably preferable to a fire, all things considered.

kaygee · March 20, 2023, 9:39pm

Thanks for the update. If my only Postgres volumes for a very low traffic app were in the SEA region and my Phoenix app failed over to SJC but none of my volumes can be connected to, would it help to try to restore in a different region from a snapshot or would the snapshots also be in SEA?

Realtime edit: My app stopped throwing Postgrex connection errors at 2023-03-20T21:29:44Z so maybe it is fixed?

kurt · March 20, 2023, 9:40pm

You should be able to restore from a snapshot! Those are stored externally and generally remain available if there’s a disaster in a single region.

ian11 · March 20, 2023, 11:50pm

Is there a way to use internal network during this outage?

Also, it would be nice if fly sent out emails about outages.

DAlperin · March 21, 2023, 12:20am

Internal networking in other regions should be unaffected. SEA might experience some interruptions if your app is physically on a host in the affected datacenter.

sosedoff · March 21, 2023, 1:14am

Just saw the status flash in the dashboard “We’re addressing an incident that affects one or more of your apps.”, is this a generic messaging or does it really apply to some of the apps? AFAIK i’m only using ORD region and was surprised to see SEA. Same status update is listed on the app page, which has nothing to do with SEA region.

nathanw · March 21, 2023, 1:28am

Hey @sosedoff, thanks for the feedback.

You’re right, that language is a bit confusing. Incidents on our public status page (https://status.flyio.net/) are appearing in the UI now.

The maintenance incident specifically that you’re referring to was very region specific. If you’re in ORD only, then you’re not affected and that message is probably causing more confusion than it’s solving. We’ll iterate on some ideas to make that messaging more clear, especially for regional issues.

system · March 28, 2023, 1:29am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Emergency maintenance for over 24 hours? postgres , volumes	9	77	October 23, 2024
Postgres database apps are crashing again	22	1185	October 25, 2022
2024-04-30 Upstream Emergency Maintenance in DEN	3	113	May 8, 2024
app is currently down for maintenance	14	535	June 22, 2023
Postgres did not restart after EWR outage Questions / Help	0	168	October 30, 2022

Regional emergency power maintenance in SEA

Related topics