There seems to be an outage with TLS termination

Some (but not all) fly apps return SSL_ERROR_ACCESS_DENIED_ALERT when i try to access them (from sydney).

This also seems to be happening to the CLI

I am seeing the same from here in London, hitting an instance in FRA, and I’m also unable to log in to the Fly web panel.

It’s related to TLS termination. Plain HTTP traffic is going through okay.

We’re on it. We’re experiencing a serious network disruption that’s causing issues connecting to our certificate store and other problems. We’re attempting workarounds and will keep our status page updated.

Checking logs on are (crashed) applications returns

Error Post "https://api.fly.io/graphql": unexpected EOF 

not able to login in cli too. Our production site is down!

it does seem like some major network outage, seems like a fair few non-fly services are down

OVH is having a global outage. Our log servers (and Vault, and Nomad) live on OVH.

absolute genius move from ovh putting their statuspage on their own network

That would do it. I can’t even reach OVH’s status page at the moment. http://status.ovh.com/

Thanks for the update.

Edit: Reddit has noticed OVH outage too: Full OVH Network Outage - Can't even get to status page : ovh

That’s “industry standard” though. Facebook had the same problem a week ago with their status page being on-net.

The outage is spanning most of their datacenters, seems like. Their US datacenters don’t seem to be having issues.

Our Vault and Nomad servers are in multiple OVH datacenters in Montreal. We’ve been working on spreading those across regions (we should!), but neither of those things are built for that so it’s an interesting problem. It’s a bit frustrating for 99% of our infrastructure to be up and running and have Vault / Nomad as central points of failure like this.

We use Consul heavily too and just launched servers in other regions a few days ago.

Their US stuff seems to be out for me as well. Weird.

Oh you’re right, it’s their VIN region that’s still working. But maintenance there seemed to break everything else: https://twitter.com/olesovhcom/status/1448196879020433409

We’re seeing connectivity return. It’s going to take us a bit to clean up once networking is restored, but not too long!

3 Likes

Looks like we’re at least partially back in Sydney

Every service from fly is down for us. Our production servers are down and not even able to access fly.io

How long until everything is restored? @kurt

We’re working to get things restored as fast as we can.

3 Likes

Thank you and the team for your efforts :pray:

Vault is taking longer to recover than we’d hoped. We use vault for certificates and application secrets, so apps can’t boot until it recovers. This is affecting our own apps, as well.

1 Like

Vault is back, apps should be recovering. Our API is still down but we’re working on that as well.

4 Likes