How confident should I be about app not going down?

First, see here: Reliability: It's Not Great

You can do some work to mitigate issues in our infrastructure. Mostly it’s just a matter of avoiding moving parts:

Switch to Machine based apps: Most apps use something called Nomad. We have continuous issues with Nomad (mostly because we’re holding it wrong, today because Nomad had an operational failure we haven’t seen before). The new, Machine backed apps have far fewer moving parts. You are less likely to have an app process go away due to something in our infrastructure on Machines.

This is not the simplest switch, you currently have to create new apps to get off Nomad.

Run 2x Machines for every app you care about: Single instance apps are especially brittle in our infrastructure. This is true on Nomad AND Machines. You should run 2+ instances. This will protect from hardware level issues.


In this kind of config, an app that’s running is most likely to stay available. Deploys may still break, since there are a bunch of moving pieces when you deploy an app. Our global proxy could also still fail, and there’s no way you can mitigate that.

Paid plans won’t change anything about how our system behaves. But they will get you more direct access to engineers here. Which is helpful when you’re trying to figure out of a problem is us or you, and occasionally helpful for identifying bugs/operational issues that our monitoring hasn’t really keyed us in to yet.

3 Likes