Two or more machines, all placed on a single (failed) host?

Following on from @Elder’s comment here:

Although our app had a spare instance (just like Fly recommends), it seems Fly placed both instances on the same physical server, causing them both to go down and become unrecoverable when that physical server failed.

A couple of questions for Fly:

  1. Are you aware of situations now, or previously (earlier flyctl versions, Fly’s v1 > v2 migrations, etc), whereby an app with two or more machines would (or maybe still could) end up being provisioned on the same host?

  2. Are you able to find out how many apps with multiple machines are currently provisioned on a single host (and possibly alert the app owners)?

1 Like

@Whistler

They are aware.

From the docs:

Prevent downtime when there’s a single host issue

Fly.io strongly recommends running at least two Machines per app in your primary region and we have features that can help make app availability and resiliency more affordable. When possible, we place volumes on different hosts in the primary region. If one host goes down, then your app will still have a Machine and volume on a healthy host.

As you see, it’s “when possible” and only refers to instances with volumes.

When you clone a volume, you get an option --volume-requires-unique-zone (on by default), to ensure new one is placed on a separate hardware. There is no such option for instances without volumes, which is what I had.

I assume to improve the reliability, one needs to attach a volume, even if it’s not needed, or set another instance in another region. Which is not so great.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.