Host statuses for Machines and volumes added to the API and flyctl

If you’ve deployed an app, used volumes, or read the community forum, then you probably know that we strongly recommend running multiple Machines in your apps and using multiple volumes when possible. That’s because each Machine and volume is assigned to a specific host server in our fleet, and if a host has a hardware problem or is otherwise unavailable, then the resources that it hosts will also be unavailable. It also means that those resources can’t be updated or destroyed, and some information about then will be missing in the Machines API.

Building things this way makes a globally distributed cloud with fast-launching VMs possible, but in order for the arrangement to work well, you (or flyctl, or whatever orchestration system you’re using) need to know what Machines and volumes are unavailable. There hasn’t been a good way to get this information, though.

To improve this, we’ve just added a host_status field to Machines and volumes returned from the Machines API. While we may add some additional values in the future, the following three statuses are live right now:

  • ok: The Machines API contacted the resource’s host without issue. The information returned is complete. Updating or destroying the resource should work normally.
  • unreachable: The Machines API couldn’t contact the resource’s host. The information returned is incomplete. Updating or destroying the resource will probably fail.
  • unknown: Something went wrong, so we don’t know the host’s status. (Hopefully you will never see this!)

Additionally, as of flyctl v0.2.43, the fly machines list and fly volumes list commands will mark Machines and volumes on unreachable hosts for you.

Here’s a quick view of the JSON returned by the volume-list endpoint with one (fictional) volume whose host is down:

    "id": "vol_rd9kkvvz39ox01wn",
    "name": "test_vol",
    "state": "created",
    "size_gb": 1,
    "region": "ord",
    "zone": "ed69",
    "host_status": "unreachable",

And here’s how it looks in fly volumes list:

vol_rd9kkvvz39ox01wn*	created	test_vol	1GB 	ord 	ed69 	false    	           	22 hours ago	

* These volumes' hosts could not be reached.

Let us know what you think or if you have any questions!