Dynamic Machine metadata - How it's Made

If you haven’t checked out the other post yet, go do that first!

There is some (I think) kinda neat engineering that went into those new features, and I have a feeling you nerds might find it kind of interesting too. So I’m introducing a new type of fresh produce accompaniment post called “How it’s Made” where we can geek out about the technology behind the features. Not every fresh produce will have them, but we’ll see how it goes :).

We broke the Machine config mutability model (but that’s OK)

The mutability model for Machines is supposed to be: “it isn’t [mutable].” This works out pretty well. When you make a request to update a Machine by, say, changing an environment variable:

  • a new config version gets built and saved in flyd with your changed env
  • the Machine record in our state stores get pointed at the new version
  • your Machine is magically replaced with one which matches the new version spec

What this means is that generally a given Machine config version never changes once it’s created. This is great!

Until it isn’t.

Up until now if you wanted to update the metadata field you’d update it like any other part of the Machine config, and the regular update process would happen, including a VM restart.

For dynamic metadata we obviously do not want to trigger a Machine restart. So we just… broke the Machine mutability model.

The metadata update endpoints go through a separate update process which writes directly to our state stores and updates the current config directly.

Macaroons! (how we made this secure)

Macaroons are a special type of access credential which in them cary a list of “caveats” that restrict the power of the token. A macaroon starts off as all powerful, and each caveat restricts it further. For example a token that grants user A readonly access to app B in organization O, might have the caveats:

  • Must be logged in as user A
  • RWX organization O
  • R app B

The interesting thing about Macaroons is that anybody holding one can add additional caveats, further restricting what it can do. An added caveat can only reduce the scope of a token (every caveat must evaluate true, independently).

We’re gradually rolling out Macaroon tokens inside of Fly.io, and we already have an infrastructure for handling them.

The tokens we already issue always include a “must be logged in as user” caveat (this caveat means that our Macaroons are safe to pass around on insecure channels, because they’re not simple bearer tokens and can’t do anything by themselves).

But for our use case though we don’t want any “Must be logged in as user” caveats. A Machine isn’t a user. On the flip side of this, we don’t want to be issuing Macaroons without some kind of safeguard to keep them from being pure bearer tokens.

So we came up with a notion of “service tokens”.

Where ordinary authentication tokens are granted to a user, service tokens are granted to an internal Fly.io service. Like all Macaroon tokens, they’re constrained to a specific set of actions on specific resources. The Metadata service tokens, for instance, are limited to a specific application.

For features like these, we want a token we could safely inject into VMs to allow them to only update metadata, and that would also be totally safe if exfiltrated from the VM. To make this happen we invented two new caveats:

  1. A Machine Feature caveat, in this case one that says only the metadata feature
  2. A From Machine caveat, which requires that a token be used from inside a specific Machine

The From Machine caveat is kind of cool. It works because we own the whole stack, so we know exactly where requests are coming from. When you make a request to our api via _api.internal in a Machine, flaps (our per host API gateway) knows which IPv6 address the request is coming from on the host. Because we assign the IPv6 addresses, we can easily map that to a Machine ID which we can use to ensure the caveat passes.

Magic proxy

As cool as our magic token is, it didn’t feel seamless enough to just put it in an environment variable for people to use. So… we built an authenticated proxy into our init process that runs as PID 1 on every Machine. That proxy opens a unix socket at /.fly/api which you can proxy requests through. Every request that gets made through it gets the auth token added.

Questions?

I think this stuff is pretty cool, but I’m biased :wink:

Regardless let us know if you have any questions! We’d be happy to answer them.

10 Likes

Very cool! So the big question is, when can I start making my own macaroons for fly?

Also another question is, when can we start creating custom macaroon policies for machines? E.g. Changing the caveats so that the macaroon can access more than just the Metadata endpoints.

tldr: as soon as we figure out a good UX. Is there something you’re building that would benefit from this? What do you need it to do?

I think customized service tokens in Machines could get pretty complicated. We’ve had a hard time coming up with ideas for what people could build if they had these.

However, I think it might make sense to let you create your own custom Macaroon we could use to generate the service token. So maybe it’s the same problem.

1 Like

I’m picturing it being slimiar to AWS instance profiles where a machine gets AWS credentials to access services based on the instance profile.

Though that might be overkill for what I want it for, having the ability to create our own macaroons would probably be enough. The immediate use case being able to give a machine access to create / destroy apps in a particular org (not necessarily the same org). We currently use an API token that has full access to everything because the alternative workaround is creating separate fly.io accounts and thats a pain to manage.

1 Like

Hey @charsleysa sorry this got lost in the shuffle. I know you’ve seen but for anyone else looking, we did ship an answer to the org specific use case Org Scoped Tokens.

I think the biggest with letting users create their own macaroons is the UX, as kurt was saying before. Macaroons are cool to talk about, but are really an implementation detail more than anything. That means to expose the power to users we need to expose some screen/command with knobs users can turn to describe a certain level of access they want. This is a famously hard problem that if not carefully addressed ends up with something like the IAM console. If you don’t mind it would be awesome if you could keep sharing when you run into situations where you feel like being able to build your own specifically scoped access tokens would be useful to you. This would be super helpful because

  • (1) it helps us understand what the knobs of a potential future tool for this might look like
  • and (2) sometimes it might be things that are generally useful we haven’t thought of but might be able to ship within our existing token issuance framework

Adding another use case: We’d like to create readonly tokens that can

  • list all apps in the org
  • list machine resources and states
  • ideally also query metrics for each app

This information would be aggregated in an internal dashboard that only runs client-side code but could optionally pass API requests through a private CORS proxy if needed.

We use fly.io for CI preview environments, which means that we create many short-lived environments via Terraform where each environment consists of (currently) six apps.