Assuming Cloud Roles on Fly.io Machines

So you’ve got an app serving AI cat images generated from the weather forecast running as an ECS task in us-east-1 but customers in Australia and Europe have been complaining that the latency is too high. You quickly realise that this is going to require replicating your ECS tasks and ECR images into ap-southeast-2 and eu-central-1 while also setting up a load balancer to direct traffic. A daunting task.

You’ve been using Fly.io to host some of your other apps, however you need to be able to read the weather forecast and training data from an S3 bucket. Accessing this outside of AWS is going to involve creating an AWS user and setting the user’s key ID/secret as fly secrets within your app. Your security team has caught wind of your plan to create an AWS user and are asking questions about rotating credentials, sharing secrets and observability of Fly.io.

Suddenly using Fly.io is no longer as easy as running fly launch and it might actually be easier to setup your app in multiple regions in AWS. However you’ve just heard that Fly.io has launched an OIDC provider for machines with a slick AWS integration. You can now access AWS services from your Fly.io machines as easily as setting the AWS_ROLE_ARN and configuring Fly.io as an identity provider in your AWS account.

Note: You can also authenticate against other OIDC providers like Azure, Google Cloud Platform, and HashiCorp Vault we’re just not doing any magic for them yet.

How to Read from an S3 Bucket

Reading object from S3 in your Fly.io machines is as easy as:

  1. Creating a Fly.io app using fly launch in the same place as your Dockerfile
  2. Find the slug for your organisation with fly orgs list
  3. Create an OpenID Connect Identity provider in your AWS account with these settings (This only has to be done once per org):
Provider URL: https://oidc.fly.io/<org-slug>
Audience: sts.amazonaws.com
  1. Create an IAM Role when choosing the trusted entity select: Web Identity -> Identity Provider -> oidc.fly.io and select the AmazonS3ReadOnlyAccess policy.
  2. Set the AWS_ROLE_ARN as an environment variable in your fly.toml and the AWS SDK will know how to do the rest.
[env]
  AWS_ROLE_ARN = "arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME>"
  1. Deploy your app fly deploy --region ams,iad,syd
  2. Clone it into your other 2 regions fly clone -r ams && fly clone -r syd

NOTE: If you’re setting this up for your personal org the slug is a little harder to find but you can find it in the url when you click on “Apps” on the dashboard.

How does this work?

How does this black magic work you ask? We’ve just rolled out a new endpoint in our machines api which mints OpenID Connect (OIDC) tokens for machines. You can grab one right now through the unix socket at /.fly/api by running
curl --unix-socket /.fly/api -X POST "http://localhost/v1/tokens/oidc" in any of your machines and get a JWT with the following claims:

{
    "app_id": "3179105",
    "app_name": "app-name",
    "aud": "https://fly.io/<org-slug>",
    "exp": 1712615659,
    "iat": 1712615059,
    "image": "image:latest",
    "image_digest": "sha256:48933d82921c947df7858e84b841046bd7352b39b4705a00cf458704b73e46bf",
    "iss": "https://oidc.fly.io/<org-slug>",
    "jti": "f2cfdee1-becc-4d32-bcb9-af07a2b041a5",
    "machine_id": "e286534eb706e8",
    "machine_name": "machine-name",
    "machine_version": "01HTZWZD0V5FYBCE1PGYKSD3YX",
    "nbf": 1712615059,
    "org_id": "288875",
    "org_name": "org-slug",
    "region": "sea",
    "sub": "org-slug:app-name:machine-name"
}

This token can then be used to authenticate against any 3rd party which supports OIDC tokens. Third parties verify it’s contents against the OpenID configuration at https://oidc.fly.io/<org-slug>/.well-known/openid-configuration. The keen eye’d of your might have noticed our decision to include the org name in the issuer, this is a hardening feature offered at a premium on other platforms we’ve included by default.

We’ve also baked some extra magic into this if you’re using this to access AWS services. Nothing too fancy though, we write the token to a file at /.fly/oidc_token every 9 minutes to keep it fresh and set the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_SESSION_NAME environment variables. The AWS SDK handles the rest!

Why Use OIDC?

You might have also noticed that this is a feature also provided by GitHub actions. To summarise their documentation:

Using OIDC tokens allow for good security practices like:

  • No Long-Lived Secrets: You don’t need to add your cloud credentials as long-lived secrets. Instead you can configure an OIDC trust to allow your Machines to request short-lived access tokens.
  • Authentication(AuthN) and Authorization(AuthZ) management: You have more granular control over how machines can use credentials using your cloud providers authN and authZ tooling.
  • Rotating Credentials: Our tokens are only valid for 10 minutes before they expire, ensuring these and your cloud credentials are rotated frequently.

Customizing token claims

Currently you can customize the audience (aud) claim of your tokens by providing a value for aud in the body of the POST request. This lets you specify the recipient of the token. This is what it looks like in a curl request.

curl --unix-socket /.fly/api -X POST "http://localhost/v1/tokens/oidc" --data '{"aud":"sts.amazonaws.com"}'

The sub claim follows the format org-slug:app-name:machine-name and is not customizable but if you’ve got a particular use case or there’s another claim you’d like to see added we’d love to hear about it!

Creating AWS Trust Policies

You can leverage the trust policy of an IAM role to restrict which machines in your organisation can assume a role in AWS. For example the following policy would only allow machines in the app foo-bar within your org to assume the role it’s attached to.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::012345678910:oidc-provider/oidc.fly.io/<org-slug>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
		  "StringEquals": {
			"oidc.fly.io/<org-slug>:aud": "sts.amazonaws.com",
		  },
		  "StringLike": {
			"oidc.fly.io/<org-slug>:sub": "org-slug:foo-bar:*"
		  }
		}
    }
  ]
}

If anyone would like to see something similar for any other 3rd parties like GCP or Azure. I’d love to hear what your use cases look like!

20 Likes

I know it’s bad form for me to comment on our own stuff here, but I had nothing to do with any of this so I feel comfortable saying out loud: this is so fucking cool.

3 Likes

I agree!

So my question is gonna be, can we use this to access fly.io APIs? Would be pretty cool to no longer need to manually manage API keys for fly.

1 Like

What’s the use case you’re thinking about here? Being able to run flyctl commands from within a machine without needing to pass in an API key through fly secrets?

That, being able to use the machines API. I’m also thinking that it’ll open support for using OIDC in github actions for deployment.

Why not leverage IAM RolesAnywhere?

Can we use this instead of a pgbouncer ec2 instance to connect to RDS in a private network?

We’ve highlighted the AWS use case here but using OIDC tokens allows Fly.io machines to authenticate to other cloud providers as well. If you’ve got a use case for using IAM RolesAnywhere which this doesn’t solve I’d love to hear it.

I’m not familiar with pgbouncer but this only helps with assuming IAM roles and doesn’t help with access to private networks.

Being able to do this the other way and swap an OIDC token for a macaroon is something we’ve talked about but there are other features we’re focusing on for now.

That’s awesome, and I’m already using it! Kudos for everyone involved.

I think my request is related to @charsleysa’s, I’d like to run commands against fly.io from GitHub Actions, like I already do for AWS with this action GitHub - aws-actions/configure-aws-credentials: Configure AWS credential environment variables for use in other GitHub Actions..

Is there a public roadmap or something like that for this kind of features?

2 Likes

Love to hear it!

Unfortunately we don’t have public roadmaps, being able to swap OIDC tokens for Macaroons is definitely on our radar we’ve just got other macaroon’s features we’d like to roll out first!

1 Like

@moss/@thomas this is awesome and I’m starting to integrate it.

I do have one major question, does the value of AWS_ROLE_ARN absolutely needs to baked into fly.toml?

Reason I ask is that we have the same fly.toml file for multiple environments (staging/prod) app versions, and thus we have multiple roles per environment too.

We target each app in pipelines by appending the -a staging-app to the fly deploy command and this works great for our use case.

I’ve added [env] AWS_ROLE_ARN = "" to fly.toml and am hoping that passing -e AWS_ROLE_ARN=${{ vars.AWS_ROLE_ARN }} on the command would suffice :raised_hands:

About to commit and test on staging, which me luck :sweat_smile: but if you could confirm as soon as possible that would be great.

Thank you,

This does indeed appear to be working, at least I can see /.fly/oidc_token when ssh’ing to the running container.

But I’m getting operation error SQS: ReceiveMessage, get identity: get credentials: failed to refresh cached credentials, failed to retrieve jwt from provide source, unable to read file at /.fly/oidc_token: open /.fly/oidc_token: no such file or directory

which seems to be a permission issue.

Really weird, or I’m missing something basic! Permissions hey :sweat_smile:

My app’s process, go executable with aws-sdk-go-v2, is running as root:

329 root 0:00 /app/bin serve

The executable is owned by root:

-rwxr-xr-x 1 root root 36233888 Sep 6 17:21 bin

And the directory:

drwxr-xr-x 1 root root 4096 Sep 6 17:36 app

And so is the token:

-rw-r--r-- 1 root root 1538 Sep 6 17:54 oidc_token

Any suggestions/thoughts?

Thank you

got it! takes a tick or two for the file to actually be available:

2024-09-06T21:43:56.839 app[XXX] lhr [info] [#1] aws fly token error: stat /.fly/oidc_token: no such file or directory

2024-09-06T21:43:56.847 runner[XXX] lhr [info] Machine created and started in 5.175s

2024-09-06T21:43:58.241 app[XXX] lhr [info] [#2] fly token: &os.fileStat

It would be nice if the steps:

init detects AWS_ROLE_ARN is set as an environment variable.
init sends a request to /v1/tokens/oidc via /.api/proxy.
init writes the response to /.fly/oidc_token.
init sets AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_SESSION_NAME.
The entrypoint boots, and (say) runs aws s3 get-object.

on AWS without Access Keys · The Fly Blog could give that indication :sweat_smile:

I have a prepared minimum reproducible app if it would help.

Maybe I’m missing something, but I cannot seem to create an OpenID connect provider in AWS without a “thumbprint list”; is there supposed to be one of those documented here, and I just skimmed past it somehow?

Screenshot? I don’t recall needing anything other than the fly provider URL.

Getting error: “2024-10-14T20:16:38.611 app[328714d1c3d368] sea [info] ERROR: Error: 2 UNKNOWN: Getting metadata from plugin failed with error: The file at /.fly/api does not exist, or it is not a file.”

Even though I can access via SSH using:

curl --unix-socket /.fly/api -X POST “http://localhost/v1/tokens/oidc” --data ‘{“aud”:“https://sts.googleapis.com”}’

Am I doing something wrong?

I am using linked organisations. Should I be setting up the provider in IAM with the top level organisation (the billing org, which has no apps), or will I need to set it up for each of the organisations I want apps in to have access to AWS?