Connecting your Fly Apps to your Tailscale tailnet

Here at Fly.io, we’re pretty big fans of Tailscale for a lot of reasons, and we especially like the simple, identity aware network access control. We’re also pretty big fans of Fly.io, and we have a whole stable of internal applications and services deployed on our platform that we need to use every day.

We found ourselves with a constant need to connect these deployments to our Tailnet, and each time we solved it a slightly different way - leading to duplicated effort and inconsistency.

So we came up with a solution that was stateless, easy to understand and maintain, as well as providing us the reliability and security that we needed to control access to our apps, so we wanted to share it with you.

Our approach

The pattern we decided on was to utilise Tailscale subnet routing to expose an organisation’s 6PN to the Tailnet, and then leverage Tailscale’s powerful ACL system to manage access to it.

This works well enough if you have a small deployment within one organisation. As you may know, all apps within an organisation are network-accessible via its 6PN, and it’s simple to share that connectivity on to the Tailnet.

But because we have a number of organisations, we took it one step further and utilised a low-key yet amazing feature of the Fly.io platform - Flycast! This allows a Fly app to securely provide internal-only, load-balanced, region-aware connectivity to a service via your Fly.io 6PN, even across organisations. As a bonus, thanks to the Tailnet and Fly.io’s 6PN, your traffic is encrypted in transit.

Here’s a fun little diagram to show how we’re putting this together:

First, the one-off setup for the infrastructure we need!

Tailscale: Define a new Tag

Define a tag in your Tailnet policy file:

{
  // (other tailnet policy file entries here)

  "tagOwners": {
    "tag:fly-router": [],
  }
}

Tailscale: Create an OAuth client

Follow the instructions for creating an OAuth client in Tailscale.

Allow the OAuth client Read and Write permissions for the Devices scope, and provide the tag:fly-router tag as created in the previous step.

This allows the OAuth client to manage devices with the fly-router tag, including creating auth keys to join devices to the Tailnet under that tag.

You can use an access key instead, but that will be tied to a user and expire after 90 days after which you will have to renew it and re-deploy.

Fly.io: Create a new organisation

$ fly orgs create tailscale-router
New organizations start on the $5/mo Hobby Plan.

Your organization tailscale-router (tailscale-router) was created successfully. Visit https://fly.io/dashboard/tailscale-router/billing to add a credit card and enable deployment.

Visit the URL and add your credit card if required.

Fly.io: Create a new application

$ fly apps create -o tailscale-router
? Choose an app name (leave blank to generate one): 
New app created: twilight-moon-8960

Fly.io: Set the OAuth key as a secret

$ fly secrets set -a twilight-moon-8960 --stage TS_AUTHKEY="tskey-client-<id><key>"

Fly.io: Deploy the Tailscale image

Use this simple fly.toml to deploy the official Tailscale container image:

app = "twilight-moon-8960"

[build]
    image = "tailscale/tailscale"

[processes]
    app = "export PATH=$PATH:/usr/local/bin; containerboot\""

[env]
    # TS_AUTHKEY set via secret
    TS_HOSTNAME = "fly-router"
    TS_EXTRA_ARGS = "--advertise-tags tag:fly-router"

    # Leave this commented for now:
    # TS_ROUTES = ""

Run fly deploy. You should see the host fly-router appear in your Tailscale device list. If you run in to trouble, start with fly logs.

Fly.io: Figure out your 6PN subnet

Run fly ips private and look at the IP that is returned, this address is in your 6PN which we want to route via Tailscale. For example:

ID            	REGION	IP                              
9829fc10a47998	syd   	fdaa:3:d3ad:beef:192:2ef7:abcf:2

From here, you can decide to route your entire 6PN (e.g. if you’re deploying this in to an existing organisation with apps you want to provide access to), or the subnet within the 6PN where services are Flycast in to:

  • Full 6PN: fdaa:3:d3ad::/48
  • Flycast only: fdaa:3:d3ad:0:1::/80

Update your fly.toml to set the TS_ROUTES environment variable to tell tailscaled which subnet it is responsible for, and then re-deploy the application.

From here on, we’ll assume you chose to expose the Flycast subnet.

Optional: Tailscale: Automatically approve routing:

Normally, when a Tailscale device boots and wants to advertise a route, you have to approve it via the console. You can optionally configure Tailscale to allow this device to advertise its route without approval:

{
  // (other tailnet policy file entries here)

  "autoApprovers": {
    "routes": {
	  "fdaa:3:d3ad:0:1::/80": ["tag:fly-router"],
	},
  }
}

That’s all the setup done! Now, for each app you want to access via Tailscale, you can perform the following steps from the app:

  1. For the app you wish to expose to your tailnet, use fly ips to release any public addresses your app may have assigned.
  2. Make sure your app has a service block that defines its services.
  3. Flycast your app in to your tailscale-router organisation and note the address:
    • fly ips allocate-v6 --private --org tailscale-router
    • nb: You currently need to be a member of both organisations to accomplish this

You’ve now Flycast’d a Fly.io app in to a subnet that is routable from Tailscale. It won’t work yet, but that’s all the Fly config required. From there, the following Tailscale configuration will need to be made by a Tailscale network admin:

  • Define an entry for the application in the hosts section of the ACL file pointing to the Flycast’d 6PN address.
  • Define an entry in the acls section of the ACL file to govern who may access the application.

Something like this:

{
  // (other tailnet policy file entries here)

  "hosts": {
    "fly-app-1": "fdaa:3:d3ad:0:1::2", // Flycast address from step 3
  },

  "acl": {
    // (other ACL entries here) 
    {
      "src":    ["group:application-users"],
      "dst":    ["fly-app-1:443"],
      "action": "accept",
    },
}

You should now be able to access your application from a Tailscale-connected device by using the address you were given in step 3. Set yourself a nice DNS record and you are good to go!

Extra credit

Failover and Load balancing

We currently have one Tailscale subnet router providing access to all of our apps. Fly machines are transient, so it would be nice to have some redundancy here. Easy enough!

fly scale count 2

You will now see two devices in Tailscale, fly-router and fly-router-1. Depending on your plan, these instances may be Failover or may be Load balanced, the details are in Tailscale’s documentation.

Regional Routing

Wait, did that page say Regional routing? We’re freakin’ Fly.io, of course we want to do that! Simply scale out your Tailscale router app to the regions you want to cover:

fly scale count 4 -r syd,iad,lhr,sin

And before you know it, you’re covering the world. Tailscale will use its DERP subsystem to route a client’s traffic to the nearest subnet router, and Fly.io’s Flycast will route from that Tailscale router to the nearest application instance. Hot stuff.


Some fixes and features we want to allow for down the line:

  • Currently uses userspace networking, find out why TS_USERSPACE=0 fails on Fly machines
  • An easy way to provision DNS and TLS certificates - can’t use MagicDNS or tailscale cert for subnet-routed services.
  • Construct a Macaroon that would allow users of specific orgs (and only those users) to Flycast services in to the tailscale-router organisation.

Hope this comes in handy for those that heavily rely on Fly.io and Tailscale! Happy to answer any questions about this approach, or where the other ones fell short for us. Let us know if there’s any other Tailscale tips or tricks that have worked for you!

15 Likes

Thanks for taking the time to write this. Unfortunately, this line does not work:

[processes]
    app = "export PATH=$PATH:/usr/local/bin; containerboot\""

Tried different variations of this, but the fly machine won’t start with errors like containerboot: No such file or directory, and the export does not work either does 'export' exist and is it executable?

I was able to get it to work with:

[processes]
    app = "/usr/bin/env PATH=$PATH:/usr/local/bin /usr/local/bin/containerboot"
1 Like

This worked! I love you @jtdowney, was very frustrated with this :grimacing:

Apologies, yes I was testing a few things (enabling IP forwarding via sysctl) in that command line and I didn’t copy it properly.

Aside from the syntax error in the post, obviously it will need to execute in a shell environment to set the environment variable with export. The correct line was:

[processes]
    app = "sh -c \"export PATH=$PATH:/usr/local/bin; containerboot\""

Or @jtdowney’s solution will work as well. I’ll fix up the main post once I figure out how to edit it :sweat_smile:

Thanks, this works now @cs1 ! I’ve done all thats instructed, though I didn’t remove the existing public ipv4/ipv6 from the fly machine. I can ping the machine from a different tailscale enabled device. How can I establish a connection from the tailscaled fly machine (hasura) to another tailscaled machine (postgres on crunchybridge)? If I ping the crunchy machine from my fly machine, I’m not getting anything back.

No worries!

I don’t want to get too far in the weeds with Tailscale stuff here as it can get complicated - and from your description it’s not exactly what we’ve set out to do here, but I would just check that you have an appropriate ACL allowing your hasura<>crunchybridge network communication.

Were you able to get this to work? I’m having trouble reasoning how that would be possible without ruining tailscale on the machine.

Right now we run tailscale on each machine but there are some downsides and using the subnet routing gets us closer to the ideal.

Namely I’d like to not have to bundle tailscale with each deployment. Or even better, I’d rather not rely on tailscale at all for critical infrastructure.

We can use magicdns but the machine name changes each time so password managers end up being a bit worthless. The ephemeral TS nodes make it unclear which one we should connect to. I know we can clean up those after but that adds complexity to an already complex set up. If we ran a single machine, allowed for downtime, and cleaned up the ephemeral node in TS, perhaps it could work but that’s far from ideal.

Once I found the Shortcuts guide in the tailscale docs I was able to make it a little less annoying on macos. Not so much on my android phone though. Edit: missed --state=mem: flag the first time around on this. This makes cleaning up ephemeral nodes via tailscale logout much easier.

We run a PWA so a stable domain and HTTPS would mean we wouldn’t have to reinstall the staging app on each deploy.

Crunchy bridge worked well for us for our MVP but looking forward to Supabase managed solution so we don’t have to use Tailscale for that connection.

@cs1 thanks for this. Works great. Certs + DNS would make this a killer feature for us. Would happily pay for a turn key solution. I noticed a recent post from fly staff for adding DNS resolution via an env var but unclear if it’s useful in this situation.

The addition of dnsproxy here seems helpful tailscale-router/start.sh at main · fly-apps/tailscale-router · GitHub. With that I bet we could use split DNS with .flycast.

1 Like

I played around with this a bit more. Setting --state=mem: flag on tailscaled and --hostname=${FLY_APP_NAME} creates a nice and stable hostname on the tailnet. However the TTL on magicDNS is 10 minutes so unfortunately it’s not a great solution if DNS is desired.

Here’s an interesting workaround: fly proxy 8080:8080 and tailscale serve http://localhost:8080 will give you a stable hostname with https. This could work for temporary scenarios and at least allows for the installation of the PWA.

FYI since it appears this post refers to the “Legacy” permissions, turning on the “Device” permissions doesn’t really make sense anymore RE OAuth clients; for those who are wondering, the translation to the “New” permissions by looking at the list of scopes and the API endpoints they correspond to are auth_keys, devices:core, devices:posture_attributes read/write for each. With that it works!