Fly KMS

For the past year or so, we’ve operated physically-segregated hardware to support security functions like Fly Macaroon tokens, and OIDC/OAuth tokens. We do this because we know that at any moment there could be a new AMD or Intel microarchitectural vulnerability, and we want customer secrets isolated as much as possible from customer workloads. We love you, everyone running code with Fly Machines, we really do. But we don’t trust you.

We’ve used that same hardware-segregated cluster to roll out a big internal change, one we should write about soon.

Since 2020, we’ve used Hashicorp Vault to store customer app secrets. That means that when you set a secret, like your database DSN or API key, we’ve hot-potatoed it off our API servers as quickly as possible to a segregated cluster of Vault servers, and arranged for those secrets to be available to our infrastructure only on physical servers your app actually runs on.

Vault is great. No notes. Recommend without reservations.

But we’ve pushed its limits. Vault is designed to manage secrets for a single enterprise, and we run half a million different applications, on six continents. So we replaced it, with a system we call Pet Sematary (all the other cool crypt, coffin, vault, and sepulcher names were taken).

For many months now, your secrets have been maintained both in our Vault cluster and in “Petsem”, which means that if Vault experiences disruption, we’re fine, because Petsem has its back. This is a change you haven’t needed to know about. But this next bit, you will.

Today, Petsem is available only to our internal tooling. But soon, you’ll have access to it too. This is a Fresh Produce post about Fly KMS.

If you’re familiar with AWS or GCP, you know where this is going, but still: hold on to your butts.

Fly KMS makes it easy to encrypt and decrypt arbitrary blobs of data. Obvious examples: columns in a Postgres database, or uploaded files from users. Think of it as PGP, but for applications, easy to use, and with actually good cryptography.

The keys for Fly KMS are stored on isolated hardware, inside of Pet Semetary. Once they’re set, they never leave Petsem. You encrypt something with Fly KMS, you get a ciphertext blob. It will never (say never) be possible to decrypt that blob outside of Fly.io.

Fly KMS exposes just a few simple operations. We did this deliberately, so you don’t have to think about algorithms and block cipher modes. When this rolls out (shortly), it’ll support:

  • Authenticating and verifying data with a private signing key (currently using NaCL’s “auth” primitives).
  • Encrypting and decrypting blobs with a private encryption key (currently using NaCL’s “secretbox” primitives).
  • Fly.io manages which primitives to use for encryption and signing keys and will pick the latest and greatest each time a new key is generated.

Authenticate or encrypt blobs of data. Just like every other cloud KMS (though: we have good taste in cryptography). Yadda yadda yadda.

Stop! Hold your yaddas. Here comes the fun bit.

What we don’t like about the idea of building a KMS is that it’s yet another API that you have to pull down libraries for and integrate into your app. If you were going to do that, why not just install and run Hashicorp Vault? We needed to do better. Here’s what we came up with:

Fly KMS is exposed directly as a Linux filesystem. You can drive it from a shell script. You can drive it from a shellscript without installing any extra tooling. If KMS-style cryptography had been understood in 1976 when the Lions Commentary on 6th Edition Unix was published, this is what it would have looked like:

You app starts, and /.fly/kms is mounted automatically, with a view of available keys. Now, Fly KMS is new, so you don’t have any of those. Create one with flyctl:

customer$ flyctl secrets keys gen encrypting myencrkey
Setting myencrkeyv0 encrypting (nacl_secretbox)

customer$ flyctl secrets keys ls
LABEL      	NAME     	VERSION	TYPE                        
myencrkeyv0	myencrkey	0      	encrypting (nacl_secretbox)	

Now, ls /.fly/kms:

appmachine# ls /.fly/kms
myencrkey

appmachine# ls /.fly/kms/myencrkey
decr  encr  info

appmachine# find /.fly/kms |xargs ls -ld
dr-xr-x--- 2 root root 0 Sep 13 08:50 /.fly/kms
dr-xr-x--- 2 root root 0 Sep 13 09:32 /.fly/kms/myencrkey
-rw-rw---- 1 root root 0 Sep 13 09:32 /.fly/kms/myencrkey/decr
-rw-rw---- 1 root root 0 Sep 13 09:32 /.fly/kms/myencrkey/encr
-rw-rw---- 1 root root 0 Sep 13 09:32 /.fly/kms/myencrkey/info

appmachine# cat /.fly/kms/myencrkey/info
label: myencrkey
type: encrypting
ops: decr encr info
latest version: 0
version 0: label=myencrkeyv0 secrettype=nacl_secretbox

Want to encrypt the string hello world ? Write it (probably base64’d) to /.fly/kms/myxaeskey/encr . Read the ciphertext back out. Decrypt it? Write the ciphertext to decr , read the plaintext out.

Behind the scenes, these filesystem endpoints proxy to calls, authenticated through flyd with Macaroon tokens, to Pet Sematary. But you don’t need to know anything about that.

All our keys, for all our operations, are versioned. Team member leaves the team? Rotate all your keys: just generate a new key with the same flyctl command:

customer$ flyctl secrets keys gen encrypting myencrkey
Setting myencrkeyv1 encrypting (nacl_secretbox)

customer$ flcytl secrets keys ls
LABEL      	NAME     	VERSION	TYPE                        
myencrkeyv0	myencrkey	0      	encrypting (nacl_secretbox)	
myencrkeyv1	myencrkey	1      	encrypting (nacl_secretbox)

The new key will be available for use in the machine within minutes of being created. Since the old key version is still available, decryption of data previous encrypted with it is still possible, but all new encryption operations will use the new key version. When you no longer need the old version, you just delete it:

customer$ flyctl secrets keys rm myencrkeyv0
? delete secrets key myencrkeyv0? Yes
Deleted myencrkeyv0

You can create lots of keys, for different purposes, and they’ll all show up dynamically on your Machine filesystem. Driving this API from Typescript, Elixir, Rails, or Python is a snap.

Some of the details of this API are going to change within the next week! We ran this design past people who have built other KMS schemes and got amazing feedback. We’ll be generating ciphertexts with key-ids, easy to grep for, that make it apparent from the ciphertext which keys and algorithms were used. We’re abstracting away a lot of the details of the algorithms we’re using (under the hood, this will let us ratchet up security without you having to know or care). We may simplify the operations we expose.

But all this stuff has been up and running in our staging environment for a couple weeks, and it’s past time for us to let you know about it. Have you ever needed to encrypt something in a Fly app? Tell us about it. If Fly KMS doesn’t work for it, we want to know.

19 Likes

If nobody tells us they ever want to encrypt anything from a Fly App we’re going to put this project on ice and lock Tim in a basement (after shooting me out of a cannon for encouraging the team to design and build this). Give us some feedback!

5 Likes

This sounds like one of those corporate team building exercises :slight_smile:

You mentioned signing keys but there doesn’t seem to be any API details for creating keys, extracting the public key, signing data, and verifying data.

Will importing keys be supported?

Hashicorp Vault supports generating exportable keys, will Fly KMS offer this?

Are there any plans to offer AWS KMS or Hashicorp Vault API endpoints so that existing tooling / third party tooling doesn’t need changes to support Fly KMS?

Will certificate generation be supported?

How do you allow multiple apps to use the key?

3 Likes

The filesystem abstraction seems really cool.:ok_hand:

I have integrated with AWS KMS for envelope encryption just a few days ago for my app (Skybear.NET). But, I moved part of my app to Fly earlier this week, so this might make certain things easier since I do encryption for most data flowing around.

Do the encryptions count against bandwidth?

I imagine if it’s within the same region no, according to the pricing docs, but it’s not clear where the KMS lives in.

Not if we can possibly avoid it. We’d like to hear more about real use cases, things actually likely to be deployed on Fly.io in the next (say) 6 months, that holding the line on this would constrain.

This depends on answers to the other questions.

Also another question I forgot to include is what algorithms will be supported? This is is especially important for signing keys.

We’ll document the algorithm/construction we use for each use case, but we don’t plan on making them selectable. I do want to hear about use cases this decision breaks, even given what I’m about to say, but I’ll say this here: right off the bat, if you’re working with a regs compliance requirement for encryption that has fussy algorithm requirements, KMS probably isn’t going to work for you. We’re neutral-to-hostile on FIPS, for instance.

This does look really nice overall, :black_cat:

Having this feature be root-only might be a little inconvenient, since many frameworks are running as UID 1000, lately.

Is this because of a race?

appmachine# cd /.fly/kms/myxaeskey
appmachine# echo squeamish ossifrage > encr
appmachine# cat encr  # what if another thread/process has done
                      #  an `echo` of its own in the interim?

(I think Plan 9 tends to use a clone file instead, so that each client gets a separate encr/1, encr/2, etc.)

1 Like

(I forgot to say “thank you” for this by the way; thank you!)

i guess this applies to both people who precede me in this comment chain, i am bad at message boards

This sounds like a great feature. I’m running a document signing app on fly (heysignthis.com). I’m planning on adding a feature that lets users verify the digital signature for a signed document. This means that I need to either generate a signing certificate for each tenant or allow the user to upload their own, and then store it somewhere secure. Instead of AWS secrets manager I’d rather use this. It sounds like fly KMS would be a more convenient (and possibly more secure) option. Any thoughts on pricing, or is it too early yet?

It is not too early. I’m going to taunt @kurt about pricing until he shows up to talk about it. But like: KMS is not what makes this company viable, so I don’t think we’re looking to rip your face off with pricing. In particular: it is way more important to get people playing with this, so I think we’re willing to do just about any reasonable thing to de-risk exploration and early dev work with this feature.

2 Likes

The filesystem supports chmod and chown, so the admin can delegate. The permissions are not persistent across machine restarts, so any delegation would have to be part of the machine startup (at privilege). Pick your uid and gid and file permissions as fits your app.

Having state maintained in the file descriptor means that we can clean it up as soon as the file descriptor is dropped. The actual process is fd = open("/.fly/kms/myencrkey/encr"); write(fd, plaintext); ciphertext = read(fd). So each client gets their own state and wont interfere with concurrent operations (so long as they dont share the same fd).

3 Likes

The tricky part with this is it limits use cases where more than one algorithm is used.

In particular, for signing algorithms, support for Ed25519 is a must but also for regulatory support P-256 (and even P-384) would probably be required for many others. We use Ed25519 and being forced to use something else by Fly KMS would be a non-starter.

We’re Ed25519 right now; I don’t think P-curves are in the cards for us, even in this post-complete-additional-laws-for-P-curves world we’re in today.

Have an immediate use case for this via Elixir - can we connect via email or DM @thomas @Tim_Newsham @kurt ?

Oh, awesome, I have an immediate use for this! (Keep track of tigris tokens for app-managed buckets.)

Two quick answers:

  • Sending data to the KMS is the equivalent of in-region egress, so $0.00/gb
  • When we start charging for this, it will be similar pricing to what AWS and Google Cloud charge for their KMS services. I’d like to simplify the pricing model a bit though, if possible.
1 Like

I think the way to do this like Tim mentions in #12 goes like:

/.fly/kms/myxaeskey # exec 3<>/.fly/kms/myxaeskey
/.fly/kms/myxaeskey # echo text-to-encrypt >&3
/.fly/kms/myxaeskey # cat <&3
fkms:myxaeskeyv0:k9cFWcrScffa0GqMgRLwPCnrJ/98EsGYV4WcVl7GdO87n26qbFTe/YZTooTjR2Tn5uA+5hBduaY=
/.fly/kms/myxaeskey # exec 3>&-

or you can do it in a single line with a subshell:

/.fly/kms/myxaeskey # ENCR=$( exec 3<>encr && echo text-to-encrypt >&3 && cat <&3 )
/.fly/kms/myxaeskey # echo $ENCR
fkms:myxaeskeyv0:RpIed8/06VtAqRNEdamPz+JTjwP7NY75Ywg1gpL17wLkehmx2Ixpc7LGph/cdna1kyZm7iMky9A=

Thanks for the detailed clarifications, @dangra and @Tim_Newsham… Overall, I do like the idea of having a pseudo-filesystem API for this, but this specific one seems maybe more brittle than what I would personally prefer for encryption (in particular).

On the other hand, I’m surely no weighty authority and don’t have a pressing need for KMS on Fly myself, either way, and it sounds like you’re still in the middle of incorporating API feedback from people who actually are cryptography engineering experts.

Maybe there could be a return to the forum with the new API for everyone’s feedback before fully nailing everything down, though?

A few ideas of things that might be good to specify explicitly at that point…

  • Whether this is only intended for small blocks—or can handle huge streams.
  • How exacting the choreography between >&3 and <&3 (or the real programming language’s equivalents) is.
    • What sort of timeout assumptions there are between writes and matching reads.
    • Whether this plays nicely with split writes, EAGAIN, and the like.
    • How wedged does this get if someone forgets about stdio buffering.
  • Whether \n is magic in the plaintext and/or ciphertext.
    • How the encryption phase recognizes the definite, absolute end of the plaintext input.
    • Whether the decryption phase can detect truncated ciphertexts.
    • How we can detect truncated incoming ciphertexts, in the absence of keys.
  • How errors in general are reported.

(Some of these might fall under “obviously this is modern” authentication, etc., but few people know what that precise standard really is…)

In general, although it’s true that I don’t want to think about algorithms and cipher modes, I do want to think about error-handling branches, packet structure, :black_cat:, unknowingly shortened wrapped keys, …

2 Likes

Its intended for small to medium size encryption. The ciphers we are using are not intended for bulk encryption and the kmsfs does not support streaming.

You open the file, make one or more writes to write the plaintext, and use one or more reads to read the ciphertext, until EOF. You can then do another write/read cycle for another request.

There is no timeout.

You can make as many writes as you wish to form your ciphertext input. You will not get EAGAIN.

you’ll need to flush all your writes before doing a read or you will get bad results.

Nope.

A read indicates that all inputs have been provided.

I don’t understand the question.

read will return an error with errno. After this reading until EOF will return an error message. So there’s a pair (errno errmsg) delivered for errors.

1 Like