RFC: Terraforming Rails on fly.io

Preface: the move from a higher level nomad CLI to a comparatively lower level machines API creates a need for scripting.

Most of the machines API is documented as curl commands. This maps directly to Ruby’s Net::HTTP. In other cases it may be worthwhile to shell out to CLI commands. This, too, is straightforward in Ruby scripts.

For those that don’t know me, I’m a big fan of Cunningham’s Law. I’m new to fly, and just got introduced to Terraform yesterday so it is quite likely I am missing a number of obvious things. Don’t be shy if you spot something, the goal is to make the best Rails developer experience.

Finally, everything below is subject to change. Don’t take any of it as a product commitment.


Survey of the current state

The flyctl app scanner support for templates is rather primitive, some modest improvements can be made by making use of go’s text/templates. flyctl’s support for machines is still experimental, resulting at times in results like incomplete fly.toml files.

There are also things one can’t do today. For example, running rails db:migrate on a sqlite3 database mounted on a volume during release time.

There are also things that are possible, but not necessary facilitated, by the current flyctl. For example, local docker builds which would unlock the ability to have secrets available during assets:precompile which is a common issue for people migrating from other hosting providers.

There are things that are possible, but hard today. For example, dynamically spinning off a new machine to process background jobs. Ideally those jobs should use the same image, but that information is not available to the deployed machine.


A different approach

Terraform machines approach is different. See Getting started with Machines and Terraform. A text file describes your entire network. This text file can be both read and updated by scripts.

A possible future would look something like this:

bundle add fly.io-rails
fly login
bin/rails generate terraform
bin/rails deploy

Note: generate terraform would (at least initially) also create a Dockerfile. Doing so could make use of RUBY_VERSION and Bundler::VERSION instead of the more indirect methods of parsing Gemfiles and the like.

It probably should also generate a minimal fly.toml file consisting only of a app = line as this is referenced by a number of flyctl commands. Over time, perhaps flyctl could be changed to parsing main.tf if fly.toml is not present.

Getting this up and running could be done incrementally. A rough plan is sketched out below:


Stage 0 : local builds

bin/rails deploy could be a rake task.

Adapting Simon Willison’s excellent example, the steps such a task would perform would be:

  • Generate image-label using ulid-ruby
  • docker build -t registry.fly.io/appname:deployment-image-label .
  • flyctl auth docker
  • docker push registry.fly.io/appname:deployment-image-label
  • Update main.tf with image_label
  • terraform apply

Note: even the most trivial rails app needs a master key, and this can be accommodated using the terraform file function in conjunction with Sensitive Input Variables, or with fly secrets. The former may be more resilient as it would be done on every deploy and using the current master.key.

Stage 1: Release

Rails apps generally require a db:migrate step to be run before deployment.

This can be done by creating and starting a machine using the same image. The current instructions don’t show how to override the ENTRYPOINT, worst case this can be done by shelling out to flyctl:

% fly machine run --help | grep ENTRYPOINT
      --entrypoint string    ENTRYPOINT replacement

This would be done after docker push and before terraform apply in the list of steps above.

Stage 2: Moar s3cR3tS?

Not sure this would be necessary, but rails tasks can access credentials, so it would be possible to match credential names against sensitive input variables and using environment variables or other approaches to pass this information onto the deployed machines.

Stage 3: Remote builds

fly deploy supports --build-only, --build-secret, and --image / --image-label. The scripts can chose either option, possibly based on the contents of a configuration file placed in the config directory of your application. Everything else should remain the same, including the injection of the image tag into the main.tf file.

At this point we can decide which is the best default. And whether the choice should be an option on the deploy command. They key is to support both equally in terms of tracking
image tag’s.

Stage 4: Multiple machines

Even applications that are not geographically distributed typically use multiple machines. Postgres and Redis are common examples. In the case where these machines may be shared by multiple applications, this reduces to the setting of a secret.

It may make sense to deploy sidekiq, for example, in a separate machine. If it is intended to use the same image, then having the deploy script do a global search and replace on the image tag would address this need.

Finally, it may make sense to dynamically create machines for infrequent background jobs. In this case it would be helpful if deploy set an environment variable containing the name of the current image.

In all, there may not be much needed to support this; the hardest problem might be getting the right ports exposed and wired together.

Conceptually, a lot of this is similar to docker-compose.yaml files. Perhaps it might make sense to have a rails task that imports such a file and converts it to a terraform file.

Stage N: fly DSL

The steps above may not be sufficient, experience will tell us more. But once things stabilize improvements may be possible.

Given the premise that main.tf and Dockerfile can be generated it should be possibleto generate such files as needed during the deploy script from configuration information and remove them immediately after they are used.

The idea is to initially cover the 80% use case where the contents of these files are largely predictable and static, with perhaps minor tweaks. Over time this can be pushed to the 95% use case with more configuration options.

Two motivations for doing this:

  • Much of the contents of these files are boilerplate and utilize a syntax that Rails developers may not be familiar with. Configuration files that focus on necessary choices using Ruby syntax may be more approachable.
  • Rails apps may or may not have assets, may or may not use node, might add features like background jobs or change databases. Having to redundantly change both the Rails configuration and the terraform configuration every time you make a change is an unnecessary chore.

Borrowing a concept from Create React App, it should be possible to eject at any time and have the necessary configuration files produced for you so you can customize as you wish.


Summary

While inevitably in one or more steps we will discover a there be dragons situation, most of the above steps are small efforts that – barring surprises – can be completed in a few days.

The key advantage of building on a lower level, more granular API, is that we are not limited by the choices made by higher level abstrations. An example of this was given up front - currently release machines can’t mount volumes. But machines that we create during deployment don’t have this limitation.

Additionally, there is another advantage to implementing Rails integration in Ruby. Not only can we take advantage of Thor and erb, this also lowers the barriers to contributions from members of the Rails community. Somebody out there will have a unique configuration giving the an itch to scratch, and this can result in pull requests.

1 Like

Disclaimer: I have one day less of experience with Terraform as Sam.

I assume this would also handle asset compilations for those who are not using importmaps?

Yes! Rails applications will be able to create more instances of themselves :smile_cat:! This will be great for getting us closer to “Scale to Zero”.

I’m curious how the production rails console UX would work with this configuration?

I’m a fan of keeping as many “unnecessary files” off of peoples machines as possible when they get started, then making it possible and obvious to bring them back when the higher-level abstractions start leaking. That requires a lot of thought around the failure modes so that super-clear error messaging can be displayed to users. TBH it probably doesn’t deserve much discussion at this point since the steps prior have to be proven out, and if they are, its still a lot of benefit.

Do you have example configuration files (or a project) folks could look at to see what the Rails Terraform file looks like that you put together?

The problem with asset compilations goes away with local builds. With a bit of work, it can be made to go away with remote builds using build secrets. We can make the latter transparent.

Unpacking this a bit: asset compilations typically involve concatenating and uglifying/compressing data - no network access required. Unfortunately, the implementation of bin/rails is that it loads your configuration, which may access AWS. It is worth reading https://github.com/rails/rails/issues/32947 which was closed as WONTFIX. This leads to a number of workarounds such as ENV SECRET_KEY_BASE 1.

If we front end the Docker build and run it locally, we can make available any secrets we want. Exporting secrets to build machines needs more care and consideration to make sure they don’t leak.

Me too! If you are running multiple machines with an app, selecting the right one may be important. For the short term fly ssh console --select might be the answer, but perhaps we can do better. A higher level script running over a lower level ssh interface could perhaps do a better job of picking the right machine.

I was expecting the Terraform file to be more complicated, but it turns out that nothing in that file will end up being specific to Rails. It is worth running through Getting started with Machines and Terraform. Basically the entire file is concerned with provisioning a given docker image on both ipv4 and ipv6 and ports 80 and 443. That’s it! The contents of the docker image is where Rails comes in.

fly_machine contains more information on how you set cpus, memory, environment variables, mounts, etc. Again, none of this is Rails specific.

The key that makes this a big win for us is the image line. We can read that. We can update that. It is not something that is determined on our behalf, or we need an API to access or update it. It is in the file.

Beta guide: Terraform · Fly Docs

Video: Terraforming Rails on fly.io - YouTube

Current status:

  • DOCKER_BUILDKIT=1 docker build . fails for me on an M1 mac with SEGV received in SEGV handler during gem install bundler. Seems to be a known problem that nobody is working on. Recommending remote builds for now.
  • Release works using fly machines API, using the SERVER_COMMAND specified in the Dockerfile.
  • Secrets are pretty orthogonal to terraform, so they are covered in the guide.
  • No special support is required to launch multiple machines - developer simply configures them in main.tf.
  • DSL remains on the todo list.
1 Like

I love seeing progress on this, thanks for putting this together @rubys.

Coincidentally I spent some time trying to get this to work on my own last weekend, but only found out this post today. I have a few questions if you don’t mind pointing me to the right direction.

  1. How can you bind certificates to this app? With fly_app being deprecated, is just using the same app name everywhere and the fly_cert hostname enough to bind it to the application?

  2. I would like to keep my infrastructure code separate from the Rails application, but it would make deployments much more challenging as I don’t know how I could build blue/green deployments with that separation.

  3. Looks like we would need more scripting to handle health checks and revert to the old image in case the deployment fails.

Maybe its just my background on k8s that is making me miss something here, but as much as I’ve tried to get this to work, I feel like the structure around the flyctl deploy is still more robust for now, having said that, I like the direction that this is going, and I am happy to help with more testing if you need.

I’m not an expert on terraform, so I’ll leave it to others to address your comments.

The default deployment option for fly.io currently is based on nomad, which works fine for many, but not all.

To address a wider set of use cases, an alternative is being developed: machines. Current status is that there is

  • a low level API which puts all orchestration decisions in your hands. This is still under active development, so it perhaps is best to consider it in beta.
  • fly deploy which will do rolling restarts, but it is somewhere between alpha and beta status

I’m continuing to explore the low level API, and hope to have a demo based on that published in a week or so.