Rough edges for multi-app services

First note:

I hope this post helps the fly team understand two things: how badly I want to use fly and also how hard i find it to “do things right” while using fly for multi-app services

I really like fly. A lot! I’ve written a few “single app” service on fly and they just work. It’s glorious.

But I find fly very (very!) hard to use for multi-app services. I’ve tried three separate times so far, and each time it’s been brutal.

I’ve written a lot of types of automation for my fly apps: shell scripts, cuelang, terraform, even a custom golang alternative to flyctl called ensure-fly-app.

Each time I try to set up a “multi app” service on fly things get so painfully difficult that I contemplate either switching to another provider like Render, or giving up on the future entirely and going to back to the dark ages of AWS.

Here are some themes that come up:

  • debugging containers they die/restart
    A “container exit code” would help enormously here. As would a way to view logs that makes it very, very clear which lines came from which vms, if that vm started recently, and which release that vm is based on.
  • creating and initializing postgres users, databases, schemas, extensions
    Software like Zulip cares about the database being “ready” for migrations to run. This means initializing extensions, schemas, and other “one time” things. There’s no obvious way to make that happen with the fly postgres paradigm.
  • setting secrets for one app that another app needs to know
    Secrets are used for accessing things like redis, postgres, and rabbitmq. In each of these cases we need two different apps to agree on what the value of a secret is. It’s challenging to set this up and keep these values in sync. Rotating a secret should be trivial and automated. AFAICT, that’s not the case for general secrets or postgres credentials.
  • scripts that try to setup/deploy everything from scratch including secrets, volumes and postgres state
    Bugs like Setting secrets before the first deployment does nothing - #11 by tgautier make this even harder

Closing the gap

I have a general goal of running a single (non-interactive) program to operate an entire set of apps that work together to provide some service. This program is for both creating ephemeral environments to develop/test against and deploying to the production environment when it’s time for that.

This doesn’t feel like a program I should be writing/testing/maintaining as a user. In fact, the act of writing it feels decidedly un-fly.

2 Likes

This is good feedback.

First, you can find this:

A “container exit code” would help enormously here.

If you run fly status --all, you’ll see a list of active and recently stopped/failed VMs. fly vm status <id> will show you exit codes.


I have a general goal of running a single (non-interactive) program to operate an entire set of apps that work together to provide some service. This program is for both creating ephemeral environments to develop/test against and deploying to the production environment when it’s time for that.

The fly apps platform is not designed for multi service apps. Fly Machines should get us closer, but they’re very low level.

If you’re up for doing a bunch of dirty work, you can get close to what you want with the Machines Terraform provider.

Right now, we’re a pretty good place to run a monolith close to your users. I think more advanced deployment is a ways off.

I haven’t shown this to anyone yet, but we’re working on anti-roadmap. One thing you’ll see on there is “Heroku Pipelines Alternative”, which is a portion of what you’re describing.

We think the best way to solve this is to have company to solve this with us (like we’re doing with Upstash for Redis).

If there are companies doing what you’re looking for, send them my way!

I appreciate you struggling through this a few times. You’ve probably put more energy into this than we deserve. :slight_smile:

2 Likes

The machines terraform provider seems a bit raw for my needs still (e.g. postgres support yet), but I’m definitely looking forward to it!

I don’t really want heroku pipelines as much as a stable set of primitives that make multi-app deployments as easy to reason about as single-app ones.

After doing this a few times, there are a clearly a set of “missing primitives” that would get pretty far. They are the ability to:

  • set secrets that apply to multiple apps (and that cause multiple apps to restart when they’re changed/removed)
  • “generate” values for multi-app secrets
  • generate postgres user passwords and set a secret with the value
  • update postgres user passwords
  • “declare” a set of volumes that are expected to exist, along with minimum sizes
  • “declare” a “minimum” postgres configuration (users, database, access-grants)

Here’s what I put together for deploying zulip on fly: fly-deploy.sh · GitHub

How it works:

  1. The fly-deploy.sh script evaluates expressions from a services.cue file.
  2. It runs various flyctl commands to determine what the state of the world is and then evaluates more services.cue expressions to create needed fly.toml files for flyctl deploy.
  3. It does this for a specific set of apps, though it could probably be modified to loop over the list of declared apps in services.cue.
2 Likes

I should also add that the reason I don’t need a Heroku pipelines type thing for fly is that I dynamically generate (and commit/push) a GitHub Actions workflow.yml file during CI. This provides enough flexibility and control to work for lots of end-to-end testing and “review app” type setups.