Docker without Docker, now with containers

Introducing a new way to run programs in Machines that will look and act a lot like containers. In fact, we’re going to go ahead and call them containers because it’s the most commonly used term when discussing “process isolation”.

If you’ve ever had one (or many) of the following issues:

  • re-packaging every binary you needed into a single image and also decide which fake supervisor to use to run multiple processes
  • didn’t want one process crashing to restart the whole VM
  • needed to run as PID 1 because you’re a systemd zealot
  • bloating your image with a shell in case you needed to fly ssh console to debug issues

Our hope is the feature set we’ll be rolling out for container based Machines will make it even easier to deploy and manage your workloads on Fly.io.

We’re still working through how to update our existing tools (flyctl and UI) to understand these new container based Machines. For now, support is limited to the Machines API. It’s also good to note parts of the config and UX are still a work in progress and not ready to run any production workloads. We may need to force update any machines using containers at any point to pull in critical bug fixes.

Without making any promises on when, here’s what could be coming soon:

  • http, tcp, and exec based health checks used for restarting an unhealthy containers
  • ephemeral filesystem sharing without a Fly Volume
  • container hooks

Usage

You’ll first need to request access by contacting us through your support email so we can enable your Organization for Machines with containers.

When creating a machine directly through our API, there’s a new field available, aptly named containers, to opt-in to using the new system with an array of ContainerConfig. Below is a slightly contrived example you’d possibly see on the interwebs for running multiple images on other platforms.

{
  "region": "atl",
  "config": {
    "containers": [
      {
        "name": "nginx",
        "image": "nginx:stable-alpine-otel",
        "depends_on": [
           {
             "name": "blue",
             "condition": "started"
           }
         ],
        "files": [
          {
            "guest_path": "/etc/nginx/nginx.conf",
            "raw_value": "dXNlciAgbmdpbng7Cndvcmtlcl9wcm9jZXNzZXMgIGF1dG87CmxvYWRfbW9kdWxlIG1vZHVsZXMvbmd4X290ZWxfbW9kdWxlLnNvOwoKZXZlbnRzIHsKICAgIHdvcmtlcl9jb25uZWN0aW9ucyAgMTAyNDsKfQoKaHR0cCB7CiAgICBvdGVsX3NlcnZpY2VfbmFtZSBuZ2lueDsKICAgIG90ZWxfZXhwb3J0ZXIgewogICAgICAgIGVuZHBvaW50IGxvY2FsaG9zdDo0MzE3OwogICAgfQoKICAgIHNlcnZlciB7CiAgICAgICAgbGlzdGVuIDgwIGRlZmF1bHRfc2VydmVyOwogICAgICAgIGxpc3RlbiBbOjpdOjgwIGRlZmF1bHRfc2VydmVyOwogICAgICAgIG90ZWxfdHJhY2Ugb247CgogICAgICAgIGxvY2F0aW9uIC8gewogICAgICAgICAgICBvdGVsX3RyYWNlICAgICAgICAgb247CiAgICAgICAgICAgIG90ZWxfdHJhY2VfY29udGV4dCBpbmplY3Q7CgogICAgICAgICAgICBwcm94eV9wYXNzIGh0dHA6Ly9ibHVlOjgwODA7CiAgICAgICAgfQoKICAgICAgICBsb2NhdGlvbiA9IC9kYXNoYm9hcmQuaHRtbCB7CiAgICAgICAgICAgIHJvb3QgL3Vzci9zaGFyZS9uZ2lueC9odG1sOwogICAgICAgIH0KICAgIH0KfQo="
          }
        ]
      },
      {
        "name": "jaeger",
        "image": "jaegertracing/all-in-one:1.63.0",
        "env": {
          "COLLECTOR_OTLP_ENABLED": "true"
        }
      },
      {
        "name": "blue",
        "image": "ghcr.io/jipperinbham/kuard-amd64:blue",
        "env": {
          "HELLO": "WORLD"
        }
      }
    ],
    "guest": {
      "cpu_kind": "shared",
      "cpus": 1,
      "memory_mb": 1024
    }
  }
}

Once created, jack into your private network and you’ll be able to reach each container directly.

nginx - http://APP_NAME.internal/
jaeger - http://APP_NAME.internal:16686

Below is the nginx.conf used to route requests to the blue container while also sending open telemetry traces to the jaeger container:

user  nginx;
worker_processes  auto;
load_module modules/ngx_otel_module.so;

events {
    worker_connections  1024;
}

http {
    otel_service_name nginx;
    otel_exporter {
        endpoint jaeger:4317;
    }

    server {
        listen 80 default_server;
        listen [::]:80 default_server;
        otel_trace on;

        location / {
            otel_trace         on;
            otel_trace_context inject;

            proxy_pass http://blue:8080;
        }

        location = /dashboard.html {
            root /usr/share/nginx/html;
        }
    }
}

How are these containers?

As many decisions often wrongly (or rightly) start, ours began with Kubernetes (FKS to be specific). We knew to provide the best experience with FKS, our Machines product needed to be able to run more than one image because it’s almost weird to not use sidecars in Pods now. We initially started out only focusing on running more than one image but it became apparent we had the opportunity to rethink how we run processes in machines.

Enter Pilot, our new init.

  • every user-defined process runs in a container. The term “container” is used somewhat loosely, but more precisely: it implements the OCI Runtime spec. That means anything Docker, runc, crun, etc. can run, it also can, hopefully in the same way as far as the spec is concerned. The spec-compliant OCI runtime implementation is copied and modified from Kata Containers’ rustjail crate.
  • prior to starting any containers Pilot does the following:
    • configures just enough Linux capabilities (everything that’s required to run systemd, for example)
    • setups contained Linux namespaces such that the network namespace is currently always shared among containers for simplicity
    • seccomp is not enabled, but could be and cgroups are v2 but we don’t currently expose container level resource limits (i.e. memory/cpu)
  • runs health checks and reacts to container events accordingly; decides to stop and start containers based on restart policies, stop signals, etc.
  • builds a dependency graph making it possible to figure out the exact order in which each container should be started
14 Likes

This is an intriguing evolution of the transmogrification idea, :tiger:

If there is a volume, then all containers do see it simultaneously right? (Just as bind-mounts?)

Relatedly, it would be handy if FUSE mounts and similar could selectively be made visible to other containers…

(@rubys mentioned that this containers feature might eventually allow for more composable and modular LiteFS—as opposed to asking people to expertly hand-craft a litefs.yml and (pretty intrusively) reshape their usual entrypoint into a subprocess (of litefs)…)

[Things like background rsync replication, Tarsnap, and fly-atc might also be snap-in options someday.]

sweet, I love using pid 1. and writing my own own shitty init’s! (“there are many like it, but this one is mine”)

(did that sound sarcastic? it wasn’t sarcasm)

Correct, right now all containers see the same underlying volume.

1 Like

This is great! I’ve wanted this capability on fly for a while now. By contrast, Render does not have a way to run init containers or do anything ahead of running a container. I’ve been looking at modifying GCP’s konlet allowing me to run more than one container on micro VMs and be able to use launch groups etc. It’s simpler than managing k8s in my opinion.

The one thing that I would love to see is the ability to define an app spec, i.e. the containers to be deployed on the machine, using something other than fly.toml. Something even like supporting a k8s PodSpec would be fine I think. But essentially allowing such a spec to posted to your Machines API would be great! (I am working on a Pulumi provider based on just the Machines API, so I want to be able to do more with the API.)

1 Like

This looks promising. Another Volume question - the Volume shared by all containers wouldn’t be ephemeral especially in the case of it being the app’s database, correct?

Looking at container hooks, would this be a way to seed a DB at runtime, if the DB volume is shared in the same pod as the container w/ the hook?

Lastly, is the only way to request access to this through a paid support subscription? (I’m getting to that point, but haven’t released in prod yet) I am guessing if I try to use the new Machines API container stuff it wouldn’t work w/o access.

Interesting update! I need to follow the news more.

Edit: I am a supporter of support now. I was pondering anyway. :slight_smile: