Fly Apps on machine prerelease

tvdfly · January 20, 2023, 8:08pm

The first post about this was Fly Apps on machines: prerelease of fly deploy , and we’ve added a lot more since then! This, kind of long, post goes into detail on how things work in Apps v2.

We’re working to migrate the Fly Apps Platform to Machines. We’re calling this Apps v2. Apps v1 are the apps that use Nomad.

The first big step will be making new apps use the machines platform instead of nomad. There’s a bunch of small steps to get there. Today we updated the prerelease and PR to support fly launch and fly deploy, making it possible to launch and manage apps built entirely on machines.

Install the prerelease or build the PR to try out the new fly deploy behavior:

github.com/superfly/flyctl

WIP: fly launch and fly deploy for machines for apps v2

superfly:master ← superfly:appsv2

opened 12:29AM - 06 Jan 23 UTC

tvdfly

+3543 -2040

We're pushing to migrate the Fly Apps platform to machines instead of nomad to i…mprove developer experience. We're calling this Apps v2. The first big step will be making new apps use the machines platform instead of nomad. There's a bunch of small steps to get there, and this is one of them... Update `fly deploy` when we're working on an app with machines to: * Prompt developer to migrate the machines into the "Fly Apps platform". The machines being deployed will become the instances of the apps. Specifically, we put some metadata on those machines indicating they are part of the Fly Apps platform. Subsequent deploys will only operate on machines with that metadata. * Support rolling deployments, release_command, and many of the other standard fly.toml configuration. Update `fly launch` to use machines instead of nomad. Things that are known not to work, at least for now, with Apps v2 `fly deploy` are: * [x] ~~`[processes]` are not supported and will be ignored~~ * [x] ~~images get built twice~~ * [ ] `[statics]` are not supported with machines

To install the latest prerelease:

curl -L https://fly.io/install.sh | sh -s -- prerelease
fly version
fly v0.0.451-pre-4 darwin/arm64 Commit: a52f9a0c BuildDate: 2023-01-20T14:37:55Z

Try it out and let us know what you think! We’re looking for bugs we can squash, missing features, things you’d like see. Let us know in this thread.

Creating a v2 app

Launch apps with fly launch. It should work like it does for apps on Nomad. All the scanners and builders are supported, as usual.

To try out Apps v2 use an app that does not require statics. Apps v2 doesn’t support statics, yet. We’ll announce when that changes.

I’ll start an nginx app, and use that for the rest of the examples in this post.

fly launch --image nginx --internal-port 80
...
Created app dry-pond-1475 in organization tvd-testorg
Admin URL: https://fly.io/apps/dry-pond-1475
Hostname: dry-pond-1475.fly.dev
Wrote config file fly.toml
? Would you like to set up a Postgresql database now? No
? Would you like to set up an Upstash Redis database now? No
? Would you like to deploy now? Yes
? Will you use statics for this app (see https://fly.io/docs/reference/configuration/#the-statics-sections)? No
==> Building image
Searching for image 'nginx' remotely...
image found: img_wd57v5nge95v38o0
Provisioning ips for dry-pond-1475
  Dedicated ipv6: 2a09:8280:1::ce9b
  Shared ipv4: 66.241.124.2
  Add a dedicated ipv4 with: fly ips allocate-v4
No machines in dry-pond-1475 app, launching one new machine
  Machine 21781973f03e89 update finished: success
  Finished deploying

Once fly launch finishes, use fly open to open the app’s homepage in a browser. The “Welcome to nginx!” page will show if everything worked.

Deployments

fly deploy continues to work to update the app:

fly deploy
==> Building image
Searching for image 'nginx' remotely...
image found: img_wd57v5nge95v38o0
Deploying dry-pond-1475 app with rolling strategy
  Machine 21781973f03e89 update finished: success
  Finished deploying

release_command, rolling and immediate strategies, and the other deploy flags and settings are supported. There are some small variations in apps v2, and we’ve limited them as much as possible and updated flyctl to tell you about them.

Mounts and volumes

Volumes need to be created and manually attached to machines. The source setting in the [mounts] section is no longer supported in fly.toml. There is no enforcement around volumes names.

Attach volumes with the Machines API for now. We’re working on making this easier with fly machine update and fly machine clone.

Once attached, the destination setting in fly.toml will be used to update the destination of the volumes. For example, the volumes will be mounted at /my/new/directory with this fly.toml config after running fly deploy:

[mounts]
destination = "/my/new/directory"

Scaling

Scaling an app is different with Apps v2. Use fly machine clone to horizontally scale the app, even across regions:

fly machine clone 21781973f03e89
fly machine clone --region syd 21781973f03e89
fly machine clone --region ams 21781973f03e89

Now 4 machines are running for this app: the original machine plus three new ones. Use fly machine stop and fly machine remove to scale down the app:

fly machine stop 9080524f610e87
fly machine remove 9080524f610e87
fly machine remove --force 0e286039f42e86

Scale memory and cpu with fly machine update:

fly machine update --memory 1024 21781973f03e89
fly machine update --cpus 2 21781973f03e89

Processes

Processes continue to be supported in fly.toml. The big difference with apps v2 is you need to specify which machines are assigned to which processes.

fly deploy will update each machine based on its process group, applying only the services, cmd, and checks for that process.

Use fly machine update to assign a process group to a machine with:

fly machine update --metadata fly_process_group=app 21781973f03e89
fly machine update --metadata fly_process_group=app 9e784925ad9683
fly machine update --metadata fly_process_group=worker 148ed21a031189
fly deploy
==> Building image
Searching for image 'nginx' remotely...
image found: img_wd57v5nge95v38o0
Deploying dry-pond-1475 app with rolling strategy
  Machine 21781973f03e89 [app] update finished: success
  Machine 148ed21a031189 [worker] update finished: success
  Machine 9e784925ad9683 [app] update finished: success
  Finished deploying

Make sure to run fly deploy after updating these groups to ensure each machine gets the appropriate services, checks, and cmd. These are the key pieces of the fly.toml that configure the processes, with the one service using the "app" process group:

[processes]
  app = "nginx -g 'daemon off;'"
  worker = "tail -F /dev/null" # not a very useful worker!

[[services]]
  processes = ["app"]

fly machine clone can then be used to build out multiple instances within a process group, or to clone a machine and put it in a different process group:

fly machine clone --region gru 21781973f03e89
fly machine clone --process-group worker 21781973f03e89

Checks

Checks defined in fly.toml are translated to checks on each machines. We don’t restart or stop routing traffic to machines based on these health checks, yet. We’re working on that!

Failing checks will cause the deployment to fail. Use --detach to skip waiting for health checks to pass during fly deploy.

Secrets

Secrets continue to work with machine in Apps v2. Setting or unsetting secrets will result in a deployment that calls the update api on each machine to change some metadata. This causes the machine to update the secrets it is using.

fly secrets set DB_PASSWORD=supersecret
INFO Using wait timeout: 2m0s and lease timeout: 30m0s
Deploying long-sun-1337 app with rolling strategy
  Machine 06e8297ef31587 update finished: success
  Finished deploying

Release commands

Release commands defined in fly.toml are run in a new machine. flyctl will wait for the machine to finish and check the exit code. If the exit code is 0, the machine will be automatically destroyed and the deployment will continue. Otherwise, the deployment fails and the release_command machine is put into a destroying state for about an hour after which it is destroyed.

We keep the failed release_command machine for a little bit in case it’s useful for debugging issues. Unfortunately it’s not possible to access the machine once it’s in the destroying state. We can clone it, though, with:

fly machine clone --clear-auto-destroy --clear-cmd MACHINE_ID

We set --clear-auto-destroy so the new machine won’t destroy itself on exit. We also unset the CMD, so the release command won’t run again. The default cmd for the machine will run. Use --override-cmd, instead of --clear-cmd, to manually set a cmd to run.

After that, we can use fly machine list to get the ip address for the machine, then use fly ssh console -A <ip> to access the machine. Then we can debug!

We have some ideas about to simplify debugging release_commands. Let us know what you’d like to see. We’ll announce those once they are available.

Restarting the app

fly apps restart APP_NAME will restart all machines in the app. This includes both machines that have the Fly Apps Platform metadata as well as other machines that may have been created.

Migrating existing apps

If you already have an app with machines, running fly deploy will convert it to an apps v2 app. You’ll be prompted to do the migration.

We don’t currently support migrating apps from Nomad to Machines. We’ll announce when that’s available.

Note/Warning this will overwrite the config for all these machines, based on the values set in fly.toml and the existing config on the machines. As an example, the services and environment values will come from fly.toml replacing whatever was present before. Any mounts will not change, though fly deploy may change the mount path if the destination path under the [mounts] section in fly.toml is different than what’s currently on a machine.

containerops · January 21, 2023, 2:37am

Do the release commands respect the environment variable PRIMARY_REGION like apps v1? I mean, is the machine spawned on PRIMARY_REGION?

Is there a ETA for implementing canary deployments?

We considered rolling our own zero downtime strategy using the API but we don’t really want to do it right now .

kurt · January 21, 2023, 2:46am

The release command doesn’t pay attention to PRIMARY_REGION right now, that’s a good thing to bring up. Added an issue here: Apps v2: Release command should attempt PRIMARY_REGION env var · Issue #1608 · superfly/flyctl · GitHub

We aren’t sure when we’re going to do canary deployments. We’re focusing on things that don’t require creating new machines, vs just updating existing ones. This is much more resilient. Most problems we run into with our current infrastructure are caused by VM churn.

It’s worth trying a rolling deploy with machines to see if you really need canary deploys. Deploys with machines do some magic to make an update incredibly fast. We pull the new image and prep everything before restarting the VM. It happens so fast, Postgres doesn’t even failover.

If you’re running 2+ machines per app, there’s a good chance a machines rolling deploy will be zero downtime.

ignoramous · January 21, 2023, 9:25am

Nice.

I want to use this new release.

I submitted a bunch of fixes for flyctl wrt to Machines back in the day that I needed (and since then continue to maintain my very old fork) but couldn’t be merged: Improvements to deploy and run commands for machines by ignoramous · Pull Request #1327 · superfly/flyctl · GitHub

If they have been fixed in main, I’d move to it.

The blocking issues for me are:

leases for apps with 40+ Machines just didn’t work as expected with rolling deploys. Always had (and still have) to do an immediate deploy.
flyctl deploy -a <app-name> ... didn’t really work.

The PR fixed (or tried to fix) the underlying issues.

It’d be nice if we could use Machine names instead of IDs that increasingly have a lot of hex chars common for some reason. I always have to double check the ID before execing some of the more onerous commands (remove -f for example): [FR] flyctl m update with machine-name · Issue #1293 · superfly/flyctl · GitHub (I volunteer to impl if this is something that folks at Fly are open to merging).

containerops · January 21, 2023, 9:42pm

Would be awesome to have the machine metadata available to the instance as environment variables.

containerops · January 24, 2023, 1:35pm

Is there any way for us to get this information? I played a bit with the GraphQL API but checks are empty for my app (I tried digging into app.healthChecks, app.allocations, app.machines. ...checkState, etc.).

I’m wondering how we can do a sanity check once a while to ensure the desired state. Do you have a suggestion? Any input is valuable, we are willing to do something on our side using the API.

I see it’s possible to get a list of running machines (machines(state: "started")), and we can use it to detect issues with the underlying host/orchestrator. But I suppose the state will still be started in case of problems on our side (e.g. app exited with an error).

By the way, is there a way to get the desired state using the API? As in, “wants 4 machines in iad, 2 machines fra, 1 machine syd, 1 machine in gru”.

catflydotio · January 29, 2023, 4:44pm

Update 29 Jan: If you want to play with this, use prerelease 5, not the latest prerelease 6. Here’s how to get prerelease 5, which has Apps V2 features:

curl -L https://fly.io/install.sh | sh -s 0.0.451-pre-5

Prerelease 6 is good if you want to test out the new Rails Dockerfile generator: Cut over to Rails Dockerfile Generator on Sunday 29 Jan 2023

Brad · February 1, 2023, 7:22pm

@catflydotio any chance you could cut a pre-7 release that’s rebased on top of @samruby’s Rails pre-6 changes? I’ve got some testing to do for Rails apps. Thanks!

catflydotio · February 1, 2023, 7:23pm

Ah, I was going to post here as soon as I’ve checked, but I believe 0.0.452-pre-1 is Apps V2 again

Hypermind · February 2, 2023, 6:36am

This breaks my mental model. It’s so convenient when things revolve around the apps. But the notion of a machine breaks that powerful simplicity. Why do we have to think about machines when everything most business care about is apps? Instance resources, regions, number of instances. That’s almost all we need to be happy.

As a side note, this is one of the main reasons we are migrating from Azure to Fly. The incremental complexities added to Azure through the years made working with it unbearable. The amount of configuration and lock-in is through the roof. It was a very good day when we deleted convoluted Azure resources after the migration to Fly.

Now it seems that Fly is going the same route trying to make things more complex than it’s needed to be for the end-users. I always thought that people love Fly because it has a simple mental model, but maybe I was mistaken. Maybe people love playing with machines… If so, then what makes Fly different from offerings like Digital Ocean? Those relicts are painful to manage in this day and time, and this is why we choose Fly (Apps v1) in the first place. Because we do not want machines! Those who want them use VPS or self-host.

I can demonstrate analogy from the world of programming. When a program is written in functional style and consists of mostly stateless functions without side effects, such a program is quite easy to develop, manage, scale and debug. However, the same program can be written in a different way. For instance, it can have tons of shared state. The presence of that state makes the progress of development slow, code changes become brittle, the algorithms written in that style are prone to errors. Bug fixes are hard because every fix tends to have a side effect that breaks another seemingly unrelated thing. The excessive presence of state puts a toll on multi-threading, etc.

Now, how does it all relate to Fly? In my opinion, machine is the ultimate form of state, and any state is the ultimate form of pain, which makes machine = pain analogy quite vivid. Cloning a machine for scaling means cloning a state. And we know that state is pain, so our pain multiplies. Scaling becomes brittle, complex and dangerous. It becomes something I would never ever touch again (hello Azure).

In contrast, Fly Apps v1 have no observable state. Dockerfile defines all the state the app should have (there is a bit of additional state for Fly scaling parameters, DNS, volumes and that is). This simple fact gives a monster power to the whole paradigm, because everything I have to worry about is just Dockerfile. I can put it to a Git repo and upgrade/rollback as I see fit. Every change is self-documented in the history of Dockerfile. The absence of pesky state means productive work and happy life.

kurt · February 2, 2023, 2:47pm

Instructions that involve fly machine ... commands are temporary. What’s going away are these commands. They, or something similar, will come back when we figure out the right UX:

fly scale count
fly regions ...
fly autoscale

These cause endless pain, because they don’t behave the way people expect. fly machine clone doesn’t surprise people, and is a decent stand in for now.

Hypermind · February 2, 2023, 3:08pm

Thank you for paying attention. I am glad to hear it.

pollux · February 2, 2023, 9:35pm

i’d argue againt deprecating it, I love these commands, add a region, increase the scale count and autoscale ratio. The main reason I came to fly was because of these commands, you get a hang of it after tinkering with the comands for a while.

Absolutely loath the fly machine commands!

charsleysa · February 2, 2023, 11:01pm

I would be fine with something like fly apps scale count [number] [region] that simply does fly machine clone under the hood for me as we have plenty of apps that are ephemeral and don’t care which VMs get created / deleted.

It would be a negative experience to take away fly scale count and have no replacement for it as it’s quite useful for quickly scaling by running a single command. Having to run multiple commands many times in order to get the same functionality is, in my opinion, a step back in developer experience.

kurt · February 3, 2023, 3:20pm

Yeah this is roughly what we’re after. The awkward part is multi process apps. Right now we have:

fly scale count web=3 worker=10

We can probably make that work just by checking the region of existing machines. If they’re all the same, we can infer that’s the right region.

Does this make sense to do when we can’t infer the region?

fly scale count web=2 worker=10 --region=syd

kurt · February 3, 2023, 3:40pm

@ignoramous when’s the last time you had a lease issue? Are you doing deploys that take longer than 30m? This stuff has all basically been rewritten since December, so pre December behavior has changed.

We’re planning to make fly machine stop <name> work, but not until we get this first release out. Machine IDs actually include routing information, so we know exactly which hardware to send a stop command to. Using names is going to take a little bit of infrastructure work.

charsleysa · February 3, 2023, 7:57pm

That makes sense, though inferring the region will have to work consistently otherwise it could be a jarring experience. I can’t speak too much for multi process apps as we don’t use them (mainly due to limitations around things like internal DNS).

Maybe the distribution of vms across regions could be a config. By default it’s even distribution across all specified regions (e.g. regions: [syd, sin, nrt] and region_placement: balanced results in a scale count of 9 placing 3 vms in each region) but there’s also the option of having per region (e.g. regions: [syd, sin, nrt] and region_placement: per_region results in being able to set the scale count to 5 for syd, 3 for sin, and 1 for nrt.

ignoramous · February 4, 2023, 8:19am

The last time I tried rolling was in Sep/Oct, 2022. Haven’t tried after.

~~Thanks for the heads-up. I’ll try again with the latest flyctl and the default strat (rolling) once the builders are up (having trouble deploying right now).~~

~~Error failed to fetch an image or build from source: error connecting to docker: You hit a Fly API error with request ID: 01GRDQDGTJMF9QFPXXFNH3EQ61-maa~~

Gotcha. I was wondering if flyctl can loop through machine names and substitute them for IDs. Back when I was looked at the code last year, it seemed do-able; but the flyctl code base has been through a lot since then.

Update: Kurt, the rewrite has made things better! rolling deploy works just fine for an app with 30+ machines using flyctl-0.0.451.

pier · February 15, 2023, 2:03am

Waaaait a minute…

So you’re saying…

I can finally have an app with 8 shared cores and 2GB of RAM?

exciting

pier · March 1, 2023, 2:42pm

So I created a new app from scratch using fly launch and presumably this should have created a v2 app?

Then tried to add 8 shared cores via the dashboard but got this error:

I contacted billing@fly.io yesterday but didn’t get a response.

So I tried with the CLI but fly machines list says No machines are available on this app so I can’t use fly machines update. Although from the first post on this thread using fly launch should have created a new machine?

Yesterday I used fly v0.0.464 to create the app. I’ve just updated to fly v0.0.473 and will try again.

Edit:

Same result with fly v0.0.473.