Feature Request: add machines count and regions in `fly.toml`, make `fly.toml` stateless and declarative

Currently fly deploy is not completely declarative because the machines count and regions are controlled with fly scale count. This was also the case previously with the machine size, which now thankfully has been moved in fly.toml.

Moving the scale count and regions options in fly.toml would make the configuration stateless and declarative, now you have to run the command fly scale count x separately after checking what’s the count of your machines and what are the current regions.

Also if you manually stop some machines and then run fly deploy the stopped machines will remain stopped, which is never what you want, you would want your infra to scale back to the desired machine count, which could be specified in fly.toml.

Adding machines count and regions values in the fly.toml could make fly deploy stateless and declarative, removing countless bugs caused by hidden state generated by fly scale count and other imperative commands.

Another thing you should do in my opinion is to move all the orchestration from the cli to your servers, so you don’t accidentally kill an in progress fly deploy command and leave your infra in a broken state. You could then handle what happens when you stop a deployment while it’s in progress, complete the deployment or roll it back.

1 Like

Added wishlist

One thing nice about fly.io as a platform is that you can write your own orchestration; which may sound like a brush off, but I truly mean it in the let’s work together sense. What you get out of it is that you are up and running faster, and what we get out of it is that once a feature is shipped there is an expectation that it is supported, so we are motivated to ensure that the feature is right and complete before it ships.

So, let’s iterate on this together. I’d suggest focusing on your last comment first, then we can add in scaling, and once complete we can explore baking the result into flyctl.

One nice feature of fly deploy is that if you are running it is Sydney, Australia the orchestration occurs in Sydney rather than northern Virginia. So let’s keep that.

Another nice feature of fly.io is that our machines start quick, so you can start an orchestrater whenever you need it. An example of that can be found here:

That example injects build secrets into the build process. If you don’t need that, that can be removed. It also suggests running with fly console, but you can substitute fly machine run which has a handy --detach flag.

Taken together, you can have a one line command that you can run today that runs your deployment on our servers. Write a one line script to contain this command, and the result will be no harder than running fly deploy today.

Now that we have a client side script running a server side script that runs your deploy on our servers, the server side script can be enhanced to run whatever scaling behavior you like. The general case is more complicated than it might seem because you can have a different number of servers in different regions, and perhaps even have different sizes. But you might not need all that, something way simpler may suffice.

Once we have some experience with this, we can come back with a concrete proposal, perhaps for additional fields or sections in fly.toml, and perhaps even a --detach flag on fly deploy.

3 Likes

Hi… As a small side note, I disagree with how strongly this is phrased. Albeit only because someone with a lot of machines explicitly chimed in last year to say that they did want such behavior:

This was in response to…

Maybe the proposed fly.toml setting helps establish a middle ground, but more care would have to be taken with backward compatibility, :dragon:

1 Like

Added autoscaling, flyctl

Added help-me-help-you, machines

It didn’t occur to me, but I benefit from this behavior now. I have print on demand servers that are mostly idle; and fly deploy of new versions is merely a matter of building images that will be loaded the next time one of those machines wakes up.

Spec’ing out correct behavior is likely to be harder than implementing that behavior.

1 Like

I liked that blog post! Also Annie Sexton’s “pickle jar” analogy:

https://fly.io/blog/delegate-tasks-to-fly-machines/

(There are so many pickle jars in the analyst world, but it often all revolves around what seems—on the outside—like personal exploratory whim.)

Running fly deploy inside a machine has some problems:

  • you will need a database to ensure you don’t run 2 deployments concurrently for the same app
  • the orchestrator machine would need to have a special role in the dashboard similar to builder machines, adding complexity
  • there is no reason to create a different orchestrator machine for each deploy command, you could share all deployments among all users and put it behind an API

My proposal is to add an endpoint to the machines API, for example POST /v1/apps/{app_name}/deploy. This endpoint would receive the fly.toml configuration as JSON and run the deployment. If you kill the command, the deployment would continue. Another user cannot deploy an app if there is another deployment in progress.

fly deploy would simply call the machines API endpoint after having published the Docker images. In the API backend you can reuse the flyctl code as it is now. This change could be initially implemented with a flag to see how the system behaves, for example --remote-orchestration (--detach is already taken, it doesn’t wait for the machines health checks before returning)

Another benefit of this approach is that if you publish a bug in the flyctl code you can quickly fix it in your API backend without waiting for users to update their cli.

One nice feature of fly deploy is that if you are running it in Sydney, Australia the orchestration occurs in Sydney rather than northern Virginia.

In my opinion this fact doesn’t add much value, the orchestrator calls the machines API, it makes more sense to put the orchestrator in the same machine or location where the API is hosted to have lower latency.

Our API is hosted on every host. A brief overview can be found in Our Stack. We even host our API on edge servers that have no machines.

If you have a design in mind that involves a central database, that would be a new database, one specific for that application. That database could be sqlite3 (with or without litefs), postgres, or any of a myriad other options.

At a minimum, that could be a useful blueprint that others can use and build upon (at the moment that page is so sparse because it literally is days old). And the pieces that make sense can migrated to flyctl (which runs on your laptop) or flaps (not mentioned in the overview I linked to above) which runs on every fly.io machine and edge server.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.