I am interested in keeping machines alive if they have an active websocket connection, possibly lasting 6 hours. This would mean that they are not affected by scaling or deploys.
My thought is that the scale/deploy system could tell the machine to shutdown (or just stop sending it traffic) and the machine could report when it is ready to shutdown (it no longer has traffic). In the meantime the scale/deploy system would just deploy new machines as needed (assuming my limit is high enough).
Hi… People who want specialized orchestration, , like this seem to gravitate toward writing their own, using the Machines API. There’s a new official doc that introduces this topic, as Fly.io sees it. (I don’t speak for them, myself, in any way.)
It starts out talking about declarativity primarily, but then gets into the more general kind of question that you were asking…
You can use it to:
Spin up new Machines in specific regions
Roll out changes a Machine at a time
Build custom scaling logic
Tear down and rebuild environments on demand
This is what Fly.io’s own orchestration tools use under the hood. You can spin up Machines, wire them together, and orchestrate them across regions, just like we do. You’ve got all the same knobs and levers we use internally, and you can build exactly the kind of workflow your setup needs.
I do this! I have one machine that is always stopped that is managed by fly deploy. I then have another app that when an incoming request comes in, clones that machine (and removes the flyctl metadata so it isn’t replaced by fly deploy) and fly-replays the request there.
Additional features that might be worth considering:
fly deploy –build-only –push will create a new image but not restart any machines
fly machine cordonwill prevent additional requests from being routed to a running machine.
exiting with a status code of 0 will stop a machine
With these primitives (which you can script using flyctl or the machines API - your choice), you can build a new image, start new machine(s), destroy existing stopped machines, and periodically check back later and destroy the rest.