Hi there,
I have some fly machines that are managing inference workloads that take a very long time (5-10+ minutes) to complete. I’d like to be able to upgrade these machines, so that the old machines can continue to interact with clients that have in-progress inferences, but new requests will go to the new machines.
Is there a recommended deploy pattern here? Maybe somehow mark old machines as “old” and have them start fly-replaying to new machines once the upgrade has started, then stop themselves when they’re done with their last request?
Rolling deploys seem to kill the machines after a maximum of five minutes, which is too short. Plus I would need the new machine to be taking new requests in that time.