I want to support long running requests for streaming AI responses. Requests can take minutes. I currently prevent fly.io from scaling down the machines by capturing SIGINT signals and witing until requests are done to stop the machine.
When doing a bluegreen deployment will the fly deploy command wait for machines to exit? This could take up to 5 minutes in my case.
Is there a way to prevent fly deploy to wait for old machines to stop?
–detach doesn’t change the semantics of a deployment strategy at all, so it’s not a solution for the problem you stated.
Bluegreen will cordon old machines before destroying them, but I don’t believe it will wait for minutes for those machines to gracefully shutdown.
If your app cannot tolerate killing a streaming response and restarting it for a user when a deploy happens then you should probably write your own custom deployment code. The basic gist would be to create a bunch of new machines, cordon the old ones, somehow tell the cordoned machines that they need to shutdown and then poll all the cordoned machines, destroying them as each of them moves into the stopped state. Deploy is complete once all the old machines are destroyed.
Or you can do the really naive thing and cordon old machines, wait 30 minutes then destroy, under the assumption that all streaming will have finished after 30 minutes of being cordoned. This won’t work if you deploy really frequently, because bluegreen doubles the number of machines.
I believe all the code for bluegreen is actually in the flyctl repo so you can use that as the basis of your own strategy.