Extended maximum kill_timeout option

nrf · June 6, 2024, 6:07am

I want to use Fly.io to run a conversational AI agent, but they might be in the middle of a conversation with a user when I do a deploy. Is there a way to extend the kill_timeout to 10 minutes?

roadmr · June 6, 2024, 2:01pm

Hi @nrf,

The maximum value of kill_timeout is 300 seconds (5 minutes). This is because the purpose here is not to “allow machine to finish arbitrarily long-running processes”, but to “give reasonable time to pack things up because a new app-wide deployment was explicitly requested by the operator and we have to complete it as soon as possible”.

In cases where the 5-minute maximum kill_timeout is not enough for your application, what we recommend is changing your deployment and worker spawning strategy so your long-running machines have time to finish their work undisturbed while allowing changes to other kinds of machines to be deployed to, without interruptions.

As an example, a typical pattern is a web or API service that starts long-running worker processes. Instead of starting the worker in the same machine as the web or API service, you can start worker machines on demand and have them grab work from a queue (rabbitmq, redis). Once a worker machine is finished, it can shut down entirely. That way, worker machines are not actually affected by a deploy and would remain even if you fly deploy a new version of the web/api service.

Here’s an example of a small project that implements this pattern in case you want to have a look:

If spawning machines on demand does not fit your use case, the other alternative would be to maintain a list of machines that need to be updated once you have a new image. Then, instead of doing fly deploy , wait for each machine to finish a long-running task, and once you know it’s idle, you can do fly machine update --image to your new image version. In essence you’re driving the deployment “manually” with a lot more control over when and how a machine gets stopped. This process probably adapts better to an application that handles long-running requests itself, without handing them over to separate worker processes or machines.

One interesting thing here is that flyctl deploy is relatively opinionated, but as you’ve seen, it mostly builds over the lower-level fly machine command and Fly Machines API, so if the deployment strategies offered by fly deploy don’t suit your needs, you can orchestrate deployments as per those needs using those primitives directly.

Daniel

system · June 13, 2024, 2:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Maximum kill_timeout wishlist , machines	1	474	December 22, 2023
Request timeouts on fly.io Questions / Help	10	3519	May 19, 2023
kill_timeout not working as expected JavaScript docs , nodejs , flyctl	5	704	August 30, 2023
Deployment handling open connections Questions / Help postgres	3	29	October 1, 2024
Timeout Possible	1	490	May 8, 2022

Extended maximum kill_timeout option

Related topics