Quite often a http request will create a long running job. For me it happens to be github webhooks which I must answer with 2xx within 10 seconds, or github will timeout their request.
So in my case, I simply create a work item on a queue, and then immediately return 2xx.
However it would be great if I could enable “scale-to-zero” on fly.io for my service. Right now it isn’t possible because it will scale to zero well before my job queue is empty.
There are many other situations also where this could be useful to have. Therefor I propose the following feature:
A busy endpoint/busy check, which can be configured in fly.toml (just like health check endpoints), where fly can query an instance to see if it can be shut down. In the future this could also enable functionality such as persistent leaders and dynamic number of workers/helper instances - keeping the leader alive (avoiding a leader re-election)
We don’t have plans to add anything like that for the moment, however here’s something that might work:
By default, we send a sigint to kill your app as documented here (search for kill_signal.) You can also bump the timeout to a maximum of either 5 minutes or 1 day depending on what VM type you’re using.
What I’d do is bump the timeout and add a signal handler to set a global flag or some other globally-accessible state when sigint is received. Then, when your job queue is empty or at other appropriate points, check that flag and exit if it’s true.
The correct way to do this is to disable auto_stop_machines and have your worker process exit when it’s done running jobs (or hasn’t been running jobs for x time, or whatever other criteria you need).