Feature Request: "busy" endpoint/busy check endpoint for fly.toml

gigurra · July 24, 2023, 3:23pm

Quite often a http request will create a long running job. For me it happens to be github webhooks which I must answer with 2xx within 10 seconds, or github will timeout their request.

So in my case, I simply create a work item on a queue, and then immediately return 2xx.
However it would be great if I could enable “scale-to-zero” on fly.io for my service. Right now it isn’t possible because it will scale to zero well before my job queue is empty.

There are many other situations also where this could be useful to have. Therefor I propose the following feature:

A busy endpoint/busy check, which can be configured in fly.toml (just like health check endpoints), where fly can query an instance to see if it can be shut down. In the future this could also enable functionality such as persistent leaders and dynamic number of workers/helper instances - keeping the leader alive (avoiding a leader re-election)

nolan-fly · July 24, 2023, 5:34pm

We don’t have plans to add anything like that for the moment, however here’s something that might work:

By default, we send a sigint to kill your app as documented here (search for kill_signal.) You can also bump the timeout to a maximum of either 5 minutes or 1 day depending on what VM type you’re using.

What I’d do is bump the timeout and add a signal handler to set a global flag or some other globally-accessible state when sigint is received. Then, when your job queue is empty or at other appropriate points, check that flag and exit if it’s true.

Does that address your use case?

gigurra · July 24, 2023, 5:49pm

Cool. If the app processes the signal quicker than the timeout, will it be shut down as soon as the processing completes?

nolan-fly · July 24, 2023, 6:04pm

Just guessing, but I’d assume not. If you’re setting a timeout and adding a handler, the onus is on you to do something like:

If there’s no work in the job queue and the signal is received, exit.
If there’s work in the job queue, set some global state and return from the handler.
When processing work, check the value of this global state when the queue is empty. If it indicates signal receipt, exit.

Once you’re handling signals, it’s your responsibility to do what the sender of those signals expects.

lillian · July 24, 2023, 6:44pm

The correct way to do this is to disable auto_stop_machines and have your worker process exit when it’s done running jobs (or hasn’t been running jobs for x time, or whatever other criteria you need).

gigurra · July 24, 2023, 7:02pm

Thanks for both of your answers

They both solve diff situations I have right now

system · July 31, 2023, 7:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deploys and Job Queue Workers Questions / Help	6	1399	July 11, 2024
kill_timeout not working as expected JavaScript docs , nodejs , flyctl	5	705	August 30, 2023
Is downtime expected post app deploy? Questions / Help	9	1197	February 2, 2022
Updating a worker that might still be working	2	210	July 22, 2022
Outgoing request timeouts after idle time Questions / Help machines	8	118	August 3, 2024

Feature Request: "busy" endpoint/busy check endpoint for fly.toml

Related topics