Instance is downscaled while worker is running

Hello Fly Community! :wave:t3: I am using fly.io to deploy a nodeJS application. At the moment I am running a performance-2x machine with the autoscaling option (scale up/down). In my nodeJS app I use bullmq with a redis connection to handle the job orchestration. There a job that I have implemented, which downloads an S3 video file and transcodes it with the help of fluent-ffmpeg.

When an instance in which a job is in progress does not receive traffic for up to 2-5 minutes, it gets downscaled :smiling_face_with_tear: I get why is that and how fly.io handles things, but I am wondering if there is a way in which the instance would not be downscaled while a job is still running while still having the autoscale option on? I tried to set a higher kill_timeout in fly.toml however I can only set it to max 5 minutes.

Would be happy to hear your recommendations! :rocket:

Hi @jivanovic, there’s a couple options:

  1. [preferred] Make it so that your app doesn’t need to be up at the end of each job.
  2. Add a request to the app that wakes it at the conclusion of the job, and block until the app is up again.
  3. Send a request every 5 minutes from the worker as it is processing the job. This would require you to make an endpoint in your app that keeps the process running every time the endpoint is hit, like a heartbeat.

For option 1, can you save the output of the video transcoding to S3 / do any of the post-processing within the job itself instead of in the main app? Can you track the status of jobs in a SQLite table instead of on the main app?

1 Like

Hey @kd1! Thank you for the response, I really appreciate it :grin:

I agree with you that Option 1 is the preferred one and I have it in the roadmap to refactor this part of the app logic. However at the moment I am looking for a bit simpler solution so I will go with Option 3 and make a heartbeat request with the fly-force-instance-id key in the header to target the specific app instance that is running the job to keep it alive :heart:

I read about how fly.io checks and decides when to downscale or upscale instances but I never saw a specific interval value when these are made. Does 5 minutes go by or is it something we cannot really predict? :zap:

I got 5 minutes from your kill timeout. You’d want the frequency of the heartbeat to be less than the timeout so that the timeout is never reached.

Yeah, of course. Thank you for your responses @kd1 :grin: have a great day :wave:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.