I’ve read you can set an auto stop for the idle GPU but it is not clear after how much time it goes idle and stops to charge for the computing. For Runpod for example it takes about 30s.
Hi @Mrka
We have found it takes around 5 minutes of idle time for the machine to shutdown.
So as I understand, for inference, Fly GPU is not cost effective compared to Runpod serverless GPUs since the cost for hours is about the same but Fly GPU takes more time to start and stop.
It would depend on your workload.
If you are running custom services then you could have the service terminate itself after 30 seconds (or whatever time you choose) of idle so you don’t have to wait for fly’s autostop.
Taking a quick look at Runpod pricing, their Flex option has a higher per second price than fly so depending on workload you could have a higher cost with Runpod.
Also it looks like you need to use their SDK and handler functions format for Runpod serverless while with fly you can run pretty much any code / any docker container.
I’m wondering for Ollama over the GPUs. I want to run the less time possible for reducing the costs. Need to start fast as possible for the first inference and stops fast as possible when idle. In Runpod serverless you load the vLLM, set two workers minimum and get the API endpoint. It scales down to one worker and if idle scales to zero after 30s.
@Mrka you can control idle shutdown time directly from your app. You can shut it down immediately after finishing serving a request if you want.
Your app itself can shut down with a successful exit code (e.g. sys.exit(0)
), and then the machine will shut down and stay stopped until a request comes in (auto_start_machines=true). So you can have the app shut down immediately after finishing with a request, or stay up once there are no more requests, and detect idleness itself, shutting down after X seconds of inactivity. This is explained here.
Otherwise, the Fly Proxy will detect the lack of activity and send a shutdown signal after 5 minutes.
Cheers!
- Daniel
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.