How is it determined what is invoiced (scale to zero)

I utilize fly.io with its scale to zero feature, allowing applications to remain inactive when not in use. Recently, I stumbled upon an article discussing hosting an AI model alongside scale to zero functionality. Given that GPUs consume computing resources, charging per hour could become costly. Hence, I’m curious about how this calculation is determined.

Hi,

It’s a good question.

For normal CPUs, it’s per second, and only when running. From the docs:

Started Machines are billed per second that they’re running (the time they spend in the started state), based on the price of a named CPU/RAM combination, plus the price of any additional RAM you specify.

but as you mention, their GPU pricing is listed per hour. Not per second:

If you don’t get a reply here, maybe email billing@fly.io

The reason I asked was, for example, when you run Ollama and need the computing power for a few minutes, for example when you use Ollama to rewrite code. You only do this a few times a day and it is not necessary for it to be on all the time. I am currently testing this locally, but this takes quite a lot of computing power, which is why you actually want to run this on a GPU. However, when this is calculated per hour, the costs may not match the benefits.

Article that I found:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.