Billed high amounts for auto start/stop machines

I’ve been migrating a high traffic service from AWS Lambda to fly.io and have been conducting performance tests.

Each request is handled in a single container, so I spun up ~440 machines of shared-cpu-4x type, with auto_start/stop_machines=true, hard/soft_limits=1 in fly.toml.

I was under the impression that I was only billed for the period during which the VM runs, however after running a 20 minute performance test which involved ramping up a load of 10 concurrent requests to 430 concurrent requests, I ended up with a rather unexpectedly large bill of $77:

I believe I’m being billed for suspended VMs as well, or VMs aren’t actually marked as idle once they’ve been provisioned or they’ve handled a request. However, since the usage data on the dashboard isn’t granular enough, I have no way to confirm that this is indeed the case, or if there is another billing issue.

Can anyone help me understand what happened here? Thanks in advance for your time.

1 Like

Do you have the auto_stop feature setup in your fly.toml?

You have to exit the main process to let fly.io know the machine can shut down.

Do you have health checks setup? You may want to try turning them off for your purpose, though I don’t think they keep the machine alive.

These are VMs, not containers, and not lambdas. So there is overhead of spinning them up as well, depending on how fast they are ready that can eat into your time. Smaller and lighter Dockerfile image may help here.

You may want to contact billing for more information.

Hi @Biswas

To clarify, you still pay for the server when it’s idle. You stop paying when the server stops.

When using the auto stop feature, it doesn’t stop all the servers at once. It stops them slowly over time.

fly provides a Grafana dashboard that you can use at https://fly-metrics.net/

You can check to see the history of how many servers were running at any point using this link
https://fly-metrics.net/explore?left={"datasource":"prometheus_on_fly","queries":[{"refId":"A","datasource":{"type":"prometheus","uid":"prometheus_on_fly"},"editorMode":"builder","expr":"sum(fly_instance_up)","legendFormat":"__auto","range":true,"instant":true}],"range":{"from":"now-1h","to":"now"}}

If you’ve got multiple apps, you’ll need to select app in the label filters and enter the desired app name as the value.

The reply from @charsleysa suggests that machines are shut down eventually.

Are they shut down eventually if the app process keeps running, but shut down immediately if the app process exits?

Somewhat related to this, I faced a few HTTP 502 errors when running performance tests. Is this usually because fly was unable to provision a machine (after all, allocated machines are not provisioned immediately), or is it because the app didn’t start up in time?

If it’s the latter I’m wondering whether health checks may be helpful here.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.