Autoscaling based on concurrency

StevenNguyen · February 23, 2026, 12:17pm

Currently, I have a total of 4 machines (8x–16GB), and my toml file is configured as below.

[http_service]
internal_port = 3003
force_https = true
auto_stop_machines = ‘suspend’
auto_start_machines = true
min_machines_running = 2
processes = [‘app’]

[http_service.concurrency]
type = ‘connections’
soft_limit = 30

[[vm]]
size = ‘performance-8x’
memory = “16384mb”

This means that there will always be 2 machines running continuously, while the other 2 machines will be in a suspended state. When the number of connections reaches 60, the third machine will start, and when it reaches 90 connections, the fourth machine will start.

However, when I check Grafana and look at the App Concurrency chart, the metrics for each machine only stay around 10–15, and all 4 machines are always running. This is not what I expected.

What should I do to make autoscaling work as intended?

Additionally, besides autoscaling based on connections or requests, is there a way to autoscale based on memory or CPU usage on Fly.io?

system · March 2, 2026, 5:14pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.