Hi everyone,
I’m having trouble setting up autoscaling for my Fly.io app based on the number of requests. I’ve configured my fly.toml
as follows:
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
max_machines_running = 2
processes = ['app']
[processes]
app = 'uvicorn index:app --host 0.0.0.0 --port 8080'
[[services]]
internal_port = 8080
protocol = 'tcp'
processes = ['app']
[services.concurrency]
type = "requests"
hard_limit = 3
soft_limit = 1
[[services.ports]]
handlers = ['http']
port = 80
[[services.ports]]
handlers = ['tls', 'http']
port = 443
[[vm]]
memory = '4gb'
cpu_kind = 'shared'
cpus = 4
Issue:
I’ve set the concurrency
type to “requests” with a soft limit of 1 and a hard limit of 3, expecting that Fly.io would automatically scale up machines when the number of requests exceeds these limits. However, my app is reaching high CPU usage (almost 100%) under load, but no additional machines are being launched.
I also set max_machines_running = 2
, but it seems like new instances aren’t being spun up when the app hits the soft or hard limits.
What I’ve Tried:
- Setting
min_machines_running
to 0 andmax_machines_running
to 2. - Adjusting the concurrency limits.
- Monitoring the app’s performance using
flyctl monitor
and checking logs.
Question:
Is there something I’m missing in my configuration? How can I ensure that my app properly scales up when the number of requests increases and prevents the CPU from reaching such high usage without scaling new machines?
Below is the test I did hoping the scale would work
Any help or suggestions would be greatly appreciated. Thank you!