Issue with Autoscaling Based on Request Count in Fly.io

Hi everyone,

I’m having trouble setting up autoscaling for my Fly.io app based on the number of requests. I’ve configured my fly.toml as follows:

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  max_machines_running = 2
  processes = ['app']

[processes]
  app = 'uvicorn index:app --host 0.0.0.0 --port 8080'

[[services]]
  internal_port = 8080
  protocol = 'tcp'
  processes = ['app']

  [services.concurrency]
    type = "requests"
    hard_limit = 3
    soft_limit = 1

  [[services.ports]]
    handlers = ['http']
    port = 80

  [[services.ports]]
    handlers = ['tls', 'http']
    port = 443

[[vm]]
  memory = '4gb'
  cpu_kind = 'shared'
  cpus = 4

Issue:

I’ve set the concurrency type to “requests” with a soft limit of 1 and a hard limit of 3, expecting that Fly.io would automatically scale up machines when the number of requests exceeds these limits. However, my app is reaching high CPU usage (almost 100%) under load, but no additional machines are being launched.

I also set max_machines_running = 2, but it seems like new instances aren’t being spun up when the app hits the soft or hard limits.

What I’ve Tried:

  • Setting min_machines_running to 0 and max_machines_running to 2.
  • Adjusting the concurrency limits.
  • Monitoring the app’s performance using flyctl monitor and checking logs.

Question:

Is there something I’m missing in my configuration? How can I ensure that my app properly scales up when the number of requests increases and prevents the CPU from reaching such high usage without scaling new machines?

Below is the test I did hoping the scale would work

Any help or suggestions would be greatly appreciated. Thank you!

Hi… Yes, although it’s not obvious at first, :dragon:. You have overlapping definitions of port 8080, both [http_service] and [[services]] blocks, and this tends to confuse the Fly.io infrastructure.

Try removing the [[services]] parts and putting your concurrency definitions under [http_service] instead.

Hope this helps a little!

Added autoscaling, proxy

I don’t think this exist? I don’t see it in the docs. As far as I know, Fly won’t create a new machine for you. You have to run fly scale count N for it to scale up to the max number of machines per region. Then it’ll scale up from min_machine_running to N

1 Like

Thank you, that was exactly the problem!

This way it works perfectly.

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']
  [http_service.concurrency]
    type = "requests"
    soft_limit = 1
    hard_limit = 3
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.