Autoscaling issues during peak traffic

MaxCloutier · November 6, 2025, 8:22pm

I’m having issues with autoscaling.

I’m hosting a tournament scheduling software. 90% of the time, there are no issues, but as soon as a tournament weekend comes around, the machines will slow down immensely (talking minutes to respond) or crash.

This happens almost every time around the playoffs. Every player takes their phone out and they look at the same time to try to see when they play next and that overloads the machines. It feels like the autoscaling just isn’t “ready” or isn’t working at all. The spike in traffic shouldn’t be much more than 300-500 people at once so I’m struggling to find how to configure things appropriately.

I need machines to start waking up earlier and I need them to share the load faster so it feels seamless for the users.

I have two apps, one for the frontend and one for the backend. Each app has multiple regions which I want to keep one machine active per region and autoscale in the region the traffic is coming from.

Backend:

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = "suspend"
  auto_start_machines = true
  min_machines_running = 3
  processes = ['app']

  [http_service.concurrency]
    type = "connections"
    soft_limit = 100  
    hard_limit = 300

  [[http_service.ports]]
    handlers = ["http"]
    port = "80"

  [[http_service.ports]]
    handlers = ["tls", "http"]
    port = "443"

[metrics]
port = 8080
path = "/metrics"

[deploy]
  release_command = "npx sequelize db:migrate"

[[vm]]
  memory = '1gb'
  cpu_kind = 'shared'
  cpus = 1
  memory_mb = 1024
  swap_size_mb = 512

Frontend:

[build]

[env]
  NODE_ENV = "production"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = "suspend"
  auto_start_machines = true
  min_machines_running = 3
  processes = ['app']

  [http_service.concurrency]
    type = "connections"
    soft_limit = 100  
    hard_limit = 200  

[[vm]]
  memory = '1gb'
  cpu_kind = 'shared'
  cpus = 1
  memory_mb = 1024
  swap_size_mb = 512

Would love some pointers/help with my config or maybe my approach is wrong all together.

Thank you!

mayailurus · November 6, 2025, 8:56pm

Hm… There are some errors in your config files, although that’s not necessarily the source of problem you were asking about. Still, it would be prudent to clean them up, either way. The new fly config validate --strict will show various warnings that flyctl doesn’t mention otherwise.

Also, type = "requests" is generally better for HTTP—unless you’re using WebSockets, or the like.

This part unfortunately isn’t possible, unless you implement your own downscaling, etc. The min_machines_running knob only affects the primary_region.

Hope this helps a little!

MaxCloutier · November 6, 2025, 9:03pm

I wasn’t aware of the validate command, that’ll be super helpful!

I do use WebSockets for realtime scores and standings updates. I’m wondering if there would be a way to specify 2 different service concurrency to handle “requests” and also “connections” independently? Or maybe I should stick to type = “requests” so the scaling is based on requests and not connections regardless.

This part unfortunately isn’t possible, unless you implement your own downscaling, etc. The min_machines_running knob only affects the primary_region.

Would the autoscaling still work per region though? i.e. if I’m okay with the non-primary regions scaling down to 0, would the region’s load trigger the autoscaling for that region and then scale back down to 0 when no one in the region is active? If that’s the case, I can live with that, if not I’ll have to figure out a different way of scaling .

Thanks for the pointers!

mayailurus · November 6, 2025, 9:13pm

Right, what you can’t do is guarantee that each outlying region will always contain at least one started Machine in it. (That feature would be useful for LiteFS, too, actually.)

You could do that if you split WebSockets off into a different port, I believe, but that may not be worth the effort.

Since you do have WebSockets, type = "connections" is the way to go. It’s really not clear how requests are counted in that case…

PeterCxy · November 6, 2025, 9:25pm

If you’re using websockets mainly, then the concurrency type should mostly behave the same, whether it is set to connections or requests.

Back to the main issue in this thread – starting or resuming a machine takes some time, and if your load is really really spiky, it is possible that connections can get sent to whichever machine happened to be running / started faster at the time. The proxy is allowed to send connections to a machine up until hard_limit is reached. If that number of connections is enough to overwhelm your machine, as is observed in this case, then hard_limit should probably be reduced – and that’ll force the proxy to wait until more machines are available. You can also reduce soft_limit so that machines are started earlier (we try to start machines when all running machines have exceeded soft_limit).

MaxCloutier · November 6, 2025, 9:31pm

Ok so essentially I need to figure out what one machine can handle to pin point the hard and soft limits appropriately and then maybe see if I should force start a few more machines during big tournaments to make sure machines don’t need to take time “starting”.

Aside from doing it manually, I’m wondering if I could get something going in my infra that could automatically start machines for regions where a tournament is set to happen using `flyctl` . I.E. instead of me doing it manually, if a tournament is coming up next weekend, the app would start machines itself a few hours before the tournament is set to start and then stop machines after the tournament is over.

Thanks for your reply!

PeterCxy · November 6, 2025, 9:38pm

In fact, after taking a look at your app, I think you might only need to tune soft/hard limits here – I don’t really think the machine start delay is playing a role. It seems that your machines never even reach their soft limits and yet are having problems, which indicates that even the current soft limit is too high. Setting these to an appropriate value is essential to have autoscaling working.

By the way, the No. of connections handled by your machines is exposed as a metric you can query for historical data. This might help you in deciding on a good soft/hard limit value. See our docs for more information (it’s fly_app_concurrency documented here).