I am running an experiment on fly.io for our future multiplayer backend at jitter.video. It is a Phoenix application that is just tracking users presence for now, so the app is only taking persistent WebSocket connections.
I kept the soft_limit and hard_limit default values, 20 and 25 respectively, to test the auto scaling. I only have 2 servers:
App
Name = xxx
Owner = jitter
Version = 33
Status = running
Hostname = xxx
Deployment Status
ID = a9f5d1c6-b2b1-2a0f-6e63-652e9734a5ff
Version = v33
Status = successful
Description = Deployment completed successfully
Instances = 2 desired, 2 placed, 2 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
bacbb76f app 33 lax run running 1 total, 1 passing 0 1h22m ago
157d39b2 app 33 cdg run running 1 total, 1 passing 0 1h23m ago
With standard autoscaling and min=2 max=10:
Scale Mode: Standard
Min Count: 2
Max Count: 10
The hard limit is reached pretty fast:
2022-07-19T09:07:37Z proxy[157d39b2] cdg [warn]Instance reached connections hard limit of 25
the auto-scaling feature can take a while to react to increased load, which makes it hard to run tests against. While you might have already found these tips, Iâd definitely recommend 1) running your load test for at least a few minutes and 2) making sure your [services.concurrency] block is set to type ârequestsâ as an initial troubleshooting step.
Weâd love if autoscale were a bit more responsive; weâre currently working on making some changes to how we schedule instances which we think will help a great deal with autoscale performance.
on a somewhat related note, weâre running some scheduled maintenance today which involves rebooting a significant number of hosts in our fleet. I donât think thatâs interfering with autoscaling, but if youâre doing testing today you might notice a few minutes of downtime with your appâs individual VMs.
My test is not a load testing script, I have deployed a real Phoenix Presence integration on the live site (no UI, just logging present users in the console). Itâs been running since yesterday and the connections limit has been reached a lot of times during long periods, without triggering the auto scaling.
I tried changing services.concurrency.type to requests but that doesnât work either, and anyway the app doesnât see any HTTP request, just persistent WebSocket connections so I think connections is closer to what I want (I want to size the service for a fixed number of connections per instance).
When the connections limit is reached, new connections are blocked and nothing happens.
Ah, yeah I can see why youâd go with âconnectionsâ-- I missed the parts where you mentioned it was only using WebSockets, sorry about that!
And thank you for that additional context â if your app has continuously had over 25 active connections for several hours, then we can definitely dig into this a bit further to see what seems to be holding the new placements up.
Does manual scaling work with your app?
What does your region pool look like (fly regions list -a <app-name>)
$ fly scale count 3
Count changed to 3
$ fly status
App
Name = xxx
Owner = jitter
Version = 37
Status = running
Hostname = xxx
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
68853dcd app 37 cdg run running 1 total 0 9s ago
01fdbc50 app 37 lax run running 1 total, 1 passing 0 55m3s ago
c1105a0d app 37 cdg run running 1 total, 1 passing 1 55m23s ago
And I configured 2 regions in the pool:
$ fly regions list
Region Pool:
cdg
lax
Backup Region:
Hello! We just tracked down an internal bug in the autoscaling service that was causing scaling events to not get triggered properly- sorry for the inconvenience. Weâve deployed a fix so it should be working correctly now, please give it another try!
Weâve been investigating the scaling behavior in your websocket application since yesterday.
We identified a bug in the fly_app_concurrency metric (which is used to make autoscaling decisions), that incorrectly lowers the value to zero if there are no changes in concurrency for 60 (edit: 30) seconds. Any app that maintains many existing, long-lived connections without frequent connects/disconnects (which seems to be the base for your applicationâs persistent websocket connections) will hit this edge-case, which will cause scaling to not occur as expected.
Weâre working on a fix and I will let you know when itâs deployed.
I double-checked and the metric-timeout that triggers the scaling bug is actually 30 seconds, not 60.
As long as the app has a steady stream of connects/disconnects the concurrency metric will be accurate, but any >30 second gaps cause the metric to reset to zero and the concurrency metric to remain incorrectly low.
Though your app does generally have quite a few connections, looking at the metrics there were a few brief gaps (for instance: 2022-07-27 from 04:51:42 â 04:52:20 UTC) that caused the concurrency metric to drop lower than it should, keeping your app scaled at 1 instance.
Update: a fix has now been deployed, the fly_app_concurrency metric should now remain accurate for applications with many long-lived connections like websockets, so autoscaling will trigger more reliably.
I have been starting new experiments to size the memory consumption of our app. Initially hard_limit and soft_limit were too high (500 and 400), and I started getting OOM errors, which made VMs crash (I think), and in turn made autoscaling go from 2 to the max of 10 instances.
I tried to lower the limits in fly.toml to 250 / 200, then to 100 / 50, each time doing new flyctl deploy but that didnât help. I had to scale the VMs memory up to stop them restarting over and over again. Then I retried to do a deployment and the limits finally seemed to settle to 100 / 50.
If connections are being rerouted to other less-loaded instances once the soft_limit is reached on an instance, thatâs exactly how the proxyâs load-balancing behavior is designed to work. Autoscaling adds and removes instances so that thereâs enough total capacity for the current number of connections to run within the soft_limit, and scales up when the total load exceeds that. In other words, when all of the instances reach the soft_limit I would expect a new instance to be added, and thatâs exactly what Iâm seeing here, so it seems like itâs working as expected.
I see, our need is more to start instances where there are more users rather than balance them globally. That is actually why I chose the âstandardâ autoscaling mode, thinking âbalancedâ would do the opposite. Thanks for your explanation it is very clear now