This is academic more than anything; I was playing with fly.io, and set up the “hello world” example app for golang, along with a grafana app. Then I started spamming it with apache bench:
$ ab -n1000000 -c90 https://cool-tree-1724.fly.dev/a
I let this run for a while, and tried multiple scaling levels. The interesting thing I found is that count=2 actually has significantly worse performance than count=1 (a high value like count=10 does improve things though). Here is the set of graphs I’m looking at:
I’m curious as to why this would happen. I can show more details about the setup, but it’s fairly standard, I just tried to follow the setup in the wiki. The exact queries I’m using may be off (I’m a prometheus noob), but they should be directionally relevant.
I think you should probably retry this with something like k6 instead of ab.
What you’re probably hitting is our TLS handshake throttling. ab doesn’t do a good job reusing connections, so each request is a new handshake. When you get throttled like that, tests get weird.
Basically the single app server is a little slower to respond, it’s keeping your test under our TLS rate limiting, and requests can run back-to-back-to-back. With two VMs, you hit our rate limit and there’s a cooldown period before we’ll allow more traffic through.
You can also try ab with just http:// for your URL, but it’s really not a very good load testing tool. Churning connections in that way stresses the ability to accept TCP connections vs the HTTP throughput rate.
Thanks a lot for the answer @kurt! Yes, this makes sense. I’ve tried to run this again with k6 run k6_load_test.js --vus 90 --duration 30m, and the following script:
import http from 'k6/http';
import { sleep } from 'k6';
export default function () {
http.get('https://cool-tree-1724.fly.dev/blah');
}
And then I tried various scaling options. Here’s how the graphs look like:
This was done locally; the primary app region is the same as my local region (London). Scaling behaved roughly as expected, with a few comments:
the edge tcp connects graph clearly shows ab spinning up plenty of TCP connections and k6 reusing as much as possible
I’m not sure if there’s a way to specify where exactly to spin a new region; the best I could find is the volume anchoring mentioned in the wiki, but that sounds like a hack more than anything
related to the above, at count=2 and count=3, I got instances in the backup regions (ams and cdg), but those didn’t really help since the majority of requests continued to get routed to lhr (as expected, I guess)
at count=4, there was a clear bump in volume, since the 4th instance started in lhr
autoscaling did not seem to help too much, since it insisted on scaling regions other than lhr
changing the autoscaling from standard to balanced resulted in a temporary dip because it decided to kill the lhr instance before changing its mind and spinning it up again
at count=20, all the instances started in lhr, which maxed out my load testing; I’m not sure why it decided to only spin up in lhr this time. I’d be curious to see exactly how autoscaling works under the hood; maybe it needed a bit more time to register that most requests came from London
fly regions remove ams cdg did not do anything, since I think the minimum is two backups
So in summary, this is really cool to play with, and it’s great to see the scaling mechanism in action, especially given how fast it is to spin instances up and down! Thanks again @kurt for the insights. It would be nice to have a bit more control over the regions where instances get started, though it might not be such a big deal with a real world app.