I guess it’s because of internal caching - at the time of testing flyctl logs was not working for me at all. At this stage the app was not scaling - still 1 micro-1x.
So then I set the rate to 1550RPS and it was still catching up, however the number of concurrent requests increased to 750.
At this point I’ve had to wait for a few minutes to get the app scaled up. During this period connections started to drop and I got some error responses too
I was then trying to increase the rate above 2000 RPS but surprisingly the throughput remained fluctuating around 1900RPS. Maybe fly.io rate-limited my requests? I tried to increase the number of concurrent requests (“in-flight”) but results became even worst. So it seems to be the max throughput. I also observed that auto-scaling was a bit behind the actual demand.
Any comments or ideas on how to improve the throughput are welcomed. Thanks.
These are interesting results. Would you mind sharing your app name? I can look at what happened with a bit more precision. For now, I’m assuming this is the “egoweb” app which seems to have had a traffic spike recently.
I suspect your application was “queuing”, which happens as soon as your hard limit is reached. We also have a queue limit, at which point we drop connections. Your hard limit is set to 25, if it can handle more than 25 connections per second (1 request == 1 connection), then you can bump that up significantly. We should make the default higher. Most apps can handle more than that. Looks like your app is just static pages served by a go server. I’d try much higher limits for that kind of app.
Scaling is definitely not instantaneous. Usually takes a few seconds, a few minutes in the worst case scenarios. It depends on your image size and if the cache is warm on the targeted servers. Scaling horizontally happens automatically when some threshold is hit, based on your concurrency limits.
During your test, your hard limit was reached thousands of times per second .
As far as I can tell, your micro-1x didn’t work too hard. It’s hard to tell with such low concurrency limits.
I’m not seeing that here. Is it possible your deploy failed? I see the last 6 versions (they’re from the last 3-4 hours) all use a concurrency setting of 20,25.
You’re right, deployed with 50 again. There’s a Deno backing service which creates the actual json result and that’s where I already set hard limit to 50. I measured that separately and it’s the same ~2000RPS max.
I’m running the tests locally from my MBP. It was my thinking as well that maybe I’m limited by my ISP but then other load tests would end up similarly - which is not the case to my knowledge but I’m going to double check.
Testing this stuff is tricky, as you’ve found. I would recommend manually scaling your app to do load testing like this just to keep things as simple as possible. Here are some things you can try:
Your local machine could bottleneck on https (vs http on example.com)
Connection pooling (especially with https) makes a big difference. If you’re trying to test SSL performance, you’ll want to tune pooling differently than if you’re trying to test HTTP performance.
One thing to know about our infrastructure is that each request creates a new connection to your actual process from the local host hardware. Some servers get slow trying to handle that many tcp connections. We have an experimental concurrency mode that does http connection pooling between our proxy and your app. If you want to try that, add type = "requests" to the concurrency block in fly.toml. This will break autoscaling but should perform better for a blitz of tests.
Is there a request rate you’re aiming for out of curiosity?
This is probably hitting the hard limit on one instance since the requests based concurrency doesn’t trigger autoscaling yet. ~1500RPS is about 50 concurrent requests that finish in 30ms each. If you run flyctl scale set min=10 and then try it again you might see a different result.
That 2000-3000RPS result you’re getting from example.org is probably the most you can expect from your laptop. Going beyond that will mean running tests concurrently from multiple hosts on multiple networks.
I just checked again with https://example.com and now it easily went up to 6k RPS which shows me that the local bandwidth shouldn’t be a bottleneck.
The scaled up 10x micro-1x static serving still tops at 3000RPS. Using only 1 micro-1x brings 1.5k RPS while 10x micro-1x brings 3k RPS. Changing the settings to use type="requests" and flyctl scale set min=10 as advised, brings ~3300RPS.
In other words it’s 10x VM cost increase for a little bit more than doubled throughput. Maybe any other ideas to try to increase the performance @kurt? Thanks!