Some load testing results and a few questions

Hi! I’m making some load testing on a simple static web server with some basic setup: standard 1-10 scaling, micro-1x VM, FRA region.

Starting with 50RPS everything seems fine:

Increasing up to 550RPS, interestingly seems even better:

I guess it’s because of internal caching - at the time of testing flyctl logs was not working for me at all. At this stage the app was not scaling - still 1 micro-1x.
So then I set the rate to 1550RPS and it was still catching up, however the number of concurrent requests increased to 750.

At this point I’ve had to wait for a few minutes to get the app scaled up. During this period connections started to drop and I got some error responses too :slightly_frowning_face:

But then it scaled up to 10 VMs and became responsive again. Maybe not on that level I’d have expected, though.

I was then trying to increase the rate above 2000 RPS but surprisingly the throughput remained fluctuating around 1900RPS. Maybe fly.io rate-limited my requests? I tried to increase the number of concurrent requests (“in-flight”) but results became even worst. So it seems to be the max throughput. I also observed that auto-scaling was a bit behind the actual demand.

Any comments or ideas on how to improve the throughput are welcomed. Thanks.

I almost forgot, I used my ego network visualization experience for the load testing,
available here https://github.com/jveres/ego-ui
(currently served from https://ego.jveres.me)

Hey there!

These are interesting results. Would you mind sharing your app name? I can look at what happened with a bit more precision. For now, I’m assuming this is the “egoweb” app which seems to have had a traffic spike recently.

I suspect your application was “queuing”, which happens as soon as your hard limit is reached. We also have a queue limit, at which point we drop connections. Your hard limit is set to 25, if it can handle more than 25 connections per second (1 request == 1 connection), then you can bump that up significantly. We should make the default higher. Most apps can handle more than that. Looks like your app is just static pages served by a go server. I’d try much higher limits for that kind of app.

Scaling is definitely not instantaneous. Usually takes a few seconds, a few minutes in the worst case scenarios. It depends on your image size and if the cache is warm on the targeted servers. Scaling horizontally happens automatically when some threshold is hit, based on your concurrency limits.

During your test, your hard limit was reached thousands of times per second :slight_smile:.

As far as I can tell, your micro-1x didn’t work too hard. It’s hard to tell with such low concurrency limits.

We’re working on exposing more of these metrics.

1 Like

Hi @jerome!

Yes, it is “egoweb” a static Go server. Hard limit during load testing was set to 50.
I’m curious about your findings.

Thanks.

I’m not seeing that here. Is it possible your deploy failed? I see the last 6 versions (they’re from the last 3-4 hours) all use a concurrency setting of 20,25.

You’re right, deployed with 50 again. There’s a Deno backing service which creates the actual json result and that’s where I already set hard limit to 50. I measured that separately and it’s the same ~2000RPS max.

I don’t really see much traffic at all in the past hour for your app.

This could be limited by your test machine. Are you running this locally or from a VM somewhere?

I’m running the tests locally from my MBP. It was my thinking as well that maybe I’m limited by my ISP but then other load tests would end up similarly - which is not the case to my knowledge but I’m going to double check.

I think it’s not my test machine limitations, I’m able to load test http://example.com up to 5000RPS and beyond.

I’ve tested that endpoint earlier, like between 2020-09-25 11:17 and 2020-09-25 11:40.

What’s the actual command you’re running to test?

Testing this stuff is tricky, as you’ve found. I would recommend manually scaling your app to do load testing like this just to keep things as simple as possible. Here are some things you can try:

  1. Your local machine could bottleneck on https (vs http on example.com)
  2. Connection pooling (especially with https) makes a big difference. If you’re trying to test SSL performance, you’ll want to tune pooling differently than if you’re trying to test HTTP performance.

One thing to know about our infrastructure is that each request creates a new connection to your actual process from the local host hardware. Some servers get slow trying to handle that many tcp connections. We have an experimental concurrency mode that does http connection pooling between our proxy and your app. If you want to try that, add type = "requests" to the concurrency block in fly.toml. This will break autoscaling but should perform better for a blitz of tests.

Is there a request rate you’re aiming for out of curiosity?

I’m using slapper.

Good catch! Testing https://example.com gives me only 2-3000RPS.

Just tried with type = "requests" but with that, at around 1500RPS all requests get dropped.

I’m just exploring fly.io and preparing for an internal presentation for our devs.

This is probably hitting the hard limit on one instance since the requests based concurrency doesn’t trigger autoscaling yet. ~1500RPS is about 50 concurrent requests that finish in 30ms each. If you run flyctl scale set min=10 and then try it again you might see a different result.

That 2000-3000RPS result you’re getting from example.org is probably the most you can expect from your laptop. Going beyond that will mean running tests concurrently from multiple hosts on multiple networks.

You are most probably right, thanks for looking into it.

:+1: Thanks for the notes. And slapper looks really neat.

Indeed. Btw what would you recommend for distributed load testing?

We just run burn (equivalent of slapper) from a bunch of VMs in multiple regions. I know of a few people who’ve used locust against their apps.

I just checked again with https://example.com and now it easily went up to 6k RPS which shows me that the local bandwidth shouldn’t be a bottleneck.
The scaled up 10x micro-1x static serving still tops at 3000RPS. Using only 1 micro-1x brings 1.5k RPS while 10x micro-1x brings 3k RPS. Changing the settings to use type="requests" and flyctl scale set min=10 as advised, brings ~3300RPS.

In other words it’s 10x VM cost increase for a little bit more than doubled throughput. Maybe any other ideas to try to increase the performance @kurt? Thanks!