Some load testing results and a few questions

There’s a problem with distributing our “load” numbers between servers rapidly. Since we have servers all over the world, there’s some lag there. So first we start hitting the same few instances as they seem the least loaded, but they quickly receive too many requests. It evens out over time.

Higher concurrency limits would help a lot here. I bet example.com allows more than 50 concurrent requests. That would be the variable I’d tweak the most. Maybe soft: 400 and hard: 800? We use the soft limit for load balancing decisions as well. If you’re running a static server, it should be able to handle that. Maybe then you’ll max out the CPU on your micro-1x, we’ll see.

How many RPS are you setting your program at? And is that really “per second” or concurrently?

I was curious so I did some testing from Frankfurt. With one VM running I got this result:

$ ./burn https://<app>.fly.dev -c 100 --resume-tls

Burning https://<app>.fly.dev for 60s

Statistical Analysis:
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
|     Metric      |   p50    |   p95    |    p99    |   Min    |   Mean   |    Max    | Std. Dev |
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
| Connect         | 944µs    | 8.775ms  | 14.128ms  | 213µs    | 2.377ms  | 16.59ms   | 3.098ms  |
| TLS Handshake   | 53.438ms | 65.577ms | 70.964ms  | 17.305ms | 45.675ms | 72.535ms  | 16.377ms |
| Headers written | 17µs     | 38µs     | 113µs     | 6µs      | 23µs     | 6.997ms   | 68µs     |
| Request written | 19µs     | 42µs     | 118µs     | 6µs      | 26µs     | 6.998ms   | 68µs     |
| TTFB            | 35.239ms | 44.069ms | 191.691ms | 17.314ms | 39.889ms | 321.332ms | 27.394ms |
| Response read   | 35.609ms | 44.695ms | 192.511ms | 17.581ms | 40.456ms | 422.221ms | 28.245ms |
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+

Meta:
  Requests Count |        149341
  Time spent     | 59.999911605s
  RPS            |   2489.020334
  TLS Resumed    |             0

That was with load test concurrency set to 100. In theory if the app has a hard limit of 50, 100 would overload it (and it does look like it caused requests to queue).

Then I scaled the app to 4 VMs and set concurrency to 200:

$ ./burn https://<app>.fly.dev -c 200 --resume-tls

Burning https://<app>.fly.dev for 60s

Statistical Analysis:
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
|     Metric      |   p50    |   p95    |    p99    |   Min    |   Mean   |    Max    | Std. Dev |
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
| Connect         | 752µs    | 20.379ms | 30.597ms  | 116µs    | 3.16ms   | 31.551ms  | 6.184ms  |
| TLS Handshake   | 58.877ms | 89.374ms | 105.295ms | 10.454ms | 62.096ms | 115.799ms | 18.913ms |
| Headers written | 15µs     | 52µs     | 407µs     | 6µs      | 30µs     | 5.864ms   | 118µs    |
| Request written | 16µs     | 54µs     | 410µs     | 6µs      | 32µs     | 5.866ms   | 118µs    |
| TTFB            | 38.567ms | 49.91ms  | 55.966ms  | 1.027ms  | 36.476ms | 200.243ms | 10.914ms |
| Response read   | 38.859ms | 50.335ms | 57.348ms  | 1.193ms  | 36.979ms | 242.264ms | 11.671ms |
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+

Meta:
  Requests Count |        329322
  Time spent     | 59.999872638s
  RPS            |   5488.711651
  TLS Resumed    |             0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 250

That should be pretty close to what 50 concurrent requests per VM can do over the course of a minute.

Your VMs can almost definitely handle higher concurrency, I don’t think they’re the bottleneck.

I went ahead and adjusted them to request based concurrency, with 100 for the hard limit and it did better:

$ ./burn https://<app>.fly.dev -c 350 --resume-tls

Burning https://<app>.fly.dev for 60s

Statistical Analysis:
+-----------------+----------+-----------+-----------+---------+----------+-----------+----------+
|     Metric      |   p50    |    p95    |    p99    |   Min   |   Mean   |    Max    | Std. Dev |
+-----------------+----------+-----------+-----------+---------+----------+-----------+----------+
| Connect         | 2.993ms  | 86.441ms  | 101.253ms | 123µs   | 22.679ms | 122.842ms | 30.729ms |
| TLS Handshake   | 46.51ms  | 120.935ms | 147.743ms | 8.907ms | 60.543ms | 240.271ms | 36.38ms  |
| Headers written | 14µs     | 56µs      | 1.362ms   | 5µs     | 72µs     | 114.342ms | 893µs    |
| Request written | 16µs     | 58µs      | 1.364ms   | 6µs     | 73µs     | 114.343ms | 892µs    |
| TTFB            | 37.999ms | 107.702ms | 173.678ms | 575µs   | 46.434ms | 514.486ms | 33.177ms |
| Response read   | 41.271ms | 115.132ms | 177.057ms | 731µs   | 49.069ms | 542.484ms | 34.852ms |
+-----------------+----------+-----------+-----------+---------+----------+-----------+----------+

Meta:
  Requests Count |        435556
  Time spent     | 59.999875312s
  RPS            |   7259.281752
  TLS Resumed    |             0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 958

Status code:
200 => 435568

Our burn tool reuses connections, I’m not sure if slapper does that.

Our default concurrency settings are designed for dynamic apps. You can probably set them very high for a well tuned static site. It wouldn’t surprise me if goStatic can handle ~200 concurrent requests per VM.

Thanks both @kurt and @jerome for your feedback!

So I went using burn just like above and this is what I got:

./burn https://ego.jveres.me -c 100 --resume-tls

Burning https://ego.jveres.me for 60s

Statistical Analysis:
+-----------------+-----------+-----------+----------+----------+-----------+-----------+-----------+
|     Metric      |    p50    |    p95    |   p99    |   Min    |   Mean    |    Max    | Std. Dev  |
+-----------------+-----------+-----------+----------+----------+-----------+-----------+-----------+
| Connect         | 62.248ms  | 85.072ms  | 87.471ms | 22.013ms | 61.799ms  | 88.683ms  | 13.821ms  |
| TLS Handshake   | 164.389ms | 3.055717s | 3.06494s | 66.776ms | 1.142049s | 3.06582s  | 1.388113s |
| Headers written | 15µs      | 65µs      | 267µs    | 5µs      | 29µs      | 18.515ms  | 113µs     |
| Request written | 16µs      | 70µs      | 269µs    | 5µs      | 31µs      | 18.516ms  | 113µs     |
| TTFB            | 51.871ms  | 73.056ms  | 90.6ms   | 37.095ms | 54.54ms   | 289.943ms | 12.995ms  |
| Response read   | 53.146ms  | 74.781ms  | 96.878ms | 37.579ms | 56.467ms  | 536.11ms  | 19.525ms  |
+-----------------+-----------+-----------+----------+----------+-----------+-----------+-----------+

Meta:
  Requests Count |        106792
  Time spent     | 1m0.00047593s
  RPS            |   1779.852549
  TLS Resumed    |             0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 160
 => 160

Status code:
200 => 106776

Errors:
16 errors.
EOF => 16

I would say quite the same as with slapper with some errors.

Following @jerome’s recommendation I then used soft=400 and hard=800.
This is the result:

./burn https://ego.jveres.me -c 100 --resume-tls

Burning https://ego.jveres.me for 60s

Statistical Analysis:
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+
|     Metric      |    p50    |    p95    |    p99    |   Min    |   Mean    |    Max    | Std. Dev |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+
| Connect         | 52.052ms  | 75.452ms  | 79.045ms  | 41.353ms | 54.27ms   | 79.208ms  | 10.614ms |
| TLS Handshake   | 110.779ms | 120.889ms | 124.669ms | 93.814ms | 109.961ms | 125.508ms | 7.034ms  |
| Headers written | 16µs      | 62µs      | 307µs     | 5µs      | 30µs      | 25.624ms  | 127µs    |
| Request written | 18µs      | 68µs      | 309µs     | 6µs      | 32µs      | 25.624ms  | 128µs    |
| TTFB            | 53.637ms  | 88.605ms  | 102.394ms | 36.859ms | 57.635ms  | 650.042ms | 16.842ms |
| Response read   | 54.736ms  | 90.415ms  | 104.968ms | 37.857ms | 59.161ms  | 650.447ms | 19.848ms |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+

Meta:
  Requests Count |        101929
  Time spent     | 59.999766564s
  RPS            |   1698.823276
  TLS Resumed    |             0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 100

Status code:
200 => 101929

Errors:
0 errors.

I noticed that during re-deployment with the new concurrency settings cdg(B) was used instead of the primary fra but anyway I got quite the same results as before without errors. I also noticed that --resume-tls maybe has no effect at all since the number of resumed TLS is always 0 in the stats.

These are the results with using concurrency 200:

./burn https://ego.jveres.me -c 200 --resume-tls

Burning https://ego.jveres.me for 60s

Statistical Analysis:
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
|     Metric      |    p50    |    p95    |    p99    |   Min    |   Mean    |    Max    | Std. Dev  |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
| Connect         | 119.983ms | 792.669ms | 858.084ms | 101µs    | 156.209ms | 870.497ms | 181.332ms |
| TLS Handshake   | 231.471ms | 951.613ms | 1.264892s | 65.207ms | 451.569ms | 3.574207s | 376.997ms |
| Headers written | 13µs      | 69µs      | 240µs     | 5µs      | 27µs      | 27.828ms  | 142µs     |
| Request written | 14µs      | 74µs      | 243µs     | 6µs      | 28µs      | 27.829ms  | 144µs     |
| TTFB            | 82.041ms  | 106.567ms | 141.763ms | 37.336ms | 84.464ms  | 1.172475s | 36.611ms  |
| Response read   | 82.911ms  | 107.977ms | 154.318ms | 37.581ms | 86.307ms  | 2.485734s | 42.062ms  |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+

Meta:
  Requests Count |         139946
  Time spent     | 1m0.003105912s
  RPS            |    2332.312601
  TLS Resumed    |              0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 651

Status code:
200 => 139946

Errors:
0 errors.

I have a certificate installed by fly.io for this ego.jveres.me domain so I thought
give it a try with using https://egoweb.fly.dev:

./burn https://egoweb.fly.dev -c 100 --resume-tls

Burning https://egoweb.fly.dev for 60s

Statistical Analysis:
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+
|     Metric      |    p50    |    p95    |    p99    |   Min    |   Mean    |    Max    | Std. Dev |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+
| Connect         | 49.761ms  | 72.48ms   | 77.615ms  | 2.355ms  | 46.795ms  | 83.105ms  | 13.793ms |
| TLS Handshake   | 109.627ms | 174.191ms | 185.481ms | 57.363ms | 118.849ms | 188.577ms | 41.597ms |
| Headers written | 16µs      | 64µs      | 266µs     | 5µs      | 30µs      | 83.835ms  | 374µs    |
| Request written | 18µs      | 69µs      | 269µs     | 6µs      | 32µs      | 83.836ms  | 375µs    |
| TTFB            | 54.325ms  | 87.974ms  | 104.013ms | 37.105ms | 58.097ms  | 257.937ms | 14.575ms |
| Response read   | 55.447ms  | 89.949ms  | 107.202ms | 38.15ms  | 59.687ms  | 435.028ms | 18.513ms |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+----------+

Meta:
  Requests Count |         100999
  Time spent     | 1m0.000268214s
  RPS            |    1683.309142
  TLS Resumed    |              0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 162

Status code:
200 => 100999

Errors:
0 errors.

I cannot really reach or go above 3000RPS.

FYI repeated this from a FRA DO droplet and got similar results (1906RPS).

OK, I changed to a Rust based static server and I see an improvement in the test results:

./burn https://ego.jveres.me -c 200 --resume-tls

Burning https://ego.jveres.me for 60s

Statistical Analysis:
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
|     Metric      |    p50    |    p95    |    p99    |   Min    |   Mean    |    Max    | Std. Dev  |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
| Connect         | 51.896ms  | 1.59297s  | 1.628326s | 79µs     | 240.676ms | 1.649805s | 462.01ms  |
| TLS Handshake   | 428.181ms | 2.433003s | 5.976942s | 126.73ms | 830.464ms | 6.850112s | 898.717ms |
| Headers written | 11µs      | 75µs      | 262µs     | 5µs      | 28µs      | 18.731ms  | 174µs     |
| Request written | 12µs      | 77µs      | 265µs     | 5µs      | 29µs      | 18.732ms  | 176µs     |
| TTFB            | 47.469ms  | 70.896ms  | 91.753ms  | 23.921ms | 50.388ms  | 5.10318s  | 26.231ms  |
| Response read   | 48.062ms  | 72.296ms  | 103.138ms | 24.137ms | 51.414ms  | 5.153172s | 28.213ms  |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+

Meta:
  Requests Count |         235855
  Time spent     | 1m0.001367943s
  RPS            |    3930.827048
  TLS Resumed    |              0

Ciphers:
ECDHE ECDSA w/ AES_128_GCM_SHA256 => 913
 => 2

Status code:
200 => 235855

Errors:
0 errors.

Increasing concurrency above 300-500 makes it a bit higher like 4600RPS but that’s the top.
Far from this:

The main difference here is @kurt was testing from one of our own, beefy (48 logical cores), servers. meaning connection latency was very low

+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
|     Metric      |   p50    |   p95    |    p99    |   Min    |   Mean   |    Max    | Std. Dev |
+-----------------+----------+----------+-----------+----------+----------+-----------+----------+
| Connect         | 752µs    | 20.379ms | 30.597ms  | 116µs    | 3.16ms   | 31.551ms  | 6.184ms  |

compared to your latest one:

+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
|     Metric      |    p50    |    p95    |    p99    |   Min    |   Mean    |    Max    | Std. Dev  |
+-----------------+-----------+-----------+-----------+----------+-----------+-----------+-----------+
| Connect         | 51.896ms  | 1.59297s  | 1.628326s | 79µs     | 240.676ms | 1.649805s | 462.01ms  |

This would affect the RPS greatly. There’s a large “base” latency do every request.

Anyway, that’s where benchmarking becomes questionable. We can’t really compare our results with yours and vice versa. Too many factors at play.

There are also a lot of Fly settings that can affect this:

  • concurrency limits (we’ve tweaked this a lot for our own apps to find the sweet spot)
  • limits on connections vs requests (the latter should create less connections to your app, but comes with its own caveats)
  • where your app is deployed

If you’re benchmarking from your home connection or DO:

  • the number of threads available on the system make a big difference
  • connect latency make a big difference
  • network link stability (shouldn’t make a big difference here)

All that said, what’s your use case for Fly? I think if you’re evaluating providers, to compare them, RPS might not be the best metric.

Just because I need to write this down somewhere, the concurrency setting on the benchmark tool limits the total RPS. The basic function is:

rps = concurrency * (1000 / request_time)

So with 100 concurrent test threads/workers/channels, 2000 requests per second is “correct” when requests take 50ms each.

Increasing the request rate means decreasing request time or increasing concurrency. The problem with increasing concurrency is it’ll use a bunch of CPU time on the test machine, which could itself become a bottleneck! My test was saturating 4 CPUs, if I remember right.

@kurt Yes, that’s correct. Local concurrency I don’t see an issue in my tests.

@jerome Although RPS is not the only aspect of measuring throughput I often have to work with it as a non-functional requirement - a scalability measure. Regarding my use-case: I measured here the upper limits of a low-end auto-scaling setup with a simple static use-case and got nice results using both Go and Rust static servers. My other Deno based dynamic setup performs similarly well. I was also curious how auto-scaling is handling spikes. I think now I shared all my results here. Thanks for the fine-tuning support!

1 Like