gRPC 4s delay while benchmarking

electwix · February 24, 2024, 3:05pm

Hi everyone,

I’m new to Fly.io and excited about its potential! I’ve been using ghz to test my gRPC service, but I’m running into some latency issues.

Out of 100 requests, 4 are taking significantly longer (4.83s) compared to the average response time of around 340ms. My machine is running smoothly during the tests, so I’m hoping you might have some insights into what could be causing this behavior.

For further investigation, I’ve set up a private test instance accessible at (https://electwix-grpc-ping-test.fly.dev/). This service uses reflection, so feel free to test it without needing proto files.

Thanks in advance for any help!

jerome · February 24, 2024, 3:26pm

When did you conduct the test? I can’t see much metrics for your app.

You haven’t set a concurrency limit therefore it uses the default hard limit of 25. Once this threshold it reached, connections or requests get queued. Docs: Fly Launch configuration (fly.toml) · Fly Docs

I’d recommend setting a much higher limit.

electwix · February 24, 2024, 3:50pm

Hi, Thanks for the info about the concurrency limit, I updated it to 100.
I tested again right now.

the command I’m using is ghz --proto=grpc_ping/message.proto --call=User.UserService/Ping -d '{"text": "hello"}' -n 30 -c 2 --rps 3 electwix-grpc-ping-test.fly.dev:443 for tests.

message.proto

syntax = "proto3";

option go_package = "github.com/ElecTwix/grpc-ping";
// import "google/protobuf/empty.proto";
package User;

service UserService { rpc Ping(Request) returns (Request); }

message Request { string text = 1; }

result

  Count:        100
  Total:        36.51 s
  Slowest:      5.14 s
  Fastest:      29.31 ms
  Average:      756.96 ms
  Requests/sec: 2.74

Response time histogram:
  29.310   [1]  |
  540.727  [83] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  1052.144 [0]  |
  1563.561 [0]  |
  2074.977 [0]  |
  2586.394 [0]  |
  3097.811 [0]  |
  3609.228 [0]  |
  4120.644 [1]  |
  4632.061 [8]  |∎∎∎∎
  5143.478 [7]  |∎∎∎

Latency distribution:
  10 % in 29.56 ms 
  25 % in 29.71 ms 
  50 % in 29.85 ms 
  75 % in 31.45 ms 
  90 % in 4.48 s 
  95 % in 4.81 s 
  99 % in 5.14 s 

Status code distribution:
  [OK]   100 responses

maybe because of my network, I’m not sure I will test on another network too to be sure, I will notify if that’s the problem.

jerome · February 24, 2024, 5:54pm

Thanks for the info.

Are you starting the benchmark cold? It appears your machine is set to autostop and autostart. If you’re launching the benchmark while the machine is not in a started state, then the proxy will queue connections while the machine is starting.

You could also try using our http handler w/ the h2_backend setting. This would give you (and us) more visibility concerning into your app metrics. It would also allow you to use requests concurrency which gives a small speed boost by pooling connections.

[[services]]
  protocol = "tcp"
  internal_port = 54321

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]
    tls_options.alpn = [ "h2" ]
    http_options.h2_backend =  true

  [services.concurrency]
    type = "requests"
    soft_limit = 75
    hard_limit = 100

Another thing I noticed was that the connections are coming from multiple regions. Your machine is deployed in OTP (Bucharest). If you’re running the benchmark from your computer, can you check which region you’re reaching: curl https://debug.fly.dev (look for Fly-Region in the body)

This is making me think we need to write some docs on how to properly fine tune and benchmark Fly apps.

electwix · February 24, 2024, 7:00pm

It seems that the issue I was experiencing was related to my network. After testing with 5-6 different machines, using other ISPs and connecting from different countries, it appears that the problem is with the ISP. I apologize for not testing with multiple devices before reaching out. Thank you for your assistance.

system · March 2, 2024, 7:01pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Very slow response time spikes Questions / Help	2	121	June 18, 2024
Very slow app response times Questions / Help machines	6	138	November 2, 2024
Request timeouts on fly.io Questions / Help	10	3464	May 19, 2023
Slow response times? Phoenix	21	1707	March 2, 2022
Slow response times Questions / Help	5	1304	February 22, 2023

gRPC 4s delay while benchmarking

Related topics