I’m new to Fly.io and excited about its potential! I’ve been using ghz to test my gRPC service, but I’m running into some latency issues.
Out of 100 requests, 4 are taking significantly longer (4.83s) compared to the average response time of around 340ms. My machine is running smoothly during the tests, so I’m hoping you might have some insights into what could be causing this behavior.
For further investigation, I’ve set up a private test instance accessible at (https://electwix-grpc-ping-test.fly.dev/). This service uses reflection, so feel free to test it without needing proto files.
When did you conduct the test? I can’t see much metrics for your app.
You haven’t set a concurrency limit therefore it uses the default hard limit of 25. Once this threshold it reached, connections or requests get queued. Docs: Fly Launch configuration (fly.toml) · Fly Docs
Hi, Thanks for the info about the concurrency limit, I updated it to 100.
I tested again right now.
the command I’m using is ghz --proto=grpc_ping/message.proto --call=User.UserService/Ping -d '{"text": "hello"}' -n 30 -c 2 --rps 3 electwix-grpc-ping-test.fly.dev:443 for tests.
Count: 100
Total: 36.51 s
Slowest: 5.14 s
Fastest: 29.31 ms
Average: 756.96 ms
Requests/sec: 2.74
Response time histogram:
29.310 [1] |
540.727 [83] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1052.144 [0] |
1563.561 [0] |
2074.977 [0] |
2586.394 [0] |
3097.811 [0] |
3609.228 [0] |
4120.644 [1] |
4632.061 [8] |∎∎∎∎
5143.478 [7] |∎∎∎
Latency distribution:
10 % in 29.56 ms
25 % in 29.71 ms
50 % in 29.85 ms
75 % in 31.45 ms
90 % in 4.48 s
95 % in 4.81 s
99 % in 5.14 s
Status code distribution:
[OK] 100 responses
maybe because of my network, I’m not sure I will test on another network too to be sure, I will notify if that’s the problem.
Are you starting the benchmark cold? It appears your machine is set to autostop and autostart. If you’re launching the benchmark while the machine is not in a started state, then the proxy will queue connections while the machine is starting.
You could also try using our http handler w/ the h2_backend setting. This would give you (and us) more visibility concerning into your app metrics. It would also allow you to use requests concurrency which gives a small speed boost by pooling connections.
Another thing I noticed was that the connections are coming from multiple regions. Your machine is deployed in OTP (Bucharest). If you’re running the benchmark from your computer, can you check which region you’re reaching: curl https://debug.fly.dev (look for Fly-Region in the body)
This is making me think we need to write some docs on how to properly fine tune and benchmark Fly apps.
It seems that the issue I was experiencing was related to my network. After testing with 5-6 different machines, using other ISPs and connecting from different countries, it appears that the problem is with the ISP. I apologize for not testing with multiple devices before reaching out. Thank you for your assistance.