My app is operating a bit slow so I am looking into my server on Fly.io which I simply use as a poke/pull endpoint.
On the fly.io dashboard, it looks like it’s about 130ms of latency, which seems very high when I expected single-digit millisecond latency, and when I click into the Grafana metrics, it looks like it’s about 1-5ms of latency.
What is the difference between the two and what should I be considering as the true latency of pinging my Fly.io server?
fly_edge_http_response_time_seconds is measured on the edge. It’s the total time it took for a request to get processed including edge <-> worker communication (possibly forwarding to a different region).
HTTP Response Times on fly-metrics.net shows fly_app_http_response_time_seconds. It’s measured on a worker and shows your app response time.
Hmm. The request that you’ve done with flyio-debug: doit header set took ~6ms to complete. The proxy received it at 22:26:56.798843000 and responded at 22:26:56.804672000.
Could you try something else, please?
Save the following data to a file, for example, /tmp/curlformat:
The file instructs curl to dump various timings and response headers. Once you see a request that took too much time (time_total), please post fly-request-id response header value here, so I can look it up in the logs to see what took it so long.
Here’s the output from the curl command you provided with fly-request-id01JHNX6S1BW8JVCCYG28CGQBWN-chi. I just ran this once and seem to get back 450ms from the time_total. I supposed to keep running this command until I get a slower response or is this what you need?
This one actually got routed to an edge server in chi and later forwarded to your app in sea.
From the proxy point of view it took ~106ms, which is expected. RTT between chi and sea is ~50ms. If the proxy didn’t have already established connection between the edge server and the worker server for your app it needs to establish one (that’s one RTT) and another RTT to send the request.
Given that your previous request got routed to sea, I assume chi is unexpected. The previous one was over IPv6, this one is over IPv4. Is this a different internet provider?
Did we find any actionable information? Let me know if you need anything else from me. Our production usage from users this weekend still show 150ms+ queries. Would love to get back to the expected performant latencies again.
not trying to hijack this thread but I’m seeing extremely high latency in bos. I’m seeing 4sec+ added latency above what my instance is reporting via sentry
If you have access to a client that can reproduce high end-to-end latencies, sending us a traceroute (ideally using, mtr -bzo "LSD NABWV" -r -n [ip] for most helpful formatting/info) could help us identify any sub-optimal routing and potentially improve things.
The AS* field gives Fly.io a company to contact in case one of the hops looks wrong, I believe. (Like sending Seattle → Seattle traffic over to Chicago first.)
If you could find one of those Wi-Fi spots that was giving bad results again and then invoke mtr from your laptop, while pointing it at your .fly.dev address (i.e., as [ip]), that would provide similar details, .
The second traceroute gave the info we needed to confirm a routing issue, traffic from Shaw ISP is being incorrectly routed through Chicago edges instead of directly to Seattle. We’re looking into this and I’ll let you know when we manage to apply any routing fixes for this issue.
I ran the commands on-premise of our production user’s venue and wifi-network, where our performance matters. Would you please be able to check for any in-efficient routing to then contact the ISP to resolve as you are currently doing now? It may appear to also incorrectly route through Chicago.