Metrics for debugging connection losses

mario1 · November 27, 2024, 11:23am

I have an app that is the gRPC endpoint for long-running processes. From time to time, I see unfortunate connection losses that can last quite long. What I see in the Grafana metrics is that the instance is up and apparently working fine (metrics like fly_instance_memory_mem_free look okay), but the metrics relating to the app and the edge (e.g., fly_app_concurrency, fly_edge_data_in) have a gap. For example, I recently had such a gap lasting about 15 minutes (on 2024-11-12 in the AMS region). This is much more than the disconnection timeouts that I want to set in my app.

Is anyone else seeing such connection losses? What kind of metrics are you looking at to help debug the issue? If the problem is not in my configuration, I guess this is an internal reliability issue, and there is not much I can do.

system · December 4, 2024, 2:54pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Missing instance metrics in Grafana for specific app Questions / Help metrics , grafana	4	34	July 8, 2025
Help me understand metrics and issues	8	880	April 7, 2023
Mutli App, Single Org Metrics not displaying in Grafana Questions / Help metrics	13	763	July 21, 2021
Prometheus endpoint appears to be timing out / returning 5xx	2	875	January 20, 2022
Rendering of metrics is wrong for some longer duration graphs	1	629	September 16, 2020

Metrics for debugging connection losses

Related topics