Metrics Dashboard not populating

I have a setup where I have an HAProxy instance proxying to my real backend server, which is using actix-web (Rust). As such, I have my main API app instance bound with .bind(("::", 8080)). As a side-effect of this, many of the graphs in my Fly.io Metrics dashboard aren’t getting populated (e.g. HTTP Status Code, HTTP Response Times, etc). I’ve tried setting up a metrics-dedicated server in the same instance bound with .bind(("0.0.0.0", 9091)), but no dice, even though the /metrics endpoint works as-expected.

Where could I be going wrong here?

fly.toml:

app = '<redacted>'
primary_region = '<redacted>'

[build]

[env]
  PORT = '8080'

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 1
  processes = ['app']

  [[http_service.ports]]
    handlers = ["http"]
    port = 80

  [[http_service.ports]]
    handlers = ["tls", "http"]
    port = 443

[metrics]
  port = 9091
  path = "/metrics"

[[vm]]
  memory = '512mb'
  cpu_kind = 'shared'
  cpus = 1

My actix-web main() method:

#[get("/")]
async fn hello() -> impl Responder {
    format!("Hello from fly.io!")
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {

    let prometheus = Arc::new(
        PrometheusMetricsBuilder::new("app")
            .endpoint("/metrics")
            .build()
            .unwrap(),
    );

    let metrics_server = HttpServer::new({
        let prometheus = Arc::clone(&prometheus);
        move || App::new().wrap(prometheus.clone())
    })
    .bind(("0.0.0.0", 9091))?
    .run();

    let api_server = HttpServer::new(move || {
        let prometheus = Arc::clone(&prometheus);
        App::new()
            .wrap(prometheus.clone())
            .service(hello)
    })
    .bind(("::", 8080))?
    .run();

    tokio::try_join!(api_server, metrics_server)?;
    Ok(())
}

Sample curl-ing:

% curl "https://<redacted>/v1/"
Hello from fly.io!
% curl "https://<redacted>/v1/"
Hello from fly.io!
% curl "https://<redacted>/v1/metrics"
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.005"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.01"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.025"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.05"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.1"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.25"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="0.5"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="1"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="2.5"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="5"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="10"} 5
app_http_requests_duration_seconds_bucket{endpoint="/",method="GET",status="200",le="+Inf"} 5
app_http_requests_duration_seconds_sum{endpoint="/",method="GET",status="200"} 0.00016875
app_http_requests_duration_seconds_count{endpoint="/",method="GET",status="200"} 5
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.005"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.01"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.025"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.05"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.1"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.25"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="0.5"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="1"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="2.5"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="5"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="10"} 44
app_http_requests_duration_seconds_bucket{endpoint="/metrics",method="GET",status="200",le="+Inf"} 44
app_http_requests_duration_seconds_sum{endpoint="/metrics",method="GET",status="200"} 0.0026970429999999997
app_http_requests_duration_seconds_count{endpoint="/metrics",method="GET",status="200"} 44
# HELP app_http_requests_total Total number of HTTP requests
# TYPE app_http_requests_total counter
app_http_requests_total{endpoint="/",method="GET",status="200"} 5
app_http_requests_total{endpoint="/metrics",method="GET",status="200"} 44

It looks like you’re accessing /v1/metrics and not /metrics (as defined in your fly.toml) in your test, maybe that’s the issue.

In this sample, I’ve temporarily added the /v1/metrics endpoint to my regular server for the sake of easier debugging. metrics_server is a seperate server running in the same process, and the url for that is just a simple /metrics on port 9091.

That said, I’ve made progress, and have learned a lot about metrics in fly.io in the process! Turns out the immediate issue here is that the middleware library I’m using doesn’t automatically collect the various fly.io specific metrics that’re displayed in the Metrics dashboard (e.g. fly_edge_http_responses_count). I’ve manually added code to do that, and now, I’m finally seeing data!

However, this still begs the question: why is it that metrics aren’t automatically generated by fly.io itself when my backend serves requests over IPv6 (by way of HAProxy and 6PN)?

Ah, in that case, I think the behavior you’re seeing may happen if haproxy is sending the requests to your Machine through its 6pn directly (.internal), not through the fly-proxy service. (In fact, the http_service config isn’t even being used at all for 6pn.) If you use a Flycast address (.flycast) instead, the requests will go through the fly-proxy service layer and then would get tracked by the fly_edge_ metrics. In many cases using Flycast can make haproxy layer redundant, in any case hopefully this helps clarify this a bit further for you.

That did the trick! Thank you, @wjordan!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.