@jerome Picking up my work on Grafana and Prometeus where I left it off last year.
Quick question, I’m running 2 apps with volumes attached by don’t seem to be able to find fly_volume_size_bytes as documented here: Metrics on Fly Any ideas?
proxy_id - is to make sure the counters reset when we deploy or else it leads to some weird metrics. It’s possible we can have 2 proxies running concurrently while we gracefully shut down connections during a deploy.
fly_app_concurrency - this is the current number of connections (or requests, depending on your concurrency config) established to your app instance
Last question (I hope) related to metrics, does fly-cache-statusCache Hits in Metrics require the HTTP handler, or does it work regardless? I have a custom cache status header i can change, would allow me to easily expose a cache hit ratio in Grafana.
Edit: I have two apps, one staging single instance and one production with 2 instances. I"m getting fly_volume_used_pct data for the single instance app, not for the multi instance app. Query is basic:
Works:
max by(region) (fly_volume_used_pct{app=~"jt-web-staging", region=~".*", host=~".*"})
Fails:
max by(region) (fly_volume_used_pct{app=~"jt-web-production", region=~".*", host=~".*"})
Hi, I am having an issue where my custom metrics are not appearing in managed Grafana or Prometheus.
/metrics on port 9091 as defined in fly.toml is definitely getting hit every 15s, but no new metrics have appeared in the metrics browser, nor when I query the prometheus API at ‘https://api.fly.io/prometheus/ORG_NAME/api/v1/label/__name__/values’.
This is an example line from the output at the app metrics endpoint ‘my_metric_name{process=“myAppName”} 525447’.
The app in question is a multi-process app, where the metrics endpoint is only exposed for one process out of two.
Would appreciate any advice you can offer for debugging this issue. Many thanks.
Thanks for the reply! The server was bound correctly, there was just an issue with my custom exporter implementation which I have now resolved. Would be cool if scraping errors could be exposed somehow. If anyone else has this issue then promtool which ships with prometheus is really useful for diagnosing these sort of issues. E.g. curl -s http://0.0.0.0:9091/metrics | promtool check metric
hihi not sure if this is the right place to spam but here goes: I wanted to try Fly.io metrics out. I think I’ve got a service correctly deployed + barfing metrics and my fly.toml is configured correctly.
My problem is this link doesn’t work: Sign In · Fly I created this org less than an hour ago.
There’s a known issue where new/updated orgs aren’t synchronized with the hosted Grafana instance (it only happens on a session refresh), so if you’re already signed in it will take a couple hours for it to get updated. We’re working on a fix but in the meantime, you can manually refresh your session by going to fly-metrics.net/logout which will force an update.
It looks recovered at this point, Fly Metrics’ own metrics are showing an impact between 2022-10-28T17:14:00Z→2022-10-28T19:58:00Z.
We plan to eventually make this service more reliable with a replicated database cluster but haven’t gotten to that point yet, which is why it was affected by this single-host issue. Sorry for the inconvenience!
Currently, it seems like the prometheus collector does not support openmetrics and more specifically their info type. When using this type (as defined as Info in the prometheus_client official rust crate) no metrics appear on the hosted grafana instance (the metrics fail to scrape). When any reference to this Info type is removed from the returned metrics, then the metrics scraping succeeds.
It would be nice for
fly to somehow convey errors from attempts to take metrics to the user, either through fly log or on the dashboard somehow.
and/or support for openmetrics/the info type to be added to the metrics scraper
as for 2, from my testing (if the content type is returned as application/openmetrics-text) the latest version of victoriametrics supports this type/openmetrics