Prometheus / Fly metrics getting dropped

bekit · November 16, 2021, 7:56pm

I know the prometheus metrics is in early access, but I was wondering if there are any known ongoing issues with it. I’m noticing that a lot of data points seem to be getting dropped when displaying the data in grafana. A lot of my graphs end up looking like this:

This seems to be the case for both metrics generated by apps and for the default fly metrics. If I had to guess, the prometheus nodes aren’t keeping and are missing scrapes, sometimes for several minutes at a time. Looking back at the history, there seems to be occasional blips (which makes sense for an early access service), but it seems to have gotten significantly worse in all regions around 11am PST yesterday (November 15th):

charsleysa · November 16, 2021, 9:59pm

It does make keeping track of live nodes a bit difficult.

avinashbot · November 17, 2021, 12:19am

Also checking in with the same issue since about 24 hours ago (varying by a few hours depending on the app). In my case, it looks like the default fly metrics are flaky too, but custom metrics are just not showing up.

All my instances are in FRA, if that helps.

kurt · November 17, 2021, 1:00am

We have been furiously expanding this metrics cluster. The short answer on gaps like this is: Our metrics cluster is growing quickly and we aren’t ahead of it. We will charge for it when we’re comfy we know how to keep up, which will slow growth, but we’re not comfortable charging money for metrics at the current level of reliability.

For the moment the best advice I have is “hold tight”.

bekit · November 17, 2021, 6:36pm

That makes sense. If we were to bring up our own prometheus instance on fly, is there a way to scrape the fly metrics?

sudhir.j · November 17, 2021, 6:42pm

~~The data should be available via the API: Hooking Up Fly Metrics · Fly — that doesn’t replace the Fly system, though, which I think is actually what you’re asking?~~

~~If you wanted to completely bypass Fly you could configure your applications to send data straight out to an external service or to an internal app using the app.internal endpoint.~~

kurt · November 17, 2021, 7:37pm

You can scrape your own metrics, assuming you’re using custom metrics and your exporter is listening on the private IPv6 addresses. We don’t have Prometheus endpoints available for scraping, though.

That might be a good feature to add. One way we can improve the reliability of metrics for paid users would be to just run dedicated Victoria Metrics clusters.

charsleysa · November 17, 2021, 11:45pm

Seems to be working much better today!

bekit · November 18, 2021, 12:14am

Yeah, if it’s going to be a paid feature, I’d be fine with it being treated similar to postgres where we just run our own instance as a fly app under our own organization.

As @charsleysa said, it does seem to be much more stable as of this morning. Glad to see it catching up to the demand

bekit · December 29, 2021, 7:44pm

@kurt Not a huge priority, but I wanted to mention that it looks like the metrics endpoints aren’t keeping up with data again. It looks like we started getting bits of missing data on the 27th, continuing through today.

Topic		Replies	Views
Prometheus not collecting or exposing metrics? Questions / Help grafana	6	2674	September 26, 2022
Prometheus API and metrics are currently 503-ing	6	443	October 21, 2023
Prometheus API currently 503-ing metrics	6	385	November 24, 2023
Fly prometheus metrics have been unreliable Questions / Help metrics , grafana	2	452	November 16, 2023
Dashboard metrics are not available	12	764	December 4, 2023

Prometheus / Fly metrics getting dropped

Related topics