Is there a 1MB limit for metrics processing?

I’m observing a pattern where, if the Prometheus scraping response size goes beyond approximately 1MB, the metrics don’t show up in the dashboards (custom ones). Is there a soft limit on scraping?

it doesn’t seem to be based on the metrics pivots or labels, since it works until a certain point—but so far I’m not certain just yet.

I’ve also validated that the metrics URL works fine, and responds without a lot of delays. I’ve recently added more pivots and that’s when I started seeing this issue, so it sounds like it has to do something with the response size.

I tried to debug this a bit further by judging dumping the /metrics endpoint result to disk every 30s, and checking what’s happening at the time when the dashboard stops populating data. I did this twice at different times of the day and the result is nearly identical. When the results stop flowing in the dashboard, the metrics details are:

  1. compressed size on disk is (output of curl request with accept-encoding: gzip, trying to replicate the same request that prometheus scraper would send) is about 885KB
  2. Uncompressed size is around 33MB
  3. Total metrics count (output of grep -v ‘# ‘ -c pretty much) is around 172180

So right now this looks like a reproducible problem, and probably some limitation internally to the default fly metrics setup. On hindsight the number of metrics is a bit exorbitant, and is in need of some optimisations. Currently reducing the labels to avoid the explosion.

Hi folks,
The max scrape size is 16 MB for the payload from scraped endpoints.

One more thing: you do have visibility on how your metrics are scraped. Go into an app’s “metrics” tab, and click on the “grafana” icon at the top right. In Grafana, go to “Explore” and check a set of metrics that start with scrape_. These have e.g. “app” or “instance” labels to filter by. There’s one that tells you the size of the scraped payloads over time so you can corroborate at which point they stop appearing, and another that tells you how many samples your payload contained. If this latter one drops off to zero, it means Victoria Metrics is dropping all samples in the payload, most likely because it was too big. Let me know if exploring those metrics gives more clarity!
When in doubt, grab a copy of “Prometheus up & running”, it has a lot of best practice advice including some on how control payload size by not having big cardinality in your metrics.