Problem
Fly Prometheus stopped scraping metrics from our apps approximately 2 hours ago. Custom metrics and fly_instance_up return empty results in instant queries, while the metrics endpoints are accessible and working correctly.
Environment
-
Apps affected:
-
Main NestJS app (port 3000)
-
postgres-exporter (port 9187)
-
Configuration
fly.toml (main app):
[[metrics]]
port = 3000
path = "/metrics"
processes = ["app"]
postgres-exporter fly.toml:
[metrics]
port = 9187
path = "/metrics"
What works
-
Metrics endpoints are accessible directly via HTTPS
-
Metrics accessible via SSH from inside the machine
-
Health checks are passing (1 total, 1 passing)
-
fly_edge_*metrics are available (these don’t require scraping from apps)
What doesn’t work
-
Instant queries return empty:
curl "https://api.fly.io/prometheus/<org>/api/v1/query" \ --data-urlencode 'query=<custom_metric>' \ -H "Authorization: FlyV1 $TOKEN" # Returns: {"data":{"result":[]}} -
Even
fly_instance_upreturns empty - this is a built-in Fly metric -
Range queries show data stopped ~2 hours ago
Expected behavior
Prometheus should scrape /metrics endpoints every 15 seconds and data should be available via API queries.
Steps already tried
-
Redeployed both apps
-
Added health checks
-
Changed
[metrics]to[[metrics]]withprocessesparameter -
Verified metrics format is valid Prometheus format
Additional info: Metrics were working fine before. We have historical data in Prometheus showing metrics were being scraped until ~08:27 UTC today. After that, scraping completely stopped.
Range query confirms this:
# Last data point: 2026-01-27T08:27:00Z
# No new data since then
The same setup was working for weeks/months.
Question: Are there any cardinality limits or quotas for custom metrics? Could we have hit some limit that caused scraping to stop?
Our apps expose relatively few metrics (~50-100 unique series), so this shouldn’t be the issue, but wanted to confirm.