Grafana Cloud Fly dashboard/Prometheus issue?

Has anyone else’s Grafana Cloud Fly dashboard(s) stopped working when trying to open/load (note: if the dashboard is already loaded in your browser it’s probably working ok)? and are you able to to Explore your Fly Prometheus source(s) from Grafana Cloud?

Putting aside the specifics of the dashboard problem, where is seems to fail to fetch the region/host/app details, I’ve also noticed that Explore doesn’t appear to work ("! Unexpected Error") :frowning: .

As far as I’m aware my (Fly) Prometheus sources haven’t changed within Grafana Cloud and have been working well for quite some time. Another existing (not the Fly) dashboard I have in Grafana Cloud using a Fly Prometheus source is also working ok.

1 Like

I believe I’m having the same issue!

Same, I set mine up last week and now it’s not working anymore… just get a bunch of errors popping up.

Facing the same issue

Can you all screenshot the errors you’re seeing? My test dashboards aren’t showing this problem. Or I’m not looking in the right place.

kurt - are you able to Explore your Fly Prometheus source(s) from within Grafana Cloud? and also, are you loading your Dashboard(s) afresh (e.g. new tab) as opposed to looking at a Dashboard(s) that you already have open (which I suspect are still ok)?

Yes, I see this from a fresh, no cache load:

It took a little while for the region/host to load when I first logged in. Maybe 10s of spinners. But it worked.

I can also see all the available series in the explore view:

I don’t use Grafana cloud though. This is a Grafana app I run on Fly.io.

This is the error that appears when going to Explore on Grafana Cloud:

I don’t know enough about Grafana, but can you see if the browser console shows any errors?

Also it’d be worth trying Grafana on Fly.io to see if you can replicate. This should get you going: GitHub - fly-apps/grafana: Run Grafana on Fly

Hey @kurt . These are the errors I’m seeing. Note: I also can’t curl the Prometheus API resources (with a working API token) either.

Will you paste the curl output? What response are you actually getting?

curl 'https://api.fly.io/prometheus/avantgarde-finance/api/v1/status/buildinfo' --header 'Authorization: Bearer <ACESS TOKEN>' 
remoteAddr: "10.123.11.76:53346", X-Forwarded-For: "84.115.209.22, 77.83.143.220, 213.188.208.17, 205.234.149.66"; requestURI: /select/36393/prometheus/api/v1/status/buildinfo; unsupported path requested: "/select/36393/prometheus/api/v1/status/buildinfo"

Status code is 400

Try this one? That buildinfo path is not supported in VictoriaMetrics (what we use under the covers). It’s normal for that to error. This should return something though:

curl 'https://api.fly.io/prometheus/fly/api/v1/label/region/values' --header "Authorization: Bearer $(fly auth token)"

That works but it’s not the path that Grafana is trying to call it seems. I’m fairly confident that it worked a few days / weeks ago.

That’s the path to the first error. Grafana is showing a 401, though, which is what you get when you’re not authenticated. Can you try re-inputing your auth token in the Grafana source?

Just did. The metrics and my dashboards work but the query builder & explorer doesn’t work (same error). So the auth token works for scraping the metrics but not for the other endpoints?

Btw. the endpoint it’s trying (and failing) to call when loading the list of available metrics in the Explorer / Inspector is

https://avantgarde.grafana.net/api/datasources/32/resources/api/v1/label/__name__/values?start=1655222080&end=1655225680

And that returns a 401 despite the rest of the metrics working via the same datasource.

See if this works for you over curl?

curl 'https://api.fly.io/prometheus/avantgarde-finance/api/v1/label/__name__/values' --header "Authorization: Bearer $(fly auth token)" -D -
1 Like

Not sure if related, but I stopped getting metrics from AMS and SJC at 9:20am PT today although the instances are still healthy and serving traffic. My project’s other region (HKG) is still getting metrics in Grafana. I have a separate project in AMS (same organization) that still has metrics.

Edit: fly restart fixed it, but a bit worrisome that metrics just stopped from some instances.

Having now reverse proxied the Grafana Cloud>Fly API Prometheus requests, I believe I can see the problem… Grafana Cloud does send the authorization: header when clicking Save & Test on the Prometheus source. It doesn’t however send the authorization: header when fetching the list of Metrics with Explore (and as a result Fly responds with a HTTP 401).

If, on the reverse proxy, I force the authorization: Bearer <fly auth token> into the requests - it then works (HTTP 200 from Fly).

With no knowledge of what is “normal” for Prometheus requests (i.e. are some available without authentication?) I don’t know if a request (without the authorization: header) for:

GET /prometheus/<orgname>/api/v1/rules

Would normally be responded to with a HTTP 200.

If Prometheus requests, or at least some paths, are usually allowed (or have previously been allowed by Fly, i.e. prior to late last week) - without an authorization: header - then this may be resolvable by Fly.

If however authorization: has always been required by Fly for all Prometheus paths - then I can only assume Grafana Cloud have made a breaking change :cry: .