Yep. Looks like a breaking change by Grafana Cloud. I’ve submitted a support ticket with Grafana Labs and will report back here once they reply.
Confirmed by Grafana Labs support:
Thank you for contacting Grafana Labs Support.
My name is Jay, and I am the support engineer assigned to assist you with your Ticket.
Based on 401 Auth error , it matched to our recent Grafana 9.0.0 upgrade causing sudden 401 auth errors for Prometheus bug.
Our engineering team is working on the fix, if you would like we can roll back to 8.5.5 as a temp solution
Can you see anywhere that the “recent Grafana 9.0.0 upgrade” was announced/documented?
I note in the Support page it says:
NOTE: Before you open a support ticket for a service problem, check status.grafana.com to see if there are any known issues.
Looks at https://status.grafana.com/ … yep, nothing mentioned.
Welcome to the Cloud; have a status page but don’t update it (a recurring theme).
I mean… It’s surprising that such a breaking change / bug happened in the first place. Especially for a software product that is about metrics, logging and monitoring. During the rollout of this upgrade they should’ve seen a massive spike in error logs and reverted the rollout immediately. I’d guess nearly everyone using Grafana Cloud consumes one or multiple Prometheus data source(s). So yeah, it’s definitely surprising that such an obvious bug wasn’t caught during testing and then also made it past the initial rollout without reverting it.
EDIT: I’ve asked Grafana Labs to comment on this too (how this happened in the first place and how it went unnoticed for so long and didn’t cause them to roll back to the previous version)
Should they roll it back to 8.5.5 and if during the problem period, on 9.0, the user(s) had (understandably/as attempted above) tried updating their Prometheus source’s Authorization details: Release notes for Grafana 9.0.0 | Grafana documentation
Any secret (data sources credential, alert manager credential, etc, etc) created or modified with Grafana v9.0 won’t be decryptable from any previous version (by default) because the way encrypted secrets are stored into the database has changed. Although secrets created or modified with previous versions will still be decryptable by Grafana v9.0.
They may (TBC) need to update them again post-downgrade.
@Whistler so is everyone awaiting for said patch to fix this issue? Can we manually downgrade Grafana Cloud to 8.5.5?
I’ve assumed any roll back (per tenant?) would require intervention by Grafana, as I’m not sure if it’s possible for a Grafana Cloud user to do this(?).
As with Fly’s Prometheus metrics issue (with vector), I guess it’ll eventually be fixed and I’ll just wait for the resolution.
Looks like Grafana Cloud has been updated to
v9.0.1-b253e87pre (b253e87d7) but the error still persists.
It appears that the latest update to Grafana Cloud
v9.0.2-83956baf (c29f1c44c) has fixed it.
Yep. Also fixed for me.