Prometheus Federation - worked for 2-3 days, now it's failing

I got Prometheus federation to work by using api.fly.io/prometheus/<< org >>/federate as the endpoint to scrape and using my org token for authorization credentials.

This worked well for the last 2-3 days but started to fail about 2-3h ago, it now returns a "400 Bad Request".

I saw other topics suggesting that federation wasn’t supported in the past but figured I’d try anyway and was positively surprised that it worked.
Did I just got lucky or ran into a test that was reverted?

Being able to alert on fly_instances_up from my existing setup and getting all the metrics from my local prom server were both very convenient.

Hi @oliver1

I am not sure the specific of what you’re trying to do with Prometheus but another customer previously wrote a prometheus exporter for fly which you might find useful.

Thanks Rahmat. I’m trying to pull all metrics out of the fly.io prometheus instance, including the pg_* metrics, fly_* metrics (a lot more than just if an instance is up, also CPU, etc) plus all custom app metrics that fly scrapes for me.
The exporter only solves a tiny part of that, the majority of interesting metrics I can’t get that way.

Hi @oliver1,
The /federate endpoint should work! For a general reference, we currently expose all of the Prometheus querying API endpoints supported by VictoriaMetrics, which includes /federate.
The issue you saw was caused by some metrics-cluster changes yesterday (adding/removing storage nodes) that unintentionally caused this to stop working. I’ve fixed the issue so this endpoint should be working again now.

3 Likes

Can confirm, it’s working again, thanks for fixing this!

@wjordan - is this broken again? Seeing errors since around 8am UTC

hey @oliver1 are you still seeing these error messages?

We had a Anycast UDP outage yesterday that may have been causing the errors you were seeing but that has been resolved

Nope, still down as of right now.