Prometheus Federation - worked for 2-3 days, now it's failing

I got Prometheus federation to work by using<< org >>/federate as the endpoint to scrape and using my org token for authorization credentials.

This worked well for the last 2-3 days but started to fail about 2-3h ago, it now returns a "400 Bad Request".

I saw other topics suggesting that federation wasn’t supported in the past but figured I’d try anyway and was positively surprised that it worked.
Did I just got lucky or ran into a test that was reverted?

Being able to alert on fly_instances_up from my existing setup and getting all the metrics from my local prom server were both very convenient.

Hi @oliver1

I am not sure the specific of what you’re trying to do with Prometheus but another customer previously wrote a prometheus exporter for fly which you might find useful.

Thanks Rahmat. I’m trying to pull all metrics out of the prometheus instance, including the pg_* metrics, fly_* metrics (a lot more than just if an instance is up, also CPU, etc) plus all custom app metrics that fly scrapes for me.
The exporter only solves a tiny part of that, the majority of interesting metrics I can’t get that way.

Hi @oliver1,
The /federate endpoint should work! For a general reference, we currently expose all of the Prometheus querying API endpoints supported by VictoriaMetrics, which includes /federate.
The issue you saw was caused by some metrics-cluster changes yesterday (adding/removing storage nodes) that unintentionally caused this to stop working. I’ve fixed the issue so this endpoint should be working again now.


Can confirm, it’s working again, thanks for fixing this!

@wjordan - is this broken again? Seeing errors since around 8am UTC

hey @oliver1 are you still seeing these error messages?

We had a Anycast UDP outage yesterday that may have been causing the errors you were seeing but that has been resolved

Nope, still down as of right now.