It looks like metrics (within the fly dashboard) have been down for the last few hours. I’m also unable to retrieve metrics from the fly prom. endpoint (returns Bad request).
Is this a known issue? I noticed that the fly status page doesn’t cover metrics, is this because metrics is still considered beta?
Can you give it a try now? The metrics API wasn’t authenticating people properly, thus bad responses.
It’s still pretty beta and we’re working on getting alerts in place for this type of thing. We use metrics pretty heavily, but not necessarily through our proxy.
We’d been planning on using metrics for alerting internally, but this issue obviously raises concerns. Any idea when you guys will have time to add some monitoring internally?
Oh we add alerts pretty much continuously. This particular outage was a first, it wasn’t really down it was just returning errors for some peoples’ auth. Which is really down, just in a new way.
I think it’s safe to use for alerting. Especially if you’re coupling it with external checks or sensu or the like.