Feature preview: Custom metrics

We’ve just “soft” launched our new custom metrics features :tada:

Here are some docs if you want to get all technical right away.

In essence: we’re starting to collect and store custom prometheus metrics for apps hosted on Fly.

Inserting metrics

  • Instrument your app with a prometheus library
  • Expose metrics (make sure it binds to 0.0.0.0)
  • Configure your fly.toml:
[metrics]
port = 9091
path = "/metrics"

We’ll pull from your instances a few times per minute.

Querying metrics

With this new feature, we’re slowly deprecating our current prometheus API and introducing a new one with a few differences:

  • Base URL: https://api.fly.io/prometheus/{org_slug}
    • Find your {org_list} by listing orgs with flyctl orgs list
  • Headers: Authorization: Bearer {token}
    • Find {token} with flyctl auth token
  • All apps for the same organization are accessible from the same endpoint
  • Custom app metrics and Fly metrics (proxy, instance, volumes, prefixed with fly_) are all available through the same endpoint.
  • Your own metrics are not namespaced, but we will automatically add the app, instance, host and region labels.
  • Full list of Fly metrics available in the docs
  • pg_ series are not prefixed with fly_ because they work just like custom metrics (its fly.toml is configured for it

This new API works just like the prometheus API

curl "https://api.fly.io/prometheus/{org_slug}/api/v1/query_range?step=30" \
	--data-urlencode 'sum(rate(fly_edge_http_responses_count{app="{app}"}[5m])) by (status)' \
	-H "Authorization: Bearer {token}"

A grafana setup looks like this:

“Normal” grafana features for discovering labels and series should work as-is. The old API did not support that.

Pricing

We’re still working on this. Keeping your series count under 10,000 should awlays be free.

You’ve probably noted that we’re including our own fly-specific metrics in there too. We’re hoping they don’t account for too many series! However, with the number of regions we have and if you have a lot of instances in different regions, that could grow fast. We’ll decide on pricing and limits based on what we observe here.

During this “beta” phase, we won’t be enforcing any specific limits, but we will be on the lookout for excessive usage.

12 Likes

Wow :astonished:, this came in faster than I expected. I will deploy a couple my apps tommorow to test, just need to modify them a little. The prometheus endpoint is currently exposed on the same port as the main app api.

1 Like

That works too. Set the port to the same internal port for your app and it should be fine.

That’s amazing! :star_struck:

One step further that would be fantastic is to ship logs to the hosted Grafana instance using Loki. Seeing that you’re already using Vector for log collection (nice blog post btw), which supports Loki as a sink, this would enable inspecting metrics and logs combined. Maybe it could be opt-in for using it instead of Elasticsearch, if that helps.

1 Like

We’re working on this yes. Vector is great, but it’s not realistic to configure it with, potentially, thousands of log sinks :slight_smile:

We’re going to be using a different model for this, but it will allow us to send app logs to whatever service, eventually.

1 Like

Whats the difference between host and instance?
Is host the physical server while instance is the vm?

That’s right

Nice work. :+1: Got my Grafana dashboard up and humming along nicely.

I’ve updated a metric to add a label. That metric no longer shows up. Is it possible to relabel metrics through Fly?

How do you mean they don’t show up anymore?

Can you give us an example of the metric and the query you’re using?

Yes. I’ve got a metric create_offer_total. It is a counter which prior to yesterday did not have a label. When I added a label, it disappeared and was no longer visible in the Grafana metric browser. I believe once a metric is created it cannot be modified.

This post appears to confirm this.

I just created a new metric, exactly as the original one, with a label.

@jerome I have the default setup and Grafana dashboard working great. I’m now trying to expose Apache metrics and setup a Grafana dashboard for monitoring it. Think I have everything running but struggling with the dashboard part.

I’m using the Apache Exporter for Prometheus, using:

[metrics]
  port = 9117
  path = "/metrics"

This seems to be working fine:

  1. curl "0.0.0.0:9117/metrics"shows the metrics and apache_up is 1 which indicates that it’s connected.

  2. curl "https://api.fly.io/prometheus/joomlatools/api/v1/series?match%5B%5D=apache_up" \ -H 'Authorization: Bearer {token} returns following:

{"status":"success","isPartial":false,"data":[{"__name__":"apache_up","instance":"xxxx","host":"xxx","app":"xxx","region":"xxx"}]}

So it seems the data is being pulled in.

  1. Trying to get the data to show in Grafana using: Apache dashboard for Grafana | Grafana Labs but thats not working.

I am right thinking that default dashboards will not just work and need to be modified to be able to handle the extra attributes (region, host, app) that you are adding? Or is there anything else I am missing?

Thanks for the help!

Looking at the dashboard’s JSON quickly, it seems like it expects the instance label to be in the form of $host:$port. Our instance label is just your instance ID. So you’ll probably have to modify all the queries and even the dashboard variables.

Thanks that helps to put me on the right track.

@jerome One more question for you, how would i handle 2 metric exporters? I have now both the Apache and the PHP-FPM exporter running, one uses port 9117, the other port 9253, it seems that fly.toml only accepts a single [metrics] block?

It adds (further) complexity, but probably the best option is to run a third process to merge the exporters metrics. Vector, for example, could ingest metrics from both exporters as a prometheus scrape source, and then have the prometheus exporter sink configured to output the merged metrics. You’d then use this url/port in the metrics block.

Thanks @steveberryman! I found: GitHub - rebuy-de/exporter-merger: Merges Prometheus metrics from multiple sources and that seems to work great. Been able to merge both apache and php-fpm and output them.

Vector would probably be the better choice. The merger binary is a little bit smaller in size, trying to keep the size of the VM down.

1 Like

Resurrecting this thread…

Any thoughts on using ChaosSearch as the back-end for logs/metrics?

@jerome Picking up my work on Grafana and Prometeus where I left it off last year.

Quick question, I’m running 2 apps with volumes attached by don’t seem to be able to find fly_volume_size_bytes as documented here: Metrics on Fly Any ideas?

This seems related to a bug that’s been happening for a while. Sounds like you might be the only user of this metric!

I’m working on a fix, it might take a little bit.