Feature preview: Custom metrics

We’ve just “soft” launched our new custom metrics features :tada:

Here are some docs if you want to get all technical right away.

In essence: we’re starting to collect and store custom prometheus metrics for apps hosted on Fly.

Inserting metrics

  • Instrument your app with a prometheus library
  • Expose metrics (make sure it binds to 0.0.0.0)
  • Configure your fly.toml:
[metrics]
port = 9091
path = "/metrics"

We’ll pull from your instances a few times per minute.

Querying metrics

With this new feature, we’re slowly deprecating our current prometheus API and introducing a new one with a few differences:

  • Base URL: https://api.fly.io/prometheus/{org_slug}
    • Find your {org_list} by listing orgs with flyctl orgs list
  • Headers: Authorization: Bearer {token}
    • Find {token} with flyctl auth token
  • All apps for the same organization are accessible from the same endpoint
  • Custom app metrics and Fly metrics (proxy, instance, volumes, prefixed with fly_) are all available through the same endpoint.
  • Your own metrics are not namespaced, but we will automatically add the app, instance, host and region labels.
  • Full list of Fly metrics available in the docs
  • pg_ series are not prefixed with fly_ because they work just like custom metrics (its fly.toml is configured for it

This new API works just like the prometheus API

curl "https://api.fly.io/prometheus/{org_slug}/api/v1/query_range?step=30" \
	--data-urlencode 'sum(rate(fly_edge_http_responses_count{app="{app}"}[5m])) by (status)' \
	-H "Authorization: Bearer {token}"

A grafana setup looks like this:

“Normal” grafana features for discovering labels and series should work as-is. The old API did not support that.

Pricing

We’re still working on this. Keeping your series count under 10,000 should awlays be free.

You’ve probably noted that we’re including our own fly-specific metrics in there too. We’re hoping they don’t account for too many series! However, with the number of regions we have and if you have a lot of instances in different regions, that could grow fast. We’ll decide on pricing and limits based on what we observe here.

During this “beta” phase, we won’t be enforcing any specific limits, but we will be on the lookout for excessive usage.

6 Likes

Wow :astonished:, this came in faster than I expected. I will deploy a couple my apps tommorow to test, just need to modify them a little. The prometheus endpoint is currently exposed on the same port as the main app api.

1 Like

That works too. Set the port to the same internal port for your app and it should be fine.

That’s amazing! :star_struck:

One step further that would be fantastic is to ship logs to the hosted Grafana instance using Loki. Seeing that you’re already using Vector for log collection (nice blog post btw), which supports Loki as a sink, this would enable inspecting metrics and logs combined. Maybe it could be opt-in for using it instead of Elasticsearch, if that helps.

We’re working on this yes. Vector is great, but it’s not realistic to configure it with, potentially, thousands of log sinks :slight_smile:

We’re going to be using a different model for this, but it will allow us to send app logs to whatever service, eventually.

1 Like

Whats the difference between host and instance?
Is host the physical server while instance is the vm?

That’s right