We just launched fly-metrics.net, a preview of a managed Grafana service for Fly.io that you can use for setting up advanced metrics queries and detailed dashboards.
The service is linked to your Fly.io account and is automatically integrated with the built-in Prometheus data sources in your Fly.io Organizations. It comes provisioned with three new detailed dashboards that visualize all of the built-in metrics published from your apps (except Postgres for now). You can also use the Explore tab and create new dashboards for visualizing custom metrics or any further customization.
You can switch orgs using the âSwitch organizationâ menu item in the profile flyout at the bottom-left of the screen:
Though youâve always been able to set up your own Grafana instance, we hope this managed service will be a convenient tool for quickly getting started with Prometheus metrics without the friction of hooking everything up yourself. The built-in dashboards help you quickly dig into whatâs happening with your apps in a little more detail than we can fit on the Metrics tab on the Fly.io dashboard.
Please try it out, let us know what you think and send feedback and suggestions! Weâd love to keep improving this service and make it even more useful to you all.
Nice! Any plans to support multiple metrics endpoints per app? I know most apps should âonly be one thingâ but sometimes there are two services it makes sense to run together that both expose some metrics.
This looks awesome! Iâm always impressed with the speed at which all you folks at fly release these amazing features/services.
Weâre running fly-log-shipper in our account connected to an instance of Grafana Loki. Iâd love to be able to add our Loki instance as a datasource here. (Or have fly manage that as well. You do already have all of our logs⌠)
I did notice that when I select âSwitch organizationâ I see one of my organizations repeated 5 times. Itâs an org that I created and then deleted a handful of times and Iâm guessing has something to do with that.
Thanks for the request, no ETA but this is definitely something we can look into. We do have some support for multi-process apps for these kinds of edge-cases, and I could imagine the need to scrape from multiple metrics endpoints in those scenarios.
Just pushed a quick fix, thanks for catching this!
Though I agree it would be useful, allowing users to set up datasources for external instances would make the service more complicated to manage. For now we still recommend using your own Grafana instance for external datasources or further customization along those lines, if only to keep the managed service simple and narrowly focused.
We only added Prometheus to offer a simple service at first, but weâre looking into ways we might eventually expand this to add a built-in datasource for app logs, or maybe even our GraphQL API.
On the âfly-appâ dashboard: The network-io graph shows the label of âinstance + regionâ. That is more useful than just the âinstanceâ which is shown on all the other graphs and can be used to tune a region.
On the âfly-edgeâ dashboard, I made a table to show where the edge traffic was going out vs where the instances were. The heatmap and data-in/data-out indicated that there is interesting data, but it was hard to put together. I used these two queries to find out that I didnât pick the optimal regions. Something like this might be handy on the fly-app dashboard, too. For instance, I have AMS as one of my regions, but the majority of the traffic is exiting out of FRA, etc. I think the âheatmapâ on fly-app could be more useful as a table perhaps.
label_uppercase(sum(rate(fly_edge_data_out{app="appname-prod"}[$__range]))by(region), "region") or 0
label_uppercase(sum(rate(fly_instance_net_sent_bytes{app="appname-prod"}[$__range]))by(region), "region") or 0
OK, The heatmap on fly app dashboard is actually that, but I didnât understand how to read it. The circles are the instances and then the heatmap itself shows where the data is exiting from the edges.
Thereâs now a tooltip on the Data Out map for better contextual info (and to help distinguish between instance and edge- I agree itâs a bit hard to understand). Iâve also squeezed in a table displaying the same data in table form.
Not exactly sure what happened, but thereâs something going on. For a while, none of the pre-configured dashboards were available and now there seems to be an old version without the region labels on some of the graphs.
@wjordan do you have any resources on how to configure/connect to the Prometheus on Fly datasource for our custom Grafana instance weâre already running on Fly?
Metrics on Fly ¡ Fly Docs should cover connecting a custom Grafana instance to the built-in Prometheus datasource. In short, you connect to https://api.fly.io/prometheus/<org-slug>/, passing your access token in a Authorization: Bearer <token> request header.
I created a new organization, and it was impossible to switch to that organization until I followed these secret, forum-only instructions to log out and log back in.
Thanks for reporting this issue! Organizations are currently only synchronized with the Grafana service when an access token is created/updated (every 2 hours), so there is an unfortunate delay if you add an org while already signed into Grafana. We should be able to handle this edge-case better with a bit of work.
A sign-out link in Grafana is also still in the works, hitting the logout url manually is just a workaround until then.
Weâve had some discussions, I understand how useful it would be and itâs something we would love to add eventually, but it adds an extra layer of complexity to the service that will take some time to sort out all the details. So no promises or ETAs but weâll see how it goes!
Will there be a SLO for durability of the dashboards when this leaves preview? I had set up a few dashboards at one point and it got deleted (org ID 31932 and dashboard slugs HdfNhGW4z and TdXRjVZ4z created on ~August 21 and lost on ~August 23?).