Successful forwarding of metrics *and* logs to DataDog?

Hi there,

I’ve browsed various threads here around DataDog, and I’ve now successfully got log forwarding working via fly-log-shipper and metrics via having deployed a DD Agent that my backend service reports profiling/metrics to. This technically works, but is clunky for multiple reasons:

  1. It requires two fly applications, one for fly-log-shipper and one for the datadog agent. This could technically be solved by deploying both into a single VM, probably…
  2. Because the logs aren’t being read by a DD agent and shipped that way, the logs don’t naturally carry any metadata about e.g., service name. While you can inject this yourself (I use logrus and Go and so can use WithFields({}) for that) for logs you emit, other logs to stdout/stderr from the software or on the host in general will not carry this metadata and so arrive untagged.

The main issue with forwarding logs via fly-log-shipper is that it doesn’t augment the logs with the metadata needed in DataDog to e.g., tie APM together with log lines and provide additional features: ddsource, hostname, service and environment fields specifically. You can tweak the log_json transform to add hostname, environment and service as you can rely on .fly.host and .fly.app.name and an env variable for environment as such:

  type = "remap"
  inputs = ["fly_socket"]
  source = '''
  . = parse_json!(.message)
  .hostname = .fly.host
  .environment = get_env_var!("ENV")
  .service  = .fly.app.name
  '''

I’m struggling even with the above to find a way to e.g., define specific metadata on a per-app basis, since the logs are being sourced from all apps in the org. If I wanted to, for example, have both a Go app and a Python app, there is no real way to e.g., set .ddsource appropriately in Vector.

Has anyone found a more adequate way to do this? Has Fly been in touch with DataDog perhaps, about an integration? I’m quite keen to see this work, as DataDog is hugely valuable and its a bit of a blocker for me in terms of Fly to be stuck with Prometheus/Grafana :confused:

Hello!

  1. To use just a single app to send both logs and metrics, we could try also sending metrics through fly-log-shipper by adding a datadog_metrics sink.
  2. Augmenting the datadog sink with the relevant metadata in datadog’s standard reserved tags makes sense- we could add a datadog_json transform that takes fly_socket as its input, and set the tags there as you described.

I’m struggling even with the above to find a way to e.g., define specific metadata on a per-app basis, since the logs are being sourced from all apps in the org. If I wanted to, for example, have both a Go app and a Python app, there is no real way to e.g., set .ddsource appropriately in Vector.

To inject specific per-app metadata, you could try adding extra VRL to the remap transform to set ddsource based on fly.app.name:

if (.fly.app.name == "my_go_app") {
  .ddsource = "go"
} else if (.fly.app.name == "my_python_app") {
  .ddsource = "python"
}

A bunch of conditionals like this is a bit clunky, maybe we could define the .fly.app.name → metadata mappings using an enrichment table CSV or something, but it will need to be configured somewhere.

What does this mean? Where do we put the configs for the sink?

I have a running nstance of log-shipper but no idea of how to get the logs into Datadog

Hey! it’s a little bit late for this answer but maybe it would help anyone else.
This is my personal experience trying this and I am terrible with Infra but maybe it could be helpful if you have to send metrics from the Prometheus provided by Fly.io to Datadog.

I tried multiple options and these are my results:


Datadog - Prometheus (Legacy) Integration

  1. I was using Datadog Agent 7.
  2. This solution didn’t work for me because even when the docs say that we are able to define a bearer_token_path, my Datadog Agents is not able to take it and we get 401 all the time from Prometheus. This isn’t very clear because even when we see part of the logic behind that attribute here: Prometheus/OpenMetrics V1 - Agent Integrations, it looks like it’s not described here: https://github.com/DataDog/integrations-core/blob/master/prometheus/datadog_checks/prometheus/data/conf.yaml.example so maybe I am mixing versions of the same Datadog Agent, but Datadog is not super well documented in my consideration.
  3. This is my example of prometheus.yaml and of course, I’ve created a Dockerfile that copies this prometheus.yaml and the bearer_token_path to the proper places:
init_config:

instances:
  - prometheus_url: "https://api.fly.io/prometheus/personal/federate?match[]=%7Bapp%3D%22my-nextjs-app%22%7D"
    namespace: "martin"
    bearer_token_auth: true
    bearer_token_path: "/var/run/secrets/kubernetes.io/serviceaccount/mytoken"
    metrics:
      - "*"

Links that can be helpful:

  1. https://docs.datadoghq.com/integrations/prometheus/
  2. Prometheus and OpenMetrics metrics collection from a host

Datadog - OpenMetrics Integration

  1. I was using Datadog Agent 7.
  2. This solution works but the metrics didn’t look good. The information looks uncompleted and it was not tagged, which means we were just getting raw data. Probably, we can do something else here, it’s a good beginning.
  3. This is my example of openmetrics.yaml:
init_config:

instances:
  - openmetrics_endpoint: "https://api.fly.io/prometheus/personal/federate?match[]=%7Bapp%3D%22my-nextjs-app%22%7D"
    headers:
      Authorization: "Bearer <<TOKEN>>"
      Accept: text/plain
    namespace: "martin"
    metrics:
      - ".*"

Links that can be helpful:

  1. OpenMetrics
  2. https://github.com/DataDog/integrations-core/blob/master/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example

Vector

  1. As many of you, I am using fly-log-shipper and it’s using Vector. In my project, I am using a couple of sinks from there but I am extending them to do some additional things, so I decided to combine the source prometheus_scrape which takes all the information from Prometheus, and the sink datadog_metrics to move all that data to Datadog.
  2. This is the solution that we decided to implement. It’s complete, transparent, easy to implement, and well-documented. Also, It returns the data properly and amazingly tagged without any transformation at all!!
  3. With this solution, you don’t need to have DataDog Agent installed to send Metrics.
  4. This is my example of a code extension of the file vector.toml:
[sources.my_flyio_nextjs_app]
type = "prometheus_scrape"
endpoints = [ "https://api.fly.io/prometheus/personal/federate" ]
auth.strategy = 'bearer'
auth.token = '<<TOKEN>>'
query.match = ['{app="my-nextjs-app"}'] # Name of your Flyio app.

[sinks.datadog_metrics]
type = "datadog_metrics"
inputs = [ "my_flyio_nextjs_app" ]
default_api_key = "${DATADOG_API_KEY}"
default_namespace = "martin" # This will help you to find your metrics easily, but it's not fully necessary, the metric in DataDog will be sent even with a tag with the app name!!!

Links that can be helpful:

  1. https://vector.dev/docs/reference/configuration/sources/prometheus_scrape/
  2. Datadog metrics | Vector documentation
  3. GitHub - superfly/fly-log-shipper: Ship logs from fly to other providers

Good luck!

1 Like

Hi Martin

Could you provide a little more information on how you’re incorporating the prometheus_scrape source and datadog_metrics sink? Are you building a custom docker image on top of fly-log-shipper? Or does this vector config go in your app?

Hey @wobbleburger!

In my case, I just cloned the fly-logged-shipper project and I added and removed what I needed there. The project already contains a Dockerfile that calls to the script start-fly-log-transporter.sh and that script takes definitions from this folder: https://github.com/superfly/fly-log-shipper/tree/main/vector-configs. Once there, you can adapt your vector-configs/vector.toml to have prometheus_scrape source, datadog_metrics sink, or whatever you want.

I don’t know if there is a more elegant solution like running: fly launch --image ghcr.io/superfly/fly-log-shipper:latest and extending what is defined by the image internally to have more sources and sinks, but in my case, I had some things to change and the image code base suited me well.

I hope this can help you!

1 Like

This is working great! Thank you for the help!

1 Like