OpenTelemetry/Tracing

Scott · May 6, 2021, 10:11pm

Are there any plans to offer any edge-to-app traces that could help us get the full picture? I am looking to use Open Telemetry for apps within Fly as well as for tracking query times in Postgres.

I’m just curious if there were plans to start a trace from the edge so we can track the entire routing process, such as connection origination to the app that actually served the request. It’s more for visibility into latency, timing, and health as I look to PoC a SaaS offering solely on Fly.

This isn’t really a requirement, more of a pipe dream. But since Fly already has metrics, logs and tracing are the next two things I’m looking to attack as part of a PoC for a distributed app on Fly.

Edit for clarification:

For this to truly work we’d probably need the ability to collect traces on Fly itself, similarly to metrics. Maybe something like Tempo from Grafana?

jerome · May 6, 2021, 11:18pm

Definitely interested in offering that.

We’ve already instrumented most of our code for internal usage. We’re sending trace samples to Honeycomb.

The way metrics work, we collect and store everything. It would make sense to do something similar for traces. I have not looked into it much but Tempo does look pretty good.

Another, less interesting option, would for us to push traces to a user-provided endpoint. These would have to be sampled. I much prefer the option of hosting traces for our users.

Scott · May 6, 2021, 11:32pm

Honestly, in the interim, I’d take you pushing traces to a user-defined endpoint so long as we could get some good data on the lifecycle of a request. All of the apps I’m looking to deploy are instrumented already and having the additional context of the Fly edge would be amazing.

I’d even consider just running a Tempo server in Fly that I can store either using MinIO on Fly or an external S3 bucket.

Sure, I’d love to have that hosted like metrics, but I care much more about having the data, so I wouldn’t be opposed to self-hosting Tempo or a similar solution.

jerome · May 7, 2021, 1:45am

Cleaning up our traces and adding that functionality is not an insignificant amount of work. That means it’s probably not going to be a priority unless there’s more demand for it.

Traces are incredibly useful to debug complex issues. We’ve used them in the past, but ever since we fixed the issues, we’ve stopped looking at them entirely. I think I should get back into the habit of using traces, but I’ve gotten used to correlating logs and metrics to figure out what’s going on.

A small thing I could do fast is add a Fly-Connection-Id header to supplement our Fly-Request-Id header.

It might also be more realistic short/medium term to offer basic traces via request headers as well. This would only cover what happens before forwarding the request to your app, but it would be something. It would also have to be pretty brief. You could then use that and prepend some spans to your traces.

Scott · May 7, 2021, 2:00am

No need to prioritize it. Like I said, pipe dream.

I wouldn’t, however, be opposed to headers that we could pull information out of, such as the time the request started.

This could be provided as a base64-encoded JSON object, for example, if there’s enough information. That would allow us to create spans for the start of the request and correlate the connection ID.

Great idea and solves the problem as I’m sure not everyone cares about the tracing.

jerome · May 7, 2021, 3:09pm

For now I’ve added X-Request-Start: t=<microseconds since epoch>.

Scott · May 7, 2021, 3:32pm

Thanks! This will help a lot right now.

charsleysa · May 11, 2021, 12:06am

We currently use Grafana Cloud with hosted Prometheus, Loki, and Tempo.

If you were to offer a fly hosted and managed version of Loki and Tempo alongside your existing Prometheus we’d happily swap over, especially if that means we no longer need to manage the Grafana Cloud Agent in our docker images.

jerome · May 11, 2021, 2:35pm

There might be licensing issues providing a hosted Tempo, Loki or Grafana.

That said, I’ll probably be trying Tempo soon, for myself, to see what’s feasible.

We already manage Prometheus if you want to use that! You can query it from any grafana installation.

rugwiro · May 25, 2021, 2:14pm

Are any of these Headers available yet?

jerome · May 25, 2021, 2:23pm

Only X-Request-Start with a timestamp of when the request started. Everything else is still at a “thinking about it” stage

rugwiro · May 25, 2021, 2:30pm

I was examining this for a remote address, and a request id(This a google bot visit):

{
  "level": "error",
  "ts": 1621891338.921659,
  "logger": "http.log.access.log0",
  "msg": "handled request",
  "request": {
    "remote_addr": "185.40.232.114:41908",
    "proto": "HTTP/1.1",
    "method": "GET",
    "host": "paypack.rw",
    "uri": "/robots.txt",
    "headers": {
      "Fly-Region": [
        "iad"
      ],
      "Accept": [
        "text/plain,text/html,*/*"
      ],
      "X-Forwarded-Proto": [
        "https"
      ],
      "X-Forwarded-Ssl": [
        "on"
      ],
      "User-Agent": [
        "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
      ],
      "X-Forwarded-Port": [
        "443"
      ],
      "X-Forwarded-For": [
        "66.249.66.47, 213.188.209.3"
      ],
      "Fly-Forwarded-Proto": [
        "https"
      ],
      "Fly-Forwarded-Port": [
        "443"
      ],
      "Fly-Request-Id": [
        "01F6G43FQ84TDBGQXDHAD7VZEP"
      ],
      "X-Request-Start": [
        "t=1621891333864626"
      ],
      "Fly-Client-Ip": [
        "66.249.66.47"
      ],
      "Accept-Encoding": [
        "gzip, deflate, br"
      ],
      "Fly-Forwarded-Ssl": [
        "on"
      ],
      "Via": [
        "1.1 fly.io"
      ]
    }
  },
  "common_log": "185.40.232.114 - - [24/May/2021:21:22:18 +0000] \"GET /robots.txt HTTP/1.1\" 404 0",
  "duration": 5.019394281,
  "size": 0,
  "status": 404,
  "resp_headers": {
    "Server": [
      "Caddy",
      "Caddy"
    ],
    "Date": [
      "Mon, 24 May 2021 21:22:18 GMT"
    ],
    "Content-Length": [
      "0"
    ]
  }
}

Can I use Fly-Request-Id as my trace id? What about X-Forwarded-For or Fly-Client-Ip can I obtain can I use any of the two to obtain the remote address.

jerome · May 25, 2021, 2:34pm

Yes, this is unique per request.

Yes, the first address would be the remote address.

However, we also set Fly-Client-Ip which is the “trusted” way of doing this. Anybody can send a X-Forwarded-For that does not represent the true remote address for the user. Fly-Client-Ip is reset by us, weither or not it was set in the original request.

Topic		Replies	Views
Grafana data sources grafana	4	532	April 19, 2024
Anyone here using fly_postgres with opentelemetry_ecto? Phoenix metrics , elixir , postgres	8	585	May 14, 2023
Keep Up The Good Work!	2	539	May 14, 2021
Send metrics to honeycomb	4	876	December 14, 2022
Access Logs from Edge Requests? Questions / Help	1	230	December 15, 2022

OpenTelemetry/Tracing

Related topics