OpenTelemetry/Tracing

Are there any plans to offer any edge-to-app traces that could help us get the full picture? I am looking to use Open Telemetry for apps within Fly as well as for tracking query times in Postgres.

I’m just curious if there were plans to start a trace from the edge so we can track the entire routing process, such as connection origination to the app that actually served the request. It’s more for visibility into latency, timing, and health as I look to PoC a SaaS offering solely on Fly.

This isn’t really a requirement, more of a pipe dream. But since Fly already has metrics, logs and tracing are the next two things I’m looking to attack as part of a PoC for a distributed app on Fly.


Edit for clarification:

For this to truly work we’d probably need the ability to collect traces on Fly itself, similarly to metrics. Maybe something like Tempo from Grafana?

3 Likes

Definitely interested in offering that.

We’ve already instrumented most of our code for internal usage. We’re sending trace samples to Honeycomb.

The way metrics work, we collect and store everything. It would make sense to do something similar for traces. I have not looked into it much but Tempo does look pretty good.

Another, less interesting option, would for us to push traces to a user-provided endpoint. These would have to be sampled. I much prefer the option of hosting traces for our users.

1 Like

Honestly, in the interim, I’d take you pushing traces to a user-defined endpoint so long as we could get some good data on the lifecycle of a request. All of the apps I’m looking to deploy are instrumented already and having the additional context of the Fly edge would be amazing.

I’d even consider just running a Tempo server in Fly that I can store either using MinIO on Fly or an external S3 bucket.

Sure, I’d love to have that hosted like metrics, but I care much more about having the data, so I wouldn’t be opposed to self-hosting Tempo or a similar solution.

1 Like

Cleaning up our traces and adding that functionality is not an insignificant amount of work. That means it’s probably not going to be a priority unless there’s more demand for it.

Traces are incredibly useful to debug complex issues. We’ve used them in the past, but ever since we fixed the issues, we’ve stopped looking at them entirely. I think I should get back into the habit of using traces, but I’ve gotten used to correlating logs and metrics to figure out what’s going on.

A small thing I could do fast is add a Fly-Connection-Id header to supplement our Fly-Request-Id header.

It might also be more realistic short/medium term to offer basic traces via request headers as well. This would only cover what happens before forwarding the request to your app, but it would be something. It would also have to be pretty brief. You could then use that and prepend some spans to your traces.

1 Like

No need to prioritize it. Like I said, pipe dream.

I wouldn’t, however, be opposed to headers that we could pull information out of, such as the time the request started.

This could be provided as a base64-encoded JSON object, for example, if there’s enough information. That would allow us to create spans for the start of the request and correlate the connection ID.

Great idea and solves the problem as I’m sure not everyone cares about the tracing. :slightly_smiling_face:

1 Like

For now I’ve added X-Request-Start: t=<microseconds since epoch>.

2 Likes

Thanks! This will help a lot right now.

We currently use Grafana Cloud with hosted Prometheus, Loki, and Tempo.

If you were to offer a fly hosted and managed version of Loki and Tempo alongside your existing Prometheus we’d happily swap over, especially if that means we no longer need to manage the Grafana Cloud Agent in our docker images.

There might be licensing issues providing a hosted Tempo, Loki or Grafana.

That said, I’ll probably be trying Tempo soon, for myself, to see what’s feasible.

We already manage Prometheus if you want to use that! You can query it from any grafana installation.

1 Like

Are any of these Headers available yet?

Only X-Request-Start with a timestamp of when the request started. Everything else is still at a “thinking about it” stage :slight_smile:

I was examining this for a remote address, and a request id(This a google bot visit):

{
  "level": "error",
  "ts": 1621891338.921659,
  "logger": "http.log.access.log0",
  "msg": "handled request",
  "request": {
    "remote_addr": "185.40.232.114:41908",
    "proto": "HTTP/1.1",
    "method": "GET",
    "host": "paypack.rw",
    "uri": "/robots.txt",
    "headers": {
      "Fly-Region": [
        "iad"
      ],
      "Accept": [
        "text/plain,text/html,*/*"
      ],
      "X-Forwarded-Proto": [
        "https"
      ],
      "X-Forwarded-Ssl": [
        "on"
      ],
      "User-Agent": [
        "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
      ],
      "X-Forwarded-Port": [
        "443"
      ],
      "X-Forwarded-For": [
        "66.249.66.47, 213.188.209.3"
      ],
      "Fly-Forwarded-Proto": [
        "https"
      ],
      "Fly-Forwarded-Port": [
        "443"
      ],
      "Fly-Request-Id": [
        "01F6G43FQ84TDBGQXDHAD7VZEP"
      ],
      "X-Request-Start": [
        "t=1621891333864626"
      ],
      "Fly-Client-Ip": [
        "66.249.66.47"
      ],
      "Accept-Encoding": [
        "gzip, deflate, br"
      ],
      "Fly-Forwarded-Ssl": [
        "on"
      ],
      "Via": [
        "1.1 fly.io"
      ]
    }
  },
  "common_log": "185.40.232.114 - - [24/May/2021:21:22:18 +0000] \"GET /robots.txt HTTP/1.1\" 404 0",
  "duration": 5.019394281,
  "size": 0,
  "status": 404,
  "resp_headers": {
    "Server": [
      "Caddy",
      "Caddy"
    ],
    "Date": [
      "Mon, 24 May 2021 21:22:18 GMT"
    ],
    "Content-Length": [
      "0"
    ]
  }
}

Can I use Fly-Request-Id as my trace id? What about X-Forwarded-For or Fly-Client-Ip can I obtain can I use any of the two to obtain the remote address.

Yes, this is unique per request.

Yes, the first address would be the remote address.

However, we also set Fly-Client-Ip which is the “trusted” way of doing this. Anybody can send a X-Forwarded-For that does not represent the true remote address for the user. Fly-Client-Ip is reset by us, weither or not it was set in the original request.

1 Like