I am curious if anyone here is using fly_postgres_elixir with opentelemetry_ecto. Using fly_postgres has been absolutely everything I hoped it would be and I don’t really wan to bail on it just to get my db spans into honeycomb. Unfortunately, it doesn’t seem to work with the default settings. My hope is that it’s a simple config change from the default. If anyone has any examples or suggestions I would love to see/hear them.
I think this might just be a matter of teaching opentelemetry_ecto
how to instrument the inner Postgres Repo that fly_postgres_elixir
wraps.
Maybe ask them how to instrument the MyApp.Repo.Local
repo?
Hi @smashedtoatoms!
Yes, the full, “real” repo will be your MyApp.Repo.Local
. Not having tried it myself, based on the opentelemtry_ecto readme, I would try this:
OpentelemetryEcto.setup([:my_app, MyApp.Repo.Local])
Sorry, I should have put more detail. It was kind of late and I was calling it a night.
I tried setting it up against the repo and the local repo as you suggested. Neither worked. If you think it should be able to work though, I will spend some more time on it and see if I can figure it out. When I opened up the fly_postgres code and saw the macros, I hoped it would just work but then wondered if the rpc calls might cause some wierdness, so I kinda lobbed the question up there without providing enough of what I had and hadn’t tried. I’ll try it again some more this evening and let you know what I find. Thanks so much.
That’s interesting. High level, the RPC calls are only used on modifying ecto calls. However, those are just forwarded to the primary and executed there. So I’m not sure how that would present a problem. The local read queries happen against the Local repo which opentelemetry_ecto should theoretically observe. Then the RPC’d queries are executed on the primary against the Local… so they should be observed there as well.
I wonder if there’s more going on internally or it’s making assumptions I’m not aware of.
Ok, I think I see what it’s doing. Sorry it took me so long to get back to this. I had to learn a little about elixir opentelemetry.
The opentelemetry library config is pretty slick. It hooks into the elixir telemetry data. When you set up the config for your app, you set the telemetry_prefix
. For example:
config :my_app, MyApp.Repo.Local,
url: database_url,
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
socket_options: maybe_ipv6,
telemetry_prefix: [:my_app, :repo]
The telemetry_prefix is implicitly set based on the module name if you don’t explicitly set it, so if you don’t set that value, the telemetry_prefix defaults to [:my_app, :repo, :local]
, which is why it doesn’t show up in the telemetry data if you instantiate it with OpentelemetryEcto.setup([:my_app, :repo])
Now that I figured that out, it almost works. The problem I am running into now is it doesn’t associate the cross-region spans. If I write to a non-writer elixir node, the write works correctly, but it seems like there is context not getting passed via the RPC call in a way that lets the writer node know it’s a child of a span on the initiating node, and I end up with orphan spans. I haven’t figured out how to pass that context through yet. I suspect it’s related to this, but I have some more experimenting to do.
Hi @smashedtoatoms,
That’s some top-notch sleuthing!
Yes, the spans issue makes sense now because there is no process-level context being passed through the RPC calls.
In the past, I’ve copied metadata to spawned processes to maintain request ids, etc. It’s possible something similar could be implemented if we knew what context to copy. That’s assuming the context needed was stored in the process dictionary. However, since their context appears to be :"$callers"
I don’t know if that can be “forwarded” in any meaningful way. This is where I don’t know enough about how the library works.
This sounds like a general problem of, “How do I use opentelemetry to measure spans with distributed computing/operations?”
I know this is old but wanted to chime in to say that the context is meant to be copied and $callers
is only used because there is no way to tell Ecto to pass the context to sub-processes.
An example with Task
can be found in the Otel docs, Instrumentation | OpenTelemetry and would be the same with an RPC call.
Thanks so much for replying to such an old thread! I will give this a shot and see if I can get it working. If I do manage to get it working, I will post the results back here. Thanks again!