-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental OpenTelemetry Tracing Support #1632
Conversation
Hey Ben, this probably isn't the best place to make this comment but I've noticed that rebuilds of the same job will get grouped under the same root span in datadog. This ends up showing the one span as taking as long as the first build + the delay before I click the rebuild button + the second build. It would be nice if there was a way to have a root span per build instead of per pipeline+sha |
@KevinGreen this is definitely an issue that i'm interested in fixing - could you open an issue on the repo and include a screenshot or two so that i know what i'm looking for? |
7c879d1
to
fec0ca9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I'm so excited this is happening.
I don't suppose there's anything in tracing.go
that's worth unit testing? I guess there aren't that many branches so if it works at all it's probably all working. /shrug
Also, clean up the wording a little in the flocks experiment description
🤔 Problem:
The times, they are a-changing, and so are distributed tracing standards.
The CNCF, in their infinite wisdom, have archived the OpenTracing project, upon which our implementation of tracing in the agent was implemented. We'll continue to support OpenTracing tracing through datadog for the forseeable future, but people are rightly migrating to the newer standard of OpenTelemetry.
OpenTelemetry has a tracing provider that's well-defined, supports multiple backends (Datadog, Jaeger, etc), and is all-around a good and useful piece of software. We should provide functionality within the agent to trace CI runs using OpenTelemetry tracing.
There's a bit of an issue in that while OpenTracing and OpenTelemetry are similar, they're just different enough that it's a bit of a hassle cognitively to have them both in the same place. In this PR you'll see a bunch of stuff where we're doing things with OpenTelemetry spans that are just slightly different to what we'd do with OpenTracing spans. It doesn't hurt (or more accurately, it does hurt) that they share an acronym, and have similar names.
💬 Cool, so what does this PR do?
Building on the work done in #1631, we add a new tracing backend,
opentelemetry
, and associated utilities in thetracetools
andbootstrap
packages in order to be able to use them. This new tracing backend is behind an experiment calledopentelemetry-tracing
, and is not considered stable - we may change the implementation without warning.I've confirmed that this implementation can egest tracing data into a local Jaeger instance, and that those traces look relatively normal (to my admittedly untrained eye, anyway).
🙅♂️ What doesn't it do?
Notably missing from the OpenTelemetry implementation are equivalents of the methods in
tracetools/propagate.go
. A large part of this is that after looking into this for quite a while, i'm still not exactly sure of:With this in mind, I'm open to bundling this lack of functionality into "well, it's experimental" and only implementing it only if it gets asked for, but this is also kind of a cop out, so if you have more information on this please let me know!
🏴☠️ Acknowledgements:
Huge thanks to @rajatvig and his awesome work on #1548 - this PR would be in much, much rougher shape without all their code that
i stoleinspired me