Experimental OpenTelemetry Tracing Support #1632

moskyb · 2022-05-10T04:38:52Z

🤔 Problem:

The times, they are a-changing, and so are distributed tracing standards.

The CNCF, in their infinite wisdom, have archived the OpenTracing project, upon which our implementation of tracing in the agent was implemented. We'll continue to support OpenTracing tracing through datadog for the forseeable future, but people are rightly migrating to the newer standard of OpenTelemetry.

OpenTelemetry has a tracing provider that's well-defined, supports multiple backends (Datadog, Jaeger, etc), and is all-around a good and useful piece of software. We should provide functionality within the agent to trace CI runs using OpenTelemetry tracing.

There's a bit of an issue in that while OpenTracing and OpenTelemetry are similar, they're just different enough that it's a bit of a hassle cognitively to have them both in the same place. In this PR you'll see a bunch of stuff where we're doing things with OpenTelemetry spans that are just slightly different to what we'd do with OpenTracing spans. It doesn't hurt (or more accurately, it does hurt) that they share an acronym, and have similar names.

💬 Cool, so what does this PR do?

Building on the work done in #1631, we add a new tracing backend, opentelemetry, and associated utilities in the tracetools and bootstrap packages in order to be able to use them. This new tracing backend is behind an experiment called opentelemetry-tracing, and is not considered stable - we may change the implementation without warning.

I've confirmed that this implementation can egest tracing data into a local Jaeger instance, and that those traces look relatively normal (to my admittedly untrained eye, anyway).

You can find a JSON dump of the pictured trace here

🙅‍♂️ What doesn't it do?

Notably missing from the OpenTelemetry implementation are equivalents of the methods in tracetools/propagate.go. A large part of this is that after looking into this for quite a while, i'm still not exactly sure of:

What these functions actually do in the context of opentracing
What the best way to implement them in an OpenTelemetry context is
If this functionality gets used

With this in mind, I'm open to bundling this lack of functionality into "well, it's experimental" and only implementing it only if it gets asked for, but this is also kind of a cop out, so if you have more information on this please let me know!

🏴‍☠️ Acknowledgements:

Huge thanks to @rajatvig and his awesome work on #1548 - this PR would be in much, much rougher shape without all their code that ~~i stole~~ inspired me

KevinGreen · 2022-05-13T18:06:09Z

Hey Ben, this probably isn't the best place to make this comment but I've noticed that rebuilds of the same job will get grouped under the same root span in datadog. This ends up showing the one span as taking as long as the first build + the delay before I click the rebuild button + the second build. It would be nice if there was a way to have a root span per build instead of per pipeline+sha

moskyb · 2022-05-15T23:00:28Z

@KevinGreen this is definitely an issue that i'm interested in fixing - could you open an issue on the repo and include a screenshot or two so that i know what i'm looking for?

tessereth

Looks great! I'm so excited this is happening.

I don't suppose there's anything in tracing.go that's worth unit testing? I guess there aren't that many branches so if it works at all it's probably all working. /shrug

bootstrap/tracing.go

tracetools/span.go

Also, clean up the wording a little in the flocks experiment description

moskyb changed the title ~~Opentelemetry~~ WIP AGAIN: Opentelemetry May 10, 2022

moskyb force-pushed the opentelemetry branch from 50842cf to 11a5297 Compare May 10, 2022 22:16

moskyb force-pushed the modular-tracing branch from d6548e9 to a1e8977 Compare May 10, 2022 23:10

moskyb force-pushed the opentelemetry branch from 11a5297 to bb62bae Compare May 11, 2022 06:36

moskyb force-pushed the modular-tracing branch from a1e8977 to d24febc Compare May 11, 2022 23:50

moskyb force-pushed the opentelemetry branch from bb62bae to a8b82cc Compare May 13, 2022 00:59

moskyb force-pushed the modular-tracing branch 2 times, most recently from 7c879d1 to fec0ca9 Compare May 16, 2022 23:23

moskyb force-pushed the opentelemetry branch from a8b82cc to 4113de8 Compare May 17, 2022 01:42

moskyb force-pushed the modular-tracing branch from fec0ca9 to f8ff0dc Compare May 17, 2022 02:56

Base automatically changed from modular-tracing to main May 17, 2022 03:04

moskyb force-pushed the opentelemetry branch from 4113de8 to f685f3c Compare May 17, 2022 03:35

moskyb changed the title ~~WIP AGAIN: Opentelemetry~~ Experimental OpenTelemetry Tracing Support May 17, 2022

moskyb requested review from tessereth and pda May 17, 2022 05:20

moskyb marked this pull request as ready for review May 17, 2022 05:20

tessereth approved these changes May 18, 2022

View reviewed changes

bootstrap/tracing.go Outdated Show resolved Hide resolved

bootstrap/tracing.go Outdated Show resolved Hide resolved

tracetools/span.go Outdated Show resolved Hide resolved

moskyb force-pushed the opentelemetry branch from db1d2c9 to ec5d53d Compare May 18, 2022 02:10

moskyb added 3 commits May 18, 2022 16:09

Add OpenTelemetry span type to tracetools

26096a3

Add support for OpenTelemetry tracing

8e8f27d

Add opentelemetry-tracing to the experiments list

2422ee7

Also, clean up the wording a little in the flocks experiment description

moskyb force-pushed the opentelemetry branch from ec5d53d to 2422ee7 Compare May 18, 2022 04:10

moskyb enabled auto-merge May 18, 2022 04:13

moskyb merged commit 3d2769f into main May 18, 2022

moskyb deleted the opentelemetry branch May 18, 2022 11:55

moskyb mentioned this pull request Jun 9, 2022

Make tracing easier to configure buildkite/elastic-ci-stack-for-aws#1030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental OpenTelemetry Tracing Support #1632

Experimental OpenTelemetry Tracing Support #1632

moskyb commented May 10, 2022 •

edited

Loading

KevinGreen commented May 13, 2022

moskyb commented May 15, 2022

tessereth left a comment

Experimental OpenTelemetry Tracing Support #1632

Experimental OpenTelemetry Tracing Support #1632

Conversation

moskyb commented May 10, 2022 • edited Loading

🤔 Problem:

💬 Cool, so what does this PR do?

🙅‍♂️ What doesn't it do?

🏴‍☠️ Acknowledgements:

KevinGreen commented May 13, 2022

moskyb commented May 15, 2022

tessereth left a comment

Choose a reason for hiding this comment

moskyb commented May 10, 2022 •

edited

Loading