-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pekko: fix spans on server timeout #13435
Conversation
@@ -69,7 +63,7 @@ public GraphStageLogic createLogic(Attributes attributes) { | |||
} | |||
|
|||
private class TracingLogic extends GraphStageLogic { | |||
private final Deque<TracingRequest> requests = new ArrayDeque<>(); | |||
private final Queue<PekkoTracingRequest> requests = new ArrayDeque<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both here and in PekkoHttpServerTracer I'm using a queue (LIFO) because that's what the approach taken in Pekko by the request-timeout stage here.
The previous approach used the Deque as FIFO. Changing it here for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous usage was a bug since pipelined requests should complete in the oder they are received. Currently with our tests it seems that pekko does not have multiple requests in this queue, probably some extra configuration is needed. If we could figure out how to actually enable pipelining we could consider improving the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enabled the http-pipelining test for the async tests, and managed to get it to work (see the comment on how). Nevertheless switching both deques from LIFO to FIFO has no effect - the tests pass in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at this. If it doesn't break then I'd guess that perhaps there isn't more than 1 request in the queue at the same time. Maybe the requests are processed too fast, idk. I think if there is interest we can tackle this in a follow up PR.
Looking at https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/13435/files#diff-dc7412c25079c85f7462b02768d4445cda72f0934517132253e0b9140f7d69dcR147-R153 this really isn't a good idea. The scope from makeCurrent
must be closed. Leaving the scope open is almost never the best solution as not closing a scope will prevent closing all other scopes that were open for that thread, it could lead to a memory leak or the thread ending in a state where all subsequent spans are added to one trace. If this is just for pipelining I'd suggest you to revert this part for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I can revert it. This PR didn't break it so yeah a follow up should look into it.
A thought - how bad would it be to use the Context.root()
as the parent context when creating the server span? i.e. change this to:
Context parentContext = Context.root();
The tests pass if I do that and remove the tracingRequest.parentContext.makeCurrent()
bit, since the previous request's context is ignored when handling the next request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually it is better to reset the scope where the unwanted context propagation happens. We usually fail the tests when there are leaked scopes but this is disabled in
opentelemetry-java-instrumentation/instrumentation/pekko/pekko-http-1.0/javaagent/build.gradle.kts
Line 50 in dfe392e
jvmArgs("-Dio.opentelemetry.javaagent.shaded.io.opentelemetry.context.enableStrictContext=false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I've removed my dodgy workarounds - we can consider better solutions for http-pipelining in another issue.
I've also removed that system property and the tests still pass 👍
.../java/io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoFlowWrapper.java
Outdated
Show resolved
Hide resolved
...etry/javaagent/instrumentation/pekkohttp/v1_0/server/HttpServerBluePrintInstrumentation.java
Outdated
Show resolved
Hide resolved
...etry/javaagent/instrumentation/pekkohttp/v1_0/server/HttpServerBluePrintInstrumentation.java
Outdated
Show resolved
Hide resolved
.../java/io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoFlowWrapper.java
Show resolved
Hide resolved
.../io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoHttpServerTracer.java
Show resolved
Hide resolved
.../io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoHttpServerTracer.java
Outdated
Show resolved
Hide resolved
.../java/io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoFlowWrapper.java
Show resolved
Hide resolved
.../io/opentelemetry/javaagent/instrumentation/pekkohttp/v1_0/server/PekkoHttpServerTracer.java
Show resolved
Hide resolved
@@ -69,7 +63,7 @@ public GraphStageLogic createLogic(Attributes attributes) { | |||
} | |||
|
|||
private class TracingLogic extends GraphStageLogic { | |||
private final Deque<TracingRequest> requests = new ArrayDeque<>(); | |||
private final Queue<PekkoTracingRequest> requests = new ArrayDeque<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've enabled the http-pipelining test for the async tests, and managed to get it to work (see the comment on how). Nevertheless switching both deques from LIFO to FIFO has no effect - the tests pass in both cases.
@@ -62,7 +64,7 @@ object PekkoHttpTestAsyncWebServer { | |||
if (null == binding) { | |||
import scala.concurrent.duration._ | |||
binding = Await.result( | |||
Http().bindAndHandleAsync(asyncHandler, "localhost", port), | |||
Http().bindAndHandleAsync(asyncHandler, "localhost", port, parallelism = 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change wasn't necessary, but I wanted to make sure the tests pass with parallelism enabled. I can revert though if preferred.
382a324
to
e97a1df
Compare
When pekko's HTTP server is configured with a request-timeout (default 20s) the span is never ended, and its response status code is never set. This fixes that, by:
PekkoFlowWrapper
toPekkoHttpServerTracer
, which is injected higher up the stack so it sees theHttpResponse
after the timeout logic has runPekkoHttpServerTracer
passes the otel context toPekkoFlowWrapper
via theHttpRequest
's attributes