Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xdsclient: release lock before attempting to close underlying transport #8011

Merged
merged 2 commits into from
Jan 16, 2025

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Jan 15, 2025

This PR fixes a rare deadlock where multiple channels are created and closed to the same target URI (with an xds scheme) at the same time. This deadlock leads to xDS client creation being stalled and therefore everything on the gRPC channel is stalled.

RELEASE NOTES:

  • xds: fixes a possible deadlock that happens when both the client application and the xDS management server (responsible for configuring the client) were using the xds:/// scheme in their target URIs

@easwars easwars requested a review from dfawley January 15, 2025 23:40
@easwars easwars added this to the 1.70 Release milestone Jan 15, 2025
@easwars
Copy link
Contributor Author

easwars commented Jan 15, 2025

@purnesh42H : FYI

Comment on lines 56 to 57
// This attempts to close the transport to the management server and could
// take a while to complete. So, call it without holding the lock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rewrite this a little. The reason to do it without the lock is not because it will "take a while" (because that implies it's purely for performance), but because the things it does could theoretically call into the xdsclient package again and deadlock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

Copy link

codecov bot commented Jan 16, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.03%. Comparing base (eb1adde) to head (3fdb7c5).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
xds/internal/xdsclient/client_refcounted.go 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8011      +/-   ##
==========================================
- Coverage   82.16%   82.03%   -0.14%     
==========================================
  Files         384      384              
  Lines       38745    38750       +5     
==========================================
- Hits        31836    31788      -48     
- Misses       5598     5636      +38     
- Partials     1311     1326      +15     
Files with missing lines Coverage Δ
xds/internal/xdsclient/client_refcounted.go 81.81% <87.50%> (-0.24%) ⬇️

... and 22 files with indirect coverage changes

@easwars easwars merged commit 9dc22c0 into grpc:master Jan 16, 2025
15 checks passed
arjan-bal pushed a commit to arjan-bal/grpc-go that referenced this pull request Jan 22, 2025
arjan-bal added a commit that referenced this pull request Jan 22, 2025
* server: fix buffer release timing in processUnaryRPC (#7998)

* xdsclient: release lock before attempting to close underlying transport (#8011)

* github: Run deps workflow against PR target branch and improve dir names (#8010)
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Jan 26, 2025
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Feb 13, 2025
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Feb 13, 2025
janardhanvissa pushed a commit to janardhanvissa/grpc-go that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants