-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Parameter pack workaround fails with nvcc #1629
Comments
The updated code is designed to: - Be branch-predictor-friendly - Be cache-friendly - Minimize the lengths of critical paths - Minimize slow operations (particularly multiplications) - Minimize binary/codegen bloat The most notable performance trick here is perhaps the precomputation & caching of the number of digits, so that we can reuse/exploit it when writing the output. This precomputation of the exact length enables 2 further performance benefits: - It makes `StrCat` and `StrAppend` zero-copy when only integers are passed, by avoiding intermediate `AlphaNum` entirely in those cases. If needed in the future, we can probably also make many other mixtures of non-integer types zero-copy as well. - It avoids over-reservation of the string buffer, allowing for more strings to fit inside SSO, which will likely have further performance benefits. There is also a side benefit of preventing `FastIntToBuffer` from writing beyond the end of the buffer, which has caused buffer overflows in the past. The new code continues to use & extend some of the existing core tricks (such as the division-by-100 trick), as those are already efficient. PiperOrigin-RevId: 595785531 Change-Id: Id6920e7e038fec10b2c45f213de75dc7e2cbddd1
Thanks for following up, for some reason it seems I never received the notification for the initial comment. I'll take a look. |
I seem to be unable to reproduce the compile error with a minimal example. Could you please do the following?
Thanks! |
I am quite unsure that It looks like nvc++ is a part of HPC SDK, which is a totally diferent product It looks like godbolt does not provide nvcc compiler at all. |
Ah. I tried this locally with an nvcc though and still didn't see an issue. So I'm still stuck. |
Interestingly, I've failed to reproduce the error my collaborator encountered as well (I've tried the godbolt example from above with both @georgthegreat, can you reproduce the issue on your end? |
OK, it has been reproduced on our end now as well, but so far only in the context of building tensorflow. Full log can be found here. |
Thanks for the log. Unfortunately this isn't enough for me to try to investigate it. I really need a reproduction I can run locally without significant effort -- which means (a) a compiler I can access easily (i.e. something on Godbolt, or available in a typical OS's package manager), and (b) a minimal reproduction (1 file) rather than an entire docker image. Having to set up a build environment and run Docker, etc. (even if I knew the command-lines for doing so, which I don't right now) is not something I can realistically do in the foreseeable future. So, if that's all we have available -- you may need to investigate this on your end. Here's what I can help you with, though. I believe I've seen this error before. It arose from an expression like Unfortunately, we don't have that here... we just have Since I have no way to reproduce this, I have no way to test it. But if you can reproduce this on your end, I would suggest trying out variations of this, such as using Ultimately, the compiler is buggy, so perhaps the answer ought to be to please upgrade your compiler, but it's hard to even suggest that without being able to test whether newer versions of nvcc break on this. |
A repro with godbolt here |
Awesome, thanks a lot! So it looks like nvcc can't parse use of the comma operator in expansions:
I could probably find a workaround, but it would hurt code readability even more... let me look into it and see if I can do anything. I would suggest reporting this upstream to NVIDIA in any case. |
Ah, here's a workaround. Apparently this works, I'll take care of it. |
Thank you very much! :) |
Thanks a lot! |
Yup, thanks for the help! |
@derekmauro, is it possible to backport this into 20240116 release branch? |
If there is another patch release (and there likely will be) then I will include this change. |
compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629
* mgard: don't restrict protobuf version more than necessary successfully built: mgard@2022-11-18 ^protobuf@3.{4,21,25} mgard@2023-01-10 ^protobuf@3.{4,25} mgard@2023-03-31 ^protobuf@3.{4,25} compile failures: mgard@2022-11-18 ^[email protected] mgard@2023-01-10 ^[email protected] mgard@2023-03-31 ^[email protected] * mgard: add conflicts to address CI errors * mgard: conflict between cuda and [email protected] compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629
* mgard: don't restrict protobuf version more than necessary successfully built: mgard@2022-11-18 ^protobuf@3.{4,21,25} mgard@2023-01-10 ^protobuf@3.{4,25} mgard@2023-03-31 ^protobuf@3.{4,25} compile failures: mgard@2022-11-18 ^[email protected] mgard@2023-01-10 ^[email protected] mgard@2023-03-31 ^[email protected] * mgard: add conflicts to address CI errors * mgard: conflict between cuda and [email protected] compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629
…ansions in range of range-based for loop Fixes: abseil#1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433
* Prevent overflow in absl::CEscape() Strings larger than 1 GiB on a platform with a 32-bit size_t could potentially overflow size_t in `CEscapedLength()`, resulting in an undersized allocation. The resulting write in `CEscapeAndAppendInternal()` would then write beyond the bounds of the output buffer. A second overflow, where the calculated escaped length is added to the size of the string being appended to, is also fixed. In both cases the program will now abort prior to the overflow. Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue. PiperOrigin-RevId: 607019573 Change-Id: I97bf246cde96102a793d2db49446cccae08abf59 * Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop Fixes: abseil#1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433 * Fix OSX support with CocoaPods and Xcode 15 PiperOrigin-RevId: 615090942 Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3 * PR abseil#1643: add xcprivacy to all subspecs Imported from GitHub PR abseil#1643 Addressing comments at abseil#1604 Add a xcprivacy subspec and have all other subspecs depend on it (option 1) Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it. Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11 Merge 4405cdf into 4539c54 Merging this change closes abseil#1643 COPYBARA_INTEGRATE_REVIEW=abseil#1643 from HannahShiSFB:privacy-manifests 4405cdf PiperOrigin-RevId: 616914674 Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc
* Prevent overflow in absl::CEscape() Strings larger than 1 GiB on a platform with a 32-bit size_t could potentially overflow size_t in `CEscapedLength()`, resulting in an undersized allocation. The resulting write in `CEscapeAndAppendInternal()` would then write beyond the bounds of the output buffer. A second overflow, where the calculated escaped length is added to the size of the string being appended to, is also fixed. In both cases the program will now abort prior to the overflow. Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue. PiperOrigin-RevId: 607019573 Change-Id: I97bf246cde96102a793d2db49446cccae08abf59 * Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop Fixes: #1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433 * Fix OSX support with CocoaPods and Xcode 15 PiperOrigin-RevId: 615090942 Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3 * PR #1643: add xcprivacy to all subspecs Imported from GitHub PR #1643 Addressing comments at #1604 Add a xcprivacy subspec and have all other subspecs depend on it (option 1) Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it. Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11 Merge 4405cdf into 4539c54 Merging this change closes #1643 COPYBARA_INTEGRATE_REVIEW=#1643 from HannahShiSFB:privacy-manifests 4405cdf PiperOrigin-RevId: 616914674 Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc
https://github.com/abseil/abseil-cpp/releases/tag/20240116.2 contains this patch. |
Thanks a lot! |
* mgard: don't restrict protobuf version more than necessary successfully built: mgard@2022-11-18 ^protobuf@3.{4,21,25} mgard@2023-01-10 ^protobuf@3.{4,25} mgard@2023-03-31 ^protobuf@3.{4,25} compile failures: mgard@2022-11-18 ^[email protected] mgard@2023-01-10 ^[email protected] mgard@2023-03-31 ^[email protected] * mgard: add conflicts to address CI errors * mgard: conflict between cuda and [email protected] compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629
Describe the issue
Was looking into failures with abseil 20240116.1 and tensorflow today, which fail with
The relevant chunk of code
abseil-cpp/absl/strings/str_cat.h
Lines 606 to 614 in 2f9e432
got introduced recently (d5a2cec), and as it turns out, there already was a comment on that commit noting exactly this problem:
I think it effectively boils down to a compiler shortcoming of nvcc, but it would be good to be able to use abseil with CUDA.
Steps to reproduce the problem
Build CUDA-enabled tensorflow with newest abseil.
What version of Abseil are you using?
20240116.1
What operating system and version are you using?
Linux
What compiler and version are you using?
GCC + nvcc
What build system are you using?
bazel
Additional context
No response
The text was updated successfully, but these errors were encountered: