Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Parameter pack workaround fails with nvcc #1629

Closed
h-vetinari opened this issue Feb 26, 2024 · 17 comments
Closed

[Bug]: Parameter pack workaround fails with nvcc #1629

h-vetinari opened this issue Feb 26, 2024 · 17 comments
Assignees

Comments

@h-vetinari
Copy link
Contributor

h-vetinari commented Feb 26, 2024

Describe the issue

Was looking into failures with abseil 20240116.1 and tensorflow today, which fail with

# Execution platform: @local_execution_config_platform//:platform
$PREFIX/include/absl/strings/str_cat.h: In function 'std::enable_if_t<(sizeof... (T) > 1), typename std::common_type<typename std::conditional<true, void, absl::lts_20240116::strings_internal::EnableIfFastCase<T, typename std::enable_if<((std::is_arithmetic<T>::value && (! std::is_same<T, char>::value)) && (! std::is_same<T, bool>::value)), void>::type> >::type ...>::type> absl::lts_20240116::StrAppend(absl::Nonnull<T*>, T ...)':
$PREFIX/include/absl/strings/str_cat.h:606:316: error: expected ';' before '...' token
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^  

$PREFIX/include/absl/strings/str_cat.h:606:43: error: parameter packs not expanded with '...': 
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^                         

The relevant chunk of code

for (const SomeTrivialEmptyType& dummy2 :
{(/* Comma expressions are poor man's C++17 fold expression for C++14 */
(void)(n = lengths[i]),
(void)(n < 0 ? (void)(*pos++ = '-'), (n = ~n) : 0),
(void)absl::numbers_internal::FastIntToBufferBackward(
absl::numbers_internal::UnsignedAbsoluteValue(std::move(args)),
pos += n, static_cast<uint32_t>(n)),
(void)++i, dummy1)...}) {
(void)dummy2; // Remove & migrate to fold expressions in C++17

got introduced recently (d5a2cec), and as it turns out, there already was a comment on that commit noting exactly this problem:

@georgthegreat: @derekmauro, this pattern breaks nvcc (tested with nvcc11.4 + c++14, nvcc11.4 + c++17, nvcc12.1 + c++17 modes) with the following error:

absl/strings/str_cat.h:609:316: error: expected ';' after expression
for (const SomeTrivialEmptyType &dummy2 : ({((((((void)(n = (lengths[i]))), ((void)((n < (0)) ? ((void)((*(pos++)) = '-')), (n = (~n)) : (0)))), ((void)numbers_internal::FastIntToBufferBackward(numbers_internal::UnsignedAbsoluteValue(std::move(args)), pos += n, static_cast< uint32_t>(n)))), ((void)(++i))), dummy1)...}))

Can we do anything about this? I do not quite understand the meaning of this code, so I can not truly refactor it to make it compilable.

I do understand that nvcc is quite weird when it comes to such complex c++ code, so I need a workaround of some kind.

@derekmauro: @higher-performance - Can you take a look at this?

I think it effectively boils down to a compiler shortcoming of nvcc, but it would be good to be able to use abseil with CUDA.

Steps to reproduce the problem

Build CUDA-enabled tensorflow with newest abseil.

What version of Abseil are you using?

20240116.1

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC + nvcc

What build system are you using?

bazel

Additional context

No response

h-vetinari referenced this issue Feb 26, 2024
The updated code is designed to:
- Be branch-predictor-friendly
- Be cache-friendly
- Minimize the lengths of critical paths
- Minimize slow operations (particularly multiplications)
- Minimize binary/codegen bloat

The most notable performance trick here is perhaps the precomputation & caching of the number of digits, so that we can reuse/exploit it when writing the output.

This precomputation of the exact length enables 2 further performance benefits:
- It makes `StrCat` and `StrAppend` zero-copy when only integers are passed, by avoiding intermediate `AlphaNum` entirely in those cases. If needed in the future, we can probably also make many other mixtures of non-integer types zero-copy as well.
- It avoids over-reservation of the string buffer, allowing for more strings to fit inside SSO, which will likely have further performance benefits.

There is also a side benefit of preventing `FastIntToBuffer` from writing beyond the end of the buffer, which has caused buffer overflows in the past.

The new code continues to use & extend some of the existing core tricks (such as the division-by-100 trick), as those are already efficient.

PiperOrigin-RevId: 595785531
Change-Id: Id6920e7e038fec10b2c45f213de75dc7e2cbddd1
@higher-performance
Copy link

Thanks for following up, for some reason it seems I never received the notification for the initial comment. I'll take a look.

@higher-performance
Copy link

I seem to be unable to reproduce the compile error with a minimal example. Could you please do the following?

  1. Specify what compiler version & flags you are using

  2. Try to reproduce the compile error with a minimal example in Godbot

Thanks!

@georgthegreat
Copy link
Contributor

I am quite unsure that nvc++ provided by godbolt actually is an NVidia / Cuda compiler

It looks like nvc++ is a part of HPC SDK, which is a totally diferent product
https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/

It looks like godbolt does not provide nvcc compiler at all.

@higher-performance
Copy link

Ah. I tried this locally with an nvcc though and still didn't see an issue. So I'm still stuck.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Feb 27, 2024

Interestingly, I've failed to reproduce the error my collaborator encountered as well (I've tried the godbolt example from above with both nvcc 11.2 & 12.0 and --std=c++17)

@georgthegreat, can you reproduce the issue on your end?

@h-vetinari
Copy link
Contributor Author

OK, it has been reproduced on our end now as well, but so far only in the context of building tensorflow. Full log can be found here.

@higher-performance
Copy link

Thanks for the log. Unfortunately this isn't enough for me to try to investigate it. I really need a reproduction I can run locally without significant effort -- which means (a) a compiler I can access easily (i.e. something on Godbolt, or available in a typical OS's package manager), and (b) a minimal reproduction (1 file) rather than an entire docker image. Having to set up a build environment and run Docker, etc. (even if I knew the command-lines for doing so, which I don't right now) is not something I can realistically do in the foreseeable future. So, if that's all we have available -- you may need to investigate this on your end.

Here's what I can help you with, though. I believe I've seen this error before. It arose from an expression like std::forward<T>(args)..., where the (buggy) compiler didn't like the fact that args and T were two seemingly unrelated parameter packs. In that case, using std::forward<decltype(args)>(args)... fixed the problem.

Unfortunately, we don't have that here... we just have std::move(args), which shouldn't have this problem, I would think, but perhaps it somehow still does for nvcc?

Since I have no way to reproduce this, I have no way to test it. But if you can reproduce this on your end, I would suggest trying out variations of this, such as using std::forward<decltype(args)>(args)... in lieu of std::move(args), and seeing if that works? You can try passing template arguments to the other functions as well, and see if that makes any difference.

Ultimately, the compiler is buggy, so perhaps the answer ought to be to please upgrade your compiler, but it's hard to even suggest that without being able to test whether newer versions of nvcc break on this.

@anakinxc
Copy link

Hi @higher-performance

A repro with godbolt here

@higher-performance
Copy link

Awesome, thanks a lot! So it looks like nvcc can't parse use of the comma operator in expansions:

template<class... T>
void StrAppend(T... args) {
  for (int _ : {((void)args, 0)...}) {
  }
}

I could probably find a workaround, but it would hurt code readability even more... let me look into it and see if I can do anything. I would suggest reporting this upstream to NVIDIA in any case.

@higher-performance
Copy link

Ah, here's a workaround. Apparently this works, I'll take care of it.

@h-vetinari
Copy link
Contributor Author

Thank you very much! :)

@georgthegreat
Copy link
Contributor

Thanks a lot!

@higher-performance
Copy link

Yup, thanks for the help!

@georgthegreat
Copy link
Contributor

@derekmauro, is it possible to backport this into 20240116 release branch?

@derekmauro
Copy link
Member

@derekmauro, is it possible to backport this into 20240116 release branch?

If there is another patch release (and there likely will be) then I will include this change.

aumuell added a commit to aumuell/spack that referenced this issue Mar 18, 2024
compiling mgard+cuda with [email protected] and nvcc from [email protected] against
protobuf pulling in [email protected] results in the errors reported
here: abseil/abseil-cpp#1629
alalazo pushed a commit to spack/spack that referenced this issue Mar 19, 2024
* mgard: don't restrict protobuf version more than necessary

successfully built:
mgard@2022-11-18 ^protobuf@3.{4,21,25}
mgard@2023-01-10 ^protobuf@3.{4,25}
mgard@2023-03-31 ^protobuf@3.{4,25}

compile failures:
mgard@2022-11-18 ^[email protected]
mgard@2023-01-10 ^[email protected]
mgard@2023-03-31 ^[email protected]

* mgard: add conflicts to address CI errors

* mgard: conflict between cuda and [email protected]

compiling mgard+cuda with [email protected] and nvcc from [email protected] against
protobuf pulling in [email protected] results in the errors reported
here: abseil/abseil-cpp#1629
mathomp4 pushed a commit to mathomp4/spack that referenced this issue Mar 27, 2024
* mgard: don't restrict protobuf version more than necessary

successfully built:
mgard@2022-11-18 ^protobuf@3.{4,21,25}
mgard@2023-01-10 ^protobuf@3.{4,25}
mgard@2023-03-31 ^protobuf@3.{4,25}

compile failures:
mgard@2022-11-18 ^[email protected]
mgard@2023-01-10 ^[email protected]
mgard@2023-03-31 ^[email protected]

* mgard: add conflicts to address CI errors

* mgard: conflict between cuda and [email protected]

compiling mgard+cuda with [email protected] and nvcc from [email protected] against
protobuf pulling in [email protected] results in the errors reported
here: abseil/abseil-cpp#1629
netkex pushed a commit to netkex/abseil-cpp that referenced this issue Apr 3, 2024
…ansions in range of range-based for loop

Fixes: abseil#1629
PiperOrigin-RevId: 611131201
Change-Id: I787731e00207b544ee16055e6e0d323a5094a433
derekmauro added a commit to derekmauro/abseil-cpp that referenced this issue Apr 8, 2024
* Prevent overflow in absl::CEscape()
Strings larger than 1 GiB on a platform with a 32-bit size_t could
potentially overflow size_t in `CEscapedLength()`, resulting in an
undersized allocation. The resulting write in
`CEscapeAndAppendInternal()` would then write beyond the bounds of the
output buffer.

A second overflow, where the calculated escaped length is added to the
size of the string being appended to, is also fixed.

In both cases the program will now abort prior to the overflow.

Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue.

PiperOrigin-RevId: 607019573
Change-Id: I97bf246cde96102a793d2db49446cccae08abf59

* Workaround for NVIDIA C++ compiler being unable to parse variadic
expansions in range of range-based for loop

Fixes: abseil#1629
PiperOrigin-RevId: 611131201
Change-Id: I787731e00207b544ee16055e6e0d323a5094a433

* Fix OSX support with CocoaPods and Xcode 15

PiperOrigin-RevId: 615090942
Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3

* PR abseil#1643: add xcprivacy to all subspecs
Imported from GitHub PR abseil#1643

Addressing comments at abseil#1604
Add a xcprivacy subspec and have all other subspecs depend on it (option 1)

Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it.

Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11
Merge 4405cdf into 4539c54

Merging this change closes abseil#1643

COPYBARA_INTEGRATE_REVIEW=abseil#1643 from HannahShiSFB:privacy-manifests 4405cdf
PiperOrigin-RevId: 616914674
Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc
derekmauro added a commit that referenced this issue Apr 8, 2024
* Prevent overflow in absl::CEscape()
Strings larger than 1 GiB on a platform with a 32-bit size_t could
potentially overflow size_t in `CEscapedLength()`, resulting in an
undersized allocation. The resulting write in
`CEscapeAndAppendInternal()` would then write beyond the bounds of the
output buffer.

A second overflow, where the calculated escaped length is added to the
size of the string being appended to, is also fixed.

In both cases the program will now abort prior to the overflow.

Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue.

PiperOrigin-RevId: 607019573
Change-Id: I97bf246cde96102a793d2db49446cccae08abf59

* Workaround for NVIDIA C++ compiler being unable to parse variadic
expansions in range of range-based for loop

Fixes: #1629
PiperOrigin-RevId: 611131201
Change-Id: I787731e00207b544ee16055e6e0d323a5094a433

* Fix OSX support with CocoaPods and Xcode 15

PiperOrigin-RevId: 615090942
Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3

* PR #1643: add xcprivacy to all subspecs
Imported from GitHub PR #1643

Addressing comments at #1604
Add a xcprivacy subspec and have all other subspecs depend on it (option 1)

Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it.

Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11
Merge 4405cdf into 4539c54

Merging this change closes #1643

COPYBARA_INTEGRATE_REVIEW=#1643 from HannahShiSFB:privacy-manifests 4405cdf
PiperOrigin-RevId: 616914674
Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc
@derekmauro
Copy link
Member

@georgthegreat
Copy link
Contributor

Thanks a lot!

teaguesterling pushed a commit to teaguesterling/spack that referenced this issue Jun 15, 2024
* mgard: don't restrict protobuf version more than necessary

successfully built:
mgard@2022-11-18 ^protobuf@3.{4,21,25}
mgard@2023-01-10 ^protobuf@3.{4,25}
mgard@2023-03-31 ^protobuf@3.{4,25}

compile failures:
mgard@2022-11-18 ^[email protected]
mgard@2023-01-10 ^[email protected]
mgard@2023-03-31 ^[email protected]

* mgard: add conflicts to address CI errors

* mgard: conflict between cuda and [email protected]

compiling mgard+cuda with [email protected] and nvcc from [email protected] against
protobuf pulling in [email protected] results in the errors reported
here: abseil/abseil-cpp#1629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants