[Bug]: Parameter pack workaround fails with nvcc #1629

h-vetinari · 2024-02-26T04:04:27Z

Describe the issue

Was looking into failures with abseil 20240116.1 and tensorflow today, which fail with

# Execution platform: @local_execution_config_platform//:platform
$PREFIX/include/absl/strings/str_cat.h: In function 'std::enable_if_t<(sizeof... (T) > 1), typename std::common_type<typename std::conditional<true, void, absl::lts_20240116::strings_internal::EnableIfFastCase<T, typename std::enable_if<((std::is_arithmetic<T>::value && (! std::is_same<T, char>::value)) && (! std::is_same<T, bool>::value)), void>::type> >::type ...>::type> absl::lts_20240116::StrAppend(absl::Nonnull<T*>, T ...)':
$PREFIX/include/absl/strings/str_cat.h:606:316: error: expected ';' before '...' token
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^  

$PREFIX/include/absl/strings/str_cat.h:606:43: error: parameter packs not expanded with '...': 
  606 |   for (const SomeTrivialEmptyType& dummy2 :
      |                                           ^

The relevant chunk of code

abseil-cpp/absl/strings/str_cat.h

Lines 606 to 614 in 2f9e432

    
           for (const SomeTrivialEmptyType& dummy2 : 
        
                {(/* Comma expressions are poor man's C++17 fold expression for C++14 */ 
        
                  (void)(n = lengths[i]), 
        
                  (void)(n < 0 ? (void)(*pos++ = '-'), (n = ~n) : 0), 
        
                  (void)absl::numbers_internal::FastIntToBufferBackward( 
        
                      absl::numbers_internal::UnsignedAbsoluteValue(std::move(args)), 
        
                      pos += n, static_cast<uint32_t>(n)), 
        
                  (void)++i, dummy1)...}) { 
        
             (void)dummy2;  // Remove & migrate to fold expressions in C++17

got introduced recently (d5a2cec), and as it turns out, there already was a comment on that commit noting exactly this problem:

@georgthegreat: @derekmauro, this pattern breaks nvcc (tested with nvcc11.4 + c++14, nvcc11.4 + c++17, nvcc12.1 + c++17 modes) with the following error:
absl/strings/str_cat.h:609:316: error: expected ';' after expression
for (const SomeTrivialEmptyType &dummy2 : ({((((((void)(n = (lengths[i]))), ((void)((n < (0)) ? ((void)((*(pos++)) = '-')), (n = (~n)) : (0)))), ((void)numbers_internal::FastIntToBufferBackward(numbers_internal::UnsignedAbsoluteValue(std::move(args)), pos += n, static_cast< uint32_t>(n)))), ((void)(++i))), dummy1)...}))
Can we do anything about this? I do not quite understand the meaning of this code, so I can not truly refactor it to make it compilable.

I do understand that nvcc is quite weird when it comes to such complex c++ code, so I need a workaround of some kind.

@derekmauro: @higher-performance - Can you take a look at this?

I think it effectively boils down to a compiler shortcoming of nvcc, but it would be good to be able to use abseil with CUDA.

Steps to reproduce the problem

Build CUDA-enabled tensorflow with newest abseil.

What version of Abseil are you using?

20240116.1

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC + nvcc

What build system are you using?

bazel

Additional context

No response

The text was updated successfully, but these errors were encountered:

The updated code is designed to: - Be branch-predictor-friendly - Be cache-friendly - Minimize the lengths of critical paths - Minimize slow operations (particularly multiplications) - Minimize binary/codegen bloat The most notable performance trick here is perhaps the precomputation & caching of the number of digits, so that we can reuse/exploit it when writing the output. This precomputation of the exact length enables 2 further performance benefits: - It makes `StrCat` and `StrAppend` zero-copy when only integers are passed, by avoiding intermediate `AlphaNum` entirely in those cases. If needed in the future, we can probably also make many other mixtures of non-integer types zero-copy as well. - It avoids over-reservation of the string buffer, allowing for more strings to fit inside SSO, which will likely have further performance benefits. There is also a side benefit of preventing `FastIntToBuffer` from writing beyond the end of the buffer, which has caused buffer overflows in the past. The new code continues to use & extend some of the existing core tricks (such as the division-by-100 trick), as those are already efficient. PiperOrigin-RevId: 595785531 Change-Id: Id6920e7e038fec10b2c45f213de75dc7e2cbddd1

higher-performance · 2024-02-26T15:47:40Z

Thanks for following up, for some reason it seems I never received the notification for the initial comment. I'll take a look.

higher-performance · 2024-02-26T16:58:38Z

I seem to be unable to reproduce the compile error with a minimal example. Could you please do the following?

Specify what compiler version & flags you are using
Try to reproduce the compile error with a minimal example in Godbot

Thanks!

georgthegreat · 2024-02-26T21:57:00Z

I am quite unsure that nvc++ provided by godbolt actually is an NVidia / Cuda compiler

It looks like nvc++ is a part of HPC SDK, which is a totally diferent product
https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-ref-guide/

It looks like godbolt does not provide nvcc compiler at all.

higher-performance · 2024-02-26T21:58:25Z

Ah. I tried this locally with an nvcc though and still didn't see an issue. So I'm still stuck.

h-vetinari · 2024-02-27T10:46:11Z

Interestingly, I've failed to reproduce the error my collaborator encountered as well (I've tried the godbolt example from above with both nvcc 11.2 & 12.0 and --std=c++17)

@georgthegreat, can you reproduce the issue on your end?

h-vetinari · 2024-02-28T01:43:57Z

OK, it has been reproduced on our end now as well, but so far only in the context of building tensorflow. Full log can be found here.

higher-performance · 2024-02-28T05:20:13Z

Thanks for the log. Unfortunately this isn't enough for me to try to investigate it. I really need a reproduction I can run locally without significant effort -- which means (a) a compiler I can access easily (i.e. something on Godbolt, or available in a typical OS's package manager), and (b) a minimal reproduction (1 file) rather than an entire docker image. Having to set up a build environment and run Docker, etc. (even if I knew the command-lines for doing so, which I don't right now) is not something I can realistically do in the foreseeable future. So, if that's all we have available -- you may need to investigate this on your end.

Here's what I can help you with, though. I believe I've seen this error before. It arose from an expression like std::forward<T>(args)..., where the (buggy) compiler didn't like the fact that args and T were two seemingly unrelated parameter packs. In that case, using std::forward<decltype(args)>(args)... fixed the problem.

Unfortunately, we don't have that here... we just have std::move(args), which shouldn't have this problem, I would think, but perhaps it somehow still does for nvcc?

Since I have no way to reproduce this, I have no way to test it. But if you can reproduce this on your end, I would suggest trying out variations of this, such as using std::forward<decltype(args)>(args)... in lieu of std::move(args), and seeing if that works? You can try passing template arguments to the other functions as well, and see if that makes any difference.

Ultimately, the compiler is buggy, so perhaps the answer ought to be to please upgrade your compiler, but it's hard to even suggest that without being able to test whether newer versions of nvcc break on this.

anakinxc · 2024-02-28T07:05:32Z

Hi @higher-performance

A repro with godbolt here

higher-performance · 2024-02-28T15:19:53Z

Awesome, thanks a lot! So it looks like nvcc can't parse use of the comma operator in expansions:

template<class... T>
void StrAppend(T... args) {
  for (int _ : {((void)args, 0)...}) {
  }
}

I could probably find a workaround, but it would hurt code readability even more... let me look into it and see if I can do anything. I would suggest reporting this upstream to NVIDIA in any case.

higher-performance · 2024-02-28T15:29:09Z

Ah, here's a workaround. Apparently this works, I'll take care of it.

h-vetinari · 2024-02-28T20:34:19Z

Thank you very much! :)

georgthegreat · 2024-02-28T21:23:17Z

Thanks a lot!

higher-performance · 2024-02-28T21:44:31Z

Yup, thanks for the help!

georgthegreat · 2024-03-07T20:43:08Z

@derekmauro, is it possible to backport this into 20240116 release branch?

derekmauro · 2024-03-11T16:50:59Z

@derekmauro, is it possible to backport this into 20240116 release branch?

If there is another patch release (and there likely will be) then I will include this change.

compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629

* mgard: don't restrict protobuf version more than necessary successfully built: mgard@2022-11-18 ^protobuf@3.{4,21,25} mgard@2023-01-10 ^protobuf@3.{4,25} mgard@2023-03-31 ^protobuf@3.{4,25} compile failures: mgard@2022-11-18 ^[email protected] mgard@2023-01-10 ^[email protected] mgard@2023-03-31 ^[email protected] * mgard: add conflicts to address CI errors * mgard: conflict between cuda and [email protected] compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629

…ansions in range of range-based for loop Fixes: abseil#1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433

* Prevent overflow in absl::CEscape() Strings larger than 1 GiB on a platform with a 32-bit size_t could potentially overflow size_t in `CEscapedLength()`, resulting in an undersized allocation. The resulting write in `CEscapeAndAppendInternal()` would then write beyond the bounds of the output buffer. A second overflow, where the calculated escaped length is added to the size of the string being appended to, is also fixed. In both cases the program will now abort prior to the overflow. Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue. PiperOrigin-RevId: 607019573 Change-Id: I97bf246cde96102a793d2db49446cccae08abf59 * Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop Fixes: abseil#1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433 * Fix OSX support with CocoaPods and Xcode 15 PiperOrigin-RevId: 615090942 Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3 * PR abseil#1643: add xcprivacy to all subspecs Imported from GitHub PR abseil#1643 Addressing comments at abseil#1604 Add a xcprivacy subspec and have all other subspecs depend on it (option 1) Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it. Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11 Merge 4405cdf into 4539c54 Merging this change closes abseil#1643 COPYBARA_INTEGRATE_REVIEW=abseil#1643 from HannahShiSFB:privacy-manifests 4405cdf PiperOrigin-RevId: 616914674 Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc

* Prevent overflow in absl::CEscape() Strings larger than 1 GiB on a platform with a 32-bit size_t could potentially overflow size_t in `CEscapedLength()`, resulting in an undersized allocation. The resulting write in `CEscapeAndAppendInternal()` would then write beyond the bounds of the output buffer. A second overflow, where the calculated escaped length is added to the size of the string being appended to, is also fixed. In both cases the program will now abort prior to the overflow. Credit goes to Ronald Crane (Zippenhop LLC) for reporting this issue. PiperOrigin-RevId: 607019573 Change-Id: I97bf246cde96102a793d2db49446cccae08abf59 * Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop Fixes: #1629 PiperOrigin-RevId: 611131201 Change-Id: I787731e00207b544ee16055e6e0d323a5094a433 * Fix OSX support with CocoaPods and Xcode 15 PiperOrigin-RevId: 615090942 Change-Id: I7cc20a0129dcfbbddedd9e6d816bb6234bff14b3 * PR #1643: add xcprivacy to all subspecs Imported from GitHub PR #1643 Addressing comments at #1604 Add a xcprivacy subspec and have all other subspecs depend on it (option 1) Didn't going with option 3 because there are several levels of subspecs in abseil podspec, it's difficult to track whether all of them directly or indirectly depends on abseil/base/config or ensure they will continue to depend on it. Example of generated podsped: https://gist.github.com/HannahShiSFB/15d8fb6aa637f2781b7be4218d080f11 Merge 4405cdf into 4539c54 Merging this change closes #1643 COPYBARA_INTEGRATE_REVIEW=#1643 from HannahShiSFB:privacy-manifests 4405cdf PiperOrigin-RevId: 616914674 Change-Id: If56d5a4f1a7cc6f9fac7a2d8e95b55d140e645fc

derekmauro · 2024-04-08T20:39:40Z

https://github.com/abseil/abseil-cpp/releases/tag/20240116.2 contains this patch.

georgthegreat · 2024-04-09T09:27:27Z

Thanks a lot!

* mgard: don't restrict protobuf version more than necessary successfully built: mgard@2022-11-18 ^protobuf@3.{4,21,25} mgard@2023-01-10 ^protobuf@3.{4,25} mgard@2023-03-31 ^protobuf@3.{4,25} compile failures: mgard@2022-11-18 ^[email protected] mgard@2023-01-10 ^[email protected] mgard@2023-03-31 ^[email protected] * mgard: add conflicts to address CI errors * mgard: conflict between cuda and [email protected] compiling mgard+cuda with [email protected] and nvcc from [email protected] against protobuf pulling in [email protected] results in the errors reported here: abseil/abseil-cpp#1629

h-vetinari mentioned this issue Feb 26, 2024

Rebuild for libabseil 20240116, libgrp 1.61, libprotobuf 4.25.2 conda-forge/tensorflow-feedstock#372

Closed

anakinxc mentioned this issue Feb 26, 2024

[Bug]: Experiment to build with GPU support in WSL2 with RTX 1050 secretflow/spu#576

Closed

derekmauro assigned higher-performance Feb 26, 2024

copybara-service bot closed this as completed in cfde5f7 Feb 28, 2024

h-vetinari mentioned this issue Feb 28, 2024

backport patch for nvcc workaround conda-forge/abseil-cpp-feedstock#77

Merged

aumuell mentioned this issue Mar 18, 2024

mgard: don't restrict protobuf version more than necessary spack/spack#43172

Merged

derekmauro mentioned this issue Apr 8, 2024

Abseil LTS Branch, Jan 2024, Patch 2 #1650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Parameter pack workaround fails with nvcc #1629

[Bug]: Parameter pack workaround fails with nvcc #1629

h-vetinari commented Feb 26, 2024 •

edited

Loading

higher-performance commented Feb 26, 2024

higher-performance commented Feb 26, 2024

georgthegreat commented Feb 26, 2024

higher-performance commented Feb 26, 2024

h-vetinari commented Feb 27, 2024 •

edited

Loading

h-vetinari commented Feb 28, 2024

higher-performance commented Feb 28, 2024

anakinxc commented Feb 28, 2024

higher-performance commented Feb 28, 2024

higher-performance commented Feb 28, 2024

h-vetinari commented Feb 28, 2024

georgthegreat commented Feb 28, 2024

higher-performance commented Feb 28, 2024

georgthegreat commented Mar 7, 2024

derekmauro commented Mar 11, 2024

derekmauro commented Apr 8, 2024

georgthegreat commented Apr 9, 2024

[Bug]: Parameter pack workaround fails with nvcc #1629

[Bug]: Parameter pack workaround fails with nvcc #1629

Comments

h-vetinari commented Feb 26, 2024 • edited Loading

Describe the issue

Steps to reproduce the problem

What version of Abseil are you using?

What operating system and version are you using?

What compiler and version are you using?

What build system are you using?

Additional context

higher-performance commented Feb 26, 2024

higher-performance commented Feb 26, 2024

georgthegreat commented Feb 26, 2024

higher-performance commented Feb 26, 2024

h-vetinari commented Feb 27, 2024 • edited Loading

h-vetinari commented Feb 28, 2024

higher-performance commented Feb 28, 2024

anakinxc commented Feb 28, 2024

higher-performance commented Feb 28, 2024

higher-performance commented Feb 28, 2024

h-vetinari commented Feb 28, 2024

georgthegreat commented Feb 28, 2024

higher-performance commented Feb 28, 2024

georgthegreat commented Mar 7, 2024

derekmauro commented Mar 11, 2024

derekmauro commented Apr 8, 2024

georgthegreat commented Apr 9, 2024

h-vetinari commented Feb 26, 2024 •

edited

Loading

h-vetinari commented Feb 27, 2024 •

edited

Loading