-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Read1To3 on Big Endian for 2 byte inputs #1388
Conversation
@@ -1083,7 +1083,7 @@ class ABSL_DLL MixingHashState : public HashStateBase<MixingHashState> { | |||
unsigned char significant0 = mem0; | |||
#else | |||
unsigned char significant2 = mem0; | |||
unsigned char significant1 = mem1; | |||
unsigned char significant1 = len == 2 ? mem0 : mem1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we don't need any of endian complications here. We should be fine returning (mem0 << 16) | (mem1 << 8) | mem2
.
This only needs to be deterministic and use all the bytes at least once. It can duplicate bytes as long as it does so consistently.
See also
abseil-cpp/absl/hash/internal/low_level_hash.cc
Lines 97 to 98 in b540445
a = static_cast<uint64_t>((ptr[0] << 16) | (ptr[len >> 1] << 8) | | |
ptr[len - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then the returned value won't match this on BE which will cause a number of test failures:
abseil-cpp/absl/hash/internal/hash.h
Line 1105 in b540445
return static_cast<uint64_t>(m ^ (m >> (sizeof(m) * 8 / 2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests fail and why does the return value have to match? Abseil hash is randomized and has no stability guarantees at all across platforms or even across runs on the same platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This recently added test case fails:
abseil-cpp/absl/hash/hash_test.cc
Line 1259 in 2d4c687
EXPECT_EQ(absl::HashOf(i16), absl::Hash<int16_t>{}(i16)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to realize that there is an IntegralFastPath
above that this has to match, so this does have to be a precise read, it can't just use all the bytes. LowLevelHash has no such need.
I've sent your change to another colleague for review in case there is some other trick that can be used, but otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, thanks for reviewing it.
Current implementation does not work with 2 byte inputs as mem1 and mem2 will both point to the least significant byte.