Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-5337 Go set fields write improvement #2307

Closed
wants to merge 1 commit into from

Conversation

BytedanceRPC
Copy link

There is a duplicate elements check for set in writeFields* function, and it compares elements using reflect.DeepEqual which is expensive.

It's much faster that generates a "DeepEqual" function for set elements and apply it in duplicate elements check, especially for nested struct element.

  • Did you create an Apache Jira ticket? (not required for trivial changes)
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all the changes yet, but wanted to give you some early feedback.

@@ -152,6 +152,10 @@ class t_go_generator : public t_generator {
const string& tstruct_name,
bool is_result = false,
bool uses_countsetfields = false);
void generate_go_struct_deepequal(std::ostream& out,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename all *deepequal* into *equals*? I don't think it's accurate to call them "deepequal".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, equals sounds better.


string field_name;
string publicize_field_name;
int32_t field_id = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused and it will cause certain compilers to fail:

src/thrift/generate/t_go_generator.cc:1879:11: error: unused variable ‘field_id’ [-Werror=unused-variable]
 1879 |   int32_t field_id = -1;
      |           ^~~~~~~~
cc1plus: all warnings being treated as errors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed latest commit

if (is_pointer_field(*f_iter)
&& (ttype->is_base_type() || ttype->is_enum() || ttype->is_container())) {
tgt = "(*" + tgt + ")";
src = "(*" + src + ")";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. wrong indentation
  2. This generates code like this for optional fields:
if (*p.FieldN) != (*other.FieldN) { return false }

this will cause panic when the field is not set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I add a pointer check to avoid panic last commit

@dcelasun
Copy link
Member

Looks reasonable at a first glance, will do a proper review later.

I wonder if there are any cases covered by reflect.DeepEqual that aren't covered by this method. If there are, that would be a BC break and we'd have to mention it in the changelog.

@BytedanceRPC
Copy link
Author

BytedanceRPC commented Jan 20, 2021

@dcelasun The process of Equals is pretty like Write (serialize). Since write can cover all cases, I believe so can Equals.

@BytedanceRPC BytedanceRPC force-pushed the deepequal_for_set branch 4 times, most recently from eaa91ae to 83c8d4a Compare January 20, 2021 13:28
Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going, but left some small comments.

Also I believe with this change there's no other way the reflect import is used, so we can totally remove it from compiled code now.

@fishy
Copy link
Member

fishy commented Jan 21, 2021

I believe currently there's some bug in your implementation. The result is different from reflect.DeepEqual, Here's the code I used:

test.thrift:

namespace go test

enum Enum {
  One = 1,
  Two = 2,
}

struct Struct {
  1: optional string Str,
  2: optional i32 I32,
  3: optional set<Enum> Enums,
}

bench_test.go:

package thriftsetbench

import (
        "reflect"
        "testing"

        "github.com/apache/thrift/lib/go/thrift"
        "github.com/fishy/thriftsetbench/gen-go/test"
)

func BenchmarkEqual(b *testing.B) {
        equalGetter := func() (*test.Struct, *test.Struct) {
                s1 := &test.Struct{
                        Str: thrift.StringPtr("foobar"),
                        Enums: []test.Enum{
                                test.Enum_One,
                                test.Enum_Two,
                        },
                }
                s2 := &test.Struct{
                        Str: thrift.StringPtr("foobar"),
                        Enums: []test.Enum{
                                test.Enum_One,
                                test.Enum_Two,
                        },
                }
                return s1, s2
        }
        nonEqualGetter := func() (*test.Struct, *test.Struct) {
                s1 := &test.Struct{
                        Str: thrift.StringPtr("foobar"),
                        Enums: []test.Enum{
                                test.Enum_One,
                        },
                }
                s2 := &test.Struct{
                        Str: thrift.StringPtr("foobar"),
                        Enums: []test.Enum{
                                test.Enum_Two,
                        },
                }
                return s1, s2
        }

        for label, getter := range map[string]func() (*test.Struct, *test.Struct){
                "equal":    equalGetter,
                "nonequal": nonEqualGetter,
        } {
                b.Run(label, func(b *testing.B) {
                        s1, s2 := getter()
                        reflectResult := reflect.DeepEqual(s1, s2)
                        equalsResult := s1.Equals(s2)
                        if reflectResult != equalsResult {
                                b.Errorf("Reflect result: %v, Equals result: %v", reflectResult, equalsResult)
                        }

                        b.Run("reflect", func(b *testing.B) {
                                b.RunParallel(func(pb *testing.PB) {
                                        for pb.Next() {
                                                reflect.DeepEqual(s1, s2)
                                        }
                                })
                        })

                        b.Run("equals", func(b *testing.B) {
                                b.RunParallel(func(pb *testing.PB) {
                                        for pb.Next() {
                                                s1.Equals(s2)
                                        }
                                })
                        })
                })
        }
}

result:

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/fishy/thriftsetbench
BenchmarkEqual/equal/reflect-12         10597189               113 ns/op              64 B/op          6 allocs/op
BenchmarkEqual/equal/equals-12          1000000000               0.680 ns/op           0 B/op          0 allocs/op
BenchmarkEqual/nonequal/reflect-12              12201334                99.9 ns/op            48 B/op          4 allocs/op
BenchmarkEqual/nonequal/equals-12               1000000000               0.712 ns/op           0 B/op          0 allocs/op
--- FAIL: BenchmarkEqual/nonequal
    bench_test.go:54: Reflect result: false, Equals result: true
--- FAIL: BenchmarkEqual
FAIL
exit status 1
FAIL    github.com/fishy/thriftsetbench 4.183s

We likely would need to have some thorough test for the Equals implementation with this PR.

@BytedanceRPC
Copy link
Author

@fishy It's a bug in generate_go_equals_guard, I fixed it.
Yes, some thorough test is pretty necessary. I will also do more test.

@fishy
Copy link
Member

fishy commented Jan 26, 2021

@BytedanceRPC there's another go compiler change merged, so can you rebase your PR on latest master branch?

Also please add some tests for the Equals implementation. This change looks promising, but we need the confidence that it's implemented correctly.

@BytedanceRPC
Copy link
Author

@fishy rebased PR on latest master branch, and added Equals test under lib/go/test

Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm, but there are a few things can be improved in the test code.

Also you can remove

system_packages.push_back("reflect");
and
"var _ = reflect.DeepEqual\n"
. Since this change eliminated the usage of reflect.DeepEqual, the reflect package is not used any more in compiled go code.

bool is_args_or_result) {
if (is_args_or_result) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should do this check on the call site instead, and currently there's only one caller of it (Line 1497), so there should be:

if !is_result && !is_args {
  generate_go_struct_equals(out, tstruct, tstruct_name);
}

Semantically this function is called "generate_*", but when one of its arg is passed in true it doesn't generate anything, which feels wrong to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable, thanks for suggestion.

* Compares any type
*/
void t_go_generator::generate_go_equals(std::ostream& out,
t_type* ori_type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this indentation looks weird. usually you either align with (, or use fixed 2/4/6 spaces. this is neither.

I don't believe clang-format is used to generate this indentation either, at least it's not enforced by travis. when I run clang-format -i --style=file compiler/cpp/src/thrift/generate/t_go_generator.cc it made a ton of changes in this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure is the std:: making indentation looks weird. Indentations looks proper after I remove std:: and apply clang-format.
To avoid unnecessary change, I only apply the output of clang-format on my changes in compiler/cpp/src/thrift/generate/t_go_generator.cc.

func TestEquals(t *testing.T) {
basicTgt, basicSrc := genBasicFoo(), genBasicFoo()
if !basicTgt.Equals(basicSrc) {
t.Fatal("BasicEqualsFoo.Equals() test failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test failures should be t.Error instead of t.Fatal. Fatal stops the test from running, so if someone made a mistake in the future and failed multiple cases inside this test, they will only see the first failure in the output.

Fatal should only be used when the following tests are depending on this one (e.g. used for nil checks).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggestion

@BytedanceRPC BytedanceRPC force-pushed the deepequal_for_set branch 2 times, most recently from b1c928f to a194c06 Compare February 2, 2021 09:43
t_type* ttype,
string tgt,
string src) {
(void)ttype;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this line do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to keep generate_go_equals* with the same arguments, but ttype not really used here.

Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good. I'll merge it once the tests pass.

@BytedanceRPC BytedanceRPC force-pushed the deepequal_for_set branch 2 times, most recently from 10008fb to f2d353d Compare February 4, 2021 03:00
@fishy
Copy link
Member

fishy commented Feb 4, 2021

Can you squash your PR to a single commit, with proper commit message (what you used in the PR title and description)?

There is a duplicate elements check for set in writeFields* function, and it compares elements using reflect.DeepEqual which is expensive.

It's much faster that generates a *Equals* function for set elements and call it in duplicate elements check, especially for nested struct element.
Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis should finish in 20min or so and I'll merge it after that.

@fishy fishy closed this in 4aaef75 Feb 4, 2021
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 18, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 19, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit to fishy/thrift that referenced this pull request Feb 19, 2021
Client: go

Also add missing copyright header for files added in
apache#2307.
fishy added a commit that referenced this pull request Feb 22, 2021
Client: go

Also add missing copyright header for files added in
#2307.
fishy added a commit that referenced this pull request Feb 22, 2021
Client: go

Also add missing copyright header for files added in
#2307.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants