Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use type-safe discriminated union for Annotation #5018

Merged
merged 8 commits into from
Nov 19, 2024
Merged

Conversation

asl
Copy link
Contributor

@asl asl commented Nov 13, 2024

This reduces size of Annotation from 576 down to 368 bytes. Some simplifications while there.

@asl asl added core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. breaking-change This change may break assumptions of compiler back ends. labels Nov 13, 2024
@asl asl requested review from vlstill, ChrisDodd and fruffy November 13, 2024 21:51
@asl
Copy link
Contributor Author

asl commented Nov 13, 2024

This saves ~100 Mb of allocations on P4CParserUnroll.switch_20160512 testcase.

Before:
Screenshot 2024-11-14 at 00 53 04
After:
Screenshot 2024-11-14 at 00 53 17

@asl asl linked an issue Nov 13, 2024 that may be closed by this pull request

// FIXME: Support variant of IR node pointers
// const IrClass *cls = type->resolve(clss ? clss->containedIn : nullptr);
// if (cls != nullptr) out << "const ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to create a node pointer in the variant by using explicitly const Node * in the type list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The problem is "inline" feature of fields that changes the types. So, in the sense variants are always "inline". However, if someone would want to have default short-cut logic at some point, this is the placeholder to implement. Likely this would require some syntax extension (e.g. allowing "inline" on variant types)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline feature of fields is there to disable the automatic conversion of them to (const) pointers that normally happens in the ir-generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: consider one would want variant of several node pointers. Then one would need to spell out variant<const IR::Expression *, const IR::Argument *> explicitly. as variant<Expression, Argument> will be translated into variant<IR::Expression, IR::Argument> storing the nodes inline. Is this desired? Or we can have e.g. variant<inline Expression, inline Argument> to be translated into variant<IR::Expression, IR::Argument> and variant<Expression, Argument> into variant<const IR::Expression *, const IR::Argument *> correspondingly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can now explicitly write variant<const IR::Expression *, const IR::Argument *> and that should work.

We could add support for inline to variants (currently, you'll just get a syntax error), but then where you currently have variant<Expression, Argument> would need to change to variant<inline Expression, inline Argument> to have the behavior you currently have.

Copy link
Contributor Author

@asl asl Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlstill optional does not generate std::optional. It just generates a version of constructor that does not initialize this field. And the field is default-initialized.

The transformation is only for IR types, yes, and as far as I can see, only in the "outer" position, e.g. for field types but not for template parameters, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant explicitly using std::optional<T> in the .def file, not the optional specifier :-). Just as another example similar to variant. std::optional can be useful for inline fields that hold non-IR types and are optional so I've noticed the transformation does not occur there. I can't think of a reason to use an IR type in std::optional, so if the "pointer-adding" only works for IR types then this example is not that important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes. Note that for Vector<Node> the transformation is implicit as the storage is always std::vector<const Node*>. So, maybe we'll just keep the things as-is, and return when the real usecase will appear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think that it makes sense to state it like this:

Given a field definition Type fldName if the type of the field (Type) is an IR node type and the inline specifier is not used, then the real generated field is const Type *fldName, otherwise it is Type fldName.

This should be the current behavior and I think it is rather consistent and easy to use. If anyone has a use case to use IR types in nested position and wants to use pointers there, they will need to use the pointer explicitly. I think that is not too much cost. Also, if we tried to make the transformation implicit for IR types appearing anywhere, we would need to make a exception for IR::Vector and IR::IndexedVector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems the IR types are actually transformed in nested positions in some cases :-/. I.e. Util::Enumerator<IDeclaration> *getDeclarations() const override actually means Util::Enumerator<const IR::IDeclaration *> *IR::Function::getDeclarations() const. Anyway, this is beyond this PR, I am fine with types nested inside variant not being lifted now.


class IrVariantField : public IrField {
public:
std::vector<const Type *> *types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you made a VariantType subclass of Type, you wouldn't need this separate vector here. Might not even need IrVariantField at all -- just an IrField whose type is VariantType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with this, but this turned out to be awkward – variant is type, but we need to generate extra boilerplate for variant fields. So instead of duplication of things (type & field) I decided to keep it in the one place. And null type in IrField appeared to be very handy with implementations as for variants we just segfault :)

@asl
Copy link
Contributor Author

asl commented Nov 14, 2024

@vlstill This also fixes debug printing of annotations

Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just a first look at the .def file and directly code under ir/.

Annotation { if (!srcInfo) srcInfo = name.srcInfo; }

/// For annotations parsed from P4-16 source.
Annotation(Util::SourceInfo si, ID n, const Vector<AnnotationToken> &a)
: Node(si), name(n), body(a), needsParsing(true), structured(false) {}
inline Annotation(Util::SourceInfo si, ID n, const Vector<AnnotationToken> &a)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does inline mean on a constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline has special meaning for .def files. Essentially all inline methods are emitted in ir-generated.h, otherwise – ir-generated.cpp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline on a function or method (including ctor) has the same meaning it does in C++ -- the function should be inlined. The main effect as far as the ir-generator is concerned is that the body must be output in the .h file (if it was in the .cpp file, the C++ compiler could not inline it).

The extension of the .def files is that you can use inline on a field as well, and it means the field is NOT a pointer. This comes from the .def/IR history being derived from java, where all object references are pointers under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline on a function or method (including ctor) has the same meaning it does in C++ -- the function should be inlined.

That is not actually true, at least in modern C++ compiler. When it comes to inlining, inline is just a hint, and I am not sure if compilers even take it into account. The main thing inline does in C++ is that it basically says the symbol can occur in multiple translation units, and the linker may assume that the definitions are equivalent and leave just one. So if there is an inline freestanding function in .h, it results in multiple versions in multiple .cpp files, but linker takes care of them. If the function is not inline, there is a linker error. Similarly for variables now (the main difference is that while member functions defined inside class definition are implicitly inline, static member variables are not implicitly inline and that is why one has to have inline static int foo = ?? to be able to provide the value in class definition).

asl added 7 commits November 18, 2024 09:59
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I skipped over the target-specific and P4-14-related code.

Signed-off-by: Anton Korobeynikov <[email protected]>
@asl asl enabled auto-merge November 19, 2024 18:39
@asl asl added this pull request to the merge queue Nov 19, 2024
Merged via the queue into p4lang:main with commit e66525a Nov 19, 2024
19 checks passed
@asl asl deleted the ann-variant branch November 19, 2024 20:47
@fruffy
Copy link
Collaborator

fruffy commented Nov 21, 2024

I missed this PR last week but I am fairly concerned about the impact on maintainability for just supporting a variant in annotations.

Is there a way we can factor out some of this code into helper functions without having to rely on the generator?

The current implemementation makes the generator code even harder to work with than before (and to read!). Now you have to reason about variants for every single generator method you may want to add. I agree with Chris' suggestion that adding a new type might be the more natural approach.

@asl
Copy link
Contributor Author

asl commented Nov 22, 2024

@fruffy

I agree with Chris' suggestion that adding a new type might be the more natural approach.
I tried to follow this way, but it turned out that it was impossible to reflect the required semantics. The issue is that ir-generator design essentially assumes two possible types of fields: nodes and simple types w/o additional semantics. As a result, all non-trivial "compound" types would be implemented manually.

The existing approach for methods is essentially to assume that in order to implement method foo we'd mostly need to decompose it into separate fields and then call foo on the pieces (with possible trivial tuning as e.g. we should not call validate on non-Node's). However, this would not work on slightly more complex things. It just happened that variant is the first example here, there are no optionals, no maps, etc. All vectors are essentially instances of Vector that implement the required heavy lifting. Arrays are supported, yes, but via "type suffix" trick, and still they are special cased in many cases. One way to solve this in the similar way is to wrap variant<Types...> into some external class, that would essentially implement the necessary semantics for all methods and forward into underlying implementation. But does this make sense? Do we want to follow this approach for every non-trivial type, say, if tomorrow we'd want to have std::optional there? This would also affect the IR consumers as instead of standard type they would need to deal with the custom wrapper instead of standard type.

I'm all open for possible improvements here and if you're having suggestions, let's discuss them. I tried to minimize the changes, but still, we really need to treat variant field as set of separate fields, we cannot emit operations based just on the type. Let's take equiv implementation for example:

  • We need to check first that LHS and RHS hold the same variant type
  • Then we'd need to visit all possible fields calling either equiv or operator== depending if the variant type is node or not
  • Therefore we first emit visitor for the variant fields

So we end with the code like this:

bool IR::Annotation::equiv(IR::Node const & a_) const {
        if (static_cast<const Node *>(this) == &a_) return true;
        if (this->typeId() != a_.typeId()) return false;
        auto &a = static_cast<const Annotation &>(a_);
        auto body_equivVisitor = [&](auto &&variant) -> bool {
            using T = std::decay_t<decltype(variant)>;
            if constexpr (std::is_same_v<T, Vector<AnnotationToken>>) {return variant.equiv(std::get<IR::Vector<IR::AnnotationToken>>(a.body)); }
            else if constexpr (std::is_same_v<T,Vector<Expression>>) {return variant.equiv(std::get<IR::Vector<IR::Expression>>(a.body)); }
            else if constexpr (std::is_same_v<T,IndexedVector<NamedExpression>>) {return variant.equiv(std::get<IR::IndexedVector<IR::NamedExpression>>(a.body)); }
            else { BUG("Unexpected variant field"); } };
        return name == a.name
        && body.index() == a.body.index() && std::visit(body_equivVisitor, body)
        && structured == a.structured;

Before equiv already had 3 cases (IR node pointers, possibly optional; IR nodes inline; not nodes), I just added 4th one :)

It is possible not to emit visitors in the generator, but instead implement everything free functions, however:

  • We still need to do special case everywhere as we need to call functions, not methods
  • We need to do some template tricks in order to distinguish between nodes and everything else inside variant types (ir-generator just statically knows this)

@fruffy
Copy link
Collaborator

fruffy commented Nov 22, 2024

Thanks for the detailed answer. At this point we can leave it as is. Unfortunately, I do not have the time to think through a simplified approach. As you pointed out the problem is that the variant is specific to each class and requires bespoke visitors.

One possible alternative solution could have been to special-case annotations and just support variant there, if there is no need in other IR nodes. A lot of IR nodes are implement this way.

Wish we had a better way to generate this C++ code. I have been working with the generator a bunch and making changes is quite difficult.

@asl
Copy link
Contributor Author

asl commented Nov 22, 2024

One possible alternative solution could have been to special-case annotations and just support variant there, if there is no need in other IR nodes. A lot of IR nodes are implement this way.

I had this special case implementation before adding to IR generator. It is a bit cumbersome as we essentially need to implement all the methods manually there. Moreover, should new method added, we need to add this as well...

Wish we had a better way to generate this C++ code. I have been working with the generator a bunch and making changes is quite difficult.

Yeah, unfortunately, it's just a bit of pattern substitution... Maybe some other approaches would've work better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotations incur huge overhead
4 participants