Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved GPU Copy #1976

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

philip-paul-mueller
Copy link
Collaborator

Before some 2D copies (especially if they had FORTRAN order) were turned into Maps, see issue#1953.
This PR modifies the code generator in such a way that such copies are now handled.

There is some legacy stuff that should also be looked at.

philip-paul-mueller and others added 16 commits March 14, 2025 10:18
But I have to run the unit tests.
…D copy to a 1d copy, because it happens to be continiously allocated.
…date_state()` function.

The reason is that in some cases this is valid, for example if an edge connects an AccessNode and a MapEntry, because, in that case the map might not be executed.
Since the Memlet does not have access to its source and destination node it can not check that, so the test was moved to a location that can do this check.
However, it only does the check for AN to AN connections, which is a bit restrictive, but this is something for later.
philip-paul-mueller added a commit to philip-paul-mueller/gt4py that referenced this pull request Mar 17, 2025
Once [PR#1976](spcl/dace#1976) is merged in DaCe the code generator is able to handle more Memlets directly as Cuda `memcpy()` calls.
This PR modifies the GPU transformation of GT4Py in such a way that these Memlets are no longer transformed into Maps.

However, it can only be merged if the DaCe dependency was bumped to a version that includes PR#1976!
@phschaad
Copy link
Collaborator

This looks good to me, but would be great if @alexnick83 could quickly sanity check - GPU codegen is not my forté

Copy link
Contributor

@alexnick83 alexnick83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions/comments, namely the following:

  • I share the concern that, for ndim > 2, the conditions tested are insufficient. I would suggest making an issue to revisit in the future.
  • I am seeing an issue in _emit_copy, for ndim == 2, around lines 1014-1015. Maybe have a look and confirm that this is intended.

I see that we were already referring to these copies as "continuous," but the correct term is "contiguous." Could you do a find and replace-all?

if src_strides[-1] != 1 or dst_strides[-1] != 1:
# Currently we only support ND copies when they can be represented
# as a 1D copy or as a 2D strided copy
# NOTE: Not sure if this test is enough, it should also be tested that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what are you agreeing on?

  • That you are not sure what this code is doing.
  • That you agree that it is meaningless and can be removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.

if src_strides[-1] != 1 or dst_strides[-1] != 1:
# Currently we only support ND copies when they can be represented
# as a 1D copy or as a 2D strided copy
# NOTE: Not sure if this test is enough, it should also be tested that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.

Copy link
Collaborator

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I wrote the original code, I want to make sure I review it properly before it’s in

Comment on lines +997 to 998
if dims == 2 and (is_fortran_order or is_c_order):
try:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbennun Since you want to review it and write the code you can probably tell me what this operation, turning a 2d copy into a 1d copy if the memory is contiguous.
I mean I understand what it is doing but I have no idea why it is doing it, especially since memlet_copy_to_absolute_strides() does the same thing already.
My guess is that this code is older than memlet_copy_to_absolute_strides() and it was forgotten and I would propose to remove it, do you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants