-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved GPU Copy #1976
base: main
Are you sure you want to change the base?
Improved GPU Copy #1976
Conversation
But I have to run the unit tests.
…D copy to a 1d copy, because it happens to be continiously allocated.
…not where I though I found it.
…date_state()` function. The reason is that in some cases this is valid, for example if an edge connects an AccessNode and a MapEntry, because, in that case the map might not be executed. Since the Memlet does not have access to its source and destination node it can not check that, so the test was moved to a location that can do this check. However, it only does the check for AN to AN connections, which is a bit restrictive, but this is something for later.
Once [PR#1976](spcl/dace#1976) is merged in DaCe the code generator is able to handle more Memlets directly as Cuda `memcpy()` calls. This PR modifies the GPU transformation of GT4Py in such a way that these Memlets are no longer transformed into Maps. However, it can only be merged if the DaCe dependency was bumped to a version that includes PR#1976!
This looks good to me, but would be great if @alexnick83 could quickly sanity check - GPU codegen is not my forté |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions/comments, namely the following:
- I share the concern that, for ndim > 2, the conditions tested are insufficient. I would suggest making an issue to revisit in the future.
- I am seeing an issue in
_emit_copy
, for ndim == 2, around lines 1014-1015. Maybe have a look and confirm that this is intended.
I see that we were already referring to these copies as "continuous," but the correct term is "contiguous." Could you do a find and replace-all?
if src_strides[-1] != 1 or dst_strides[-1] != 1: | ||
# Currently we only support ND copies when they can be represented | ||
# as a 1D copy or as a 2D strided copy | ||
# NOTE: Not sure if this test is enough, it should also be tested that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To what are you agreeing on?
- That you are not sure what this code is doing.
- That you agree that it is meaningless and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.
if src_strides[-1] != 1 or dst_strides[-1] != 1: | ||
# Currently we only support ND copies when they can be represented | ||
# as a 1D copy or as a 2D strided copy | ||
# NOTE: Not sure if this test is enough, it should also be tested that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the test is not enough. it doesn't consider, e.g., Views where you can have multiple discontinuities, and I guess it is like that because the original code predates Views. I also think it is fine to leave it as-is for now, but it should be noted down as a possible source of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I wrote the original code, I want to make sure I review it properly before it’s in
if dims == 2 and (is_fortran_order or is_c_order): | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbennun Since you want to review it and write the code you can probably tell me what this operation, turning a 2d copy into a 1d copy if the memory is contiguous.
I mean I understand what it is doing but I have no idea why it is doing it, especially since memlet_copy_to_absolute_strides()
does the same thing already.
My guess is that this code is older than memlet_copy_to_absolute_strides()
and it was forgotten and I would propose to remove it, do you agree?
Before some 2D copies (especially if they had FORTRAN order) were turned into Maps, see issue#1953.
This PR modifies the code generator in such a way that such copies are now handled.
There is some legacy stuff that should also be looked at.