Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device regions to non-banded routines #140

Merged
merged 35 commits into from
Dec 4, 2023

Conversation

neil-lindquist
Copy link
Contributor

@neil-lindquist neil-lindquist commented Oct 31, 2023

This adds arbitrary regions for the all of the batched device functions, except the banded norms. Handling the bandwidth for non-uniform tile sizes isn't totally obvious and is probably broken in the other banded routines too. So, I'm leaving that for the future.

The core logic for building the regions is refactored into a couple functions in src/internal/internal_batch.hh. That's probably the place to start, in order to understand the rest of the changes. device_regions_build is the key function there, which builds the regions as a vector of device_regions_params objects. This function and struct are both templated on several parameters in order to avoid too large of an overhead relative to manually inlining this logic everywhere. This function has several moving parts to address all of the ways we organize our batch kernels. Please let me know if anything needs more comments/explanations.

On guyot, the performance seems to be the same or maybe a touch better. The norms seem to be a little slower. But, it's a little hard to tell with the system's noise.

Edit: I noticed that I missed trsmA and the he2hb variants of gemm, trmm, and her2k. I've addressed these routines.

Copy link
Collaborator

@mgates3 mgates3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet done reviewing files (there are lots), but giving some feedback now.

src/he2hb.cc Show resolved Hide resolved
src/internal/internal_util.hh Show resolved Hide resolved
src/internal/internal_batch.hh Outdated Show resolved Hide resolved
src/internal/internal_batch.hh Show resolved Hide resolved
src/internal/internal_batch.hh Outdated Show resolved Hide resolved
src/internal/internal_batch.hh Show resolved Hide resolved
src/internal/internal_batch.hh Outdated Show resolved Hide resolved
src/internal/internal_batch.hh Outdated Show resolved Hide resolved
src/internal/internal_geset.cc Outdated Show resolved Hide resolved
src/internal/internal_genorm.cc Show resolved Hide resolved
src/internal/internal_syr2k.cc Show resolved Hide resolved
src/internal/internal_genorm.cc Show resolved Hide resolved
src/internal/internal_gescale_row_col.cc Show resolved Hide resolved
src/internal/internal_trsmA.cc Outdated Show resolved Hide resolved
src/internal/internal_he2hb_gemm.cc Outdated Show resolved Hide resolved
src/internal/internal_he2hb_trmm.cc Show resolved Hide resolved
src/internal/internal_he2hb_trmm.cc Outdated Show resolved Hide resolved
@mgates3
Copy link
Collaborator

mgates3 commented Dec 3, 2023

As discussed, either rebase or merge to fix the conflicts. If merge, I like to fast-forward master to avoid having 2 consecutive merges, with the 2nd doing nothing. Unfortunately, github doesn't have a way to do that through their web interface, so I do it locally and push. Just let me know when it's ready and I can do that.

@neil-lindquist neil-lindquist merged commit d8a3527 into icl-utk-edu:master Dec 4, 2023
@neil-lindquist
Copy link
Contributor Author

I did the merge + push-to-master approach, which appears to have worked correctly.

@neil-lindquist neil-lindquist deleted the device-regions branch December 4, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants