-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kokkos] Work around performance issue by using only 'unsigned long' in AtomicPairCounter #333
base: master
Are you sure you want to change the base?
Conversation
Mhm, this doesn't seem to be working as intended, at least in my test on a GTX 1080 Ti: master
master + #309
master + #333
Though I'm still at my first coffee, so I cannot guarantee there weren't any mistakes... |
I ran similar tests (1 thread with 1k events, and 16 threads with 20k events) on RTX 2080 SUPERmaster
master + #309
master + #333
GTX 1050 Timaster
master + #309
master + #333
Earlier I had tested only on RTX 2080 and was happy with the #333 giving similar performance as #309. But my GTX 1050 Ti test reproduces the GTX 1080 result in #333 (comment), so this appears to be a real effect. |
Here is a plot on V100 (~2 minutes running for each point, on the same CoriGPU node) So on Volta both fixes work, but disabling the "new atomics" yields a bit higher throughput for >= 3 concurrent events. Perhaps it would be best to go with #309 for now, and rebase this PR on top of that and leave it open for time being. |
This reverts commit 1097fee.
…in AtomicPairCounter See kokkos/kokkos#4780
b9755a1
to
0882c02
Compare
Rebased following the merge of #309. |
Better workaround than #309, see kokkos/kokkos#4780 for more details.
In addition, this PR adds support for using Kokkos' profiling tools via the
KOKKOS_PROFILE_LIBRARY
environment variable. (functionality that we were missing because of heavily customized initialization of Kokkos).