Rewrite the `getValidWorkDivForKernel` 1D and 2D tests #2331

fwyzard · 2024-07-30T17:55:19Z

Fix the getValidWorkDivForKernel tests for the CUDA backend and merge it with the other parallel backends.

Fix the getValidWorkDivForKernel tests for the SYCL CPU backend and move it to the parallel backends.

Rewrite the getValidWorkDivForKernel tests: test more valid and invalid work divisions, merge the serial and parallel backends, and use traits to run tests specific to the serial backends.

fwyzard · 2024-07-30T17:56:38Z

@mehmetyusufoglu can you double check the tests ?

mehmetyusufoglu · 2024-07-30T18:50:08Z

@mehmetyusufoglu can you double check the tests ?

Ok, doing. Thanks.

mehmetyusufoglu · 2024-07-30T20:52:00Z

@mehmetyusufoglu can you double check the tests ?

Locally everything worked, i am trying to understand which accelerator is generating the error. I will

@mehmetyusufoglu can you double check the tests ?

Locally everything worked for me as below, I pushed a copy branch to test on CI with logs, i will test on HAL as well.

/projects/alpaka-dir/alpaka/build/test/unit/workDiv$ ./workDivTest 
Randomness seeded to: 3752370894
- Acc: AccCpuSerial<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 536870911), blockThreadExtent: (2, 1), threadElemExtent: (1, 1)}
- Acc: AccCpuThreads<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 4329604), blockThreadExtent: (2, 124), threadElemExtent: (1, 1)}
- Acc: AccCpuTbbBlocks<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 536870911), blockThreadExtent: (2, 1), threadElemExtent: (1, 1)}
- Acc: AccCpuOmp2Blocks<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 536870911), blockThreadExtent: (2, 1), threadElemExtent: (1, 1)}
- Acc: AccCpuOmp2Threads<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 35791394), blockThreadExtent: (2, 15), threadElemExtent: (1, 1)}
- Acc: AccGpuCudaRt<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 729444), blockThreadExtent: (2, 736), threadElemExtent: (1, 1)}
===============================================================================
All tests passed (425 assertions in 85 test cases)

mehmetyusufoglu · 2024-07-30T20:55:09Z

test/unit/workDiv/src/WorkDivForKernelTest.cpp

+    // Check that using too many threads per block is not valid.
+    auto const invalidWorkDiv
+        = WorkDiv{Vec{8, threadsPerGridTestValue / threadsPerBlock / 8}, Vec{2, threadsPerBlock}, Vec{1, 1}};
+    CHECK(not alpaka::isValidWorkDivKernel<Acc>(dev, kernelBundle, invalidWorkDiv));


My initial observation is that; the check assumes 2 * threadsPerBlock will be more than allowed block size. But threadsPerBlock is not the max block size, it is a value found by getValidWorkDiv, why 2 * threadsPerBlock should be more than allowed block-size limit for all accelerators?. Anyway I am trying to understand which accelerator is that. Thanks.

Seems to be happening at AccCpuOmp2Threads. Below is valid because AccCpuOmp2Threads allows 64 threads. Test expect an invalid workdiv but it is valid. Which is in my opinion normal result. We should not test for AccCpuOmp2Threads. [TEST at HAL]

- Acc: AccCpuOmp2Threads<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 22369621), blockThreadExtent: (2, 24), threadElemExtent: (1, 1)} threadsPerBlock:24, threadsPerBlockLimit:64

https://github.com/alpaka-group/alpaka/actions/runs/10170404404/job/28129487668?pr=2333

My idea was that

threadsPerGridTestValue is the maximum number of threads per grid (props.m_blockThreadCountMax * props.m_gridBlockCountMax)

Vec{8, threadsPerGridTestValue / threadsPerBlock / 8} × Vec{2, threadsPerBlock} should give a total of 2 * threadsPerGridTestValue / threadsPerBlock * threadsPerBlock / 8 * 8 which I expected to be bigger than threadsPerGridTestValue

That is, I don't expect to exceed the block size, but the total grid size.

Ah, I see !

Let me update the PR to test what we actually check in Alpaka first.

Before adding the new check we should check if it actually fails on the various backends, like CUDA.

I added it but accDevProps of Omp2 threads is too large.

Acc: AccCpuOmp2Threads<2,unsigned int>, invalidWorkDiv: {gridBlockExtent: (8, 22369621), blockThreadExtent: (2, 24), threadElemExtent: (1, 1)}
threadsPerBlock:24, threadsPerBlockLimit:64
calculation: if(178956968 *48 > 4294967295 * 64) return false;
check:4294967168 > 4294967232 returns true because accDevProps.m_gridBlockCountMax is too large for Ompthreads;

OK, also on the CpuSerial backend, it seems that the maximum grid size is not validated properly.

props.m_gridBlockCountMax is 2147483647 (the maximum value for a signed 32-bit integer), but launching a kernel with 4294967294 (twice that) blocks still works:

[+] QueueGenericThreadsBlocking [-] QueueGenericThreadsBlocking createTaskKernel workDiv: {gridBlockExtent: (4294967294), blockThreadExtent: (1), threadElemExtent: (1)}, kernelFnObj: TestKernelWithManyRegisters [+] operator() operator() blockSharedMemDynSizeBytes: 0 B The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 The sum is 1492.00, the argument is 200 ...

For the time being I'll update the test to check only the block size, not the grid size.

We can follow up on the grid size check on #2334 .

Great info, thanks.

fwyzard added 2 commits July 30, 2024 14:39

Fix getValidWorkDivForKernel tests for the CUDA backend

1c44a59

Fix getValidWorkDivForKernel tests for the SYCL CPU backend

a7fd020

fwyzard requested a review from psychocoderHPC July 30, 2024 17:55

mehmetyusufoglu reviewed Jul 30, 2024

View reviewed changes

fwyzard mentioned this pull request Jul 31, 2024

CPU backends do not properly check the grid size (?) #2334

Open

Rewrite the getValidWorkDivForKernel tests

6206326

fwyzard force-pushed the fix_WorkDivForKernelTest branch from f52bd9b to 6206326 Compare July 31, 2024 07:59

fwyzard requested a review from mehmetyusufoglu July 31, 2024 07:59

fwyzard added the Type:Testing label Jul 31, 2024

psychocoderHPC added the Type:Bug label Aug 1, 2024

psychocoderHPC added this to the 1.2.0 milestone Aug 1, 2024

psychocoderHPC approved these changes Aug 1, 2024

View reviewed changes

psychocoderHPC merged commit ba169cd into alpaka-group:develop Aug 1, 2024
22 checks passed

fwyzard deleted the fix_WorkDivForKernelTest branch August 1, 2024 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite the `getValidWorkDivForKernel` 1D and 2D tests #2331

Rewrite the `getValidWorkDivForKernel` 1D and 2D tests #2331

fwyzard commented Jul 30, 2024

fwyzard commented Jul 30, 2024

mehmetyusufoglu commented Jul 30, 2024

mehmetyusufoglu commented Jul 30, 2024 •

edited

Loading

mehmetyusufoglu Jul 30, 2024

mehmetyusufoglu Jul 30, 2024 •

edited

Loading

mehmetyusufoglu Jul 30, 2024

fwyzard Jul 30, 2024

fwyzard Jul 30, 2024

fwyzard Jul 31, 2024

mehmetyusufoglu Jul 31, 2024

fwyzard Jul 31, 2024

fwyzard Jul 31, 2024

mehmetyusufoglu Jul 31, 2024

Rewrite the getValidWorkDivForKernel 1D and 2D tests #2331

Rewrite the getValidWorkDivForKernel 1D and 2D tests #2331

Conversation

fwyzard commented Jul 30, 2024

fwyzard commented Jul 30, 2024

mehmetyusufoglu commented Jul 30, 2024

mehmetyusufoglu commented Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

mehmetyusufoglu Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rewrite the `getValidWorkDivForKernel` 1D and 2D tests #2331

Rewrite the `getValidWorkDivForKernel` 1D and 2D tests #2331

mehmetyusufoglu commented Jul 30, 2024 •

edited

Loading

mehmetyusufoglu Jul 30, 2024 •

edited

Loading