Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into using nvrtc for just-in-time cuda compilation #169

Open
ddemidov opened this issue Mar 23, 2015 · 2 comments
Open

Look into using nvrtc for just-in-time cuda compilation #169

ddemidov opened this issue Mar 23, 2015 · 2 comments
Labels

Comments

@ddemidov
Copy link
Owner

NVRTC is a runtime compilation library for CUDA C++. It accepts CUDA C++ source code in character string form and creates handles that can be used to obtain the PTX:
http://docs.nvidia.com/cuda/nvrtc/index.html

@ddemidov ddemidov added the todo label Mar 23, 2015
ddemidov added a commit that referenced this issue Mar 24, 2015
see #169

This mostly works, but fails some tests.

* It looks like the compiler backend (or compiler settings) that is used in
  nvrtc is a bit different from the one in nvcc. Some kernels that nvcc
  compiles just fine are not accepted by nvrtc.

  The notable example is the use of anonymous structs in shared union (used in
  sort algorithms):

      union Shared
      {
        struct
        {
          int keys0[3072];
        };
        struct
        {
          float vals0[2816];
        };
      };
      __shared__ union Shared shared;

  This results in the following compilation error when compiled with nvrtc:
      warning: declaration does not declare anything
      error: union "Shared" has no member "keys0"

* The compilation of FFT tests just seems to hang.

* And the caching does not work either with nvrtc, meaning that subsequent
  compiles take as much time as the first one.
@rosenrodt
Copy link

Curious whether these compilation errors still exist with CUDA 9 or even CUDA 10?

@ddemidov
Copy link
Owner Author

ddemidov commented Dec 2, 2018

Yes, I am still getting the errors described in 35a9f30 with CUDA 9.1:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

To test this: checkout branch nvrtc, do cmake . -DVEXCL_BACKEND=CUDA -Bbuild && cmake --build build && ./build/examples/benchmark. Here is what I get:

----------------------------------------------------------
Profiling "double" performance
----------------------------------------------------------
1. Tesla V100-SXM2-16GB

Vector SAXPY (double)
  OpenCL
    GFLOPS:    62.8019
    Bandwidth: 753.623
  C++
    GFLOPS:    2.0774
    Bandwidth: 24.9288
  res = 1.1856e-26

Vector arithmetic (double)
  OpenCL
    GFLOPS:    57.3009
    Bandwidth: 764.012
  C++
    GFLOPS:    1.53007
    Bandwidth: 20.401
  res = 5.09807e-19

Reduction (double)
  OpenCL
    GFLOPS:    103.171
    Bandwidth: 825.371
  C++
    GFLOPS:    1.04469
    Bandwidth: 8.35756
  res = 6.74923e-14

Stencil convolution (double)
  OpenCL
    GFLOPS:    1181.18
    Bandwidth: 9449.44
  C++
    GFLOPS:    1.34248
    Bandwidth: 10.7398
  res = 3.33067e-16

SpMV (double)
  OpenCL
    GFLOPS:    104.44
    Bandwidth: 1397.87
  C++
    GFLOPS:    1.41486
    Bandwidth: 18.9371
  res = 9.74447e-15

SpMV (CCSR) (double)
  OpenCL
    GFLOPS:    315.465
    Bandwidth: 3635.06
  C++
    GFLOPS:    1.57218
    Bandwidth: 18.116
  res = 9.74447e-15

Random numbers per second (double)
    OpenCL (threefry): 1.05625e+11
    OpenCL (philox):   1.8714e+11
    C++    (mt19937):  1.05787e+08

./benchmark(_ZN3vex6detail15print_backtraceEv+0x1a) [0x4462ea]
./benchmark(_ZN3vex7backend4cuda13build_sourcesERKNS1_13command_queueERKSsS6_+0x21b) [0x44c70b]
./benchmark() [0x43ffa3]
./benchmark(_ZN3vex6detail17block_sort_kernelILi256ELi7EN5boost3mpl6vectorImN4mpl_2naES6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS4_IS6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_S6_EENS_4lessImE19vex_function_deviceEEERNS_7backend4cuda6kernelERKNSD_13command_queueE+0xb9d) [0x4bbecd]
./benchmark(_ZN3vex6detail4sortIN5boost6fusion14transform_viewIKNS3_6vectorIIRNS_6vectorImEEEEENS0_21extract_device_vectorENS3_5void_EEENS_4lessImE19vex_function_deviceEEEvRKNS_7backend4cuda13command_queueERT_T0_+0x9f) [0x4be38f]
./benchmark(_ZN3vex6detail9sort_sinkIN5boost6fusion6vectorIIRNS_6vectorImEEEEENS_4lessImEEEEvOT_T0_+0x80) [0x4becd0]
./benchmark(_Z14benchmark_sortIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x189) [0x4bf049]
./benchmark(_Z9run_testsIdEvRKN3vex7ContextERNS0_8profilerINSt6chrono3_V212system_clockEEE+0x4be) [0x4c8efe]
./benchmark(main+0x745) [0x43d525]
/usr/lib64/libc.so.6(__libc_start_main+0xf5) [0x2af97c343c05]
./benchmark() [0x43db52]

default_program(646): warning: declaration does not declare anything

default_program(657): error: union "Shared" has no member "keys0"

default_program(658): error: union "Shared" has no member "keys0"

default_program(716): error: union "Shared" has no member "keys0"

default_program(717): error: union "Shared" has no member "keys0"

4 errors detected in the compilation of "default_program".

*/vexcl/vexcl/backend/cuda/compiler_nvrtc.hpp:104
	NVRTC Error (6 - NVRTC_ERROR_COMPILATION)
^C

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants