-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable code for dynamic parallelism #96
base: master
Are you sure you want to change the base?
Conversation
thedodd
commented
Nov 15, 2022
•
edited
Loading
edited
- Closes Dynamic Parallelism | implementation strategy #94
So, interestingly, I'm running into an issue where the generated code can not be loaded by Some background on current testing:
Now, what is quite strange is that if I copy the PTX from the working C++ program over to the Rust program (disabling PTX gen in the Rust program to ensure the C++ PTX is not overwritten), the Rust program aborts with that same error
So, I am wondering:
|
Perhaps we need to be manually constructing a linker, linking the PTX and the |
Yea, that was it. Need to create a linker, add the PTX, add libcudadevrt (right now I have this hard-coded, but I need to create a dynamic search mechanism, as I don't think the cuda linker will do this on its own ... we'll see). From there, I was able to successfully execute the PTX from the sample C++ app of mine. The generated Rust PTX has an invalid memory access taking place, and it looks like it is coming from how the buffer is being populated. This is still a step forward, as the code gen is much easier to fix. I at least know what I'm dealing with, instead of some opaque "JIT compilation failed" error. |
Yea, that did it. Code gen is far from optimal for loading the param buffer. But it works, and I am able to successfully use dynamic parallelism from the Rust generated PTX end to end. Expected output and behavior. Macro codegen for populating the buffer can be optimized further, as the generated PTX is not optimal. I'll focus on that later. |
crates/cust/src/link.rs
Outdated
@@ -114,6 +114,28 @@ impl Linker { | |||
} | |||
} | |||
|
|||
/// Link device runtime lib. | |||
pub fn add_libcudadevrt(&mut self) -> CudaResult<()> { | |||
let mut bytes = std::fs::read("/usr/local/cuda-11/lib64/libcudadevrt.a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this PR is finalized, this should maybe be replaced by searching CUDA_PATH
? Not sure what is the proper way.