You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That's correct, we currently can only generate CUDA binaries for targets that match the host system (i.e., x86_64). In fact, I recently investigated exactly this for libxc, but didn't manage to get it working. I'll copy my conclusions here, before Slack swallows them:
I took a brief look at libxc for aarch64, and there's a bunch of issues preventing us to move forward:
CUDA_SDK_jll is currently installed as a BuildDependency, so can't be executed by the host arch. We could switch this to a HostBuildDependency, however the host environment is musl, while the CUDA SDK is glibc. Often that works out OK-ish, but Pkg refuses to download the glibc artifact when instantiating the musl env
I tried switching the compiler to Clang, which is easy enough by doing -DCMAKE_CUDA_COMPILER=clang, however that exposes a couple of issues. one, somehow --target= leaks into the command line flags when CMake identifies the compiler, breaking all sort of stuff. Fixing that to say --target=aarch64-..., a header isn't found (__config_site). This seems caused by the fact that LLVM has a bug, looking into the wrong locations, as noted here: https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/ac6831078a4241d85ff891e6067a06a9e6dc1052/src/Runner.jl#L431-L442. apparently that needs to be generalized to all Clang-based platforms, which I verified works by jerry rigging the invocation to include nostdinc++. The header location added in the linked change doesn't seem to exist on aarch64, which may be problematic later down the line, but I didn't get that far because:
using Clang as the CUDA compiler still wants to execute ptxas, which brings us back to the initial issue of CUDA_SDK_jll not being executable. so we would probably need to fix that anyway, i.e., support either overriding the platform to allow using glibc binaries on musl so that HostBuildDependency works, or making sure foreign binaries are executable.
I decided to try the former using qemu-use-static, however, our Qemu_static_jll isn't built for musl either, meaning it can't be installed as a HostBuildDependency either. I started fixing that by attempting a rebuild of Qemu for musl, however, we're using musl 1.2.2 in the musl rootfs which doesn't yet have MAP_FIXED_NOREPLACE as used by qemu.
it also should be said that even with qemu-user-static as a HostBuildDependency in the container, not everything is fixed, because the current version of the sandbox doesn't grant you access to proc/binfmt, meaning you can't register qemu-user-static as an interpreter for foreign binaries, but would need to replace tools like nvcc and ptxas with wrappers that invoke under qemu-user-static. But I didn't get to that part because of not managing to upgrade qemu
With BinaryBuilder2.jl, the qemu/binfmt solution will be integrated, and we should be able to automatically execute foreign binaries and depend on the target-specific CUDA SDK. Given the amount of work it would require to get it working right now, I decided to wait for BinaryBuilder2.jl.
CUDA_SDK_jll is currently installed as a BuildDependency, so can't be executed by the host arch. We could switch this to a HostBuildDependency, however the host environment is musl, while the CUDA SDK is glibc. Often that works out OK-ish, but Pkg refuses to download the glibc artifact when instantiating the musl env
We already are running the glibc-based programs for the current compilation flow, so I think the main problem is the tag matching and Pkg. Perhaps we could make a "fake" musl version of the CUDA_SDK_jll that just includes all the glibc files again. Then the HostBuildDependency should have a package matching its specification.
Yes, but even then I'm not sure that the x86_64 version of the CUDA SDK will know how to target ARM, because they're pretty clear it's not a cross compiler. I think it's best to wait until we can execute the ARM version under qemu (and the same for Windows using Wine).
That's correct, we currently can only generate CUDA binaries for targets that match the host system (i.e., x86_64). In fact, I recently investigated exactly this for libxc, but didn't manage to get it working. I'll copy my conclusions here, before Slack swallows them:
With BinaryBuilder2.jl, the qemu/binfmt solution will be integrated, and we should be able to automatically execute foreign binaries and depend on the target-specific CUDA SDK. Given the amount of work it would require to get it working right now, I decided to wait for BinaryBuilder2.jl.
Originally posted by @maleadt in #10217 (comment)
The text was updated successfully, but these errors were encountered: