Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refresh CUDA 12 migrator #34

Merged
merged 3 commits into from
Jan 15, 2024

Conversation

conda-forge-admin
Copy link
Contributor

@conda-forge-admin conda-forge-admin commented Jan 12, 2024

Recently the CUDA 11.8 migrator was closed out ( conda-forge/conda-forge-pinning-feedstock#5340 ). As part of that it was determined a small updated was needed to the CUDA 12 migrator ( conda-forge/conda-forge-pinning-feedstock#5340 (comment) ). While this largely works for most feedstocks without changes, those that have use_local: true will have issues with re-rendering until the CUDA 12 migrator they have is updated. This PR pulls over those updates to fix re-rendering on the feedstock. Please let us know if you have any other questions.


Hi! This is the friendly automated conda-forge-webservice.

I've started rerendering the recipe as instructed in #33.

If I find any needed changes to the recipe, I'll push them to this PR shortly. Thank you for waiting!

Here's a checklist to do before merging.

  • Bump the build number if needed.

Fixes #33

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the latest rerendering GitHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or try re-rendering locally.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/nnpops-feedstock/actions/runs/7498754771.

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@jakirkham jakirkham changed the title MNT: rerender Refresh CUDA 12 migrator Jan 12, 2024
@jakirkham
Copy link
Member

This is passing! 🎉

@conda-forge/nnpops this is ready for review 🙂

@h-vetinari
Copy link
Member

What's the purpose of this PR? There are no additional CI jobs because (checking Azure...) we already have CUDA 12 builds here.

Also, turning a rerender PR into a migration PR is confusing at best - the migration machinery needs to mark things as completed as it marches through the DAG, but not even that is necessary here because it was started by hand?

@jakirkham
Copy link
Member

There's a description in the OP

@h-vetinari
Copy link
Member

There's a description in the OP

Which was edited in, so not exactly visible... But anyway, there's no use_local: true here, so this PR is pointless.

@h-vetinari
Copy link
Member

But anyway, there's no use_local: true here, so this PR is pointless.

Sorry, I was just looking at the diff, I see it's there but not being removed

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should be a single-line change to remove use_local: true, not manually update the migrator file and still leave it out of sync with eventual updates in the global pinning

@jakirkham
Copy link
Member

Yep it's here

# Vendor CUDA 12 migrator to use GCC 11.
# This is needed to workaround a pybind + GCC 12 + nvcc 12 bug
# xref: https://github.com/pybind/pybind11/issues/4606
use_local: true

Also would appreciate moderating the tone here. It is coming off a bit harsh

@h-vetinari
Copy link
Member

Yep it's here

See follow-up comments.

Also would appreciate moderating the tone here. It is coming off a bit harsh

Sorry for that. My point is: use_local: true should only ever be used as a stopgap and then removed asap. Working around its presence rather than removing it is just wrong.

@jakirkham
Copy link
Member

Apology accepted

Above the use_local: true line there is a comment with more context on why it was added. Please let me know if you have more questions after reading that

@h-vetinari
Copy link
Member

OK, sorry for missing that (I was on a phone, which didn't help...).

IMO we're just layering bandaids on top of each other here, but I just tried

commit 48e30057a1c52821ba3e313ea7ceb83d1eeb7bfb
Author: H. Vetinari <[email protected]>
Date:   Sat Jan 13 07:57:39 2024 +1100

    replace use_local in CUDA 12 migrator with local compiler override

diff --git a/.ci_support/migrations/cuda120.yaml b/.ci_support/migrations/cuda120.yaml
index aa2b2a2..ab54105 100644
--- a/.ci_support/migrations/cuda120.yaml
+++ b/.ci_support/migrations/cuda120.yaml
@@ -2,10 +2,6 @@ migrator_ts: 1682985063
 __migrator:
   kind:
     version
-  # Vendor CUDA 12 migrator to use GCC 11.
-  # This is needed to workaround a pybind + GCC 12 + nvcc 12 bug
-  # xref: https://github.com/pybind/pybind11/issues/4606
-  use_local: true
   migration_number:
     3
   build_number:
diff --git a/recipe/conda_build_config.yaml b/recipe/conda_build_config.yaml
new file mode 100644
index 0000000..ff0c248
--- /dev/null
+++ b/recipe/conda_build_config.yaml
@@ -0,0 +1,6 @@
+# Work around a bug in CUDA 12.0, see
+# https://github.com/pybind/pybind11/issues/4606
+c_compiler_version:     # [linux]
+  - 11                  # [linux]
+cxx_compiler_version:   # [linux]
+  - 11                  # [linux]

and to my surprise, this does not work. Local conda_build_config.yaml should always "win", but it doesn't in this case. IMO this is broken, but there doesn't seem to be another good work around for now.

@h-vetinari
Copy link
Member

This is passing! 🎉

TBH, this is what threw me, because there are no relevant changes in this PR (no builds added; all were passing before).

It's proactively ensuring things keep working after the closing of the CUDA 11.8 migration (which is laudable!), but celebrating that this clerical PR passes made me expect an entirely different caliber of changes.

@jakirkham
Copy link
Member

jakirkham commented Jan 12, 2024

No worries

Recall trying something similar when the bug surfaced, but wasn't able to get it to work either. Maybe we are missing something about how to get that to work. Or maybe there is a build tooling bug


Yeah the pybind11 + CUDA 12.0 + GCC 12 bug is an unfortunate cross-section of things

AIUI there is a fix in pybind11 ( pybind/pybind11#4893 ), but that is unreleased. Not sure what their release plan is. Even with a release, there are some libraries that vendor pybind11 (like PyTorch). So we would still need libraries to update their vendored copy (though that at least could happen without a pybind11 release)

There was also a fix on the NVCC side as part of CUDA 12.2 ( pybind/pybind11#4606 (comment) ), but currently the migrator is pinned to CUDA 12.0. Until very recently CUDA 12.0 was the only version in conda-forge, but we do now have 12.2 ( conda-forge/cuda-feedstock#13 ). We could revisit how we are building for CUDA 12.x in conda-forge (and we briefly discussed this at the recent core meeting), but it probably deserves an issue and some more discussion (just to make sure we are considering everything we should be). Am planning to do that next week

Given the options at the time, GCC 11 was the only realistic and least disruptive solution. So that's what we went with


This is passing! 🎉

TBH, this is what threw me, because there are no relevant changes in this PR (no builds added; all were passing before).

It's proactively ensuring things keep working after the closing of the CUDA 11.8 migration (which is laudable!), but celebrating that this clerical PR passes made me expect an entirely different caliber of changes.

Sorry this comment didn't show up until I posted

Noted thanks for the feedback. Will keep this in mind for the future

@h-vetinari
Copy link
Member

Given the options at the time, GCC 11 was the only realistic and least disruptive solution. So that's what we went with

Yeah, I get that that's the only good choice right now, but even then, there should be a solution that doesn't require use_local: true (with CBC). Very unfortunate that this doesn't work as it should.

In any case, sorry for the friction here and thanks for taking care of this. Looking forward to unblocking CUDA 12.x builds. :)

@jakirkham
Copy link
Member

Gotcha yeah agreed

No worries. Understand this is a confusing issue. Sorry for not communicating it clearer. Happy to help. Likewise 🙂

@RaulPPelaez
Copy link
Contributor

Thanks for the good work guys, always refreshing to see you around.
The dreaded GCC12+CUDA12+pybind11 pops up in my radar quite often, it is a fact of life at this point :P. I am guilty of abusing the use_local: true thingy to work around it. Will go around replacing it when I learn of a better solution.
I am going to merge this in then.

@RaulPPelaez RaulPPelaez merged commit a466cc1 into conda-forge:main Jan 15, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@conda-forge-admin , please re-render
4 participants