Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECAL RecHit producer Alpaka migration #46453

Merged

Conversation

thomreis
Copy link
Contributor

PR description:

Migration of the ECAL RecHit producer from CUDA to Alpaka, including the required portable data and conditions formats and an extension of the DQM module to compare RecHits produced on the CPU or GPU.
While being a direct replacement of the existing CUDA RecHit producer for the most part, the migrated Alpaka version adds the RecHit time variable, which the CUDA version did not calculate. In addition the Alpaka version adds support for Phase 2, where no inputs from the endcaps will be existing anymore.

In comparison with the legacy CPU producer the Alpaka algorithm still lacks the recovery of dead channels and can therefore not yet be used to replace the legacy producer in production.

PR validation:

A comparison of the legacy CPU code vs. CUDA comparison (12834.513) with the legacy CPU code vs. Alpaka (on GPU) comparison (13834.413) with 9k TTbar events shows almost identical results between CUDA and Alpaka (with the exception of the time variables as mentioned above) and very good agreement with the legacy CPU version for both implementations.
In addition, a comparison of the Alpaka module running on CPU gives almost identical results to the module running on GPU (nvidia).

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 19, 2024

cms-bot internal usage

@thomreis
Copy link
Contributor Author

type ecal

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46453/42315

@thomreis
Copy link
Contributor Author

enable gpu

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @thomreis for master.

It involves the following packages:

  • CondFormats/DataRecord (db, alca)
  • CondFormats/EcalObjects (db, alca)
  • Configuration/ProcessModifiers (operations)
  • Configuration/PyReleaseValidation (pdmv, upgrade)
  • DQM/EcalMonitorTasks (dqm)
  • DataFormats/EcalRecHit (reconstruction)
  • EventFilter/EcalRawToDigi (reconstruction)
  • RecoLocalCalo/EcalRecProducers (reconstruction)

@AdrianoDee, @Moanwar, @antoniovagnerini, @antoniovilela, @atpathak, @cmsbuild, @consuegs, @davidlange6, @fabiocos, @francescobrivio, @jfernan2, @kskovpen, @mandrenguyen, @miquork, @nothingface0, @perrotta, @rappoccio, @rvenditti, @srimanob, @subirsarkar, @sunilUIET, @syuvivida, @tjavaid can you please review it and eventually sign? Thanks.
@JanChyczynski, @Martin-Grunewald, @PonIlya, @ReyerBand, @apsallid, @argiro, @fabiocos, @makortel, @missirol, @mmusich, @rchatter, @rovere, @rsreds, @seemasharmafnal, @slomeo, @thomreis, @tocheng, @wang0jin, @youyingli, @yuanchao this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@thomreis
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 132KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-891ac5/42309/summary.html
COMMIT: b693b23
CMSSW: CMSSW_14_2_X_2024-10-19-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46453/42309/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Oct 21, 2024

Hello, just out of curiosity, I imagine this development will eventually enter the HLT menu for 2025.
Is the plan to introduce a customization function to make use the alpaka ECAL rechit producer in the HLT menu in this PR or will ECAL just open a ticket with the confDB configuration for integration?

@thomreis
Copy link
Contributor Author

Hi @mmusich the portable rechit produer in this PR does not have the full functionality of the CPU producer currently used in the HLT menu. Just like the CUDA version it lacks the algorithms for recovery of dead channels. However, in the current pp HLT menu there does not seem to be any energy recovery done neither so it may be possible to actually use this in 2025 already. This needs to be checked in more detail however.

We can add a customization function to this PR but I think we should only activate it once it is confirmed that the portable producer gives the same results than the currently used one at the HLT. Of course we could also do this in a separate PR if that is preferred.

@missirol
Copy link
Contributor

assign heterogeneous ?

@thomreis
Copy link
Contributor Author

Hi @cms-sw/alca-l2 @cms-sw/db-l2 @cms-sw/pdmv-l2 @cms-sw/dqm-l2 do you have any comments on this PR?

@AdrianoDee
Copy link
Contributor

AdrianoDee commented Dec 19, 2024 via email

@antoniovagnerini
Copy link

+dqm

@thomreis
Copy link
Contributor Author

Hi @cms-sw/alca-l2 @cms-sw/db-l2 please take a look at this PR and let us know if you have comments.

@perrotta
Copy link
Contributor

perrotta commented Jan 2, 2025

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 2, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit f55e6a5 into cms-sw:master Jan 3, 2025
14 checks passed
@dan131riley
Copy link

This broke gcc13 builds, CondFormats/EcalObjects/interface/EcalRecHitParameters.h needs

#include <cstdint>

as the compiler helpfully tells us:

In file included from src/CondFormats/EcalObjects/interface/EcalRecHitParametersHost.h:4,
                 from src/CondFormats/EcalObjects/src/ES_EcalRecHitParametersHost.cc:1:
  src/CondFormats/EcalObjects/interface/EcalRecHitParameters.h:9:16: error: 'uint32_t' was not declared in this scope
     9 |     std::array<uint32_t, kNEcalChannelStatusCodes>;  // associate recoFlagBits to all channel status codes
      |                ^~~~~~~~
src/CondFormats/EcalObjects/interface/EcalRecHitParameters.h:6:1: note: 'uint32_t' is defined in header '<cstdint>'; did you forget to '#include <cstdint>'?
    5 | #include <array>
  +++ |+#include <cstdint>
    6 | 
  src/CondFormats/EcalObjects/interface/EcalRecHitParameters.h:9:50: error: template argument 1 is invalid
     9 |     std::array<uint32_t, kNEcalChannelStatusCodes>;  // associate recoFlagBits to all channel status codes
      |                                                  ^
  src/CondFormats/EcalObjects/interface/EcalRecHitParameters.h:9:10: error: '<expression error>' in namespace 'std' does not name a type
     9 |     std::array<uint32_t, kNEcalChannelStatusCodes>;  // associate recoFlagBits to all channel status codes
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  src/CondFormats/EcalObjects/interface/EcalRecHitParameters.h:12:3: error: 'RecoFlagBitsArray' does not name a type
    12 |   RecoFlagBitsArray recoFlagBits;
      |   ^~~~~~~~~~~~~~~~~

With the include, std:: uint32_t is mildly preferred over bare uint32_t.

@thomreis thomreis deleted the ecalrechitproducer-alpaka-migration branch January 6, 2025 11:19
@smuzaffar
Copy link
Contributor

this PR also adds the following dicts in DataFormats/EcalRecHit/src/classes_def.xml file.

  <class name="EcalRecHitSoA"/>
  <class name="EcalRecHitSoA::View"/>

The lost dictionary checker thinks that these should be defined in CUDADataFormats/EcalRecHitSoA package (as package name matches the dict names). So should we move these dicts ( along with the headers) to CUDADataFormats/EcalRecHitSoA on should we add exception for these dicts?

@fwyzard
Copy link
Contributor

fwyzard commented Jan 7, 2025

Note that all CUDADataFormats packages are going to be removed soon.

@smuzaffar
Copy link
Contributor

ok, so for now I will update https://github.com/cms-sw/cmssw/blob/master/Utilities/ReleaseScripts/scripts/duplicateReflexLibrarySearch.py#L31-L80 to not complain about these dicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.