-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cms-2016-simulated-datasets: add folder, file list, categorisation #163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiborsimko @jmhogan Maybe just merge the folder structure, dataset lists in inputs (although it will be updated closer to the release), interface.py and the categorisation.py
@nancyhamdan and @joudmas will work on updating the other python scripts and then upload.
Yes, I can do that. A quick diff shows that $ git checkout pr-163
$ cd cms-2016-simulated-datasets
$ $ colordiff -ru code/ ../cms-YYYY-simulated-datasets/code
Only in code/: categorisation.py~
diff -ru code/printer.py ../cms-YYYY-simulated-datasets/code/printer.py
--- code/printer.py 2023-10-26 15:59:00.896461652 +0200
+++ ../cms-YYYY-simulated-datasets/code/printer.py 2022-11-02 17:39:55.955236462 +0100
@@ -49,7 +49,9 @@
' the rules in the categorisation script should be adjusted and'
' the script rerun.')
print('')
- print('See [#157](https://github.com/cernopendata/data-curation/issues/157) for more context.')
+ print('See [#1229](https://github.com/cernopendata/opendata.cern.ch/issues/1229)'
+ ' and [this page](https://demo.codimd.org/s/BkoBknkqQ#)'
+ ' for more context.')
print('')
print('Generated on', datetime.datetime.now().strftime("%d-%m-%Y %H:%M:%S"))
print('')
Only in ../cms-YYYY-simulated-datasets/code: __pycache__ |
@jmhogan BTW your starting point for the categorisation changes was the $ git checkout master
$ colordiff -uw cms-YYYY-simulated-datasets/code/categorisation.py cms-2015-simulated-datasets/code/categorisation.py I'm appending them for convenience below: --- cms-YYYY-simulated-datasets/code/categorisation.py 2023-10-26 16:02:53.268938768 +0200
+++ cms-2015-simulated-datasets/code/categorisation.py 2023-08-18 15:13:13.661327989 +0200
@@ -74,7 +74,6 @@
re.search(r'/branon', title_lower) or # extra-dimensions, brane models
re.search(r'/stringball', title_lower) or
re.search(r'/qbh', title_lower) or # Quantum Black Hole
- re.search(r'/unpart', title_lower) or
re.search(r'blackhole', title_lower)): # Quantum Black Hole also??
return 'Exotica/Extra Dimensions'
@@ -84,7 +83,6 @@
re.search(r'/dmz', title_lower) or # darkmatter Z?
re.search(r'/dms', title_lower) or # darkmatter scalar
re.search(r'/dmv', title_lower) or # darkmatter vector
- re.search(r'/Monotop', title) or
re.search(r'DMJets', title)): # darkmatter Jets?
return 'Exotica/Dark Matter'
@@ -136,9 +134,7 @@
re.search(r'/wrto', title_lower) or
re.search(r'/monolepton', title_lower) or
re.search(r'/spin0plus', title_lower) or
- re.search(r'/spin2ph', title_lower) or
- re.search(r'/extendedweakisospin', title_lower) or
- re.search(r'/hscp', title_lower)):
+ re.search(r'/spin2ph', title_lower)):
return 'Exotica/Miscellaneous'
elif ('susy' in title_lower or
@@ -179,8 +175,6 @@
re.search(r'primejettoth', title_lower) or # TprimeJetToTH FIXME: SM Higgs from T' is here?
re.search(r'hminus', title_lower) or
re.search(r'sms[-]?higgs', title_lower) or # sms higgs
- re.search(r'spin0to', title_lower) or
- re.search(r'xxto', title_lower) or
re.search(r'hplus', title_lower)):
return 'Higgs Physics/Beyond Standard Model'
@@ -200,12 +194,6 @@
return 'Higgs Physics/Standard Model'
# FIXME gravitino going to SM Higgs ctegory.
- elif ('_HInt_' in title or
- 'ttHJetTo' in title or
- 'Hincl' in title or
- 'GluGluHTo' in title):
- return 'Higgs Physics/Standard Model'
-
elif (re.search('GammaGammaTo(E|Mu|Tau)*_(Inel|Elastic|SingleDiss)', title) or # gamma gamma -> mu+ mu- etc reactions which involve elastically scattered protons
# SingleDiss, means Single Diffractive Dissociation
re.search('/singlediffractive[zw]?', title_lower) or
@@ -217,13 +205,7 @@
elif (re.search('/minbias', title_lower)):
return 'Standard Model Physics/Minimum Bias'
- elif (re.search(r'gun', title_lower) or # particle gun
- re.search(r'/single', title_lower) or
- re.search(r'/double', title_lower) or
- re.search(r'/muminus', title_lower) or
- re.search(r'/muplus', title_lower) or
- re.search(r'/doubleelectron', title_lower) or # Is this an electron gun?
- re.search(r'/singlepi', title_lower)):
+ elif (re.search(r'gun', title_lower)): # particle gun
return 'Physics Modelling'
elif re.search(r'/dy', title_lower):
@@ -259,21 +241,6 @@
re.search(r'/wminusto', title_lower) or # W- to
re.search(r'/wmto', title_lower) or # W- to
re.search(r'/z*to', title_lower) or # ZZ To
- re.search(r'/eeg', title_lower) or
- re.search(r'/photonindbkg', title_lower) or
- re.search(r'/wgjjto', title_lower) or
- re.search(r'/wzjto', title_lower) or
- re.search(r'/zzj', title_lower) or
- re.search(r'/wlljjto', title_lower) or
- re.search(r'/wzjj', title_lower) or
- re.search(r'/glugluwwto', title_lower) or
- re.search(r'/mumug', title_lower) or
- re.search(r'/vvto', title_lower) or
- re.search(r'/wpwp', title_lower) or
- re.search(r'/wmwm', title_lower) or
- re.search(r'/wbjets', title_lower) or
- re.search(r'/zllg', title_lower) or
- re.search(r'/znunug', title_lower) or
re.search(r'/[wz]to[emunu]*', title_lower)): # W/Z to E,Mu,Nu
return 'Standard Model Physics/ElectroWeak'
@@ -293,15 +260,15 @@
'tt_mtt-1000' in title_lower or
'tt_mtt-700' in title_lower or
re.search(r'/[tbar]*_.+_[stuw]+-channel', title_lower) or # T_bla_s/t/u/w-channel
- re.search(r'/ST_', title) or
- re.search(r'/TTZTo', title) or
- re.search(r'/tZq', title) or
- re.search(r'/TTTo', title) or
- re.search(r'/ttwjets', title_lower) or
- re.search(r'/ttbb', title) or
re.search(r'/t+_', title_lower)):
return 'Standard Model Physics/Top physics'
+ elif (re.search(r'/muminus', title_lower) or
+ re.search(r'/muplus', title_lower) or
+ re.search(r'/doubleelectron', title_lower) or # Is this an electron gun?
+ re.search(r'/singlepi', title_lower)): # is this right? FIXME
+ return 'Standard Model Physics/Miscellaneous'
+
elif ('Heavy-Ion Physics' in title or
re.search('reggegribov_', title_lower)):
return 'Heavy-Ion Physics'
@@ -316,10 +283,6 @@
re.search(r'/bsto', title_lower) or
re.search(r'/chib0', title_lower) or
'etabto' in title_lower or # Eta_b To
- 'InclusivebtoMu' in title or
- 'InclusivectoMu' in title or
- 'DsToTau' in title or
- 'DStar' in title or
'xibstar0' in title_lower):
return 'B physics and Quarkonia'
Do you think some of the above changes may be interesting to "replay" on top of your branch? Or have you covered everything already? |
Fixes several miscellaneous cases in the dataset categorisation.
Updates categorisation for 2016 simulated data.
Adds initial structure for CMS 2016 simulated dataset curation work. Adds categorisation scripts.
Actually this was not part of the pull request files, so I'm merging without inputs, just the categorisation scripts and 2016 MD results.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, rebased and amended as discussed to keep only categorisation changes. I have also added you to the list of authors with a link to your ORCID profile.
Adding the cms-2016-simulated-datasets folder. The inputs subfolder contains the dataset listing as of late March 2023, and the categorisation script has been extensively updated to handle everything.