Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pineparser for compatibility with new theories #63

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

comane
Copy link
Member

@comane comane commented Apr 14, 2024

The scope of this PR is to allow SIMUnet to use new theories generated with pineappl.

Example on how to use this for the moment:

  1. from theory_700/fast_kernel copy NMC_NC_NOTFIXED_P_EM-SIGMARED.pineappl.lz4 into theory_270/fast_kernel

  2. from nnpdf_data/new_commondata/NMC_NC_NOTFIXED_P copy the metadata.yaml as NMC_NC_NOTFIXED_P_EM-SIGMARED_metadata.yaml into theory_700/fast_kernel

  3. copy DATA_NMC.dat into DATA_NMC_NC_NOTFIXED_P_EM-SIGMARED.dat within data/commondata

Note: make sure that in the metadata.yaml file you change the name of the fktable to the name of the pineappl grid.

Now, it should be possible to run a fit with the following dataset_inputs:

dataset_inputs:
- {dataset: NMC_NC_NOTFIXED_P_EM-SIGMARED, new_commondata: true}  

TODO

  • Run the new fk parser with theory 270
  • Benchmark against previous fits done with theory 270
  • Support normalisation and shifts in TheoryMeta
  • Support vp-setupfit for closure tests (_filter_closure_data)

@comane comane added the enhancement New feature or request label Apr 14, 2024
@ElieHammou
Copy link
Collaborator

ElieHammou commented Jun 24, 2024

Hi @comane ,
Thanks for starting the PR. I am testing it at the moment with this data:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75} # New FK table

I am working with theory 270. I have manually added this FK table:

simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_NC_EPD_88_PES.pineappl.lz4 

and this compound file (using the old way):

simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat

Here is the content of the compound file:

# COMPOUND FK
FK: EIC_NC_EPD_88_PES
OP: NULL

When I vp-setupfit it seems to be looking for the wrong name:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
[ERROR]: Bad configuration encountered:
Incorrect COMPOUND file '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'. Searching for non-existing FKTable:
Could not find FKTable for set '_NC_EPD_88'. File '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/FK__NC_EPD_88.dat' not found

It looks like it is messing up with both the prefix and the suffix. It is due to the fact that the old format had the following naming convention for FK tables:

FK_EIC_NC_EPD_88_PES.dat

@ElieHammou
Copy link
Collaborator

For the record, is this PR relying on the old compound files to link commondata and FK tables or is it expecting the info to be stored in the yamldb folder of the theory, like nnpdf does currently?

@comane
Copy link
Member Author

comane commented Jun 25, 2024

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75} # New FK table

Can you try adding the new_commondata: true flag to the dataset that makes use of the FKtable in the pineappl format.

For the record, is this PR relying on the old compound files to link commondata and FK tables or is it expecting the info to be stored in the yamldb folder of the theory, like nnpdf does currently?

I don't think that this PR supports compounds yet

@ElieHammou
Copy link
Collaborator

Sure thing.

I have just tried vp-setupfit with:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_NC_EPD_88_PES, frac: 0.75, new_commondata: true} # New FK table

I have also removed the compound file I had initially added. It gives me the following error:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
[CRITICAL]: Bug in setup-fit ocurred. Please report it.
Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 405, in check_compound
    with compound_spec_path.open() as f:
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 590, in check_dataset
    fkspec, op = self.check_compound(theoryno, name, cfac)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 412, in check_compound
    raise CompoundNotFound(msg)
validphys.loader.CompoundNotFound: Could not find COMPOUND set 'EIC_NC_EPD_88_PES' for theory 270: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/compound/FK_EIC_NC_EPD_88_PES-COMPOUND.dat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/n3fit/src/n3fit/scripts/vp_setupfit.py", line 197, in run
    super().run()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/app.py", line 158, in run
    super().run()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/app.py", line 358, in run
    rb.resolve_fuzzytargets()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 370, in resolve_fuzzytargets
    self.resolve_fuzzytarget(target)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 379, in resolve_fuzzytarget
    self.process_targetspec(fuzzytarget.name, spec, fuzzytarget.extraargs)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 388, in process_targetspec
    gen.send(None)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 450, in _process_requirement
    yield from self._make_node(name, nsspec, extraargs, parents)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 466, in _make_node
    yield from self._make_callspec(f, name, nsspec, extraargs, parents)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 499, in _make_callspec
    index, _ = gen.send(None)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 417, in _process_requirement
    put_index, val = self.input_parser.resolve_key(name, ns, parents=parents, currspec=nsspec)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 429, in resolve_key
    return self._resolve_key(key=key, ns=ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 491, in _resolve_key
    val = produce_func(**kwargs)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/config.py", line 1492, in produce_data
    datasets.append(self.parse_from_(None, "dataset", write=False)[1])
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 133, in f_
    return f(self, val, *args, **kwargs)
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 735, in parse_from_
    return self.resolve_key(element, ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 429, in resolve_key
    return self._resolve_key(key=key, ns=ns, input_params=input_params,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/configparser.py", line 491, in _resolve_key
    val = produce_func(**kwargs)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/config.py", line 754, in produce_dataset
    ds = self.loader.check_dataset(
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 592, in check_dataset
    fkspec = self.check_fktable(theoryno, name, cfac, use_fixed_predictions=use_fixed_predictions, new_commondata=new_commondata)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/loader.py", line 386, in check_fktable
    with open(path_metadata, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_NC_EPD_88_PES_metadata.yaml'

It appears to complain about the absence of the compound file and the metadata file. The metadata file makes sense since I am using an old commondata implementation with a new FK table.

I will implement the metadata or try with a dataset which has it already implemented and come back to you. I am confused about the compound error though.

@ElieHammou
Copy link
Collaborator

Hi @comane ,
I think I have found a bug, it appears that the new FK tables cannot be read if another dataset if being contaminated. For example, the following runcard works well:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

But if I add another dataset to be contaminated, the vp-setupfit steps bugs out:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: HLLHC_HMDY_NC_EL_FINAL, frac: 0.75, cfac: ['QCD', 'EWK'], contamination: 'EFT_LO'}
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

I have then the following error:

(simunet-dev) ~/Projects/Low_E_PDF/low-energy/Fits/ - (main) > vp-setupfit test_simunet_EIC.yaml
[WARNING]: Output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC Overwriting contents
[WARNING]: Using q2min from runcard
[WARNING]: Using w2min from runcard
Using Keras backend
[INFO]: All requirements processed and checked successfully. Executing actions.
[WARNING]: Importing libNNPDF
[INFO]: Initialising RNG
- Random Generator allocated: ranlux
[INFO]: NNPDF40_nnlo_as_01180 T0 checked.
[INFO]: Verifying positivity tables:
[INFO]: POSF2U checked.
[INFO]: POSF2DW checked.
[INFO]: POSF2S checked.
[INFO]: POSFLL checked.
[INFO]: POSDYU checked.
[INFO]: POSDYD checked.
[INFO]: POSDYS checked.
[INFO]: POSF2C checked.
[INFO]: POSXUQ checked.
[INFO]: POSXUB checked.
[INFO]: POSXDQ checked.
[INFO]: POSXDB checked.
[INFO]: POSXSQ checked.
[INFO]: POSXSB checked.
[INFO]: POSXGL checked.
-- Generating closure data for DEUTERON
-- Generating replica data for DEUTERON
[WARNING]: Dataset output folder exists: /Users/eliehammou/Projects/Low_E_PDF/low-energy/Fits/test_simunet_EIC/filter/NMCPD_dw_ite Overwriting contents
[INFO]: 121/260 datapoints in NMCPD_dw_ite passed kinematic cuts.
-- Generating closure data for HLLHC
-- Generating replica data for HLLHC
[INFO]: 12/12 datapoints in HLLHC_HMDY_NC_EL_FINAL passed kinematic cuts.
[CRITICAL]: Bug in setup-fit ocurred. Please report it.
Traceback (most recent call last):
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/n3fit/src/n3fit/scripts/vp_setupfit.py", line 197, in run
    super().run()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/app.py", line 158, in run
    super().run()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/app.py", line 380, in run
    rb.execute_sequential()
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 166, in execute_sequential
    result = self.get_result(callspec.function,
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/reportengine/resourcebuilder.py", line 175, in get_result
    fres =  function(**kwdict)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 122, in filter_closure_data_by_experiment
    return [
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 123, in <listcomp>
    _filter_closure_data(filter_path, exp, t0pdfset, fakenoise, errorsize)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/filters.py", line 177, in _filter_closure_data
    loaded_data = data.load.__wrapped__(data)
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 774, in load
    loaded_data = dataset.load()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 584, in load
    fktable = p.load()
  File "/Users/eliehammou/Software/simunet_git/SIMUnet/validphys2/src/validphys/core.py", line 702, in load
    return FKTable(str(self.fkpath), [str(factor) for factor in self.cfactors])
  File "/Users/eliehammou/miniconda3/envs/simunet-dev/lib/python3.10/site-packages/NNPDF/nnpdf.py", line 3042, in __init__
    _nnpdf.FKTable_swiginit(self, _nnpdf.new_FKTable(*args))
RuntimeError: [utils] error: Could not open (PosixPath('/Users/eliehammou/miniconda3/envs/simunet-dev/share/NNPDF/data/theory_270/fastkernel/EIC_CC_EMP_140_OPT.pineappl.lz4'),)

I have similar problems with validphys runcards.

@ElieHammou
Copy link
Collaborator

I have no idea what the problem can be to be honest

@ElieHammou
Copy link
Collaborator

I think I understand the issue. The contamination itself is not the issue, the new FK tables do not work in a closure test.

This runcard produces a bug for instance:

dataset_inputs:
- {dataset: NMCPD_dw_ite, frac: 0.75} # Old FK table
- {dataset: EIC_CC_EMP_140_OPT, frac: 0.75, new_commondata: true} # New FK table

###########################################################
# The closure test namespace tells us the settings for the
# (possible contaminated) closure test.
############################################################
closuretest:
    filterseed: 0 # Random seed to be used in filtering data partitions
    fakedata: true     # true = to use FAKEPDF to generate pseudo-data
    fakepdf: NNPDF40_nnlo_as_01180      # Theory input for pseudo-data
    errorsize: 1.0    # uncertainties rescaling
    fakenoise: true    # true = to add random fluctuations to pseudo-data
    rancutprob: 1.0   # Fraction of data to be included in the fit
    rancutmethod: 0   # Method to select rancutprob data fraction
    rancuttrnval: false # 0(1) to output training(valiation) chi2 in report
    printpdf4gen: false # To print info on PDFs during minimization
#     contamination_parameters:
#       - name: 'W'
#         value: 0.00008
#         linear_combination:
#             'Olq3': -15.94

seed: 0
rngalgo: 0

The bug disappears if I comment out the closure test key.


# use different file name for the FK table if the commondata is new
if new_commondata:
fkpath = tuple([theopath/ 'fastkernel' / (f'{setname}.pineappl.lz4')])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work for double differential measurements, i.e. ATLAS_1JET_13TEV_DIF which is a double differential measurement in |y| and pT, to make it work one could extract the file names from the <DATASET>_<OBS>_metadata.yaml file. I suggest moving the loading of the .yaml file from the next if clause to this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case the dataset name is ATLAS_1JET_13TEV_DIF_PT-Y but the names of the FK files are ATLAS_1JET_13TEV_DIF_PT-Y_BIN#.pineappl.lz4

@comane
Copy link
Member Author

comane commented Jul 8, 2024

I think I understand the issue. The contamination itself is not the issue, the new FK tables do not work in a closure test.

Yes, exactly. As I had already commented in the description above, this PR still not supports the filtering of closure test data when using the new pine parser.
It's in the TODO list above.
Thanks for pointing this out again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants