diff --git a/doc/sphinx/source/buildmaster.md b/doc/sphinx/source/buildmaster.md index 9fde1acb1..44d3c6fcb 100644 --- a/doc/sphinx/source/buildmaster.md +++ b/doc/sphinx/source/buildmaster.md @@ -1,3 +1,7 @@ +```eval_rst +.. _buildmaster: +``` + ## Handling experimental data: Buildmaster Buildmaster is the code that allows the user to generate the ``DATA`` and diff --git a/doc/sphinx/source/get-started/cite.rst b/doc/sphinx/source/get-started/cite.rst index df9472c40..cb090abda 100644 --- a/doc/sphinx/source/get-started/cite.rst +++ b/doc/sphinx/source/get-started/cite.rst @@ -1,7 +1,7 @@ .. _cite: References -======== +========== If you use this code please consider citing at least the following papers: * NNPDF4.0 :cite:p:`nnpdf40` diff --git a/doc/sphinx/source/get-started/index.rst b/doc/sphinx/source/get-started/index.rst index 73d26f019..ee9b382c1 100644 --- a/doc/sphinx/source/get-started/index.rst +++ b/doc/sphinx/source/get-started/index.rst @@ -1,24 +1,18 @@ -.. _getstarted: - Getting started =============== -This section provides an introduction to the NNPDF code. The workflow -for an NNPDF fit is displayed below. -.. image:: diagram.png - :width: 300 - :alt: Alternative text - -To get started, find below the instructions on how to download and -install the code. +This section provides an introduction to the NNPDF code. +To get started, find below the instructions on how to download and +install the code, as well as a description of the different modules that compose +the NNPDF code. If you use the public NNPDF code, please cite all references listed on the :ref:`cite` page. .. toctree:: - :maxdepth: 1 - - ./git.md - ./installation.rst - ./cite.rst + :maxdepth: 1 + ./git.md + ./installation.rst + ./nnpdfmodules.rst + ./cite.rst diff --git a/doc/sphinx/source/get-started/nnpdfmodules.rst b/doc/sphinx/source/get-started/nnpdfmodules.rst new file mode 100644 index 000000000..fca040430 --- /dev/null +++ b/doc/sphinx/source/get-started/nnpdfmodules.rst @@ -0,0 +1,99 @@ +Code structure +================ +Here, we describe the structure of the NNPDF code and we present a +high-level description of its functionalities. The workflow +for an NNPDF fit is displayed in the figure below. + +.. figure:: diagram.png + :align: center + :alt: Code structure diagram + + Code structure diagram + + +The :ref:`APFELcomb ` interpolation table generator +-------------------------------------------------------------------------------- +This code takes hard-scattering partonic matrix element interpolators +from `APPLgrid `_ and +`FastNLO `_ (for hadronic processes) and +`APFEL `_ (for DIS structure functions) and +combines them with the DGLAP evolution kernels provided by ``APFEL`` to +construct the fast interpolation grids called +FK-tables (for further instructions +see :ref:`tutorialfktables`). In this way, physical +observables can be evaluated in a highly efficient manner as a tensor sum of +FK-tables with a grid of PDFs at an initial parametrisation scale :math:`Q_0`. +``APFELcomb`` also handles NNLO QCD and/or NLO electroweak +K-factors when needed. + +Theory predictions can be generated configuring a variety of options, +such as the perturbative order (currently up to NNLO), the values of the +heavy quark masses, the electroweak parameters, the maximum number of +active flavours, and the variable-flavour-number scheme used to account +for the effects of the heavy quark masses in the DIS structure functions. +The FK-tables resulting from each choice are associated to a +database entry trough a theory id, which allows to quickly identify them +them. + + +The :ref:`buildmaster ` experimental data formatter +-------------------------------------------------------------------------------- +A C++ code which transforms the original measurements provided +by the experimental collaborations, +e.g. via `HepData `_, +into a standard format that is tailored for PDF fitting. + +In particular, the code allows for a flexible handling of experimental +systematic uncertainties allowing for different treatments of the correlated +systematic uncertainties. + + +The :ref:`n3fit ` fitting code +-------------------------------------------------------------------------------- +This module implements the core fitting methodology as implemented through +the ``TensorFlow`` framework. The ``n3fit`` library allows +for a flexible specification of the neural network model adopted to +parametrise the PDFs, whose settings can be selected automatically via +the built-in :ref:`hyperoptimization algorithm `. These +include the neural network type and architecture, the activation +functions, and the initialization strategy; the choice of optimizer and +of its corresponding parameters; and hyperparameters related to the +implementation in the fit of theoretical constraints such as PDF +positivity and integrability. The settings for a +PDF fit are inputted via a declarative runcard. Using these +settings, ``n3fit`` finds the values of the neural network parameters, +corresponding to the PDF at initial scale which describe the input data. +Following a post-fit selection (using the ``postfit`` tool implemented +in validphys) and PDF evolution step, the final output +consists of an `LHAPDF `_ grid corresponding to +the best fit PDF as well as metadata on the fit performance. + + +The :ref:`validphys ` analysis framework +-------------------------------------------------------------------------------- +As an implementation of the +`reportengine `_, it enables a workflow +focused on declarative and reproducible runcards. The code implements data +structures that can interact with those of ``libnnpdf`` and are accessible from +the runcard. The analysis code makes heavy use of common Python Data Science +libraries such as ``NumPy``, ``SciPy``, ``Matplotlib`` and ``Pandas``, and +through its use of ``Pandoc`` it is capable of outputting the final results to +HTML reports. These can be composed directly by the user or be generated by more +specialised, downstream applications. The package includes tools to interact +with online resources such as the results of fits or PDF grids, which, for +example, are automatically downloaded when they are required by a runcard. + + +The libnnpdf C++ legacy code +-------------------------------------------------------------------------------- +A C++ library which contains common data structures together with +the fitting code used to produce the NNPDF3.0 and NNPDF3.1 analyses. + +The availability of the ``libnnpdf`` guarantees strict backwards +compatibility of the NNPDF framework and the ability to benchmark the +current methodology against the previous one. + +To facilitate the interaction between the NNPDF C++ and Python +codebases, we have developed Python wrappers using the ``SWIG`` library. + +For instructions on how to run a legacy fit, see :ref:`nnfit-usage`. diff --git a/doc/sphinx/source/index.rst b/doc/sphinx/source/index.rst index 36bd17725..c96737de9 100644 --- a/doc/sphinx/source/index.rst +++ b/doc/sphinx/source/index.rst @@ -4,7 +4,7 @@ contain the root `toctree` directive. The NNPDF collaboration -================= +======================= The `NNPDF collaboration `_ performs research in the field of high-energy physics. The NNPDF collaboration determines the structure of the proton using contemporary methods of artificial intelligence. A precise knowledge @@ -15,7 +15,7 @@ the Large Hadron Collider of CERN. The NNPDF code -============= +============== The scientific output of the collaboration is freely available to the publi through the arXiv, journal repositories, and software repositories. Along with this online documentation, we release the @@ -24,9 +24,9 @@ repositories. Along with this online documentation, we release the The code can be used to produce the ingredients needed for PDF fits, to run the fits themselves, and to analyse the results. This is the first framework used to produce a global PDF fit made publicly available, enabling for a detailed external validation and reproducibility of the NNPDF4.0 analysis. Moreover, the code enables the user to explore a number of phenomenological applications, such as the assessment of the impact of new experimental data on PDFs, the effect of changes in theory settings on the resulting PDFs and a fast quantitative comparison between theoretical predictions and experimental data over a broad range of observables. If you are a new user head along to :ref:`getstarted` and check out the :ref:`tutorials`. - + The NNPDF team -============= +============== The NNPDF collaboration is currently composed by the following members: @@ -43,7 +43,7 @@ members: * Rabah Abdul Khalek - Nikhef Theory Group and VU University * José Ignacio Latorre - Quantum Research Centre, Technology Innovation Institute, Abu Dhabi, United Arab Emirates and Center for Quantum Technologies, National University of Singapore -* Emanuele R. Nocera - University of Edinburgh +* Emanuele R. Nocera - University of Edinburgh * Rosalyn Pearson - University of Edinburgh * Juan Rojo - Nikhef Theory Group and VU University * Roy Stegeman - Tif Lab, Dipartimento di Fisica, Università di @@ -68,7 +68,7 @@ Former members of the NNPDF collaboration include The NNPDF publications -===================== +====================== * *"Future tests of parton distributions"*, Juan Cruz-Martinez, Stefano Forte, Emanuele R. Nocera :cite:p:`Cruz-Martinez:2021rgy` * *"Deuteron Uncertainties in the Determination of Proton PDFs"*, @@ -95,7 +95,7 @@ The NNPDF publications Voisey and Michael Wilson :cite:p:`AbdulKhalek:2019bux` * *"Nuclear Parton Distributions from Lepton-Nucleus Scattering and the Impact of an Electron-Ion Collider"*, Rabah Abdul Khalek, - Jacob J. Ethier, Juan Rojo, :cite:p:`AbdulKhalek:2019mzd` + Jacob J. Ethier, Juan Rojo, :cite:p:`AbdulKhalek:2019mzd` * *"A First Determination of Parton Distributions with Theoretical Uncertainties"*, Rabah Abdul Khalek, Richard D. Ball, Stefano Carrazza, Stefano Forte, Tommaso Giani, Zahari Kassabov, @@ -110,7 +110,7 @@ The NNPDF publications Nathan P. Hartland, Zahari Kassabov, Jose I. Latorre, Emanuele R. Nocera, Juan Rojo, Luca Rottoli, Emma Slade, and Maria Ubiali :cite:p:`Ball:2017nwa` - + Contents ======== .. toctree:: diff --git a/doc/sphinx/source/n3fit/hyperopt.rst b/doc/sphinx/source/n3fit/hyperopt.rst index b4307cf10..d78dd5cd9 100644 --- a/doc/sphinx/source/n3fit/hyperopt.rst +++ b/doc/sphinx/source/n3fit/hyperopt.rst @@ -1,4 +1,6 @@ -================================ +.. _hyperoptimization: + +================================ Hyperoptimization algorithm ================================ @@ -77,10 +79,10 @@ An example of a DIS fit using this loss function can be found here: [`best worst \begin{array}{lr} std(\{\chi^{2}\}) & \text{ if } avg(\chi^2) < \text{ threshold } \\ \infty & \text{otherwise} - \end{array} + \end{array} \right. - -An example of a DIS fit using this loss function with the threshold :math:`\chi^2` set to 2.0 + +An example of a DIS fit using this loss function with the threshold :math:`\chi^2` set to 2.0 can be found here: [`best std `_]. It can be selected in the runcard using the target ``std``. diff --git a/doc/sphinx/source/n3fit/index.rst b/doc/sphinx/source/n3fit/index.rst index dbbcc44bc..8e5501b49 100644 --- a/doc/sphinx/source/n3fit/index.rst +++ b/doc/sphinx/source/n3fit/index.rst @@ -1,15 +1,17 @@ +.. _n3fitindex: + Fitting code: ``n3fit`` -=================== +======================= -- ``n3fit`` is the next generation fitting code for NNPDF developed by the +- ``n3fit`` is the next generation fitting code for NNPDF developed by the N3PDF team :cite:p:`Carrazza:2019mzf` -- ``n3fit`` is responsible for fitting PDFs from NNPDF4.0 onwards. +- ``n3fit`` is responsible for fitting PDFs from NNPDF4.0 onwards. - The code is implemented in python using `Tensorflow `_ and `Keras `_. - The sections below are an overview of the ``n3fit`` design. ``n3fit`` design ------------- +---------------- .. toctree:: :maxdepth: 1 @@ -17,7 +19,7 @@ Fitting code: ``n3fit`` hyperopt runcard_detailed -.. important:: +.. important:: If you just want to know how to run a fit using ``n3fit``, head to :ref:`n3fit-usage`. diff --git a/doc/sphinx/source/tutorials/apfelcomb.md b/doc/sphinx/source/tutorials/apfelcomb.md index 471aa5a7e..c19660714 100644 --- a/doc/sphinx/source/tutorials/apfelcomb.md +++ b/doc/sphinx/source/tutorials/apfelcomb.md @@ -1,11 +1,15 @@ +```eval_rst +.. _tutorialfktables: +``` + # How to generate and implement FK tables APFELcomb is the project that allows the user to generate `FK` tables. -These are lookup tables that contain the relevant information to compute -theoretical predicitons in the NNPDF framework. Broadly speaking, this is -achieved by taking DGLAP evolution kernels from ``APFEL`` and combining them +These are lookup tables that contain the relevant information to compute +theoretical predicitons in the NNPDF framework. Broadly speaking, this is +achieved by taking DGLAP evolution kernels from ``APFEL`` and combining them with interpolated parton-level observable kernels in the APPLgrid or -FastNLO format +FastNLO format (see [How to generate APPLgrid and fastNLO tables](../tutorials/APPLgrids.md)). The various data formats used in APFELcomb are described in [Experimental data files](../data/exp-data-files.html#exp-data-files). @@ -23,22 +27,22 @@ The generation of each subgrid can by achieved with the following command ``` ./apfel_comb ``` -where `` specifies whether the subgrid is in the APP, DIS or +where `` specifies whether the subgrid is in the APP, DIS or DYP subgrid categories in the database (`db/apfelcomb.dat`), where: - APP: refers to applgrids, partonic cross sections produced externally by a MonteCarlo generator. - DIS: Deep Inelastic Scatting, coefficient fucnctions computed by `APFEL`. - DYP: Drell-Yan, partonic cross sections computed by `APFEL`. -`` is the corresponding ID in that database (visible in the `disp\_grids` script) -and `` specifies the desired NNPDF theory index (the entry in +`` is the corresponding ID in that database (visible in the `disp\_grids` script) +and `` specifies the desired NNPDF theory index (the entry in nnpdf/nnpdfcpp/data/theory.db). As an example: ```Shell -./apfel_comb app 500 53 +./apfel_comb app 500 53 ``` -will generate the subgrid for CDFZRAP and theory 53 -(NNPDF3.1 NNLO fitted charm). +will generate the subgrid for CDFZRAP and theory 53 +(NNPDF3.1 NNLO fitted charm). The resulting FK subgrid -will be written out to +will be written out to ``` $RESULTS_PATH/theory_/subgrids/FK__.dat. @@ -47,21 +51,21 @@ $RESULTS_PATH/theory_/subgrids/FK__.dat. APPLgrids and FastNLO tables should be properly stored in the `applgrids` folder by means of [Git LFS](https://git-lfs.github.com/) (see [here](storage) for details). -Once all the relevant subgrids for the desired dataset(s) are generated, +Once all the relevant subgrids for the desired dataset(s) are generated, one should run ``` ./merge_allgrids.py ``` -which will loop over all datasets and attempt to merge their subgrids into a +which will loop over all datasets and attempt to merge their subgrids into a complete `FK` table. The resulting final `FK` table should be stored at ``` $RESULTS_PATH/theory_/fastkernel/FK_.dat. ``` ## Implement a new FK table -Whenever a new dataset is implemented, it should be accompanied by the +Whenever a new dataset is implemented, it should be accompanied by the corresponding `FK` table. To implement a new `FK` table, one must first add -a corresponding entry into the apfelcomb database (by editing the +a corresponding entry into the apfelcomb database (by editing the `./db/apfelcomb.dat` file) under the `grids` table. These entries are comprised of the following fields. - **id** - The primary key identifier of the FK table. @@ -72,25 +76,25 @@ These entries are comprised of the following fields. - **positivity** - A flag specifying if the FK table is a positivity set. - **source** - Specifies if the corresponding subgrids are [APP/DIS/DYP]. -Note that **setname** and **name** may be different in the case of compound -observables such as ratios, where multiple FK tables are required to compute -predictions for a single dataset. The `nx` parameter specifies the -interpolation accuracy of the dataset (this must currently be tuned by hand, +Note that **setname** and **name** may be different in the case of compound +observables such as ratios, where multiple FK tables are required to compute +predictions for a single dataset. The `nx` parameter specifies the +interpolation accuracy of the dataset (this must currently be tuned by hand, e.g. by making sure that the native applgrid and the generated FK tables lead -to numerically equivalent results once they are convolved with the same PDF -set). The `positivity` parameter restricts the observable to NLO matrix +to numerically equivalent results once they are convolved with the same PDF +set). The `positivity` parameter restricts the observable to NLO matrix elements and disables target-mass corrections. -Once this entry is complete, one must move on to adding entries in the +Once this entry is complete, one must move on to adding entries in the corresponding subgrid table. -### Implementing a new APPLgrid/FastNLO subgrid +### Implementing a new APPLgrid/FastNLO subgrid -To add a new APPLgrid- or FastNLO--based subgrid, one must add a corresponding entry into -the `app\_subgrids` table of the apfelcomb database. One entry should be added +To add a new APPLgrid- or FastNLO--based subgrid, one must add a corresponding entry into +the `app\_subgrids` table of the apfelcomb database. One entry should be added for each APPLgrid making up the final target `FK` table. The entries have the following fields: -- **id** - The primary key identifier of the subgrid. -- **fktarget** - The name of the FK table this subgrid belongs to. +- **id** - The primary key identifier of the subgrid. +- **fktarget** - The name of the FK table this subgrid belongs to. - **applgrid** - The filename of the corresponding APPLgrid. - **fnlobin** - The fastNLO index if the table is a fastNLO grid, or -1 if not. - **ptmin** - The minimum perturbative order (1 when the LO is zero, 0 if not). @@ -98,8 +102,8 @@ The entries have the following fields: - **ppbar** - A boolean flag, 1 if the APPLgrid should be transformed to *ppbar* beams, 0 if not. - **mask** - A boolean mask, specifying which APPLgrid entries should be considered data points. - **operators** - A list of operators to handle certain special cases (see below). -The mask should have as many entries as APPLgrid bins and each boolean value -should be separated by a space. For example, for an applgrid with five bins +The mask should have as many entries as APPLgrid bins and each boolean value +should be separated by a space. For example, for an applgrid with five bins where we want to exclude the penultimate bin, the mask would be: ``` 1 1 1 0 1 @@ -113,15 +117,15 @@ The applgrid filename assumes that the grid can be found at ``` $APPL_PATH// ``` -where `APPL_PATH` is defined in Makefile.am, `` is the corresponding +where `APPL_PATH` is defined in Makefile.am, `` is the corresponding `COMMONDATA` set name specified in the grids table (that should match the name -used in the [buildmaster](../tutorials/buildmaster.md) implementation), and `` +used in the [buildmaster](../tutorials/buildmaster.md) implementation), and `` is specified in the field described above. -### Implementing a new DIS or DYP subgrid -New DIS or DYP subgrids should be entered respectively into the +### Implementing a new DIS or DYP subgrid +New DIS or DYP subgrids should be entered respectively into the `dis_subgrids` or `dyp_subgrids` tables of the apfelcomb database. -Typically only one subgrid is needed per DIS or DYP FK table. +Typically only one subgrid is needed per DIS or DYP FK table. Each subgrid entry has the following fields: - **id** - The primary key identifier of the subgrid - **fktarget** - The name of the FK table this subgrid belongs to @@ -209,83 +213,83 @@ The list of processes below can be found in `apfel/src/DIS/FKObservables.f` in t ### Subgrid operators -Subgrid operators are used to provide certain subgrid-wide transformations that -can be useful in certain circumstances. They are formed by a key-value pair +Subgrid operators are used to provide certain subgrid-wide transformations that +can be useful in certain circumstances. They are formed by a key-value pair with syntax: ``` : ``` -If using multiple operators, they should be comma-separated. Currently these +If using multiple operators, they should be comma-separated. Currently these operators are implemented: - \*:*V* - Duplicate the subgrid data point (there must be only one for this operator) *V* times. - +:*V* - Increment the starting data point index of this subgrid by *V*. - N:*V* - Normalise all data points in this subgrid by *V*. -The \* operator is typically used for normalised cross-sections, where the -total cross-section computation (a single data point) must be duplicated -*N\_dat* times to correspond to the size of the `COMMONDATA` file. -The + operator is typically used to compensate for missing subgrids, -for example when a `COMMONDATA` file begins with several data points that -cannot yet be computed from theory, the + operator can be used to skip those +The \* operator is typically used for normalised cross-sections, where the +total cross-section computation (a single data point) must be duplicated +*N\_dat* times to correspond to the size of the `COMMONDATA` file. +The + operator is typically used to compensate for missing subgrids, +for example when a `COMMONDATA` file begins with several data points that +cannot yet be computed from theory, the + operator can be used to skip those points. The N operator is used to perform unit conversions or the like. ### Compound files and C-factors -If the new dataset is a compound observable (that is, theory predictions are a -function of more than one FK-product), then one should write a corresponding -`COMPOUND` file as described in [Theory data files](../data/th-data-files.html#compound-file-format). This compound file should be stored +If the new dataset is a compound observable (that is, theory predictions are a +function of more than one FK-product), then one should write a corresponding +`COMPOUND` file as described in [Theory data files](../data/th-data-files.html#compound-file-format). This compound file should be stored in the APFELcomb repository under the `compound` directory. -C-factors should be in the format specified in [Theory data files](../data/th-data-files.html#cfactor-file-format) and stored in the nnpdfcpp +C-factors should be in the format specified in [Theory data files](../data/th-data-files.html#cfactor-file-format) and stored in the nnpdfcpp repository under ``` nnpdf/nnpdfcpp/data/N*LOCFAC/ -``` +``` directory. ### Important note on subgrid ordering -If the FK table consists of more than one subgrid to be merged into a single -table, then the ordering of the subgrids in their subgrid **id** is vital. +If the FK table consists of more than one subgrid to be merged into a single +table, then the ordering of the subgrids in their subgrid **id** is vital. The `merge_allgrids.py` script will merge the subgrids -in order of their **id**. So if one is constructing an FK table for a merged -W+/W-/Z dataset, it is crucial that the ordering of the corresponding W+/W-/Z +in order of their **id**. So if one is constructing an FK table for a merged +W+/W-/Z dataset, it is crucial that the ordering of the corresponding W+/W-/Z subgrids in id matches the ordering in `COMMONDATA`. ### Important note on committing changes -If one makes a modification to the `apfelcomb.db` database, once he is happy -with it one *must* export it to the plain-text dump file at `db/apfelcomb.dat`. +If one makes a modification to the `apfelcomb.db` database, once he is happy +with it one *must* export it to the plain-text dump file at `db/apfelcomb.dat`. This file must then be committed. It is important to note that the binary sqlite database is not stored in the repository. -A helper script is provided to do this. If you want to convert your binary -database to the text dump, run `db/generate_dump.sh` and then commit the +A helper script is provided to do this. If you want to convert your binary +database to the text dump, run `db/generate_dump.sh` and then commit the resulting `apfelcomb.dat` file. Also, note that, if one conversely modifies the `apfelcomb.dat` file, one has to delete and re-generate the sqlite database `apfelcomb.db` This is easily -done by running `db/generate_database.sh`. +done by running `db/generate_database.sh`. ## Helper scripts -Several helper scripts are provided to make using APFELcomb easier +Several helper scripts are provided to make using APFELcomb easier (particularly when generating a full set of FK tables for a particular theory). -- `scripts/disp_grids.py` displays a full list of APPLgrid/FastNLO, DIS or DYP subgrids +- `scripts/disp_grids.py` displays a full list of APPLgrid/FastNLO, DIS or DYP subgrids implemented in APFELcomb. -- `run_allgrids.py [theoryID] [job script]` scans the results directory and +- `run_allgrids.py [theoryID] [job script]` scans the results directory and submits jobs for all missing subgrids for the specified theory. - `test_submit.py` is an example [job script] to be used for `run\_allgrids.py`. These scripts specify how jobs are launched on a given cluster. - `hydra_submit.py` is the [job script] for the HYDRA cluster in Oxford. -- `merge_allgrids.py [theoryID]` merges all subgrids in the results directory +- `merge_allgrids.py [theoryID]` merges all subgrids in the results directory for a specified theory into final FK tables. This does not delete subgrids. -- `finalise.sh [theoryID]` runs C-factor scaling, copies `COMPOUND` files, -deletes the subgrids, and finally compresses the result into a theory.tgz file +- `finalise.sh [theoryID]` runs C-factor scaling, copies `COMPOUND` files, +deletes the subgrids, and finally compresses the result into a theory.tgz file ready for upload. -- `results/upload_theories` automatically upload to the server all the +- `results/upload_theories` automatically upload to the server all the theory.tgz files that have been generated. ## Generating a complete theory -The general workflow for generating a complete version of a given theory (on +The general workflow for generating a complete version of a given theory (on a cluster) cluster is then: ``` ./run_allgrids.py ./hydra_submit.sh # Submit all APFELcomb subgrid-jobs