Merge pull request #1069 from NNPDF/final_feature_scaling

Final feature scaling
HEP-PBSP · Mar 11, 2021 · 7739359 · 7739359
2 parents 3abbc6a + 9632a80
commit 7739359
Show file tree

Hide file tree

Showing 10 changed files with 406 additions and 109 deletions.
diff --git a/doc/sphinx/source/n3fit/figures/feature_scaling.png b/doc/sphinx/source/n3fit/figures/feature_scaling.png
diff --git a/doc/sphinx/source/n3fit/methodology.md b/doc/sphinx/source/n3fit/methodology.md
@@ -56,6 +56,8 @@ In the following table we list some of the differences between both codes:
 +--------------------+---------------------------------+--------------------------------------------------+
 | Model selection    | closure test                    | closure test, **hyper optimization**             |
 +--------------------+---------------------------------+--------------------------------------------------+
+| Input scaling      | (x,log(x))                      | **feature scaling**                              |
++--------------------+---------------------------------+--------------------------------------------------+
 ```
 
 In `nnfit` there is a single `seed` variable stored in the fit runcard which is used to initialize an instance of the `RandomGenerator` class which provides random numbers sequentially. The `nnfit` user has no independent control over the random number sequences used for the neural network initialization, the training-validation split and the MC replica generation. On the other hand, in `n3fit` we introduce 3 new seed variables in the fit runcard: `trvlseed` for the random numbers used in training-validation, `nnseed` for the neural network initialization and `mcseed` which controls the MC replica generation.
@@ -176,3 +178,20 @@ in the `postfit` section.
 ``` important:: The positivity and integrability multipliers are hyper-parameters of the fit which require specific fine tuning through hyper-optimization. 
 ```
 
+Feature Scaling
+---------------
+
+Up to NNPDF4.0 the input to the neural network consisted of an input node `(x)`, which in the first layer is transformed to `(x,log(x))` before being connected to the trainable layers of the network. The choice  for the `(x,log(x))` split is motivated by the fact that the pdfs appear to scale linearly in the large-x region, which is roughly `[1e-2,1]`, while the scaling is logarithmical in the small-x region below `x=1e-2`. However, gradient descent based optimizers are incapable of distinguishing features across many orders of magnitude of `x`, this choice of input scaling means that the algorithm is limited to learning features on a logarithmic and linear scale. 
+
+To solve this problem there is the possibility to apply a different feature scaling to the input by adding a `interpolation_points: [number of points]` flag to the `n3fit` runcard. By adding this flag the `(x,log(x))` scaling is replaced by a scaling in such a way that all input `x` values are evenly distributed on the domain `[-1,1]`, and the input node is no longer split in two. 
+
+Of course this scaling is discrete while the pdfs must be continuous. Therefore a monotonically increasing cubic spline is used to interpolate after the scaling has been applied. To this end the [PchipInterpolator](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PchipInterpolator.html) function from the scipy library is used. However, this way the neural network will be agnostic to the existence of this interpolation function meaning it can no longer learn the true underlying law. To fix this, the interpolation function has to be probed as well. This is done by only using `[number of points]` set by the `interpolation_points` flag to define the interpolation function after the scaling has been applied. Using this methodology the points used in the interpolation are again evenly distributed. 
+
+
+``` image:: figures/feature_scaling.png
+```
+
+The figure above provides a schematic representation of this feature scaling methodology:  
+1. The input `x` are mapped onto the `[-1,1]` domain such that they are evenly distributed.
+2. `[number of points]` points are kept (dark blue), while other points are discarded (light blue).
+3. A cubic spline function is used to do the interpolation between the points that have not been discarded.
diff --git a/n3fit/runcards/Basic_feature_scaling.yml b/n3fit/runcards/Basic_feature_scaling.yml
@@ -0,0 +1,113 @@
+#
+# Configuration file for n3fit
+#
+
+############################################################
+description: Basic feature scaling 
+
+############################################################
+# frac: training fraction
+# ewk: apply ewk k-factors
+# sys: systematics treatment (see systypes)
+dataset_inputs:
+- { dataset: SLACP, frac: 0.5}
+- { dataset: NMCPD, frac: 0.5 }
+- { dataset: CMSJETS11, frac: 0.5, sys: 10 }
+
+############################################################
+datacuts:
+  t0pdfset     : NNPDF31_nlo_as_0118 # PDF set to generate t0 covmat
+  q2min        : 3.49                # Q2 minimum
+  w2min        : 12.5                # W2 minimum
+  combocuts    : NNPDF31             # NNPDF3.0 final kin. cuts
+  jetptcut_tev : 0                   # jet pt cut for tevatron
+  jetptcut_lhc : 0                   # jet pt cut for lhc
+  wptcut_lhc   : 30.0                # Minimum pT for W pT diff distributions
+  jetycut_tev  : 1e30                # jet rap. cut for tevatron
+  jetycut_lhc  : 1e30                # jet rap. cut for lhc
+  dymasscut_min: 0                   # dy inv.mass. min cut
+  dymasscut_max: 1e30                # dy inv.mass. max cut
+  jetcfactcut  : 1e30                # jet cfact. cut
+
+############################################################
+theory:
+  theoryid: 53        # database id
+
+############################################################
+fitting:
+  trvlseed: 1
+  nnseed: 2
+  mcseed: 3
+  epochs: 900
+  save: 'weights.h5'
+#   load: '/path/to/weights.h5/file'
+
+
+  tensorboard:
+    weight_freq: 100
+    profiling: False
+
+  genrep   : True     # true = generate MC replicas, false = use real data
+
+  parameters: # This defines the parameter dictionary that is passed to the Model Trainer
+    nodes_per_layer: [15, 10, 8]
+    activation_per_layer: ['sigmoid', 'sigmoid', 'linear']
+    initializer: 'glorot_normal'
+    optimizer:
+      optimizer_name: 'RMSprop'
+      learning_rate: 0.01
+      clipnorm: 1.0
+    epochs: 900
+    positivity:
+      multiplier: 1.05 # When any of the multiplier and/or the initial is not set
+      initial: # the poslambda will be used instead to compute these values per dataset
+      threshold: 1e-5
+    stopping_patience: 0.30 # percentage of the number of epochs
+    layer_type: 'dense'
+    dropout: 0.0
+    interpolation_points: 40
+    threshold_chi2: 5.0
+
+  # NN23(QED) = sng=0,g=1,v=2,t3=3,ds=4,sp=5,sm=6,(pht=7)
+  # EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
+  # EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
+  # FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
+  fitbasis: NN31IC # EVOL (7), EVOLQED (8), etc.
+  basis:
+      # remeber to change the name of PDF accordingly with fitbasis
+      # pos: True for NN squared
+      # mutsize: mutation size
+      # mutprob: mutation probability
+      # smallx, largex: preprocessing ranges
+      - { fl: sng, pos: False, mutsize: [15], mutprob: [0.05], smallx: [1.05,1.19], largex: [1.47,2.70], trainable: False }
+      - { fl: g,   pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.94,1.25], largex: [0.11,5.87], trainable: False }
+      - { fl: v,   pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.54,0.75], largex: [1.15,2.76], trainable: False }
+      - { fl: v3,  pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.21,0.57], largex: [1.35,3.08] }
+      - { fl: v8,  pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.52,0.76], largex: [0.77,3.56], trainable: True }
+      - { fl: t3,  pos: False, mutsize: [15], mutprob: [0.05], smallx: [-0.37,1.52], largex: [1.74,3.39] }
+      - { fl: t8,  pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.56,1.29], largex: [1.45,3.03] }
+      - { fl: cp,  pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.12,1.19], largex: [1.83,6.70] }
+
+############################################################
+positivity:
+  posdatasets:
+    - { dataset: POSF2U,   poslambda: 1e6 }  # Positivity Lagrange Multiplier
+    - { dataset: POSFLL,   poslambda: 1e4 }
+
+############################################################
+integrability:
+  integdatasets:
+    - {dataset: INTEGXT3,   poslambda: 1e2}
+
+############################################################
+lhagrid:
+  nx  : 150
+  xmin: 1e-9
+  xmed: 0.1
+  xmax: 1.0
+  nq  : 50
+  qmax: 1e5
+
+############################################################
+debug: True
+maxcores: 8
diff --git a/n3fit/src/n3fit/ModelTrainer.py b/n3fit/src/n3fit/ModelTrainer.py
@@ -10,6 +10,7 @@
 """
 import logging
 import numpy as np
+from scipy.interpolate import PchipInterpolator
 import n3fit.model_gen as model_gen
 from n3fit.backends import MetaModel, clear_backend_state, operations, callbacks
 from n3fit.stopping import Stopping
@@ -213,6 +214,7 @@ def __init__(
         self.debug = debug
         self.save_weights_each = save_weights_each
         self.all_datasets = []
+        self._scaler = None
 
         # Initialise internal variables which define behaviour
         if debug:
@@ -394,18 +396,20 @@ def _model_generation(self, pdf_model, partition):
         """
         log.info("Generating the Model")
 
-        # Compute the input array that will be given to the pdf
-        input_arr = np.concatenate(self.input_list, axis=1)
-        input_layer = operations.numpy_to_input(input_arr.T)
+        # Construct the input array that will be given to the pdf
+        input_arr = np.concatenate(self.input_list, axis=1).T
+        if self._scaler:
+            # Apply feature scaling if given
+            input_arr = self._scaler(input_arr)
+        input_layer = operations.numpy_to_input(input_arr)
 
-        # The input to the full model is expected to be the input to the PDF
-        # by reutilizing `pdf_model.parse_input` we ensure any auxiliary input is also accunted fro
-        full_model_input_dict = pdf_model._parse_input([input_layer], pass_content=False)
+        # The input to the full model also works as the input to the PDF model
+        # The PDF model is then applied as a layer (receiving the full_pdf layer)
+        # we also need to receive full_model_input_dict, a dictionary with the full input to the PDF
+        full_model_input_dict, full_pdf = pdf_model.apply_as_layer([input_layer])
 
-        # The output of the pdf on input_layer will be thus a concatenation
-        # of the PDF values for all experiments
-        full_pdf = pdf_model.apply_as_layer([input_layer])
-        # The input layer is a concatenation of all experiments
+        # The input layer was a concatenation of all experiments
+        # the output of the pdf on input_layer will be thus a concatenation
         # we need now to split the output on a different array per experiment
         sp_ar = [self.input_sizes]
         sp_kw = {"axis": 1}
@@ -471,7 +475,13 @@ def _reset_observables(self):
     # wrapper around all of these                                              #
     ############################################################################
     def _generate_observables(
-        self, all_pos_multiplier, all_pos_initial, all_integ_multiplier, all_integ_initial, epochs
+        self,
+        all_pos_multiplier,
+        all_pos_initial,
+        all_integ_multiplier,
+        all_integ_initial,
+        epochs,
+        interpolation_points,
     ):
         """
         This functions fills the 3 dictionaries (training, validation, experimental)
@@ -567,6 +577,53 @@ def _generate_observables(
                 self.training["integmultipliers"].append(integ_multiplier)
                 self.training["integinitials"].append(integ_initial)
 
+        # Store a reference to the interpolator as self._scaler
+        if interpolation_points:
+            input_arr = np.concatenate(self.input_list, axis=1)
+            input_arr = np.sort(input_arr)
+            input_arr_size = input_arr.size
+
+            # Define an evenly spaced grid in the domain [0,1]
+            # force_set_smallest is used to make sure the smallest point included in the scaling is
+            # 1e-9, to prevent trouble when saving it to the LHAPDF grid
+            force_set_smallest = input_arr.min() > 1e-9
+            if force_set_smallest:
+                new_xgrid = np.linspace(start=1/input_arr_size, stop=1.0, endpoint=False, num=input_arr_size)
+            else:
+                new_xgrid = np.linspace(start=0, stop=1.0, endpoint=False, num=input_arr_size)
+
+            # When mapping the FK xgrids onto our new grid, we need to consider degeneracies among
+            # the x-values in the FK grids 
+            unique, counts = np.unique(input_arr, return_counts=True)
+            map_to_complete = []
+            for cumsum_ in np.cumsum(counts):
+                # Make sure to include the smallest new_xgrid value, such that we have a point at
+                # x<=1e-9
+                map_to_complete.append(new_xgrid[cumsum_ - counts[0]])
+            map_to_complete = np.array(map_to_complete)
+            map_from_complete = unique
+
+            #  If needed, set feature_scaling(x=1e-9)=0
+            if force_set_smallest:
+                map_from_complete = np.insert(map_from_complete, 0, 1e-9)
+                map_to_complete = np.insert(map_to_complete, 0, 0.0)
+
+            # Select the indices of the points that will be used by the interpolator
+            onein = map_from_complete.size / (int(interpolation_points) - 1)
+            selected_points = [round(i * onein - 1) for i in range(1, int(interpolation_points))]
+            if selected_points[0] != 0:
+                selected_points = [0] + selected_points
+            map_from = map_from_complete[selected_points]
+            map_from = np.log(map_from)
+            map_to = map_to_complete[selected_points]
+
+            try:
+                scaler = PchipInterpolator(map_from, map_to)
+            except ValueError:
+                raise ValueError("interpolation_points is larger than the number of unique \
+                                    input x-values")
+            self._scaler = lambda x: np.concatenate([scaler(np.log(x)), x], axis=-1)
+
     def _generate_pdf(
         self,
         nodes_per_layer,
@@ -626,6 +683,7 @@ def _generate_pdf(
             regularizer=regularizer,
             regularizer_args=regularizer_args,
             impose_sumrule=self.impose_sumrule,
+            scaler=self._scaler,
         )
         return pdf_model
 
@@ -795,6 +853,7 @@ def hyperparametrizable(self, params):
             integrability_dict.get("multiplier"),
             integrability_dict.get("initial"),
             epochs,
+            params.get("interpolation_points"),
         )
         threshold_pos = positivity_dict.get("threshold", 1e-6)
         threshold_chi2 = params.get("threshold_chi2", CHI2_THRESHOLD)