Skip to content

Commit

Permalink
Merge pull request #1069 from NNPDF/final_feature_scaling
Browse files Browse the repository at this point in the history
Final feature scaling
  • Loading branch information
RoyStegeman authored Mar 11, 2021
2 parents 3abbc6a + 9632a80 commit 7739359
Show file tree
Hide file tree
Showing 10 changed files with 406 additions and 109 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 19 additions & 0 deletions doc/sphinx/source/n3fit/methodology.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ In the following table we list some of the differences between both codes:
+--------------------+---------------------------------+--------------------------------------------------+
| Model selection | closure test | closure test, **hyper optimization** |
+--------------------+---------------------------------+--------------------------------------------------+
| Input scaling | (x,log(x)) | **feature scaling** |
+--------------------+---------------------------------+--------------------------------------------------+
```

In `nnfit` there is a single `seed` variable stored in the fit runcard which is used to initialize an instance of the `RandomGenerator` class which provides random numbers sequentially. The `nnfit` user has no independent control over the random number sequences used for the neural network initialization, the training-validation split and the MC replica generation. On the other hand, in `n3fit` we introduce 3 new seed variables in the fit runcard: `trvlseed` for the random numbers used in training-validation, `nnseed` for the neural network initialization and `mcseed` which controls the MC replica generation.
Expand Down Expand Up @@ -176,3 +178,20 @@ in the `postfit` section.
``` important:: The positivity and integrability multipliers are hyper-parameters of the fit which require specific fine tuning through hyper-optimization.
```

Feature Scaling
---------------

Up to NNPDF4.0 the input to the neural network consisted of an input node `(x)`, which in the first layer is transformed to `(x,log(x))` before being connected to the trainable layers of the network. The choice for the `(x,log(x))` split is motivated by the fact that the pdfs appear to scale linearly in the large-x region, which is roughly `[1e-2,1]`, while the scaling is logarithmical in the small-x region below `x=1e-2`. However, gradient descent based optimizers are incapable of distinguishing features across many orders of magnitude of `x`, this choice of input scaling means that the algorithm is limited to learning features on a logarithmic and linear scale.

To solve this problem there is the possibility to apply a different feature scaling to the input by adding a `interpolation_points: [number of points]` flag to the `n3fit` runcard. By adding this flag the `(x,log(x))` scaling is replaced by a scaling in such a way that all input `x` values are evenly distributed on the domain `[-1,1]`, and the input node is no longer split in two.

Of course this scaling is discrete while the pdfs must be continuous. Therefore a monotonically increasing cubic spline is used to interpolate after the scaling has been applied. To this end the [PchipInterpolator](https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PchipInterpolator.html) function from the scipy library is used. However, this way the neural network will be agnostic to the existence of this interpolation function meaning it can no longer learn the true underlying law. To fix this, the interpolation function has to be probed as well. This is done by only using `[number of points]` set by the `interpolation_points` flag to define the interpolation function after the scaling has been applied. Using this methodology the points used in the interpolation are again evenly distributed.


``` image:: figures/feature_scaling.png
```

The figure above provides a schematic representation of this feature scaling methodology:
1. The input `x` are mapped onto the `[-1,1]` domain such that they are evenly distributed.
2. `[number of points]` points are kept (dark blue), while other points are discarded (light blue).
3. A cubic spline function is used to do the interpolation between the points that have not been discarded.
113 changes: 113 additions & 0 deletions n3fit/runcards/Basic_feature_scaling.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#
# Configuration file for n3fit
#

############################################################
description: Basic feature scaling

############################################################
# frac: training fraction
# ewk: apply ewk k-factors
# sys: systematics treatment (see systypes)
dataset_inputs:
- { dataset: SLACP, frac: 0.5}
- { dataset: NMCPD, frac: 0.5 }
- { dataset: CMSJETS11, frac: 0.5, sys: 10 }

############################################################
datacuts:
t0pdfset : NNPDF31_nlo_as_0118 # PDF set to generate t0 covmat
q2min : 3.49 # Q2 minimum
w2min : 12.5 # W2 minimum
combocuts : NNPDF31 # NNPDF3.0 final kin. cuts
jetptcut_tev : 0 # jet pt cut for tevatron
jetptcut_lhc : 0 # jet pt cut for lhc
wptcut_lhc : 30.0 # Minimum pT for W pT diff distributions
jetycut_tev : 1e30 # jet rap. cut for tevatron
jetycut_lhc : 1e30 # jet rap. cut for lhc
dymasscut_min: 0 # dy inv.mass. min cut
dymasscut_max: 1e30 # dy inv.mass. max cut
jetcfactcut : 1e30 # jet cfact. cut

############################################################
theory:
theoryid: 53 # database id

############################################################
fitting:
trvlseed: 1
nnseed: 2
mcseed: 3
epochs: 900
save: 'weights.h5'
# load: '/path/to/weights.h5/file'


tensorboard:
weight_freq: 100
profiling: False

genrep : True # true = generate MC replicas, false = use real data

parameters: # This defines the parameter dictionary that is passed to the Model Trainer
nodes_per_layer: [15, 10, 8]
activation_per_layer: ['sigmoid', 'sigmoid', 'linear']
initializer: 'glorot_normal'
optimizer:
optimizer_name: 'RMSprop'
learning_rate: 0.01
clipnorm: 1.0
epochs: 900
positivity:
multiplier: 1.05 # When any of the multiplier and/or the initial is not set
initial: # the poslambda will be used instead to compute these values per dataset
threshold: 1e-5
stopping_patience: 0.30 # percentage of the number of epochs
layer_type: 'dense'
dropout: 0.0
interpolation_points: 40
threshold_chi2: 5.0

# NN23(QED) = sng=0,g=1,v=2,t3=3,ds=4,sp=5,sm=6,(pht=7)
# EVOL(QED) = sng=0,g=1,v=2,v3=3,v8=4,t3=5,t8=6,(pht=7)
# EVOLS(QED)= sng=0,g=1,v=2,v8=4,t3=4,t8=5,ds=6,(pht=7)
# FLVR(QED) = g=0, u=1, ubar=2, d=3, dbar=4, s=5, sbar=6, (pht=7)
fitbasis: NN31IC # EVOL (7), EVOLQED (8), etc.
basis:
# remeber to change the name of PDF accordingly with fitbasis
# pos: True for NN squared
# mutsize: mutation size
# mutprob: mutation probability
# smallx, largex: preprocessing ranges
- { fl: sng, pos: False, mutsize: [15], mutprob: [0.05], smallx: [1.05,1.19], largex: [1.47,2.70], trainable: False }
- { fl: g, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.94,1.25], largex: [0.11,5.87], trainable: False }
- { fl: v, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.54,0.75], largex: [1.15,2.76], trainable: False }
- { fl: v3, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.21,0.57], largex: [1.35,3.08] }
- { fl: v8, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.52,0.76], largex: [0.77,3.56], trainable: True }
- { fl: t3, pos: False, mutsize: [15], mutprob: [0.05], smallx: [-0.37,1.52], largex: [1.74,3.39] }
- { fl: t8, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.56,1.29], largex: [1.45,3.03] }
- { fl: cp, pos: False, mutsize: [15], mutprob: [0.05], smallx: [0.12,1.19], largex: [1.83,6.70] }

############################################################
positivity:
posdatasets:
- { dataset: POSF2U, poslambda: 1e6 } # Positivity Lagrange Multiplier
- { dataset: POSFLL, poslambda: 1e4 }

############################################################
integrability:
integdatasets:
- {dataset: INTEGXT3, poslambda: 1e2}

############################################################
lhagrid:
nx : 150
xmin: 1e-9
xmed: 0.1
xmax: 1.0
nq : 50
qmax: 1e5

############################################################
debug: True
maxcores: 8
81 changes: 70 additions & 11 deletions n3fit/src/n3fit/ModelTrainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"""
import logging
import numpy as np
from scipy.interpolate import PchipInterpolator
import n3fit.model_gen as model_gen
from n3fit.backends import MetaModel, clear_backend_state, operations, callbacks
from n3fit.stopping import Stopping
Expand Down Expand Up @@ -213,6 +214,7 @@ def __init__(
self.debug = debug
self.save_weights_each = save_weights_each
self.all_datasets = []
self._scaler = None

# Initialise internal variables which define behaviour
if debug:
Expand Down Expand Up @@ -394,18 +396,20 @@ def _model_generation(self, pdf_model, partition):
"""
log.info("Generating the Model")

# Compute the input array that will be given to the pdf
input_arr = np.concatenate(self.input_list, axis=1)
input_layer = operations.numpy_to_input(input_arr.T)
# Construct the input array that will be given to the pdf
input_arr = np.concatenate(self.input_list, axis=1).T
if self._scaler:
# Apply feature scaling if given
input_arr = self._scaler(input_arr)
input_layer = operations.numpy_to_input(input_arr)

# The input to the full model is expected to be the input to the PDF
# by reutilizing `pdf_model.parse_input` we ensure any auxiliary input is also accunted fro
full_model_input_dict = pdf_model._parse_input([input_layer], pass_content=False)
# The input to the full model also works as the input to the PDF model
# The PDF model is then applied as a layer (receiving the full_pdf layer)
# we also need to receive full_model_input_dict, a dictionary with the full input to the PDF
full_model_input_dict, full_pdf = pdf_model.apply_as_layer([input_layer])

# The output of the pdf on input_layer will be thus a concatenation
# of the PDF values for all experiments
full_pdf = pdf_model.apply_as_layer([input_layer])
# The input layer is a concatenation of all experiments
# The input layer was a concatenation of all experiments
# the output of the pdf on input_layer will be thus a concatenation
# we need now to split the output on a different array per experiment
sp_ar = [self.input_sizes]
sp_kw = {"axis": 1}
Expand Down Expand Up @@ -471,7 +475,13 @@ def _reset_observables(self):
# wrapper around all of these #
############################################################################
def _generate_observables(
self, all_pos_multiplier, all_pos_initial, all_integ_multiplier, all_integ_initial, epochs
self,
all_pos_multiplier,
all_pos_initial,
all_integ_multiplier,
all_integ_initial,
epochs,
interpolation_points,
):
"""
This functions fills the 3 dictionaries (training, validation, experimental)
Expand Down Expand Up @@ -567,6 +577,53 @@ def _generate_observables(
self.training["integmultipliers"].append(integ_multiplier)
self.training["integinitials"].append(integ_initial)

# Store a reference to the interpolator as self._scaler
if interpolation_points:
input_arr = np.concatenate(self.input_list, axis=1)
input_arr = np.sort(input_arr)
input_arr_size = input_arr.size

# Define an evenly spaced grid in the domain [0,1]
# force_set_smallest is used to make sure the smallest point included in the scaling is
# 1e-9, to prevent trouble when saving it to the LHAPDF grid
force_set_smallest = input_arr.min() > 1e-9
if force_set_smallest:
new_xgrid = np.linspace(start=1/input_arr_size, stop=1.0, endpoint=False, num=input_arr_size)
else:
new_xgrid = np.linspace(start=0, stop=1.0, endpoint=False, num=input_arr_size)

# When mapping the FK xgrids onto our new grid, we need to consider degeneracies among
# the x-values in the FK grids
unique, counts = np.unique(input_arr, return_counts=True)
map_to_complete = []
for cumsum_ in np.cumsum(counts):
# Make sure to include the smallest new_xgrid value, such that we have a point at
# x<=1e-9
map_to_complete.append(new_xgrid[cumsum_ - counts[0]])
map_to_complete = np.array(map_to_complete)
map_from_complete = unique

# If needed, set feature_scaling(x=1e-9)=0
if force_set_smallest:
map_from_complete = np.insert(map_from_complete, 0, 1e-9)
map_to_complete = np.insert(map_to_complete, 0, 0.0)

# Select the indices of the points that will be used by the interpolator
onein = map_from_complete.size / (int(interpolation_points) - 1)
selected_points = [round(i * onein - 1) for i in range(1, int(interpolation_points))]
if selected_points[0] != 0:
selected_points = [0] + selected_points
map_from = map_from_complete[selected_points]
map_from = np.log(map_from)
map_to = map_to_complete[selected_points]

try:
scaler = PchipInterpolator(map_from, map_to)
except ValueError:
raise ValueError("interpolation_points is larger than the number of unique \
input x-values")
self._scaler = lambda x: np.concatenate([scaler(np.log(x)), x], axis=-1)

def _generate_pdf(
self,
nodes_per_layer,
Expand Down Expand Up @@ -626,6 +683,7 @@ def _generate_pdf(
regularizer=regularizer,
regularizer_args=regularizer_args,
impose_sumrule=self.impose_sumrule,
scaler=self._scaler,
)
return pdf_model

Expand Down Expand Up @@ -795,6 +853,7 @@ def hyperparametrizable(self, params):
integrability_dict.get("multiplier"),
integrability_dict.get("initial"),
epochs,
params.get("interpolation_points"),
)
threshold_pos = positivity_dict.get("threshold", 1e-6)
threshold_chi2 = params.get("threshold_chi2", CHI2_THRESHOLD)
Expand Down
Loading

0 comments on commit 7739359

Please sign in to comment.