-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add template for twcrps * add crps for logistic distribution * add numba functionality to compute outcome-weighted crps * tidy documentation of owcrps_ensemble * add numba functionality to compute threshold-weighted crps * add equations to documentation of weighted crps * add axis argument to owcrps_ensemble and twcrps_ensemble numba functions * add vertically re-scaled crps numba functionality * add vrcrps_ensemble to scoringrules init * add weighted crps for api backends * add weighted energy scores numba functionality * add weighted variogram scores numba functionality * add api functionality for threshold-weighted energy and variogram scores * add markdown files for weighted scoring rule documentation * change gufuncs to avoid numba warnings in weighted scores * change indicator function latex code in weighted crps docstrings * add tests for weighted crps * add tests for vertically re-scaled crps * add tests for weighted energy score * add tests for weighted variogram scores * add documentation for energy and variogram scores, and weighted versions * add api functionality to compute outcome-weighted and vertically re-scaled energy score * add api functionality to compute outcome-weighted and vertically re-scaled variogram scores * fix bug with weight function in outcome weighted and vertically rescaled crps * change order of dimension inputs in variogram score to match energy score * fix bugs in weighted multivariate scores with numpy backend
- Loading branch information
Showing
25 changed files
with
2,002 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
::: scoringrules.owcrps_ensemble | ||
::: scoringrules.twcrps_ensemble | ||
::: scoringrules.vrcrps_ensemble |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: scoringrules.energy_score |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Energy Score | ||
|
||
The energy score (ES) is a scoring rule for evaluating multivariate probabilistic forecasts. | ||
It is defined as | ||
|
||
$$\text{ES}(F, \mathbf{y})= \mathbb{E} \| \mathbf{X} - \mathbf{y} \| - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \|, $$ | ||
|
||
where $\mathbf{y} \in \mathbb{R}^{d}$ is the multivariate observation ($d > 1$), and | ||
$\mathbf{X}$ and $\mathbf{X}^{\prime}$ are independent random variables that follow the | ||
multivariate forecast distribution $F$ (Gneiting and Raftery, 2007)[@gneiting_strictly_2007]. | ||
If the dimension $d$ were equal to one, the energy score would reduce to the continuous ranked probability score (CRPS). | ||
|
||
<br/><br/> | ||
|
||
## Ensemble forecasts | ||
|
||
While multivariate probabilistic forecasts could belong to a parametric family of | ||
distributions, such as a multivariate normal distribution, it is more common in practice | ||
that these forecasts are ensemble forecasts; that is, the forecast is comprised of a | ||
predictive sample $\mathbf{x}_{1}, \dots, \mathbf{x}_{M}$, | ||
where each ensemble member $\mathbf{x}_{1}, \dots, \mathbf{x}_{M} \in \R^{d}$. | ||
|
||
In this case, the expectations in the definition of the energy score can be replaced by | ||
sample means over the ensemble members, yielding the following representation of the energy | ||
score when evaluating an ensemble forecast $F_{ens}$ with $M$ members: | ||
|
||
$$\text{ES}(F_{ens}, \mathbf{y})= \frac{1}{M} \sum_{m=1}^{M} \| \mathbf{x}_{m} - \mathbf{y} \| - \frac{1}{2 M^{2}} \sum_{m=1}^{M} \sum_{j=1}^{M} \| \mathbf{x}_{m} - \mathbf{x}_{j} \|. $$ | ||
|
||
<br/><br/> | ||
|
||
## Weighted energy scores | ||
|
||
The energy score provides a measure of overall forecast performance. However, it is often | ||
the case that certain outcomes are of more interest than others, making it desirable to | ||
assign more weight to these outcomes when evaluating forecast performance. This can be | ||
achieved using weighted scoring rules. Weighted scoring rules typically introduce a | ||
weight function into conventional scoring rules, and users can choose the weight function | ||
depending on what outcomes they want to emphasise. Allen et al. (2022)[@allen2022evaluating] | ||
discuss three weighted versions of the energy score. These are all available in `scoringrules`. | ||
|
||
Firstly, the outcome-weighted energy score (originally introduced by Holzmann and Klar (2014)[@holzmann2017focusing]) | ||
is defined as | ||
|
||
$$\text{owES}(F, \mathbf{y}; w)= \frac{1}{\bar{w}} \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) - \frac{1}{2 \bar{w}^{2}} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime})w(\mathbf{y}), $$ | ||
|
||
where $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to | ||
target particular multivariate outcomes, and $\bar{w} = \mathbb{E}[w(X)]$. | ||
As before, $\mathbf{X}, \mathbf{X}^{\prime} \sim F$ are independent. | ||
|
||
Secondly, Allen et al. (2022) introduced the threshold-weighted energy score as | ||
|
||
$$\text{twES}(F, \mathbf{y}; v)= \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{y}) \| - \frac{1}{2} \mathbb{E} \| v(\mathbf{X}) - v(\mathbf{X}^{\prime}) \|, $$ | ||
|
||
where $v : \mathbb{R}^{d} \to \mathbb{R}^{d}$ is a so-called chaining function. | ||
The threshold-weighted energy score transforms the forecasts and observations according | ||
to the chaining function $v$, prior to calculating the unweighted energy score. Choosing | ||
a chaining function is generally more difficult than choosing a weight function when | ||
emphasising particular outcomes. | ||
|
||
As an alternative, the vertically re-scaled energy score is defined as | ||
|
||
$$\text{vrES}(F, \mathbf{y}; w, \mathbf{x}_{0})= \mathbb{E} \| \mathbf{X} - \mathbf{y} \| w(\mathbf{X}) w(\mathbf{y}) - \frac{1}{2} \mathbb{E} \| \mathbf{X} - \mathbf{X}^{\prime} \| w(\mathbf{X})w(\mathbf{X}^{\prime}) + \left( \mathbb{E} \| \mathbf{X} - \mathbf{x}_{0} \| w(\mathbf{X}) - \| \mathbf{y} - \mathbf{x}_{0} \| w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), $$ | ||
|
||
where $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to | ||
target particular multivariate outcomes, and $\mathbf{x}_{0} \in \mathbb{R}^{d}$. Typically, | ||
$\mathbf{x}_{0}$ is chosen to be zero. | ||
|
||
Each of these weighted energy scores targets particular outcomes in a different way. | ||
Further details regarding the differences between these scoring rules, as well as choices | ||
for the weight and chaining functions, can be found in Allen et al. (2022). The weighted | ||
energy scores can easily be computed for ensemble forecasts by | ||
replacing the expectations with sample means over the ensemble members. | ||
|
||
<br/><br/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
::: scoringrules.owenergy_score | ||
::: scoringrules.twenergy_score | ||
::: scoringrules.vrenergy_score |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: scoringrules.variogram_score |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# Variogram Score | ||
|
||
The varigoram score (VS) is a scoring rule for evaluating multivariate probabilistic forecasts. | ||
It is defined as | ||
|
||
$$\text{VS}_{p}(F, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | X_{i} - X_{j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}, $$ | ||
|
||
where $p > 0$, $\mathbf{y} = (y_{1}, \dots, y_{d}) \in \mathbb{R}^{d}$ is the multivariate observation ($d > 1$), and | ||
$\mathbf{X} = (X_{1}, \dots, X_{d})$ is a random vector that follows the | ||
multivariate forecast distribution $F$ (Scheuerer and Hamill, 2015)[@scheuerer_variogram-based_2015]. | ||
The exponent $p$ is typically chosen to be 0.5 or 1. | ||
|
||
The variogram score is less sensitive to marginal forecast performance than the energy score, | ||
and Scheuerer and Hamill (2015) argue that it should therefore be more sensitive to errors in the | ||
forecast's dependence structure. | ||
|
||
<br/><br/> | ||
|
||
## Ensemble forecasts | ||
|
||
While multivariate probabilistic forecasts could belong to a parametric family of | ||
distributions, such as a multivariate normal distribution, it is more common in practice | ||
that these forecasts are ensemble forecasts; that is, the forecast is comprised of a | ||
predictive sample $\mathbf{x}_{1}, \dots, \mathbf{x}_{M}$, | ||
where each ensemble member $\mathbf{x}_{i} = (x_{i, 1}, \dots, x_{i, d}) \in \R^{d}$ for | ||
$i = 1, \dots, M$. | ||
|
||
In this case, the expectation in the definition of the variogram score can be replaced by | ||
a sample mean over the ensemble members, yielding the following representation of the variogram | ||
score when evaluating an ensemble forecast $F_{ens}$ with $M$ members: | ||
|
||
$$\text{VS}_{p}(F_{ens}, \mathbf{y})= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \frac{1}{M} \sum_{m=1}^{M} | x_{m,i} - x_{m,j} |^{p} - | y_{i} - y_{j} |^{p} \right)^{2}. $$ | ||
|
||
<br/><br/> | ||
|
||
## Weighted variogram scores | ||
|
||
It is often the case that certain outcomes are of more interest than others when evaluating | ||
forecast performance. These outcomes can be emphasised by employing weighted scoring rules. | ||
Weighted scoring rules typically introduce a weight function into conventional scoring rules, | ||
and users can choose the weight function depending on what outcomes they want to emphasise. | ||
Allen et al. (2022)[@allen2022evaluating] introduced three weighted versions of the variogram score. | ||
These are all available in `scoringrules`. | ||
|
||
Firstly, the outcome-weighted variogram score (see also Holzmann and Klar (2014)[@holzmann2017focusing]) | ||
is defined as | ||
|
||
$$\text{owVS}_{p}(F, \mathbf{y}; w) = \frac{1}{\bar{w}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2 \bar{w}^{2}} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) w(\mathbf{y}) ], $$ | ||
|
||
where | ||
|
||
$$ \rho_{p}(\mathbf{x}, \mathbf{z}) = \sum_{i=1}^{d} \sum_{j=1}^{d} \left( |x_{i} - x_{j}|^{p} - |z_{i} - z_{j}|^{p} \right)^{2}, $$ | ||
|
||
for $\mathbf{x} = (x_{1}, \dots, x_{d}) \in \mathbb{R}^{d}$ and $\mathbf{z} = (z_{1}, \dots, z_{d}) \in \mathbb{R}^{d}$. | ||
|
||
Here, $w : \mathbb{R}^{d} \to [0, \infty)$ is the non-negative weight function used to | ||
target particular multivariate outcomes, and $\bar{w} = \mathbb{E}[w(X)]$. | ||
As before, $\mathbf{X}, \mathbf{X}^{\prime} \sim F$ are independent. | ||
|
||
Secondly, Allen et al. (2022) introduced the threshold-weighted variogram score as | ||
|
||
$$\text{twVS}_{p}(F, \mathbf{y}; v)= \sum_{i=1}^{d} \sum_{j=1}^{d} \left( \mathbb{E} | v(\mathbf{X})_{i} - v(\mathbf{X})_{j} |^{p} - | v(\mathbf{y})_{i} - v(\mathbf{y})_{j} |^{p} \right)^{2}, $$ | ||
|
||
where $v : \mathbb{R}^{d} \to \mathbb{R}^{d}$ is a so-called chaining function, so that | ||
$v(\mathbf{X}) = (v(\mathbf{X})_{1}, \dots, v(\mathbf{X})_{d}) \in \mathbb{R}^{d}$. | ||
The threshold-weighted variogram score transforms the forecasts and observations according | ||
to the chaining function $v$, prior to calculating the unweighted variogram score. Choosing | ||
a chaining function is generally more difficult than choosing a weight function when | ||
emphasising particular outcomes. | ||
|
||
As an alternative, the vertically re-scaled variogram score is defined as | ||
|
||
$$\text{vrVS}_{p}(F, \mathbf{y}; w) = \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{y}) w(\mathbf{X}) w(\mathbf{y}) ] - \frac{1}{2} \mathbb{E} [ \rho_{p}(\mathbf{X}, \mathbf{X}^{\prime}) w(\mathbf{X}) w(\mathbf{X}^{\prime}) ] + \left( \mathbb{E} [ \rho_{p} ( \mathbf{X}, \mathbf{x}_{0} ) w(\mathbf{X}) ] - \rho_{p} ( \mathbf{y}, \mathbf{x}_{0}) w(\mathbf{y}) \right) \left(\mathbb{E}[w(\mathbf{X})] - w(\mathbf{y}) \right), $$ | ||
|
||
where $w$ and $\rho_{p}$ are as defined above, and $\mathbf{x}_{0} \in \mathbb{R}^{d}$. | ||
Typically, $\mathbf{x}_{0}$ is chosen to be the zero vector. | ||
|
||
Each of these weighted variogram scores targets particular outcomes in a different way. | ||
Further details regarding the differences between these scoring rules, as well as choices | ||
for the weight and chaining functions, can be found in Allen et al. (2022). The weighted | ||
variogram scores can easily be computed for ensemble forecasts by | ||
replacing the expectations with sample means over the ensemble members. | ||
|
||
<br/><br/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
::: scoringrules.owvariogram_score | ||
::: scoringrules.twvariogram_score | ||
::: scoringrules.vrvariogram_score |
Oops, something went wrong.