Merge pull request #235 from stan-dev/new-pareto-k-threshold

stan-dev · Feb 15, 2024 · ce0f540 · ce0f540
2 parents b855083 + 3c87b7f
commit ce0f540
Show file tree

Hide file tree

Showing 67 changed files with 1,103 additions and 630 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,13 +1,43 @@
 # loo 2.6.0.9000
 
-### New features
+### Major changes
+
+* Use of new sample size specific diagnostic threshold for Pareto `k`. The pre-2022 version
+of the [PSIS paper](https://arxiv.org/abs/1507.02646) recommended diagnostic
+thresholds of 
+`k < 0.5 "good"`, `0.5 <= k < 0.7 "ok"`, 
+`0.7 <= k < 1 "bad"`, `k>=1 "very bad"`. 
+The 2022 revision of the PSIS paper now recommends 
+`k < min(1 - 1/log10(S), 0.7) "good"`, `min(1 - 1/log10(S), 0.7) <= k < 1 "bad"`, 
+`k > 1 "very bad"`, where `S` is the sample size. 
+There is now one fewer diagnostic threshold (`"ok"` has been removed), and the
+most important threshold now depends on the sample size `S`. With sample sizes
+`100`, `320`, `1000`, `2200`, `10000` the sample size specific part 
+`1 - 1/log10(S)` corresponds to thresholds of `0.5`, `0.6`, `0.67`, `0.7`, `0.75`.
+Even if the sample size grows, the bias in the PSIS estimate dominates if 
+`0.7 <= k < 1`, and thus the diagnostic threshold for good is capped at 
+`0.7` (if `k > 1`, the mean does not exist and bias is not a valid measure). 
+The new recommended thresholds are based on more careful bias-variance analysis
+of PSIS based on truncated Pareto sums theory. For those who use the Stan
+default 4000 posterior draws, the `0.7` threshold will be roughly the same, but
+there will be fewer warnings as there will be no diagnostic message for `0.5 <=
+k < 0.7`. Those who use smaller sample sizes may see diagnostic messages with a
+threshold less than `0.7`, and they can simply increase the sample size to about
+`2200` to get the threshold to `0.7`.
+
+* There are no more warnings if the `r_eff` argument is not provided, and the
+default is now `r_eff = 1`. The summary print output showing MCSE and ESS now
+shows diagnostic information on the range of `r_eff`. The change was made to
+reduce unnecessary warnings. The use of `r_eff` does not change the expected
+value of `elpd_loo`, `p_loo`, and Pareto `k`, and is needed only to estimate
+MCSE and ESS. Thus it is better to show the diagnostic information about `r_eff`
+only when MCSE and ESS values are shown.
+
+### Other changes
 
 * `E_loo` now allows `type="sd"`. 
-
-
-### Bug fixes
-
 * Fix bug in `E_loo` when `type=variance`. 
+
 
 # loo 2.6.0
 

diff --git a/R/crps.R b/R/crps.R
@@ -112,7 +112,7 @@ loo_crps.matrix <-
            log_lik,
            ...,
            permutations = 1,
-           r_eff = NULL,
+           r_eff = 1,
            cores = getOption("mc.cores", 1)) {
   validate_crps_input(x, x2, y, log_lik)
   repeats <- replicate(permutations,
@@ -154,7 +154,7 @@ loo_scrps.matrix <-
     log_lik,
     ...,
     permutations = 1,
-    r_eff = NULL,
+    r_eff = 1,
     cores = getOption("mc.cores", 1)) {
   validate_crps_input(x, x2, y, log_lik)
   repeats <- replicate(permutations,
@@ -175,7 +175,7 @@ EXX_compute <- function(x, x2) {
 }
 
 
-EXX_loo_compute <- function(x, x2, log_lik, r_eff = NULL, ...) {
+EXX_loo_compute <- function(x, x2, log_lik, r_eff = 1, ...) {
   S <- nrow(x)
   shuffle <- sample (1:S)
   x2 <- x2[shuffle,]