Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

shane-kercheval · 2020-12-23T20:32:40Z

Figures 13.10 and 14.3 (a) show the probability of switching wells as a function of distance and a given level of arsenic.

The values of arsenic used are 0.5 and 1.0.

The minimum value of arsenic in the data is 0.51, so perhaps 0.5 is close enough.

But the value of 1 is something like the 35% percentile in the data. I'm not sure this is a good representation, and seems arbitrary, but perhaps I'm missing something.

The median value of arsenic is 1.3.

Plotting the min/median/max values of arsenic (or even something like ~95% percent, which is ~3.79), I think provide better insight.

The values 0.5-1 of arsenic used in the graph is on the lower levels and so to say, in the book, "the interaction is small in the range of most of the data" seem misleading because we're only looking at lower levels of arsenic.

(graph b uses a distance of 0 to 50, which seems more reasonable because its covering the "closest" 65% of data (median dist is 36.8), and 50 meters is a nice conceptual value; although seems like a min/medium/max approach could benefit this graph as well)

Here's a screenshot at different levels.

code below, along with the non-interaction model

library(rstan)
library(rstanarm)

set.seed(2)
model_interaction <- stan_glm(switch ~ dist100*arsenic, family=binomial(link="logit"), data=wells, refresh=0)
model_no_interaction <- stan_glm(switch ~ dist100+arsenic, family=binomial(link="logit"), data=wells, refresh=0)

jitter_binary <- function(a, jitt=.05){
  a + (1-2*a)*runif(length(a),0,jitt)
}
predict_switch <- function(.model, .dist100, .arsenic) {
    predict(.model,
            newdata = data.frame(dist100=.dist100, arsenic=.arsenic),
            type='response')
}

plot_prob_swiching_given_distance <- function(.model) {

    plot(wells$dist, jitter_binary(wells$switch), xlim=c(0, max(wells$dist)))
    for(arsenic_quantile in c(0, 0.5, 0.95, 1)) {
        arsenic_value <- quantile(wells$arsenic, arsenic_quantile)
        curve(predict_switch(.model, .dist100=x/100, .arsenic = arsenic_value), add=TRUE, col='red')
        
        text_y <- predict_switch(.model, .dist100=0, .arsenic = arsenic_value)
        text(.25, text_y, paste("if arsenic = ", arsenic_value), adj=0, cex=.8, col = 'red')
    }
    
}
plot_prob_swiching_given_distance(model_no_interaction)
plot_prob_swiching_given_distance(model_interaction)

The text was updated successfully, but these errors were encountered:

shane-kercheval · 2020-12-23T20:35:24Z

Also, just wanted to say, this is a fantastic book; I'm getting a lot of practical insights out of it. Thank you.

andrewgelman · 2021-02-21T21:53:14Z

Hi, thanks for the kind words. Regarding the arsenic level: yes, I see your point. Something like the median and the 90th percentile, or the 25th and 75th percentile, would make more sense. I don't know if I have the energy to re-make all these graphs, but it would make sense to add a couple of sentences to make this point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

shane-kercheval commented Dec 23, 2020 •

edited

Loading

shane-kercheval commented Dec 23, 2020

andrewgelman commented Feb 21, 2021

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

Comments

shane-kercheval commented Dec 23, 2020 • edited Loading

shane-kercheval commented Dec 23, 2020

andrewgelman commented Feb 21, 2021

shane-kercheval commented Dec 23, 2020 •

edited

Loading