Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure 13.10 & 14.3 (a) potentially misleading; given values of arsenic used #69

Open
shane-kercheval opened this issue Dec 23, 2020 · 2 comments

Comments

@shane-kercheval
Copy link

shane-kercheval commented Dec 23, 2020

Figures 13.10 and 14.3 (a) show the probability of switching wells as a function of distance and a given level of arsenic.

The values of arsenic used are 0.5 and 1.0.

The minimum value of arsenic in the data is 0.51, so perhaps 0.5 is close enough.

But the value of 1 is something like the 35% percentile in the data. I'm not sure this is a good representation, and seems arbitrary, but perhaps I'm missing something.

The median value of arsenic is 1.3.

Plotting the min/median/max values of arsenic (or even something like ~95% percent, which is ~3.79), I think provide better insight.

The values 0.5-1 of arsenic used in the graph is on the lower levels and so to say, in the book, "the interaction is small in the range of most of the data" seem misleading because we're only looking at lower levels of arsenic.

(graph b uses a distance of 0 to 50, which seems more reasonable because its covering the "closest" 65% of data (median dist is 36.8), and 50 meters is a nice conceptual value; although seems like a min/medium/max approach could benefit this graph as well)

Here's a screenshot at different levels.

Screen Shot 2020-12-23 at 12 24 38 PM

code below, along with the non-interaction model

library(rstan)
library(rstanarm)

set.seed(2)
model_interaction <- stan_glm(switch ~ dist100*arsenic, family=binomial(link="logit"), data=wells, refresh=0)
model_no_interaction <- stan_glm(switch ~ dist100+arsenic, family=binomial(link="logit"), data=wells, refresh=0)

jitter_binary <- function(a, jitt=.05){
  a + (1-2*a)*runif(length(a),0,jitt)
}
predict_switch <- function(.model, .dist100, .arsenic) {
    predict(.model,
            newdata = data.frame(dist100=.dist100, arsenic=.arsenic),
            type='response')
}

plot_prob_swiching_given_distance <- function(.model) {

    plot(wells$dist, jitter_binary(wells$switch), xlim=c(0, max(wells$dist)))
    for(arsenic_quantile in c(0, 0.5, 0.95, 1)) {
        arsenic_value <- quantile(wells$arsenic, arsenic_quantile)
        curve(predict_switch(.model, .dist100=x/100, .arsenic = arsenic_value), add=TRUE, col='red')
        
        text_y <- predict_switch(.model, .dist100=0, .arsenic = arsenic_value)
        text(.25, text_y, paste("if arsenic = ", arsenic_value), adj=0, cex=.8, col = 'red')
    }
    
}
plot_prob_swiching_given_distance(model_no_interaction)
plot_prob_swiching_given_distance(model_interaction)
@shane-kercheval
Copy link
Author

Also, just wanted to say, this is a fantastic book; I'm getting a lot of practical insights out of it. Thank you.

@andrewgelman
Copy link

Hi, thanks for the kind words. Regarding the arsenic level: yes, I see your point. Something like the median and the 90th percentile, or the 25th and 75th percentile, would make more sense. I don't know if I have the energy to re-make all these graphs, but it would make sense to add a couple of sentences to make this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants