Skip to content

Commit

Permalink
Merge pull request #224 from avehtari/update-aaltobda-install
Browse files Browse the repository at this point in the history
Update aaltobda install instructions in templates
  • Loading branch information
andrjohns authored Oct 23, 2023
2 parents 379c801 + ff5be1e commit 964c5e6
Show file tree
Hide file tree
Showing 9 changed files with 241 additions and 257 deletions.
6 changes: 2 additions & 4 deletions FAQ.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,9 @@ install.packages(c("MASS", "bayesplot", "brms", "cmdstanr ", "dplyr", "gganimate

The course has its own R package `aaltobda` with data and
functionality to simplify coding. `aaltobda` has been pre-installed in
Aalto JupyterHub. To install the package to your own computer just run the following
(upgrade=”never” skips question about updating other packages):
Aalto JupyterHub. To install the package to your own computer just run the following:

1. `install.packages("remotes")`
2. `remotes::install_github("avehtari/BDA_course_Aalto", subdir = "rpackage", upgrade="never", dependencies=TRUE)`
1. `install.packages("aaltobda", repos = c("https://avehtari.github.io/BDA_course_Aalto/", getOption("repos")))`

If during the course there is announcement that `aaltobda` has been
updated (e.g. some error has been fixed), you can get the latest
Expand Down
83 changes: 41 additions & 42 deletions assignments/assignment6.qmd
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
---
title: "Assignment 6"
author: "Aki Vehtari et al."
format:
format:
html:
toc: true
code-tools: true
code-line-numbers: true
code-line-numbers: true
number-sections: true
mainfont: Georgia, serif
page-layout: article
editor: source
filters:
- includes/assignments.lua
- includes/include-code-files.lua
- includes/assignments.lua
- includes/include-code-files.lua
---

# General information

**The maximum amount of points from this assignment is 6.**

We have prepared a **quarto template specific to this assignment ([html](template6.html), [qmd](https://avehtari.github.io/BDA_course_Aalto/assignments/template6.qmd), [pdf](template6.pdf))** to help you get started.
We have prepared a **quarto template specific to this assignment ([html](template6.html), [qmd](https://avehtari.github.io/BDA_course_Aalto/assignments/template6.qmd), [pdf](template6.pdf))** to help you get started.


::: {.aalto}
We recommend you use [jupyter.cs.aalto.fi](https://jupyter.cs.aalto.fi) or the [docker container](docker.html).
:::

::: {.hint}
**Reading instructions:**
::: {.hint}
**Reading instructions:**

- [**The reading instructions for BDA3 Chapter 10**](../BDA3_notes.html#ch10).
- [**The reading instructions for BDA3 Chapter 11**](../BDA3_notes.html#ch11).
Expand All @@ -44,16 +44,15 @@ We recommend you use [jupyter.cs.aalto.fi](https://jupyter.cs.aalto.fi) or the [

:::: {.content-hidden when-format="pdf"}
::: {.callout-tip collapse=true}
## Setup

## Setup

JupyterHub has all the needed packages pre-installed.

The following installs and loads the `aaltobda` package:
```{r}
if(!require(aaltobda)){
install.packages("remotes")
remotes::install_github("avehtari/BDA_course_Aalto", subdir = "rpackage", upgrade="never")
install.packages("aaltobda", repos = c("https://avehtari.github.io/BDA_course_Aalto/", getOption("repos")))
library(aaltobda)
}
```
Expand Down Expand Up @@ -129,7 +128,7 @@ p(y|x,\alpha,\beta,\sigma) &= p_\mathrm{normal}(y|\alpha + \beta x, \sigma)
& \text{(normal likelihood)} &\text{.}
\end{aligned}
$$
In both the statistical model above and in the Stan model below, $x \in \mathbb{R}^N$ and $y \in \mathbb{R}^N$ are vectors of the covariates / predictors (the assignment number) and vectors of the observation (proportions of students who have handed in the respective assignment). $\alpha \in \mathbb{R}$ is the unknown scalar intercept, $\beta \in \mathbb{R}$ is the unknown scalar slope and $\sigma \in \mathbb{R}_{>0}$ is the unknown scalar observation standard deviation. The statistical model further implies
In both the statistical model above and in the Stan model below, $x \in \mathbb{R}^N$ and $y \in \mathbb{R}^N$ are vectors of the covariates / predictors (the assignment number) and vectors of the observation (proportions of students who have handed in the respective assignment). $\alpha \in \mathbb{R}$ is the unknown scalar intercept, $\beta \in \mathbb{R}$ is the unknown scalar slope and $\sigma \in \mathbb{R}_{>0}$ is the unknown scalar observation standard deviation. The statistical model further implies
$$
p(y_\mathrm{pred.}|x_\mathrm{pred.},\alpha,\beta,\sigma) = p_\mathrm{normal}(y_\mathrm{pred.}|\alpha + \beta x_\mathrm{pred.}, \sigma)
$$
Expand All @@ -140,31 +139,31 @@ You can download [the broken stan file from github](./additional_files/assignmen
```{.stan}
data { #<1>
// number of data points
int<lower=0> N;
int<lower=0> N;
// covariate / predictor
vector[N] x;
vector[N] x;
// observations
vector[N] y;
vector[N] y;
// number of covariate values to make predictions at
int<lower=0> no_predictions;
// covariate values to make predictions at
vector[no_predictions] x_predictions;
vector[no_predictions] x_predictions;
} #<1>
parameters { #<2>
// intercept
real alpha;
real alpha;
// slope
real beta;
real beta;
// the standard deviation should be constrained to be positive
real<upper=0> sigma;
real<upper=0> sigma;
} #<2>
transformed parameters { #<3>
// deterministic transformation of parameters and data
vector[N] mu = alpha + beta * x // linear model
} #<3>
model { #<4>
// observation model / likelihood
y ~ normal(mu, sigma);
y ~ normal(mu, sigma);
} #<4>
generated quantities { #<5>
// compute the means for the covariate values at which to make predictions
Expand Down Expand Up @@ -200,7 +199,7 @@ Find the ***three mistakes*** in the code and fix them. Report the original mist

::: {.hint}
You may find some of the mistakes in the code using Stan syntax checker. If you copy the Stan code to a file ending `.stan` and open it in RStudio (you can also choose from RStudio menu File$\rightarrow$New File$\rightarrow$Stan file to create a new Stan file), the editor will show you some syntax errors. More syntax errors might be detected by clicking `Check' in the bar just above the Stan file in the RStudio editor. Note that some of the errors in the presented Stan code may not be syntax errors.
:::
:::



Expand All @@ -218,7 +217,7 @@ The author runs the corrected Stan file using the following R code and plots the
#| warning: false
# These are our observations y: the proportion of students handing in each assignment (1-8),
# sorted by year (row-wise) and assignment (column-wise).
# While the code suggest a matrix structure,
# While the code suggest a matrix structure,
# the result will actually be a vector of length N = no_years * no_assignments
propstudents<-c(c(176, 174, 158, 135, 138, 129, 126, 123)/176,
c(242, 212, 184, 177, 174, 172, 163, 156)/242,
Expand All @@ -228,7 +227,7 @@ propstudents<-c(c(176, 174, 158, 135, 138, 129, 126, 123)/176,
# These are our predictors x: for each observation, the corresponding assignment number.
assignment <- rep(1:8, 5)
# These are in some sense our test data: the proportion of students handing in the last assignment (9),
# sorted by year.
# sorted by year.
# Usually, we would not want to split our data like that and instead
# use e.g. Leave-One-Out Cross-Validation (LOO-CV, see e.g. http://mc-stan.org/loo/index.html)
# to evaluate model performance.
Expand All @@ -246,8 +245,8 @@ model_data = list(N=length(assignment),
```
**Sampling from the posterior distribution happens here**:
```{r}
#| warning: false
# This reads the file at the specified path and tries to compile it.
#| warning: false
# This reads the file at the specified path and tries to compile it.
# If it fails, an error is thrown.
retention_model = cmdstan_model("./additional_files/assignment6_linear_model.stan")
# This "out <- capture.output(...)" construction suppresses output from cmdstanr
Expand All @@ -263,21 +262,21 @@ out <- capture.output(
# This extracts the draws from the sampling result as a data.frame.
draws_df = fit$draws(format="draws_df")
# This does some data/draws wrangling to compute the 5, 50 and 95 percentiles of
# the mean at the specified covariate values (x_predictions).
# This does some data/draws wrangling to compute the 5, 50 and 95 percentiles of
# the mean at the specified covariate values (x_predictions).
# It can be instructive to play around with each of the data processing steps
# to find out what each step does, e.g. by removing parts from the back like "|> gather(pct,y,-x)"
# and printing the resulting data.frame.
mu_quantiles_df = draws_df |>
subset_draws(variable = c("mu_pred")) |>
summarise_draws(~quantile2(.x, probs = c(0.05, .5, 0.95))) |>
mutate(x = 1:9) |>
mu_quantiles_df = draws_df |>
subset_draws(variable = c("mu_pred")) |>
summarise_draws(~quantile2(.x, probs = c(0.05, .5, 0.95))) |>
mutate(x = 1:9) |>
pivot_longer(c(q5, q50, q95), names_to = c("pct"))
# Same as above, but for the predictions.
y_quantiles_df = draws_df |>
subset_draws(variable = c("y_pred")) |>
summarise_draws(~quantile2(.x, probs = c(0.05, .5, 0.95))) |>
mutate(x = 1:9) |>
y_quantiles_df = draws_df |>
subset_draws(variable = c("y_pred")) |>
summarise_draws(~quantile2(.x, probs = c(0.05, .5, 0.95))) |>
mutate(x = 1:9) |>
pivot_longer(c(q5, q50, q95), names_to = c("pct"))
```

Expand All @@ -290,14 +289,14 @@ y_quantiles_df = draws_df |>
#| label: fig-posterior
#| fig-cap: Describe me in your submission!
ggplot() +
# scatter plot of the training data:
# scatter plot of the training data:
geom_point(
aes(x, y, color=assignment),
aes(x, y, color=assignment),
data=data.frame(x=assignment, y=propstudents, assignment="1-8")
) +
# scatter plot of the test data:
geom_point(
aes(x, y, color=assignment),
aes(x, y, color=assignment),
data=data.frame(x=no_assignments, y=propstudents9, assignment="9")
) +
# you have to tell us what this plots:
Expand Down Expand Up @@ -340,7 +339,7 @@ Based on the above plot, answer the following questions:
- What is the general trend of student retention as measured by assignment submissions?
- Given a model fitted to the submission data for assignments 1-8, does it do a good job predicting the proportion of students who submit the final 9th assignment?
- Name one different modeling choice you could make to improve the prediction.
:::
:::


::: {.rubric}
Expand Down Expand Up @@ -379,7 +378,7 @@ Based on the above plot, answer the following questions:
* Has at least one way to improve the model been mentioned (E.g. **...** or **...**)?
:::

# Generalized linear model: Bioassay with Stan (4 points)
# Generalized linear model: Bioassay with Stan (4 points)

Replicate the computations for the bioassay example of section 3.7
(BDA3) using Stan.
Expand Down Expand Up @@ -460,13 +459,13 @@ any problems in setting it up or using it. Please report,
:::




::: {.rubric}
* Is the Stan model code included?
* No
* Yes
* Does the implemented Stan-model seem to be working?
* Does the implemented Stan-model seem to be working?
* No implementation
* Model implemented but results not visualized/reported
* Model implemented, but the results seem weird
Expand Down
3 changes: 1 addition & 2 deletions assignments/includes/_general_info.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@
To install the package on your own system, run
the following code (upgrade=\"never\" skips question about updating other packages):
```{.r}
install.packages("remotes")
remotes::install_github("avehtari/BDA_course_Aalto", subdir = "rpackage", upgrade="never")
install.packages("aaltobda", repos = c("https://avehtari.github.io/BDA_course_Aalto/", getOption("repos")))
```
- Many of the exercises can be checked automatically using the R
package `markmyassignment` (pre-installed in JupyterHub).
Expand Down
Loading

0 comments on commit 964c5e6

Please sign in to comment.