Skip to content

Latest commit

 

History

History
27 lines (23 loc) · 1.91 KB

README.md

File metadata and controls

27 lines (23 loc) · 1.91 KB

High school performance

Dataset

The data were collected from three high schools in the US, consisting of information on the students' performance as measured by three continuous outcome variables: math, reading, and writing, as well as five predictors: their demographic information on race/ethnicity, parental level of education, gender, lunch type, and test preparation course.

The R code for data visualization, descriptive statistics, and multi-linear regression was written in R Markdown and knitted to html.

Variables:

  • Math: The student's score on a standardized mathematics test, a continuous variable
  • Reading: The student's score on a standardized reading test, a continuous variable
  • Writing: The student's score on a standardized writing test, a continuous variable
  • Race/ethnicity: The student's racial or ethnic background (Asian, African-American, Hispanic, etc.)
  • Parental level of education: The highest level of education attained by the student's parent(s) or guardian(s)
  • Gender: The gender of the student (male/female)
  • Lunch: Whether the student receives free or reduced-price lunch (yes/no)
  • Test preparation course: Whether the student completed a test preparation course (yes/no)

Levels of categorical variables:

variables level_1 level_2 level_3 level_4 level_5 level_6
race/ethnicity group_a group_b group_c group_d group_e
parental level of education some high school high school some college associate's degree bachelor's degree master's degree
gender male female
lunch free/reduced standard
test_prep course completed none

Data source

High school performance