-
Notifications
You must be signed in to change notification settings - Fork 23
/
Copy path04-challenge_rev.Rmd
63 lines (40 loc) · 3.87 KB
/
04-challenge_rev.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: "Lab 04: Tidy Summaries with Gapminder Data"
subtitle: "CS631"
author: "Alison Hill"
output:
html_document:
theme: flatly
number_sections: TRUE
---
# Overview
Both challenges are due by noon via Sakai on Wednesday May 2. For the first tidy challenge, you'll want to refer back to our [slides](slides/04-slides.html). For the second challenge, you'll want to refer to the [reference lab](04-distributions.html).
1. Tidy challenge: Tidy the `gapminder_broadband_per_100.csv` file *(Tip: use `janitor::clean_names()` immediately after import to make life easier)*
- The data are the fixed broadband internet subscribers (per 100 people) for different countries by year: "Fixed broadband subscribers are users of the Internet who subscribe to paid high-speed access to the public Internet. High-speed access is at least 256 kilobits per second in one or both directions. Source: International Telecommunication Union, World Telecommunication Development Report and database, and World Bank estimates. Note: Please cite the International Telecommunication Union for third-party use of these data."
- Read more about the numbers [here](https://docs.google.com/spreadsheets/d/1MgJAijU4I4WnB8JDu6uPmS9lGYaPUkCt1k-RYTZ4nSE/pub#)
1. Gapminder challenge: Read on...
Install and load the `gapminder` data package. This is the same data that was used in your Datacamp "Introduction to the Tidyverse" course, and a tidied version of the larger gapminder dataset from challenge 1!
```{r eval = FALSE}
install.packages("gapminder")
library(gapminder)
?gapminder
```
Pick at least __two__ of the tasks below from the task menu and approach each with a table and figure.
* `dplyr` should be your main data manipulation tool
* `ggplot2` should be your main visualization tool
Make observations about what your tables/figures show and about the process. If you want to do something comparable but different, i.e. swap one quantitative variable for another- go for it!
You do not have to use `tidyr` or otherwise worry about reshaping your tables. Many of your tables may not be formatted perfectly in the report. Simply printing `dplyr` tabular output is fine. For all things, graphical and tabular, if you're dissatisfied with a result, discuss the problem, what you tried to do to fix it, and move on.
### Task menu
* Get the maximum and minimum of GDP per capita for all continents.
* Look at the spread of GDP per capita across countries within the continents.
* How does life expectancy vary across different continents?
* Report the absolute and/or relative abundance of countries with low life expectancy over time by continent: Compute some measure of worldwide life expectancy - you decide - a mean or median or some other quantile or perhaps your current age. Then determine how many countries on each continent have a life expectancy less than this benchmark, for each year.
* Make up your own!
### Companion graphs
For each table, make sure to include a relevant figure. One tip for starting is to draw out on paper what you want your x- and y-axis to be first and what your `geom` is; that is, start by drawing the plot you want `ggplot` to give you. Your figure does not have to depict every single number present in the table. Use your judgement. It just needs to complement the table, add context, and allow for some sanity checking.
Notice which figures are easy/hard to make, and whether the visualization adds clarity, detracts from, or is completely redundant (and therefore probably unnecessary) with respect to the tabular display.
## Report your process
You're encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
<div class="jumbotron">
Gapminder EDA ideas from [Jenny Bryan](http://stat545.com/hw03_dplyr-and-more-ggplot2.html), author and creator of the [Gapminder package](https://cran.r-project.org/web/packages/gapminder/index.html).
</div>