diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..9386a01 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,5 @@ +# Change Log + +8/5/2024 + +* Initial Module 3 Uploaded diff --git a/_sources/content/Module02/Module02_walkthrough_book.ipynb b/_sources/content/Module02/Module02_walkthrough_book.ipynb index 5d7b147..fa764d0 100644 --- a/_sources/content/Module02/Module02_walkthrough_book.ipynb +++ b/_sources/content/Module02/Module02_walkthrough_book.ipynb @@ -3,7 +3,9 @@ { "cell_type": "markdown", "id": "a10e9828", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "# Walkthrough\n", "\n", diff --git a/_sources/content/Module03/Module03_book.md b/_sources/content/Module03/Module03_book.md new file mode 100644 index 0000000..59d5e88 --- /dev/null +++ b/_sources/content/Module03/Module03_book.md @@ -0,0 +1,3 @@ +# Module 3: DataFrames + +This chapter will discuss how to use Python and Pandas to load and summarize spreadsheet style data. \ No newline at end of file diff --git a/_sources/content/Module03/Module03_walkthrough_book.ipynb b/_sources/content/Module03/Module03_walkthrough_book.ipynb new file mode 100644 index 0000000..c6516a8 --- /dev/null +++ b/_sources/content/Module03/Module03_walkthrough_book.ipynb @@ -0,0 +1,2996 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "da4cbf41", + "metadata": {}, + "source": [ + "# Module 03 Walkthrough\n", + "\n", + "Remember, all assignments are due before the synchronous session.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "d1b24089-7033-4965-ae63-de76bdf935a9", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial.\n", + "In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group.\n", + "After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first \"detectable\" viral load was found.\n", + "We will use `Pandas` to analyze this data and evaluate the treatment's effectiveness.\n", + "By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in `DataFrames`, and using summary methods like sum, mean, and max.\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "d728e12b", + "metadata": {}, + "source": [ + "## Learning Objectives\n", + "At the end of this learning activity you will be able to:\n", + " - Practice loading spreadsheet data into Python using `pandas`.\n", + " - Use Python methods to create derived columns in `pd.DataFrames`.\n", + " - Use `Pandas` summary methods like sum, mean, and max.\n", + " - Employ basic filtering and data extraction from `pandas`." + ] + }, + { + "cell_type": "markdown", + "id": "28d532d9", + "metadata": {}, + "source": [ + "## Dataset Reference\n", + "\n", + "_File_: `trial_data.csv`\n", + "\n", + "_Columns_:\n", + "\n", + " - `age` : (years) Current age during the study. \n", + " - `age_initial_infection` : (years) Age at which the participant was initially infected.\n", + " - `initial_viral_load` : (copies/ul) The level of infection at the start of the study.\n", + " - `treatment` : (boolean) `True` for participant in the treatment group, `False` for those in the control group.\n", + " - `weeks_to_failure` : (weeks) Time from the treatment to the first week of uncontrolled viral load.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "621cd2ef", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "markdown", + "id": "917b592b", + "metadata": {}, + "source": [ + "While _basic_ Python can do a lot, you have to do everything yourself.\n", + "The **real** power of Python is that you can `import` code that is written by others.\n", + "\n", + "For this course, we will use a common data science stack of interoperable tools centered around the [Numpy](https://numpy.org/).\n", + "\n", + "There are four that we will use regularly, two of which we'll cover today." + ] + }, + { + "cell_type": "markdown", + "id": "cfb0afb0-6fe4-47c7-b044-75144973797c", + "metadata": {}, + "source": [ + "### Numpy\n", + "\n", + "[Numpy](https://numpy.org/)\n", + "\n", + "A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.\n", + "\n", + "By convention, the community tends to _alias_ the long `numpy` as `np`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d5cc7c1d-b078-4555-a578-f862584233c4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "id": "8152b253-6408-4f55-9505-672f597e23e7", + "metadata": {}, + "source": [ + "### Pandas\n", + "\n", + "[Pandas](https://pandas.pydata.org/)\n", + "\n", + "A libary that sits atop `numpy` and provides a _spreadsheet_ style object called a `DataFrame` along with a plethora of data sciecne utilities.\n", + "This is the main tool we will be using for data exploration.\n", + "\n", + "By convention, the community tends to _alias_ the long `pandas` as `pd`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5a223d0-d0d2-471a-b5b8-a63a700eda75", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "id": "519ff9d5", + "metadata": {}, + "source": [ + "Nicely, it can read `csv` files for us." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "4492bb2c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
0552666False3
1482666False4
2453632True6
3433123False5
4402045True5
5422057True9
6553123False4
7565022False4
8593333False5
9513049True7
10552194False3
11534285True5
12403427True8
13484199False3
14564159False6
15534738True7
16574142True8
17483357False4
18514225False2
19554645False1
20432446False1
21483799True8
22512736False2
23433448True7
24514388False2
25492076False5
26544774False5
27452587True5
28594049False5
29514338True8
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "10 55 21 94 False \n", + "11 53 42 85 True \n", + "12 40 34 27 True \n", + "13 48 41 99 False \n", + "14 56 41 59 False \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "17 48 33 57 False \n", + "18 51 42 25 False \n", + "19 55 46 45 False \n", + "20 43 24 46 False \n", + "21 48 37 99 True \n", + "22 51 27 36 False \n", + "23 43 34 48 True \n", + "24 51 43 88 False \n", + "25 49 20 76 False \n", + "26 54 47 74 False \n", + "27 45 25 87 True \n", + "28 59 40 49 False \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure \n", + "0 3 \n", + "1 4 \n", + "2 6 \n", + "3 5 \n", + "4 5 \n", + "5 9 \n", + "6 4 \n", + "7 4 \n", + "8 5 \n", + "9 7 \n", + "10 3 \n", + "11 5 \n", + "12 8 \n", + "13 3 \n", + "14 6 \n", + "15 7 \n", + "16 8 \n", + "17 4 \n", + "18 2 \n", + "19 1 \n", + "20 1 \n", + "21 8 \n", + "22 2 \n", + "23 7 \n", + "24 2 \n", + "25 5 \n", + "26 5 \n", + "27 5 \n", + "28 5 \n", + "29 8 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df = pd.read_csv('trial_data.csv')\n", + "\n", + "# If a `DataFrame` is the last line, it will display a nice summary\n", + "trial_df" + ] + }, + { + "cell_type": "markdown", + "id": "31664b42", + "metadata": {}, + "source": [ + "And we should see that this exactly matches the table we saw in Excel." + ] + }, + { + "cell_type": "markdown", + "id": "1e653c16-ca8d-4641-ac5c-d81e549657ae", + "metadata": {}, + "source": [ + "The object we got back is called a `DataFrame`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b8e9d2ba-70fa-4614-8dae-e70f0a2f0db1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pandas.core.frame.DataFrame" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(trial_df)" + ] + }, + { + "cell_type": "markdown", + "id": "9124b71d-4468-42ad-b98a-4412d553f369", + "metadata": {}, + "source": [ + "If we only want to see a small version of the `DataFrame` we can use the `.head()` _method_." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "075dbccc-d1cf-4127-bd61-e15ddab0f2ce", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
0552666False3
1482666False4
2453632True6
3433123False5
4402045True5
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment weeks_to_failure\n", + "0 55 26 66 False 3\n", + "1 48 26 66 False 4\n", + "2 45 36 32 True 6\n", + "3 43 31 23 False 5\n", + "4 40 20 45 True 5" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "197cc06c-7490-4ea2-ab98-652c64a37c50", + "metadata": {}, + "source": [ + "## Acting on Columns" + ] + }, + { + "cell_type": "markdown", + "id": "75de8d1e", + "metadata": {}, + "source": [ + "We can reference each column by name using square brackets `[]`.\n", + "For example: Extracting the `age` column like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "cacc125e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 55\n", + "1 48\n", + "2 45\n", + "3 43\n", + "4 40\n", + "5 42\n", + "6 55\n", + "7 56\n", + "8 59\n", + "9 51\n", + "10 55\n", + "11 53\n", + "12 40\n", + "13 48\n", + "14 56\n", + "15 53\n", + "16 57\n", + "17 48\n", + "18 51\n", + "19 55\n", + "20 43\n", + "21 48\n", + "22 51\n", + "23 43\n", + "24 51\n", + "25 49\n", + "26 54\n", + "27 45\n", + "28 59\n", + "29 51\n", + "Name: age, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df['age']" + ] + }, + { + "cell_type": "markdown", + "id": "c8d83ab1", + "metadata": {}, + "source": [ + "### Q1: Extract the `initial_viral_load` column ?" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f99e62ac", + "metadata": {}, + "outputs": [], + "source": [ + "init_vl = trial_df['initial_viral_load'] # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "68ab0587-f8db-46fa-b758-2320a9ec6858", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "init_vl is a `pd.Series`: True\n" + ] + } + ], + "source": [ + "print('init_vl is a `pd.Series`:', isinstance(init_vl, pd.Series))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "640c9a7e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "init_vl_sum = 1628\n" + ] + } + ], + "source": [ + "print(f'init_vl_sum = {init_vl.sum()}')" + ] + }, + { + "cell_type": "markdown", + "id": "2cac446b", + "metadata": {}, + "source": [ + "Once we can extract columns, we can start summarizing them." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "48ce947a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The mean age of the population is 50.1 yrs.\n" + ] + } + ], + "source": [ + "age_col = trial_df['age']\n", + "age_mean = age_col.mean()\n", + "print(f'The mean age of the population is {age_mean:0.1f} yrs.')" + ] + }, + { + "cell_type": "markdown", + "id": "35eb614c", + "metadata": {}, + "source": [ + "Expressions can also be _chained_. \n", + "They are functionally the same, the only difference is aesthetic. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "1be80170", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The mean age of the population is 50.1 yrs, even when done on a single line.\n" + ] + } + ], + "source": [ + "age_mean_short = trial_df['age'].mean()\n", + "print(f'The mean age of the population is {age_mean_short:0.1f} yrs, even when done on a single line.')" + ] + }, + { + "cell_type": "markdown", + "id": "73927199", + "metadata": {}, + "source": [ + "### Q2: Calculate the average `weeks_to_failure` for the whole population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ba3fa20b", + "metadata": {}, + "outputs": [], + "source": [ + "average_weeks = trial_df['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "e6176369", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "average_weeks = 4.9\n" + ] + } + ], + "source": [ + "print(f'average_weeks = {average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "8f948c7a-4e79-4083-9a76-ba7a040240c7", + "metadata": {}, + "source": [ + "We can also summarize an entire `DataFrame` with a single command." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "bd9ca277-1b6d-4d66-b000-b9c0e1973e38", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "age 50.133333\n", + "age_initial_infection 34.366667\n", + "initial_viral_load 54.266667\n", + "treatment 0.400000\n", + "weeks_to_failure 4.900000\n", + "dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.mean()" + ] + }, + { + "cell_type": "markdown", + "id": "bac8ff72-1f06-4ad1-a484-fac4126cf4a3", + "metadata": {}, + "source": [ + "In this case the summary went _down_ the columns and calculated a mean for each." + ] + }, + { + "cell_type": "markdown", + "id": "679c12f1-e5bb-42d5-9dcb-7e7634853f67", + "metadata": {}, + "source": [ + "There are a number of other summarization _methods_.\n", + " - `max()`\n", + " - `min()`\n", + " - `mode()`\n", + " - `median()`\n", + " - `var()`\n", + " - `std()`\n", + " - `nunique()`" + ] + }, + { + "cell_type": "markdown", + "id": "11f7825b-21ba-4a24-9316-b91047dc17b6", + "metadata": {}, + "source": [ + "```{note}\n", + ":class: dropdown\n", + "Methods, are functions that are attached to an `object`.\n", + "They usually act on the object to provide a summary, perform a transformation, or otherwise utilize the information within the object.\n", + "In this case, these summarization methods utilize the information within the dataframe to summarize each column.\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "142a4c50-7ac3-4db8-b08b-b0ad4a075b14", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadweeks_to_failure
count30.00000030.00000030.00000030.000000
mean50.13333334.36666754.2666674.900000
std5.5692099.04198424.0702042.202663
min40.00000020.00000022.0000001.000000
25%45.75000026.25000036.5000003.250000
50%51.00000034.00000048.5000005.000000
75%55.00000041.75000072.0000006.750000
max59.00000050.00000099.0000009.000000
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load weeks_to_failure\n", + "count 30.000000 30.000000 30.000000 30.000000\n", + "mean 50.133333 34.366667 54.266667 4.900000\n", + "std 5.569209 9.041984 24.070204 2.202663\n", + "min 40.000000 20.000000 22.000000 1.000000\n", + "25% 45.750000 26.250000 36.500000 3.250000\n", + "50% 51.000000 34.000000 48.500000 5.000000\n", + "75% 55.000000 41.750000 72.000000 6.750000\n", + "max 59.000000 50.000000 99.000000 9.000000" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.describe()" + ] + }, + { + "cell_type": "markdown", + "id": "e5b40fdd", + "metadata": {}, + "source": [ + "Selecting columns is nice.\n", + "We can also add a new column based on another one.\n", + "\n", + "In HIV research it is often important to know how long someone has been living with HIV.\n", + "However, this dataset contains their current age, and their age at infection.\n", + "We can use these two to calculate the length." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "7c162199", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
1482666False422
2453632True69
3433123False512
4402045True520
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# first make a new `Series`\n", + "years_infected = trial_df['age'] - trial_df['age_initial_infection']\n", + "\n", + "# Then add that series into the table\n", + "trial_df['years_infected'] = years_infected\n", + "trial_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "69cd190f-6c41-48e5-805c-5d3bde23a510", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
1482666False422
2453632True69
3433123False512
4402045True520
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 " + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Alternatively\n", + "trial_df['years_infected'] = trial_df['age'] - trial_df['age_initial_infection']\n", + "trial_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "3d5dc837-650c-44db-8aab-8675491b8049", + "metadata": {}, + "source": [ + "## Acting on Rows" + ] + }, + { + "cell_type": "markdown", + "id": "3f2dac9e-00f7-4442-8a36-20631e73f8f6", + "metadata": {}, + "source": [ + "### Indexing" + ] + }, + { + "cell_type": "markdown", + "id": "c38315cd", + "metadata": {}, + "source": [ + "When selecting rows, or rows and columns, we need to use the `.loc` attribute of the `DataFrame`.\n", + "\n", + "We can select by row number." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "85d1364b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "age 55\n", + "age_initial_infection 26\n", + "initial_viral_load 66\n", + "treatment False\n", + "weeks_to_failure 3\n", + "years_infected 29\n", + "Name: 0, dtype: object" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.loc[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "39614ebd-ee4b-46ab-9619-40b14ac66418", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
1482666False422
2453632True69
3433123False512
4402045True520
5422057True922
6553123False424
7565022False46
8593333False526
9513049True721
10552194False334
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "10 55 21 94 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 \n", + "5 9 22 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "10 3 34 " + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can use a : to indicate a range.\n", + "trial_df.loc[0:10]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "87473eee-abd9-4ca7-9f85-420103cf22c0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
5422057True922
7565022False46
13484199False37
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "5 42 20 57 True \n", + "7 56 50 22 False \n", + "13 48 41 99 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "5 9 22 \n", + "7 4 6 \n", + "13 3 7 " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can provide an arbitrary list\n", + "trial_df.loc[[0, 5, 7, 13]]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "24b190cc-d554-46ea-b7c3-ada85f89ede5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
initial_viral_loadage
06655
55742
72256
139948
\n", + "
" + ], + "text/plain": [ + " initial_viral_load age\n", + "0 66 55\n", + "5 57 42\n", + "7 22 56\n", + "13 99 48" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can also select columns at the same time.\n", + "trial_df.loc[[0, 5, 7, 13], ['initial_viral_load', 'age']]" + ] + }, + { + "cell_type": "markdown", + "id": "7110e944-753f-4e99-a6d4-c1a62c84ce40", + "metadata": {}, + "source": [ + "### Boolean Indexing" + ] + }, + { + "cell_type": "markdown", + "id": "2327261a-e23e-4f80-ab9c-61d9d593769a", + "metadata": {}, + "source": [ + "If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing.\n", + "In this stragey we create a new `pd.Series` of True/False values where True corresponds to the ones we want." + ] + }, + { + "cell_type": "markdown", + "id": "04de3217-aaea-445d-91c1-f55329913752", + "metadata": {}, + "source": [ + "Start by finding all people over 50 years old." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "0ddab93f-c9b6-4caa-b47d-37325c748b76", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
6553123False424
7565022False46
8593333False526
9513049True721
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "age_mask = trial_df['age'] > 50\n", + "aged_samples = trial_df.loc[age_mask]\n", + "aged_samples.head()" + ] + }, + { + "cell_type": "markdown", + "id": "31e45076-4226-4f85-9731-21502452138f", + "metadata": {}, + "source": [ + "```{note}\n", + ":class: dropdown\n", + "I often use the suffix `_mask` when I create boolean indexes.\n", + "It is not required, but utilizing naming conventions makes your code easier to understand by yourself and others.\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "f82b318d-5429-4c16-aff9-1b8fb32d35db", + "metadata": {}, + "source": [ + "Now, if we also wanted to split by the initial_viral_load we might do:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e3948cff-b118-4fa2-bb49-541567d43404", + "metadata": {}, + "outputs": [], + "source": [ + "high_vl_mask = trial_df['initial_viral_load'] > 50" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "467ede0c-c706-456e-8cb8-f6b257cdbf86", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
10552194False334
11534285True511
14564159False615
24514388False28
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "10 55 21 94 False \n", + "11 53 42 85 True \n", + "14 56 41 59 False \n", + "24 51 43 88 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "10 3 34 \n", + "11 5 11 \n", + "14 6 15 \n", + "24 2 8 " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "aged_high_vl = trial_df.loc[age_mask & high_vl_mask]\n", + "aged_high_vl.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "7e70c0ea-31ae-4315-81e7-080229ed1b6e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
6553123False424
7565022False46
8593333False526
9513049True721
15534738True76
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "15 53 47 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "15 7 6 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# ~ can be used to say \"not\"\n", + "aged_low_vl = trial_df.loc[age_mask & ~high_vl_mask]\n", + "aged_low_vl.head()" + ] + }, + { + "cell_type": "markdown", + "id": "45027741-6266-4ea5-86ce-fe20ef65baa3", + "metadata": {}, + "source": [ + "### Q3: Calculate the average weeks to failure for the treated population?" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "5e318b06-4063-4d00-b42a-c13c0b9b55d4", + "metadata": {}, + "outputs": [], + "source": [ + "treated_mask = trial_df['treatment'] == True # SOLUTION NO PROMPT\n", + "treated_average_weeks = trial_df.loc[treated_mask, 'weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "dd141501-a148-4168-b9bc-3448e7e60028", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks = 6.9\n" + ] + } + ], + "source": [ + "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "0673dd84-d614-4c05-afcd-941163c57608", + "metadata": {}, + "source": [ + "Utilizing boolean indexing you can express _any_ algorithmic row selecting strategy.\n", + "This can even include comparisons between rows, for example if there were multiple rows of the same sample.\n", + "We will cover these strategies later in the course." + ] + }, + { + "cell_type": "markdown", + "id": "c212312f-58e6-4354-8bb3-94df1bc669f2", + "metadata": {}, + "source": [ + "Sometimes, our searches are simple.\n", + "Pandas also includes another method for indexing rows called `.query()` for these purposes." + ] + }, + { + "cell_type": "markdown", + "id": "5b87435e-b89c-4cf5-be6b-526faf8469fd", + "metadata": {}, + "source": [ + "### Querying" + ] + }, + { + "cell_type": "markdown", + "id": "e1abf62a", + "metadata": {}, + "source": [ + "`.query()` is an interface that facilitates simple queries qith a few specific limitations:\n", + " - It can only use the information present in the row.\n", + " - It can only work on one row at a time.\n", + " - Column headers cannot contain spaces, dots, dashes, commas, or emoji." + ] + }, + { + "cell_type": "markdown", + "id": "e6ea84aa-edc6-40e8-aebd-e8eedbfd59e8", + "metadata": {}, + "source": [ + "Our questions on this dataset easily fit within those constraints." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "19e82c53", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
2453632True69
4402045True520
5422057True922
9513049True721
11534285True511
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "9 51 30 49 True \n", + "11 53 42 85 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "4 5 20 \n", + "5 9 22 \n", + "9 7 21 \n", + "11 5 11 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# All treatment rows\n", + "trial_df.query('treatment == True').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "c2ac06a0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
0552666False329
1482666False422
3433123False512
6553123False424
7565022False46
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "3 43 31 23 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "3 5 12 \n", + "6 4 24 \n", + "7 4 6 " + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('treatment == False').head()" + ] + }, + { + "cell_type": "markdown", + "id": "2d7c3caf", + "metadata": {}, + "source": [ + "You can also make them more complex." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "7a4fa71b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
2453632True69
4402045True520
5422057True922
9513049True721
11534285True511
12403427True86
15534738True76
16574142True816
21483799True811
23433448True79
27452587True520
29514338True88
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "9 51 30 49 True \n", + "11 53 42 85 True \n", + "12 40 34 27 True \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "21 48 37 99 True \n", + "23 43 34 48 True \n", + "27 45 25 87 True \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "4 5 20 \n", + "5 9 22 \n", + "9 7 21 \n", + "11 5 11 \n", + "12 8 6 \n", + "15 7 6 \n", + "16 8 16 \n", + "21 8 11 \n", + "23 7 9 \n", + "27 5 20 \n", + "29 8 8 " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('age > 33 & treatment == True')" + ] + }, + { + "cell_type": "markdown", + "id": "8b2af46a", + "metadata": {}, + "source": [ + "This statement doesn't make a \"biological sense\", but it is an example of a valid comparison." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "af1fd110", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
2453632True69
3433123False512
6553123False424
7565022False46
8593333False526
9513049True721
12403427True86
15534738True76
16574142True816
18514225False29
19554645False19
22512736False224
28594049False519
29514338True88
\n", + "
" + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "12 40 34 27 True \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "18 51 42 25 False \n", + "19 55 46 45 False \n", + "22 51 27 36 False \n", + "28 59 40 49 False \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "3 5 12 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "12 8 6 \n", + "15 7 6 \n", + "16 8 16 \n", + "18 2 9 \n", + "19 1 9 \n", + "22 2 24 \n", + "28 5 19 \n", + "29 8 8 " + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('age >= initial_viral_load')" + ] + }, + { + "cell_type": "markdown", + "id": "e68592de", + "metadata": {}, + "source": [ + "### Q4: Calculate the average `weeks_to_failure` for the untreated population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "cfcdf2f7", + "metadata": {}, + "outputs": [], + "source": [ + "# BEGIN SOLUTION NO PROMPT\n", + "\n", + "wanted_samples = trial_df.query('treatment == False')\n", + "\n", + "# END SOLUTION\n", + "\n", + "untreated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "5ea72ed6-01ee-48d4-800b-5d145f3517ad", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Untreated participants took 3.6 weeks to rebound.\n" + ] + } + ], + "source": [ + "print(f'Untreated participants took {untreated_average_weeks:0.1f} weeks to rebound.')" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "7f804ed1-a9c2-41a1-bc31-58035c91e967", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "untreated_average_weeks is a `float`: True\n" + ] + } + ], + "source": [ + "print('untreated_average_weeks is a `float`:', isinstance(untreated_average_weeks, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "8f8d7324", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "untreated_average_weeks = 3.6\n" + ] + } + ], + "source": [ + "print(f'untreated_average_weeks = {untreated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "e87cce47", + "metadata": {}, + "source": [ + "### Q4: Calculate the average `weeks_to_failure` for the treated population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "2742d0bb", + "metadata": {}, + "outputs": [], + "source": [ + "# BEGIN SOLUTION NO PROMPT\n", + "\n", + "wanted_samples = trial_df.query('treatment == True')\n", + "\n", + "# END SOLUTION\n", + "\n", + "treated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "c6b2bfa0-8673-4666-adcd-31a1a574fd79", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Treated patients took 6.9 weeks to rebound.\n" + ] + } + ], + "source": [ + "print(f'Treated patients took {treated_average_weeks:0.1f} weeks to rebound.')" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "79786fec-0e7c-461e-9a65-0fac65d6870a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks is a `float`: True\n" + ] + } + ], + "source": [ + "print('treated_average_weeks is a `float`:', isinstance(treated_average_weeks, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "ea73783c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks = 6.9\n" + ] + } + ], + "source": [ + "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "5af4b494", + "metadata": {}, + "source": [ + "# Conclusion" + ] + }, + { + "cell_type": "markdown", + "id": "a2d1c3b8", + "metadata": {}, + "source": [ + "We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks.\n", + "While not a complete cure, any incremental step is useful progress in the elimination of HIV.\n", + "\n", + "In the lab you will use similar techniques to explore whether other factors in this dataset impact the results.\n", + "In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment." + ] + }, + { + "cell_type": "markdown", + "id": "493f93a7", + "metadata": {}, + "source": [ + "---------------------------------------------" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/content/Module01/Module01_book.html b/content/Module01/Module01_book.html index 94b7e7f..82e684d 100644 --- a/content/Module01/Module01_book.html +++ b/content/Module01/Module01_book.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module01/Module01_walkthrough_book.html b/content/Module01/Module01_walkthrough_book.html index fe0cf47..b75b590 100644 --- a/content/Module01/Module01_walkthrough_book.html +++ b/content/Module01/Module01_walkthrough_book.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module01/notebook_actions.html b/content/Module01/notebook_actions.html index b23ab5c..a86da7c 100644 --- a/content/Module01/notebook_actions.html +++ b/content/Module01/notebook_actions.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module02/Module02_book.html b/content/Module02/Module02_book.html index ced0d9c..5fdfba3 100644 --- a/content/Module02/Module02_book.html +++ b/content/Module02/Module02_book.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module02/Module02_walkthrough_book.html b/content/Module02/Module02_walkthrough_book.html index d9ff590..41dd307 100644 --- a/content/Module02/Module02_walkthrough_book.html +++ b/content/Module02/Module02_walkthrough_book.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module02/dilution_calculations.html b/content/Module02/dilution_calculations.html index a2d3bf2..7a19ed2 100644 --- a/content/Module02/dilution_calculations.html +++ b/content/Module02/dilution_calculations.html @@ -62,6 +62,7 @@ + @@ -193,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • @@ -390,6 +396,15 @@

    Dilution calculationsNanopore Sequencing

    + +
    +

    next

    +

    Module 3: DataFrames

    +
    + +
    diff --git a/content/Module02/nanopore_description.html b/content/Module02/nanopore_description.html index 8b86d9d..13b3893 100644 --- a/content/Module02/nanopore_description.html +++ b/content/Module02/nanopore_description.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/Module03/Module03_book.html b/content/Module03/Module03_book.html new file mode 100644 index 0000000..c2c1f4d --- /dev/null +++ b/content/Module03/Module03_book.html @@ -0,0 +1,467 @@ + + + + + + + + + + + Module 3: DataFrames — Quantitative Reasoning in Biology + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + + +
    +
    Work in progress!
    +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + +
    +

    Module 3: DataFrames

    + +
    +
    + +
    +
    +
    + + + + +
    + +
    +

    Module 3: DataFrames#

    +

    This chapter will discuss how to use Python and Pandas to load and summarize spreadsheet style data.

    +
    +
    +
    + + + + +
    + + + + + + + + +
    + + + +
    + + +
    +
    + + +
    + + +
    +
    +
    + + + + + + + + \ No newline at end of file diff --git a/content/Module03/Module03_walkthrough_book.html b/content/Module03/Module03_walkthrough_book.html new file mode 100644 index 0000000..596e46f --- /dev/null +++ b/content/Module03/Module03_walkthrough_book.html @@ -0,0 +1,2602 @@ + + + + + + + + + + + Module 03 Walkthrough — Quantitative Reasoning in Biology + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + + + + + + + + + + +
    +
    +
    +
    +
    + + +
    +
    Work in progress!
    +
    + + + + + +
    +
    + + + +
    + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    +
    + +
    +
    + +
    + +
    + +
    + + +
    + +
    + +
    + + + + + + + + + + + + + + + + + + + +
    + +
    + +
    +
    + + + + + + + + +
    + +
    +

    Module 03 Walkthrough#

    +

    Remember, all assignments are due before the synchronous session.

    +
    +

    Introduction#

    +

    Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial. +In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group. +After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first “detectable” viral load was found. +We will use Pandas to analyze this data and evaluate the treatment’s effectiveness. +By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in DataFrames, and using summary methods like sum, mean, and max. +Let’s get started!

    +
    +
    +

    Learning Objectives#

    +

    At the end of this learning activity you will be able to:

    +
      +
    • Practice loading spreadsheet data into Python using pandas.

    • +
    • Use Python methods to create derived columns in pd.DataFrames.

    • +
    • Use Pandas summary methods like sum, mean, and max.

    • +
    • Employ basic filtering and data extraction from pandas.

    • +
    +
    +
    +

    Dataset Reference#

    +

    File: trial_data.csv

    +

    Columns:

    +
      +
    • age : (years) Current age during the study.

    • +
    • age_initial_infection : (years) Age at which the participant was initially infected.

    • +
    • initial_viral_load : (copies/ul) The level of infection at the start of the study.

    • +
    • treatment : (boolean) True for participant in the treatment group, False for those in the control group.

    • +
    • weeks_to_failure : (weeks) Time from the treatment to the first week of uncontrolled viral load.

    • +
    +
    +
    +

    Imports#

    +

    While basic Python can do a lot, you have to do everything yourself. +The real power of Python is that you can import code that is written by others.

    +

    For this course, we will use a common data science stack of interoperable tools centered around the Numpy.

    +

    There are four that we will use regularly, two of which we’ll cover today.

    +
    +

    Numpy#

    +

    Numpy

    +

    A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.

    +

    By convention, the community tends to alias the long numpy as np.

    +
    +
    +
    import numpy as np
    +
    +
    +
    +
    +
    +
    +

    Pandas#

    +

    Pandas

    +

    A libary that sits atop numpy and provides a spreadsheet style object called a DataFrame along with a plethora of data sciecne utilities. +This is the main tool we will be using for data exploration.

    +

    By convention, the community tends to alias the long pandas as pd.

    +
    +
    +
    import pandas as pd
    +
    +
    +
    +
    +

    Nicely, it can read csv files for us.

    +
    +
    +
    trial_df = pd.read_csv('trial_data.csv')
    +
    +# If a `DataFrame` is the last line, it will display a nice summary
    +trial_df
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
    0552666False3
    1482666False4
    2453632True6
    3433123False5
    4402045True5
    5422057True9
    6553123False4
    7565022False4
    8593333False5
    9513049True7
    10552194False3
    11534285True5
    12403427True8
    13484199False3
    14564159False6
    15534738True7
    16574142True8
    17483357False4
    18514225False2
    19554645False1
    20432446False1
    21483799True8
    22512736False2
    23433448True7
    24514388False2
    25492076False5
    26544774False5
    27452587True5
    28594049False5
    29514338True8
    +
    +
    +

    And we should see that this exactly matches the table we saw in Excel.

    +

    The object we got back is called a DataFrame.

    +
    +
    +
    type(trial_df)
    +
    +
    +
    +
    +
    pandas.core.frame.DataFrame
    +
    +
    +
    +
    +

    If we only want to see a small version of the DataFrame we can use the .head() method.

    +
    +
    +
    trial_df.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
    0552666False3
    1482666False4
    2453632True6
    3433123False5
    4402045True5
    +
    +
    +
    +
    +
    +

    Acting on Columns#

    +

    We can reference each column by name using square brackets []. +For example: Extracting the age column like so:

    +
    +
    +
    trial_df['age']
    +
    +
    +
    +
    +
    0     55
    +1     48
    +2     45
    +3     43
    +4     40
    +5     42
    +6     55
    +7     56
    +8     59
    +9     51
    +10    55
    +11    53
    +12    40
    +13    48
    +14    56
    +15    53
    +16    57
    +17    48
    +18    51
    +19    55
    +20    43
    +21    48
    +22    51
    +23    43
    +24    51
    +25    49
    +26    54
    +27    45
    +28    59
    +29    51
    +Name: age, dtype: int64
    +
    +
    +
    +
    +
    +

    Q1: Extract the initial_viral_load column ?#

    +
    +
    +
    init_vl = trial_df['initial_viral_load']  # SOLUTION
    +
    +
    +
    +
    +
    +
    +
    print('init_vl is a `pd.Series`:', isinstance(init_vl, pd.Series))
    +
    +
    +
    +
    +
    init_vl is a `pd.Series`: True
    +
    +
    +
    +
    +
    +
    +
    print(f'init_vl_sum = {init_vl.sum()}')
    +
    +
    +
    +
    +
    init_vl_sum = 1628
    +
    +
    +
    +
    +

    Once we can extract columns, we can start summarizing them.

    +
    +
    +
    age_col = trial_df['age']
    +age_mean = age_col.mean()
    +print(f'The mean age of the population is {age_mean:0.1f} yrs.')
    +
    +
    +
    +
    +
    The mean age of the population is 50.1 yrs.
    +
    +
    +
    +
    +

    Expressions can also be chained. +They are functionally the same, the only difference is aesthetic.

    +
    +
    +
    age_mean_short = trial_df['age'].mean()
    +print(f'The mean age of the population is {age_mean_short:0.1f} yrs, even when done on a single line.')
    +
    +
    +
    +
    +
    The mean age of the population is 50.1 yrs, even when done on a single line.
    +
    +
    +
    +
    +
    +
    +

    Q2: Calculate the average weeks_to_failure for the whole population?#

    +
    +
    +
    average_weeks = trial_df['weeks_to_failure'].mean()  # SOLUTION
    +
    +
    +
    +
    +
    +
    +
    print(f'average_weeks = {average_weeks:0.1f}')
    +
    +
    +
    +
    +
    average_weeks = 4.9
    +
    +
    +
    +
    +

    We can also summarize an entire DataFrame with a single command.

    +
    +
    +
    trial_df.mean()
    +
    +
    +
    +
    +
    age                      50.133333
    +age_initial_infection    34.366667
    +initial_viral_load       54.266667
    +treatment                 0.400000
    +weeks_to_failure          4.900000
    +dtype: float64
    +
    +
    +
    +
    +

    In this case the summary went down the columns and calculated a mean for each.

    +

    There are a number of other summarization methods.

    +
      +
    • max()

    • +
    • min()

    • +
    • mode()

    • +
    • median()

    • +
    • var()

    • +
    • std()

    • +
    • nunique()

    • +
    + +
    +
    +
    trial_df.describe()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadweeks_to_failure
    count30.00000030.00000030.00000030.000000
    mean50.13333334.36666754.2666674.900000
    std5.5692099.04198424.0702042.202663
    min40.00000020.00000022.0000001.000000
    25%45.75000026.25000036.5000003.250000
    50%51.00000034.00000048.5000005.000000
    75%55.00000041.75000072.0000006.750000
    max59.00000050.00000099.0000009.000000
    +
    +
    +

    Selecting columns is nice. +We can also add a new column based on another one.

    +

    In HIV research it is often important to know how long someone has been living with HIV. +However, this dataset contains their current age, and their age at infection. +We can use these two to calculate the length.

    +
    +
    +
    # first make a new `Series`
    +years_infected = trial_df['age'] - trial_df['age_initial_infection']
    +
    +# Then add that series into the table
    +trial_df['years_infected'] = years_infected
    +trial_df.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    +
    +
    +
    +
    +
    # Alternatively
    +trial_df['years_infected'] = trial_df['age'] - trial_df['age_initial_infection']
    +trial_df.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    +
    +
    +
    +
    +
    +

    Acting on Rows#

    +
    +

    Indexing#

    +

    When selecting rows, or rows and columns, we need to use the .loc attribute of the DataFrame.

    +

    We can select by row number.

    +
    +
    +
    trial_df.loc[0]
    +
    +
    +
    +
    +
    age                         55
    +age_initial_infection       26
    +initial_viral_load          66
    +treatment                False
    +weeks_to_failure             3
    +years_infected              29
    +Name: 0, dtype: object
    +
    +
    +
    +
    +
    +
    +
    # We can use a : to indicate a range.
    +trial_df.loc[0:10]
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    5422057True922
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    10552194False334
    +
    +
    +
    +
    +
    # We can provide an arbitrary list
    +trial_df.loc[[0, 5, 7, 13]]
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    5422057True922
    7565022False46
    13484199False37
    +
    +
    +
    +
    +
    # We can also select columns at the same time.
    +trial_df.loc[[0, 5, 7, 13], ['initial_viral_load', 'age']]
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    initial_viral_loadage
    06655
    55742
    72256
    139948
    +
    +
    +
    +
    +

    Boolean Indexing#

    +

    If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing. +In this stragey we create a new pd.Series of True/False values where True corresponds to the ones we want.

    +

    Start by finding all people over 50 years old.

    +
    +
    +
    age_mask = trial_df['age'] > 50
    +aged_samples = trial_df.loc[age_mask]
    +aged_samples.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    +
    +
    + +

    Now, if we also wanted to split by the initial_viral_load we might do:

    +
    +
    +
    high_vl_mask = trial_df['initial_viral_load'] > 50
    +
    +
    +
    +
    +
    +
    +
    aged_high_vl = trial_df.loc[age_mask & high_vl_mask]
    +aged_high_vl.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    10552194False334
    11534285True511
    14564159False615
    24514388False28
    +
    +
    +
    +
    +
    # ~ can be used to say "not"
    +aged_low_vl = trial_df.loc[age_mask & ~high_vl_mask]
    +aged_low_vl.head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    15534738True76
    +
    +
    +
    +
    +

    Q3: Calculate the average weeks to failure for the treated population?#

    +
    +
    +
    treated_mask = trial_df['treatment'] == True  # SOLUTION NO PROMPT
    +treated_average_weeks = trial_df.loc[treated_mask, 'weeks_to_failure'].mean()  # SOLUTION
    +
    +
    +
    +
    +
    +
    +
    print(f'treated_average_weeks = {treated_average_weeks:0.1f}')
    +
    +
    +
    +
    +
    treated_average_weeks = 6.9
    +
    +
    +
    +
    +

    Utilizing boolean indexing you can express any algorithmic row selecting strategy. +This can even include comparisons between rows, for example if there were multiple rows of the same sample. +We will cover these strategies later in the course.

    +

    Sometimes, our searches are simple. +Pandas also includes another method for indexing rows called .query() for these purposes.

    +
    +
    +

    Querying#

    +

    .query() is an interface that facilitates simple queries qith a few specific limitations:

    +
      +
    • It can only use the information present in the row.

    • +
    • It can only work on one row at a time.

    • +
    • Column headers cannot contain spaces, dots, dashes, commas, or emoji.

    • +
    +

    Our questions on this dataset easily fit within those constraints.

    +
    +
    +
    # All treatment rows
    +trial_df.query('treatment == True').head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    4402045True520
    5422057True922
    9513049True721
    11534285True511
    +
    +
    +
    +
    +
    trial_df.query('treatment == False').head()
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    3433123False512
    6553123False424
    7565022False46
    +
    +
    +

    You can also make them more complex.

    +
    +
    +
    trial_df.query('age > 33 & treatment == True')
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    4402045True520
    5422057True922
    9513049True721
    11534285True511
    12403427True86
    15534738True76
    16574142True816
    21483799True811
    23433448True79
    27452587True520
    29514338True88
    +
    +
    +

    This statement doesn’t make a “biological sense”, but it is an example of a valid comparison.

    +
    +
    +
    trial_df.query('age >= initial_viral_load')
    +
    +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    3433123False512
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    12403427True86
    15534738True76
    16574142True816
    18514225False29
    19554645False19
    22512736False224
    28594049False519
    29514338True88
    +
    +
    +
    +
    +

    Q4: Calculate the average weeks_to_failure for the untreated population?#

    +
    +
    +
    # BEGIN SOLUTION NO PROMPT
    +
    +wanted_samples = trial_df.query('treatment == False')
    +
    +# END SOLUTION
    +
    +untreated_average_weeks = wanted_samples['weeks_to_failure'].mean()  # SOLUTION
    +
    +
    +
    +
    +
    +
    +
    print(f'Untreated participants took {untreated_average_weeks:0.1f} weeks to rebound.')
    +
    +
    +
    +
    +
    Untreated participants took 3.6 weeks to rebound.
    +
    +
    +
    +
    +
    +
    +
    print('untreated_average_weeks is a `float`:', isinstance(untreated_average_weeks, float))
    +
    +
    +
    +
    +
    untreated_average_weeks is a `float`: True
    +
    +
    +
    +
    +
    +
    +
    print(f'untreated_average_weeks = {untreated_average_weeks:0.1f}')
    +
    +
    +
    +
    +
    untreated_average_weeks = 3.6
    +
    +
    +
    +
    +
    +
    +

    Q4: Calculate the average weeks_to_failure for the treated population?#

    +
    +
    +
    # BEGIN SOLUTION NO PROMPT
    +
    +wanted_samples = trial_df.query('treatment == True')
    +
    +# END SOLUTION
    +
    +treated_average_weeks = wanted_samples['weeks_to_failure'].mean()  # SOLUTION
    +
    +
    +
    +
    +
    +
    +
    print(f'Treated patients took {treated_average_weeks:0.1f} weeks to rebound.')
    +
    +
    +
    +
    +
    Treated patients took 6.9 weeks to rebound.
    +
    +
    +
    +
    +
    +
    +
    print('treated_average_weeks is a `float`:', isinstance(treated_average_weeks, float))
    +
    +
    +
    +
    +
    treated_average_weeks is a `float`: True
    +
    +
    +
    +
    +
    +
    +
    print(f'treated_average_weeks = {treated_average_weeks:0.1f}')
    +
    +
    +
    +
    +
    treated_average_weeks = 6.9
    +
    +
    +
    +
    +
    +
    +
    +
    +

    Conclusion#

    +

    We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks. +While not a complete cure, any incremental step is useful progress in the elimination of HIV.

    +

    In the lab you will use similar techniques to explore whether other factors in this dataset impact the results. +In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment.

    +
    +
    + + + + +
    + + + + + + + + +
    + + + + + + +
    +
    + + +
    + + +
    +
    +
    + + + + + + + + \ No newline at end of file diff --git a/content/book_index.html b/content/book_index.html index 6d3bae2..4a7789c 100644 --- a/content/book_index.html +++ b/content/book_index.html @@ -195,6 +195,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/misc/about_this_book.html b/content/misc/about_this_book.html index f90c8f3..f879c8e 100644 --- a/content/misc/about_this_book.html +++ b/content/misc/about_this_book.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/content/misc/book_intro.html b/content/misc/book_intro.html index a7762bc..0880fd2 100644 --- a/content/misc/book_intro.html +++ b/content/misc/book_intro.html @@ -194,6 +194,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/genindex.html b/genindex.html index 87e9743..923c428 100644 --- a/genindex.html +++ b/genindex.html @@ -193,6 +193,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb b/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb index 51335f0..96bc54c 100644 --- a/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb +++ b/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb @@ -416,9 +416,9 @@ " import otter\n", "\n", "if not os.path.exists('walkthrough-tests'):\n", - " zip_files = [f for f in os.listdir()]\n", + " zip_files = [f for f in os.listdir() if f.endswith('.zip')]\n", " assert len(zip_files)>0, 'Could not find any zip files!'\n", - " assert len(zip_files)>1, 'Found multiple zip files!'\n", + " assert len(zip_files)==1, 'Found multiple zip files!'\n", " ! unzip {zip_files[0]}\n", "\n", "grader = otter.Notebook(colab=True,\n", diff --git a/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb b/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb new file mode 100644 index 0000000..d879fc2 --- /dev/null +++ b/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb @@ -0,0 +1,744 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a10e9828", + "metadata": { + "tags": [] + }, + "source": [ + "# Walkthrough\n", + "\n", + "Remember, all assignments are due before the weekly synchronous session." + ] + }, + { + "cell_type": "markdown", + "id": "719610dc", + "metadata": {}, + "source": [ + "## Learning Objectives\n", + "At the end of this learning activity you will be able to:\n", + "\n", + " - Use basic arithmetic operations in Python.\n", + " - Summarize the basic expression syntax in Python.\n", + " - Write an equation that uses the result of one variable to calculate the value of another. \n", + " - Create basic `f-strings` in Python to display dynamically created data.\n", + " - Summarize a general strategy for using Python to calculate dilutions." + ] + }, + { + "cell_type": "markdown", + "id": "38ad9ba4", + "metadata": {}, + "source": [ + "## Programmatic Arithmetic in Python" + ] + }, + { + "cell_type": "markdown", + "id": "2528718d", + "metadata": {}, + "source": [ + "Often times in the lab we have common tasks that we repeat over and over again. \n", + "This can be anything from counting the number of cells on a plate, to normalizing values with a reference, to calculating dilutions for stock chemicals.\n", + "Automating these types of tasks can lead to drastic speedups in the time it takes to get common tasks done. \n", + "This week we'll use a common problem from molecular biology as our jumping off point into Python.\n", + "\n", + "Recently, my lab obtained a Nanopore MinION.\n", + "It is a 1000 dollar, USB-key sized DNA sequencer that reads millions of bases for about 100 dollars per sample.\n", + "As part of a Senior Design Project we used the device to track the COVID outbreak in the Drexel community using rapid sequencing.\n", + "Watch the video explaining the project in the Recommended Materials for more context.\n", + "This protocol requires numerous tedious calculations relating mass, moles, and concentrations.\n", + "This week we will explore how to use Python to automate these calculations." + ] + }, + { + "cell_type": "markdown", + "id": "bd70cef6", + "metadata": {}, + "source": [ + "The Nanopore sequencing protocol requires the operator to perform 3 enzymatic reactions:\n", + " 1. `End-Prep`: Prepare the 3' and 5' ends of the DNA by removing single-basepair overhangs and add a single `A` at the end of the molecule.\n", + " 2. `Barcode ligation`: Attach unique barcodes to each sample using a `T` overhang so each sample has an individual *key* at the start of the sequence.\n", + " 3. `Adapter ligation`: After pooling each sample, another DNA molecule (called an *adapter*) needs to be added so it can attach to the motor protein inside the Nanopore device.\n", + " \n", + "Refer to the online textbook for more detail." + ] + }, + { + "cell_type": "markdown", + "id": "280e4fda", + "metadata": {}, + "source": [ + "## The Problem" + ] + }, + { + "cell_type": "markdown", + "id": "b2b7eead", + "metadata": {}, + "source": [ + "Just like baking, when performing enzymatic reactions it is critical that we use the right amount of each ingredient.\n", + "The Nanopore enzymatic reagents come in prescribed amounts and it is up to the operator to ensure that the correct initial amount of template DNA is added to each reaction.\n", + "\n", + "The amount of template DNA needed for each reaction is listed in the protocol in [*moles*](https://en.wikipedia.org/wiki/Mole_(unit)).\n", + "Moles are a unit of \"amount\" such as the number of molecules of DNA, there are 6.022 × 10^(23) items in a mole.\n", + "However, we can't *count* the amount of DNA we have in a test-tube.\n", + "But, we can *weight* the DNA by looking at the amount of light absorbed by the sample using a device called a [Qubit](https://www.youtube.com/watch?v=RRKZN--7jqg).\n", + "Then, if we know the number of nucleotides in the strand, we can convert the weight of the DNA into a number of *moles*.\n", + "Refer to the course book for a in-depth review of math.\n", + "\n", + "Doing this calculation manually is tedious and prone to error. The perfect thing to automate." + ] + }, + { + "cell_type": "markdown", + "id": "35d4169c", + "metadata": {}, + "source": [ + "## Walkthrough" + ] + }, + { + "cell_type": "markdown", + "id": "c516dbb5", + "metadata": {}, + "source": [ + "We do this through a series of *expressions*.\n", + "Remember, the computer is not 'space limited' we should write code so WE understand it.\n", + "Not, try to make everything as short and compact as possible." + ] + }, + { + "cell_type": "markdown", + "id": "add539e5", + "metadata": {}, + "source": [ + "Assume you have a 25 ul of a 280 bp double-stranded template at that you measured to be a concentration of 50.6 ng/ul." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "9ff40df4", + "metadata": {}, + "outputs": [], + "source": [ + "# It is often useful to define all of your variables at the beginning.\n", + "amplicon_length = 280 # bp\n", + "dna_weight = 650 # g/mole/bp\n", + "dna_conc = 50.6 # ng/ul\n", + "volume = 25 # ul" + ] + }, + { + "cell_type": "markdown", + "id": "2f78b8ae", + "metadata": {}, + "source": [ + "## What is the template weight?" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "2ac5ff60", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The template weighs 182000 g/mole\n" + ] + } + ], + "source": [ + "template_weight = amplicon_length*dna_weight\n", + "print(f'The template weighs {template_weight} g/mole')" + ] + }, + { + "cell_type": "markdown", + "id": "aaa997e6", + "metadata": {}, + "source": [ + "## Q1: Calculate the molarity of the sample" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "b387af00", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The DNA molarity is 278.02197802197804 fmoles/ul\n" + ] + } + ], + "source": [ + "# Answer in fmoles/ul\n", + "\n", + "dna_molarity = dna_conc * 1E-9 / template_weight / 1E-15 # SOLUTION\n", + "print(f'The DNA molarity is {dna_molarity} fmoles/ul')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "7468753b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Is dna_molarity a float: True\n" + ] + } + ], + "source": [ + "print('Is dna_molarity a float:', isinstance(dna_molarity, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "2b1265bc", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "dna_molarity = 278.0\n" + ] + } + ], + "source": [ + "print(f'dna_molarity = {dna_molarity:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "0538443b", + "metadata": {}, + "source": [ + "Some things to notice above:\n", + " 1. There's an `f` immediately before the `'`. This makes it a \"formatted\" string. Or `f-string`.\n", + " 2. There's a lot of different colors changing." + ] + }, + { + "cell_type": "markdown", + "id": "2310a821", + "metadata": {}, + "source": [ + "### `f-strings`\n", + "\n", + "These are a new (circa 2016) addition to Python that makes adding data into strings.\n", + "Representing our results as dynamically changing explanatory statements helps make our analysis more transparent and reproducible.\n", + "`f-strings` make this much easier.\n", + "\n", + "Take a look at this post from [The Python Guru](https://thepythonguru.com/python-string-formatting/) for an indepth explanation of the formatting." + ] + }, + { + "cell_type": "markdown", + "id": "6f7942e2", + "metadata": {}, + "source": [ + "### Linting through color\n", + "\n", + "If we look around our notebook, we can see that there are a lot of different text colors.\n", + "Those are hints at what Python thinks we're trying to tell it.\n", + "Understanding the code can really help with debugging.\n", + "\n", + "\n", + "Numbers are green.\n", + "```python\n", + "1231231\n", + "```\n", + "\n", + "Variables are black.\n", + "```python\n", + "val = 1231231\n", + "other = val\n", + "```\n", + "\n", + "Strings are orange.\n", + "```python\n", + "val = '1231231'\n", + "```\n", + "_Even if they are strings of numbers._\n", + "\n", + "`f-strings` are orange.\n", + "```python\n", + "val = f'1231231'\n", + "```\n", + "\n", + "\n", + "`f-strings` are orange, unless it is between `{` `}`.\n", + "```python\n", + "age = 12\n", + "val = f'This book is {age} years old.'\n", + "```\n", + "\n", + "The parts between curly braces are replaced by the value in the code.\n", + "\n", + "\n", + "Notice how imbalanced braces alters the color.\n", + "```python\n", + "age = 12\n", + "val = f'This book is {age years old.'\n", + "```\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "6d7c3cd8", + "metadata": {}, + "source": [ + "## Q2: Calculate the amount of sample to add.\n", + "\n", + "The protocol requires us to start with 200 fmoles of template DNA.\n", + "How many mircoliters of our stock do we need to start with?" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "077c05a7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "You should add 0.72 ul of sample to your reaction.\n" + ] + } + ], + "source": [ + "# Answer in ul\n", + "\n", + "wanted_dna = 200 # fmoles\n", + "\n", + "volume_to_add = wanted_dna / dna_molarity # SOLUTION\n", + "\n", + "print(f'You should add {volume_to_add:0.2f} ul of sample to your reaction.')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "999d7794", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Is volume_to_add a float: True\n" + ] + } + ], + "source": [ + "print('Is volume_to_add a float:', isinstance(volume_to_add, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "23f4c888", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "volume_to_add = 0.72\n" + ] + } + ], + "source": [ + "print(f'volume_to_add = {volume_to_add:0.2f}')" + ] + }, + { + "cell_type": "markdown", + "id": "a2e6b58e", + "metadata": { + "tags": [] + }, + "source": [ + "## Q3: Describing the reaction yield\n", + "\n", + "Calculating how much **total** amount of DNA we created during the PCR is called the _yield_ of the reaction.\n", + "\n", + "Create an `f-string` that renders the yield in femtomoles of this reaction. Round your answer to the nearest integer." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "419fd114", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The experiment yielded 6951 fmoles of DNA.\n" + ] + } + ], + "source": [ + "# Calculate the amount of DNA in the entire reaction\n", + "# Answer in fmoles\n", + "dna_yield = dna_molarity*volume # SOLUTION\n", + "\n", + "# Create an f-string that uses the dna_yield variable\n", + "# and describes the result in a short sentence\n", + "dna_yield_description = f'The experiment yielded {dna_yield:0.0f} fmoles of DNA.' # SOLUTION\n", + "\n", + "print(dna_yield_description)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "e6576ec2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Is dna_yield_description a str: True\n" + ] + } + ], + "source": [ + "print('Is dna_yield_description a str:', isinstance(dna_yield_description, str))" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "c7f9a1c7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Is the correct number in the description: True\n" + ] + } + ], + "source": [ + "print('Is the correct number in the description:', '6951' in dna_yield_description)" + ] + }, + { + "cell_type": "markdown", + "id": "5af9a301-0a1e-47ed-8360-8cba5370ae6d", + "metadata": {}, + "source": [ + "## Functions" + ] + }, + { + "cell_type": "markdown", + "id": "b04917b2-c287-45ca-950a-ed84d8a524a3", + "metadata": {}, + "source": [ + "Functions are self contained blocks of code created for a reusable purpose.\n", + "\n", + "**Purpose:**\n", + "* Modularity: Breaks down complex processes into smaller, manageable parts.\n", + "* Reusability: Allows the same code to be used multiple times without repetition.\n", + "* Organization: Makes the code more organized and easier to understand.\n", + "\n", + "\n", + "```python\n", + "def function_name(arg1, arg2, kwarg1=1, kwarg2='a'):\n", + " \"A brief function description\"\n", + "\n", + " # do something with inputs\n", + " result = arg1 + 2*arg2\n", + "\n", + " return result\n", + "\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "09b1a9c9-ef18-48be-a11c-29b3ce7f19ab", + "metadata": {}, + "source": [ + "Instead of continually copy-paste-and-change, we should write a function.\n", + "\n", + "We've been using something like this to calculate the molarity from the concentration.\n", + "\n", + "```python\n", + "dna_molarity = dna_conc * 1E-9 / template_weight / 1E-15 \n", + "\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "0338098a-34d3-4fb1-a150-7177effdb519", + "metadata": {}, + "outputs": [], + "source": [ + "def calc_molarity(sample_concentration, sample_length, base_weight=650):\n", + " \"\"\"Calculate molarity of samples.\n", + "\n", + " sample_concentration : ng/ul\n", + " sample_length : bases\n", + " base_weight : g/mole/bp\n", + "\n", + " returns molarity fmols/ul\n", + " \"\"\"\n", + "\n", + " nano = 1E-9\n", + " fempto = 1E-15\n", + "\n", + " amplicon_weight = sample_length*base_weight\n", + " molarity = sample_concentration * nano / amplicon_weight / fempto\n", + "\n", + " return molarity\n" + ] + }, + { + "cell_type": "markdown", + "id": "ecc78848-963b-4775-a5e4-1c25774af6ab", + "metadata": {}, + "source": [ + "Once created, we can use this function anywhere." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "bbd13b0d-ec1b-462a-9989-118dfc1fe04e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Function calculated paragon molarity 278.0 fmols/ul\n" + ] + } + ], + "source": [ + "paragon_molarity = calc_molarity(50.6, 280)\n", + "print(f'Function calculated paragon molarity {paragon_molarity:0.1f} fmols/ul')" + ] + }, + { + "cell_type": "markdown", + "id": "1c4aca50-9a5c-4f89-85a8-5abf3a6ca9d5", + "metadata": {}, + "source": [ + "Now, if we had another sample with a different concentration." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "d3f2afaf-186b-4647-b674-c5db087feb63", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Function calculated new molarity 827.5 fmols/ul\n" + ] + } + ], + "source": [ + "new_concentration = 150.6 # ng/ul\n", + "\n", + "new_paragon_molarity = calc_molarity(new_concentration, 280)\n", + "print(f'Function calculated new molarity {new_paragon_molarity:0.1f} fmols/ul')" + ] + }, + { + "cell_type": "markdown", + "id": "b36d47c0-9b06-4a18-9cba-f2167f5a4049", + "metadata": {}, + "source": [ + "Or, if *for some reason* you were making RNA, the `base_weight` would be different." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "3754737d-7d32-4dec-9e6a-1aa9df64ac9f", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Function calculated rna molarity 1680.8 fmols/ul\n" + ] + } + ], + "source": [ + "rna_paragon_molarity = calc_molarity(new_concentration, 280, base_weight=320)\n", + "print(f'Function calculated rna molarity {rna_paragon_molarity:0.1f} fmols/ul')" + ] + }, + { + "cell_type": "markdown", + "id": "e0dcc246-60c5-49e0-8f67-5f392010d7b1", + "metadata": { + "tags": [] + }, + "source": [ + "## Q4: Write a function which calculates the reaction yield\n", + "\n", + "Use the function above as a template to create on that further calculates the reaction yield in `fmols`." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "de147208-c0ef-4383-a394-3340e64865e5", + "metadata": {}, + "outputs": [], + "source": [ + "def calc_yield(sample_concentration, sample_length, sample_volume, base_weight=650):\n", + " \"\"\"Calculate molarity of samples.\n", + "\n", + " sample_concentration : ng/ul\n", + " sample_length : bases\n", + " base_weight : g/mole/bp\n", + "\n", + " returns sample_yield in fmols\n", + " \"\"\"\n", + " # BEGIN SOLUTION NO PROMPT\n", + "\n", + " molarity = calc_molarity(sample_concentration, sample_length, base_weight=base_weight)\n", + " sample_yield = molarity*sample_volume\n", + "\n", + " return sample_yield\n", + "\n", + " # END SOLUTION\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "03d9d1c8-f1d4-4da2-a78d-145134ab5f59", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current reaction yield is 6950.5 fmols\n" + ] + } + ], + "source": [ + "current_yield = calc_yield(50.6, 280, 25)\n", + "print(f'Current reaction yield is {current_yield:0.1f} fmols')" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "1cef68ee-529b-4c25-b2a9-233d69939ddf", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Testing calc_yield(50.6, 280, 25) = 6950.5\n" + ] + } + ], + "source": [ + "print(f'Testing calc_yield(50.6, 280, 25) = {calc_yield(50.6, 280, 25):0.1f}')" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "409348d9-f122-4d5e-8257-d0ad492c1d42", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Testing calc_yield(35, 263, 20, base_weight=320) = 26988.6\n" + ] + } + ], + "source": [ + "print(f'Testing calc_yield(35, 263, 20, base_weight=320) = {calc_yield(35, 77, 19, base_weight=320):0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "c7e25054-fc34-4393-975f-de23c2500122", + "metadata": {}, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "id": "22b14f97-975d-466b-a595-16c6866f86e5", + "metadata": {}, + "source": [ + "In this walkthrough we have discussed a number of ways to perform basic math in Python.\n", + "We also covered strategies to modularize processes into reusable functions.\n", + "This week we worked with a 'one number at a time' strategy, in the next module we will explore using tables to work with multiple samples at the same time." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb b/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb new file mode 100644 index 0000000..b5ff137 --- /dev/null +++ b/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb @@ -0,0 +1,2996 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "da4cbf41", + "metadata": {}, + "source": [ + "# Module 03 Walkthrough\n", + "\n", + "Remember, all assignments are due before the synchronous session.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "d1b24089-7033-4965-ae63-de76bdf935a9", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial.\n", + "In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group.\n", + "After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first \"detectable\" viral load was found.\n", + "We will use `Pandas` to analyze this data and evaluate the treatment's effectiveness.\n", + "By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in `DataFrames`, and using summary methods like sum, mean, and max.\n", + "Let's get started!" + ] + }, + { + "cell_type": "markdown", + "id": "d728e12b", + "metadata": {}, + "source": [ + "## Learning Objectives\n", + "At the end of this learning activity you will be able to:\n", + " - Practice loading spreadsheet data into Python using `pandas`.\n", + " - Use Python methods to create derived columns in `pd.DataFrames`.\n", + " - Use `Pandas` summary methods like sum, mean, and max.\n", + " - Employ basic filtering and data extraction from `pandas`." + ] + }, + { + "cell_type": "markdown", + "id": "28d532d9", + "metadata": {}, + "source": [ + "## Dataset Reference\n", + "\n", + "_File_: `trial_data.csv`\n", + "\n", + "_Columns_:\n", + "\n", + " - `age` : (years) Current age during the study. \n", + " - `age_initial_infection` : (years) Age at which the participant was initially infected.\n", + " - `initial_viral_load` : (copies/ul) The level of infection at the start of the study.\n", + " - `treatment` : (boolean) `True` for participant in the treatment group, `False` for those in the control group.\n", + " - `weeks_to_failure` : (weeks) Time from the treatment to the first week of uncontrolled viral load.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "621cd2ef", + "metadata": {}, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "markdown", + "id": "917b592b", + "metadata": {}, + "source": [ + "While _basic_ Python can do a lot, you have to do everything yourself.\n", + "The **real** power of Python is that you can `import` code that is written by others.\n", + "\n", + "For this course, we will use a common data science stack of interoperable tools centered around the [Numpy](https://numpy.org/).\n", + "\n", + "There are four that we will use regularly, two of which we'll cover today." + ] + }, + { + "cell_type": "markdown", + "id": "cfb0afb0-6fe4-47c7-b044-75144973797c", + "metadata": {}, + "source": [ + "### Numpy\n", + "\n", + "[Numpy](https://numpy.org/)\n", + "\n", + "A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.\n", + "\n", + "By convention, the community tends to _alias_ the long `numpy` as `np`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "d5cc7c1d-b078-4555-a578-f862584233c4", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np" + ] + }, + { + "cell_type": "markdown", + "id": "8152b253-6408-4f55-9505-672f597e23e7", + "metadata": {}, + "source": [ + "### Pandas\n", + "\n", + "[Pandas](https://pandas.pydata.org/)\n", + "\n", + "A libary that sits atop `numpy` and provides a _spreadsheet_ style object called a `DataFrame` along with a plethora of data sciecne utilities.\n", + "This is the main tool we will be using for data exploration.\n", + "\n", + "By convention, the community tends to _alias_ the long `pandas` as `pd`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "d5a223d0-d0d2-471a-b5b8-a63a700eda75", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd" + ] + }, + { + "cell_type": "markdown", + "id": "519ff9d5", + "metadata": {}, + "source": [ + "Nicely, it can read `csv` files for us." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "4492bb2c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
    0552666False3
    1482666False4
    2453632True6
    3433123False5
    4402045True5
    5422057True9
    6553123False4
    7565022False4
    8593333False5
    9513049True7
    10552194False3
    11534285True5
    12403427True8
    13484199False3
    14564159False6
    15534738True7
    16574142True8
    17483357False4
    18514225False2
    19554645False1
    20432446False1
    21483799True8
    22512736False2
    23433448True7
    24514388False2
    25492076False5
    26544774False5
    27452587True5
    28594049False5
    29514338True8
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "10 55 21 94 False \n", + "11 53 42 85 True \n", + "12 40 34 27 True \n", + "13 48 41 99 False \n", + "14 56 41 59 False \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "17 48 33 57 False \n", + "18 51 42 25 False \n", + "19 55 46 45 False \n", + "20 43 24 46 False \n", + "21 48 37 99 True \n", + "22 51 27 36 False \n", + "23 43 34 48 True \n", + "24 51 43 88 False \n", + "25 49 20 76 False \n", + "26 54 47 74 False \n", + "27 45 25 87 True \n", + "28 59 40 49 False \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure \n", + "0 3 \n", + "1 4 \n", + "2 6 \n", + "3 5 \n", + "4 5 \n", + "5 9 \n", + "6 4 \n", + "7 4 \n", + "8 5 \n", + "9 7 \n", + "10 3 \n", + "11 5 \n", + "12 8 \n", + "13 3 \n", + "14 6 \n", + "15 7 \n", + "16 8 \n", + "17 4 \n", + "18 2 \n", + "19 1 \n", + "20 1 \n", + "21 8 \n", + "22 2 \n", + "23 7 \n", + "24 2 \n", + "25 5 \n", + "26 5 \n", + "27 5 \n", + "28 5 \n", + "29 8 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df = pd.read_csv('trial_data.csv')\n", + "\n", + "# If a `DataFrame` is the last line, it will display a nice summary\n", + "trial_df" + ] + }, + { + "cell_type": "markdown", + "id": "31664b42", + "metadata": {}, + "source": [ + "And we should see that this exactly matches the table we saw in Excel." + ] + }, + { + "cell_type": "markdown", + "id": "1e653c16-ca8d-4641-ac5c-d81e549657ae", + "metadata": {}, + "source": [ + "The object we got back is called a `DataFrame`." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "b8e9d2ba-70fa-4614-8dae-e70f0a2f0db1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pandas.core.frame.DataFrame" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(trial_df)" + ] + }, + { + "cell_type": "markdown", + "id": "9124b71d-4468-42ad-b98a-4412d553f369", + "metadata": {}, + "source": [ + "If we only want to see a small version of the `DataFrame` we can use the `.head()` _method_." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "075dbccc-d1cf-4127-bd61-e15ddab0f2ce", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failure
    0552666False3
    1482666False4
    2453632True6
    3433123False5
    4402045True5
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment weeks_to_failure\n", + "0 55 26 66 False 3\n", + "1 48 26 66 False 4\n", + "2 45 36 32 True 6\n", + "3 43 31 23 False 5\n", + "4 40 20 45 True 5" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "197cc06c-7490-4ea2-ab98-652c64a37c50", + "metadata": {}, + "source": [ + "## Acting on Columns" + ] + }, + { + "cell_type": "markdown", + "id": "75de8d1e", + "metadata": {}, + "source": [ + "We can reference each column by name using square brackets `[]`.\n", + "For example: Extracting the `age` column like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "cacc125e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 55\n", + "1 48\n", + "2 45\n", + "3 43\n", + "4 40\n", + "5 42\n", + "6 55\n", + "7 56\n", + "8 59\n", + "9 51\n", + "10 55\n", + "11 53\n", + "12 40\n", + "13 48\n", + "14 56\n", + "15 53\n", + "16 57\n", + "17 48\n", + "18 51\n", + "19 55\n", + "20 43\n", + "21 48\n", + "22 51\n", + "23 43\n", + "24 51\n", + "25 49\n", + "26 54\n", + "27 45\n", + "28 59\n", + "29 51\n", + "Name: age, dtype: int64" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df['age']" + ] + }, + { + "cell_type": "markdown", + "id": "c8d83ab1", + "metadata": {}, + "source": [ + "### Q1: Extract the `initial_viral_load` column ?" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f99e62ac", + "metadata": {}, + "outputs": [], + "source": [ + "init_vl = trial_df['initial_viral_load'] # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "68ab0587-f8db-46fa-b758-2320a9ec6858", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "init_vl is a `pd.Series`: True\n" + ] + } + ], + "source": [ + "print('init_vl is a `pd.Series`:', isinstance(init_vl, pd.Series))" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "640c9a7e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "init_vl_sum = 1628\n" + ] + } + ], + "source": [ + "print(f'init_vl_sum = {init_vl.sum()}')" + ] + }, + { + "cell_type": "markdown", + "id": "2cac446b", + "metadata": {}, + "source": [ + "Once we can extract columns, we can start summarizing them." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "48ce947a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The mean age of the population is 50.1 yrs.\n" + ] + } + ], + "source": [ + "age_col = trial_df['age']\n", + "age_mean = age_col.mean()\n", + "print(f'The mean age of the population is {age_mean:0.1f} yrs.')" + ] + }, + { + "cell_type": "markdown", + "id": "35eb614c", + "metadata": {}, + "source": [ + "Expressions can also be _chained_. \n", + "They are functionally the same, the only difference is aesthetic. " + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "1be80170", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The mean age of the population is 50.1 yrs, even when done on a single line.\n" + ] + } + ], + "source": [ + "age_mean_short = trial_df['age'].mean()\n", + "print(f'The mean age of the population is {age_mean_short:0.1f} yrs, even when done on a single line.')" + ] + }, + { + "cell_type": "markdown", + "id": "73927199", + "metadata": {}, + "source": [ + "### Q2: Calculate the average `weeks_to_failure` for the whole population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "ba3fa20b", + "metadata": {}, + "outputs": [], + "source": [ + "average_weeks = trial_df['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "e6176369", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "average_weeks = 4.9\n" + ] + } + ], + "source": [ + "print(f'average_weeks = {average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "8f948c7a-4e79-4083-9a76-ba7a040240c7", + "metadata": {}, + "source": [ + "We can also summarize an entire `DataFrame` with a single command." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "bd9ca277-1b6d-4d66-b000-b9c0e1973e38", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "age 50.133333\n", + "age_initial_infection 34.366667\n", + "initial_viral_load 54.266667\n", + "treatment 0.400000\n", + "weeks_to_failure 4.900000\n", + "dtype: float64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.mean()" + ] + }, + { + "cell_type": "markdown", + "id": "bac8ff72-1f06-4ad1-a484-fac4126cf4a3", + "metadata": {}, + "source": [ + "In this case the summary went _down_ the columns and calculated a mean for each." + ] + }, + { + "cell_type": "markdown", + "id": "679c12f1-e5bb-42d5-9dcb-7e7634853f67", + "metadata": {}, + "source": [ + "There are a number of other summarization _methods_.\n", + " - `max()`\n", + " - `min()`\n", + " - `mode()`\n", + " - `median()`\n", + " - `var()`\n", + " - `std()`\n", + " - `nunique()`" + ] + }, + { + "cell_type": "markdown", + "id": "11f7825b-21ba-4a24-9316-b91047dc17b6", + "metadata": {}, + "source": [ + "```{note}\n", + ":class: dropdown\n", + "Methods, are functions that are attached to an `object`.\n", + "They usually act on the object to provide a summary, perform a transformation, or otherwise utilize the information within the object.\n", + "In this case, these summarization methods utilize the information within the dataframe to summarize each column.\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "142a4c50-7ac3-4db8-b08b-b0ad4a075b14", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadweeks_to_failure
    count30.00000030.00000030.00000030.000000
    mean50.13333334.36666754.2666674.900000
    std5.5692099.04198424.0702042.202663
    min40.00000020.00000022.0000001.000000
    25%45.75000026.25000036.5000003.250000
    50%51.00000034.00000048.5000005.000000
    75%55.00000041.75000072.0000006.750000
    max59.00000050.00000099.0000009.000000
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load weeks_to_failure\n", + "count 30.000000 30.000000 30.000000 30.000000\n", + "mean 50.133333 34.366667 54.266667 4.900000\n", + "std 5.569209 9.041984 24.070204 2.202663\n", + "min 40.000000 20.000000 22.000000 1.000000\n", + "25% 45.750000 26.250000 36.500000 3.250000\n", + "50% 51.000000 34.000000 48.500000 5.000000\n", + "75% 55.000000 41.750000 72.000000 6.750000\n", + "max 59.000000 50.000000 99.000000 9.000000" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.describe()" + ] + }, + { + "cell_type": "markdown", + "id": "e5b40fdd", + "metadata": {}, + "source": [ + "Selecting columns is nice.\n", + "We can also add a new column based on another one.\n", + "\n", + "In HIV research it is often important to know how long someone has been living with HIV.\n", + "However, this dataset contains their current age, and their age at infection.\n", + "We can use these two to calculate the length." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "7c162199", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# first make a new `Series`\n", + "years_infected = trial_df['age'] - trial_df['age_initial_infection']\n", + "\n", + "# Then add that series into the table\n", + "trial_df['years_infected'] = years_infected\n", + "trial_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "69cd190f-6c41-48e5-805c-5d3bde23a510", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 " + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Alternatively\n", + "trial_df['years_infected'] = trial_df['age'] - trial_df['age_initial_infection']\n", + "trial_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "3d5dc837-650c-44db-8aab-8675491b8049", + "metadata": {}, + "source": [ + "## Acting on Rows" + ] + }, + { + "cell_type": "markdown", + "id": "3f2dac9e-00f7-4442-8a36-20631e73f8f6", + "metadata": {}, + "source": [ + "### Indexing" + ] + }, + { + "cell_type": "markdown", + "id": "c38315cd", + "metadata": {}, + "source": [ + "When selecting rows, or rows and columns, we need to use the `.loc` attribute of the `DataFrame`.\n", + "\n", + "We can select by row number." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "85d1364b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "age 55\n", + "age_initial_infection 26\n", + "initial_viral_load 66\n", + "treatment False\n", + "weeks_to_failure 3\n", + "years_infected 29\n", + "Name: 0, dtype: object" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.loc[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "39614ebd-ee4b-46ab-9619-40b14ac66418", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    2453632True69
    3433123False512
    4402045True520
    5422057True922
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    10552194False334
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "10 55 21 94 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "2 6 9 \n", + "3 5 12 \n", + "4 5 20 \n", + "5 9 22 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "10 3 34 " + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can use a : to indicate a range.\n", + "trial_df.loc[0:10]" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "87473eee-abd9-4ca7-9f85-420103cf22c0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    5422057True922
    7565022False46
    13484199False37
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "5 42 20 57 True \n", + "7 56 50 22 False \n", + "13 48 41 99 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "5 9 22 \n", + "7 4 6 \n", + "13 3 7 " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can provide an arbitrary list\n", + "trial_df.loc[[0, 5, 7, 13]]" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "24b190cc-d554-46ea-b7c3-ada85f89ede5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    initial_viral_loadage
    06655
    55742
    72256
    139948
    \n", + "
    " + ], + "text/plain": [ + " initial_viral_load age\n", + "0 66 55\n", + "5 57 42\n", + "7 22 56\n", + "13 99 48" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We can also select columns at the same time.\n", + "trial_df.loc[[0, 5, 7, 13], ['initial_viral_load', 'age']]" + ] + }, + { + "cell_type": "markdown", + "id": "7110e944-753f-4e99-a6d4-c1a62c84ce40", + "metadata": {}, + "source": [ + "### Boolean Indexing" + ] + }, + { + "cell_type": "markdown", + "id": "2327261a-e23e-4f80-ab9c-61d9d593769a", + "metadata": {}, + "source": [ + "If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing.\n", + "In this stragey we create a new `pd.Series` of True/False values where True corresponds to the ones we want." + ] + }, + { + "cell_type": "markdown", + "id": "04de3217-aaea-445d-91c1-f55329913752", + "metadata": {}, + "source": [ + "Start by finding all people over 50 years old." + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "0ddab93f-c9b6-4caa-b47d-37325c748b76", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 " + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "age_mask = trial_df['age'] > 50\n", + "aged_samples = trial_df.loc[age_mask]\n", + "aged_samples.head()" + ] + }, + { + "cell_type": "markdown", + "id": "31e45076-4226-4f85-9731-21502452138f", + "metadata": {}, + "source": [ + "```{note}\n", + ":class: dropdown\n", + "I often use the suffix `_mask` when I create boolean indexes.\n", + "It is not required, but utilizing naming conventions makes your code easier to understand by yourself and others.\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "f82b318d-5429-4c16-aff9-1b8fb32d35db", + "metadata": {}, + "source": [ + "Now, if we also wanted to split by the initial_viral_load we might do:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "e3948cff-b118-4fa2-bb49-541567d43404", + "metadata": {}, + "outputs": [], + "source": [ + "high_vl_mask = trial_df['initial_viral_load'] > 50" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "467ede0c-c706-456e-8cb8-f6b257cdbf86", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    10552194False334
    11534285True511
    14564159False615
    24514388False28
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "10 55 21 94 False \n", + "11 53 42 85 True \n", + "14 56 41 59 False \n", + "24 51 43 88 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "10 3 34 \n", + "11 5 11 \n", + "14 6 15 \n", + "24 2 8 " + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "aged_high_vl = trial_df.loc[age_mask & high_vl_mask]\n", + "aged_high_vl.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "7e70c0ea-31ae-4315-81e7-080229ed1b6e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    15534738True76
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "15 53 47 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "15 7 6 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# ~ can be used to say \"not\"\n", + "aged_low_vl = trial_df.loc[age_mask & ~high_vl_mask]\n", + "aged_low_vl.head()" + ] + }, + { + "cell_type": "markdown", + "id": "45027741-6266-4ea5-86ce-fe20ef65baa3", + "metadata": {}, + "source": [ + "### Q3: Calculate the average weeks to failure for the treated population?" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "5e318b06-4063-4d00-b42a-c13c0b9b55d4", + "metadata": {}, + "outputs": [], + "source": [ + "treated_mask = trial_df['treatment'] == True # SOLUTION NO PROMPT\n", + "treated_average_weeks = trial_df.loc[treated_mask, 'weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "dd141501-a148-4168-b9bc-3448e7e60028", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks = 6.9\n" + ] + } + ], + "source": [ + "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "0673dd84-d614-4c05-afcd-941163c57608", + "metadata": {}, + "source": [ + "Utilizing boolean indexing you can express _any_ algorithmic row selecting strategy.\n", + "This can even include comparisons between rows, for example if there were multiple rows of the same sample.\n", + "We will cover these strategies later in the course." + ] + }, + { + "cell_type": "markdown", + "id": "c212312f-58e6-4354-8bb3-94df1bc669f2", + "metadata": {}, + "source": [ + "Sometimes, our searches are simple.\n", + "Pandas also includes another method for indexing rows called `.query()` for these purposes." + ] + }, + { + "cell_type": "markdown", + "id": "5b87435e-b89c-4cf5-be6b-526faf8469fd", + "metadata": {}, + "source": [ + "### Querying" + ] + }, + { + "cell_type": "markdown", + "id": "e1abf62a", + "metadata": {}, + "source": [ + "`.query()` is an interface that facilitates simple queries qith a few specific limitations:\n", + " - It can only use the information present in the row.\n", + " - It can only work on one row at a time.\n", + " - Column headers cannot contain spaces, dots, dashes, commas, or emoji." + ] + }, + { + "cell_type": "markdown", + "id": "e6ea84aa-edc6-40e8-aebd-e8eedbfd59e8", + "metadata": {}, + "source": [ + "Our questions on this dataset easily fit within those constraints." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "19e82c53", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    4402045True520
    5422057True922
    9513049True721
    11534285True511
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "9 51 30 49 True \n", + "11 53 42 85 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "4 5 20 \n", + "5 9 22 \n", + "9 7 21 \n", + "11 5 11 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# All treatment rows\n", + "trial_df.query('treatment == True').head()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "c2ac06a0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    0552666False329
    1482666False422
    3433123False512
    6553123False424
    7565022False46
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "0 55 26 66 False \n", + "1 48 26 66 False \n", + "3 43 31 23 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "\n", + " weeks_to_failure years_infected \n", + "0 3 29 \n", + "1 4 22 \n", + "3 5 12 \n", + "6 4 24 \n", + "7 4 6 " + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('treatment == False').head()" + ] + }, + { + "cell_type": "markdown", + "id": "2d7c3caf", + "metadata": {}, + "source": [ + "You can also make them more complex." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "7a4fa71b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    4402045True520
    5422057True922
    9513049True721
    11534285True511
    12403427True86
    15534738True76
    16574142True816
    21483799True811
    23433448True79
    27452587True520
    29514338True88
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "4 40 20 45 True \n", + "5 42 20 57 True \n", + "9 51 30 49 True \n", + "11 53 42 85 True \n", + "12 40 34 27 True \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "21 48 37 99 True \n", + "23 43 34 48 True \n", + "27 45 25 87 True \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "4 5 20 \n", + "5 9 22 \n", + "9 7 21 \n", + "11 5 11 \n", + "12 8 6 \n", + "15 7 6 \n", + "16 8 16 \n", + "21 8 11 \n", + "23 7 9 \n", + "27 5 20 \n", + "29 8 8 " + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('age > 33 & treatment == True')" + ] + }, + { + "cell_type": "markdown", + "id": "8b2af46a", + "metadata": {}, + "source": [ + "This statement doesn't make a \"biological sense\", but it is an example of a valid comparison." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "af1fd110", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
    \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
    ageage_initial_infectioninitial_viral_loadtreatmentweeks_to_failureyears_infected
    2453632True69
    3433123False512
    6553123False424
    7565022False46
    8593333False526
    9513049True721
    12403427True86
    15534738True76
    16574142True816
    18514225False29
    19554645False19
    22512736False224
    28594049False519
    29514338True88
    \n", + "
    " + ], + "text/plain": [ + " age age_initial_infection initial_viral_load treatment \\\n", + "2 45 36 32 True \n", + "3 43 31 23 False \n", + "6 55 31 23 False \n", + "7 56 50 22 False \n", + "8 59 33 33 False \n", + "9 51 30 49 True \n", + "12 40 34 27 True \n", + "15 53 47 38 True \n", + "16 57 41 42 True \n", + "18 51 42 25 False \n", + "19 55 46 45 False \n", + "22 51 27 36 False \n", + "28 59 40 49 False \n", + "29 51 43 38 True \n", + "\n", + " weeks_to_failure years_infected \n", + "2 6 9 \n", + "3 5 12 \n", + "6 4 24 \n", + "7 4 6 \n", + "8 5 26 \n", + "9 7 21 \n", + "12 8 6 \n", + "15 7 6 \n", + "16 8 16 \n", + "18 2 9 \n", + "19 1 9 \n", + "22 2 24 \n", + "28 5 19 \n", + "29 8 8 " + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trial_df.query('age >= initial_viral_load')" + ] + }, + { + "cell_type": "markdown", + "id": "e68592de", + "metadata": {}, + "source": [ + "### Q4: Calculate the average `weeks_to_failure` for the untreated population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "cfcdf2f7", + "metadata": {}, + "outputs": [], + "source": [ + "# BEGIN SOLUTION NO PROMPT\n", + "\n", + "wanted_samples = trial_df.query('treatment == False')\n", + "\n", + "# END SOLUTION\n", + "\n", + "untreated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "5ea72ed6-01ee-48d4-800b-5d145f3517ad", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Untreated participants took 3.6 weeks to rebound.\n" + ] + } + ], + "source": [ + "print(f'Untreated participants took {untreated_average_weeks:0.1f} weeks to rebound.')" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "7f804ed1-a9c2-41a1-bc31-58035c91e967", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "untreated_average_weeks is a `float`: True\n" + ] + } + ], + "source": [ + "print('untreated_average_weeks is a `float`:', isinstance(untreated_average_weeks, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "8f8d7324", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "untreated_average_weeks = 3.6\n" + ] + } + ], + "source": [ + "print(f'untreated_average_weeks = {untreated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "e87cce47", + "metadata": {}, + "source": [ + "### Q4: Calculate the average `weeks_to_failure` for the treated population?\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "2742d0bb", + "metadata": {}, + "outputs": [], + "source": [ + "# BEGIN SOLUTION NO PROMPT\n", + "\n", + "wanted_samples = trial_df.query('treatment == True')\n", + "\n", + "# END SOLUTION\n", + "\n", + "treated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "c6b2bfa0-8673-4666-adcd-31a1a574fd79", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Treated patients took 6.9 weeks to rebound.\n" + ] + } + ], + "source": [ + "print(f'Treated patients took {treated_average_weeks:0.1f} weeks to rebound.')" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "79786fec-0e7c-461e-9a65-0fac65d6870a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks is a `float`: True\n" + ] + } + ], + "source": [ + "print('treated_average_weeks is a `float`:', isinstance(treated_average_weeks, float))" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "ea73783c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "treated_average_weeks = 6.9\n" + ] + } + ], + "source": [ + "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')" + ] + }, + { + "cell_type": "markdown", + "id": "5af4b494", + "metadata": {}, + "source": [ + "# Conclusion" + ] + }, + { + "cell_type": "markdown", + "id": "a2d1c3b8", + "metadata": {}, + "source": [ + "We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks.\n", + "While not a complete cure, any incremental step is useful progress in the elimination of HIV.\n", + "\n", + "In the lab you will use similar techniques to explore whether other factors in this dataset impact the results.\n", + "In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment." + ] + }, + { + "cell_type": "markdown", + "id": "493f93a7", + "metadata": {}, + "source": [ + "---------------------------------------------" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file diff --git a/makefile b/makefile new file mode 100644 index 0000000..4f50e3c --- /dev/null +++ b/makefile @@ -0,0 +1,7 @@ + +update_book: + cp -r ../applied_biostats/_book/book/_build/html/* . + cp -r ../applied_biostats/_book/book/_build/jupyter_execute . + +#deploy_book: + diff --git a/objects.inv b/objects.inv index 557396d..e46bdb3 100644 Binary files a/objects.inv and b/objects.inv differ diff --git a/search.html b/search.html index b979e3f..f5b4d53 100644 --- a/search.html +++ b/search.html @@ -195,6 +195,11 @@
  • Walkthrough
  • Nanopore Sequencing
  • Dilution calculations
  • + + +
  • Module 3: DataFrames
  • diff --git a/searchindex.js b/searchindex.js index c5e4a54..3c045aa 100644 --- a/searchindex.js +++ b/searchindex.js @@ -1 +1 @@ -Search.setIndex({"alltitles": {"About this book": [[8, "about-this-book"]], "Calculate a aerobic target heart rate?": [[1, "calculate-a-aerobic-target-heart-rate"]], "Cells": [[2, "cells"]], "Coding expectations": [[1, "coding-expectations"]], "Conclusion": [[4, "conclusion"]], "Dilution calculations": [[5, "dilution-calculations"]], "Don\u2019t be afraid to Restart & Run all": [[2, null]], "Functions": [[4, "functions"]], "Introduction": [[1, "introduction"], [9, "introduction"]], "Jupyter Notebooks": [[2, "jupyter-notebooks"]], "Learning Objectives": [[4, "learning-objectives"]], "Linting through color": [[4, "linting-through-color"]], "Markdown": [[1, "markdown"]], "Module 1: Hello World": [[0, "module-1-hello-world"]], "Module 2: Simple calculations": [[3, "module-2-simple-calculations"]], "Nanopore Sequencing": [[6, "nanopore-sequencing"]], "Notebook basics": [[2, "notebook-basics"]], "Otter Grader": [[1, "otter-grader"]], "Programmatic Arithmetic in Python": [[4, "programmatic-arithmetic-in-python"]], "Q1: Calculate the molarity of the sample": [[4, "q1-calculate-the-molarity-of-the-sample"]], "Q1: Using the information above, calculate the subject\u2019s heart rate reserve.": [[1, "q1-using-the-information-above-calculate-the-subject-s-heart-rate-reserve"]], "Q2: Calculate the amount of sample to add.": [[4, "q2-calculate-the-amount-of-sample-to-add"]], "Q3: Describing the reaction yield": [[4, "q3-describing-the-reaction-yield"]], "Q3: Using the information above, calculate the upper limit of the subject\u2019s target heart rate zone.": [[1, "q3-using-the-information-above-calculate-the-upper-limit-of-the-subject-s-target-heart-rate-zone"]], "Q4: Write a function which calculates the reaction yield": [[4, "q4-write-a-function-which-calculates-the-reaction-yield"]], "Quantitative Reasoning in Biology": [[7, "quantitative-reasoning-in-biology"]], "Quick introduction on cells and blocks": [[1, "quick-introduction-on-cells-and-blocks"]], "Session": [[2, "session"]], "Submissions": [[1, "submissions"]], "The Problem": [[4, "the-problem"]], "Try me": [[1, "try-me"]], "Walkthrough": [[1, "walkthrough"], [4, "walkthrough"], [4, "id1"]], "What is the template weight?": [[4, "what-is-the-template-weight"]], "Why Google Colab": [[1, "why-google-colab"]], "Why Python": [[1, "why-python"]], "f-strings": [[4, "f-strings"]]}, "docnames": ["content/Module01/Module01_book", "content/Module01/Module01_walkthrough_book", "content/Module01/notebook_actions", "content/Module02/Module02_book", "content/Module02/Module02_walkthrough_book", "content/Module02/dilution_calculations", "content/Module02/nanopore_description", "content/book_index", "content/misc/about_this_book", "content/misc/book_intro"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["content/Module01/Module01_book.md", "content/Module01/Module01_walkthrough_book.ipynb", "content/Module01/notebook_actions.md", "content/Module02/Module02_book.md", "content/Module02/Module02_walkthrough_book.ipynb", "content/Module02/dilution_calculations.md", "content/Module02/nanopore_description.md", "content/book_index.md", "content/misc/about_this_book.md", "content/misc/book_intro.md"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 4, 6], "0": [1, 4, 7], "02197802197804": 4, "022": 4, "03": 1, "0f": 4, "1": [1, 4], "10": 4, "100": [1, 4], "1000": 4, "12": [1, 2, 4], "1231231": 4, "13": 1, "15": 4, "150": 4, "1680": 4, "182000": 4, "19": 4, "1e": 4, "1f": 4, "1st": 7, "2": [1, 4], "20": 4, "200": 4, "2007": 1, "2016": 4, "21": 2, "220": 1, "23": [1, 4], "25": 4, "263": 4, "26988": 4, "278": 4, "280": 4, "2f": 4, "3": [1, 4], "320": 4, "34": 1, "35": 4, "4": [1, 7], "5": [1, 4], "50": 4, "517": 8, "6": 4, "60": 1, "650": 4, "6950": 4, "6951": 4, "7": 1, "70": 1, "72": 4, "77": 4, "8": 4, "822714681440445": 1, "827": 4, "85": 1, "86": 1, "9": [1, 4], "A": [1, 2, 4], "And": [1, 5], "As": [2, 4], "At": [1, 4], "BY": 7, "But": 4, "By": 1, "For": [1, 2], "If": [1, 4, 5, 7], "In": [1, 2, 4], "It": [1, 2, 4, 8], "NO": 4, "NOT": 1, "Not": 4, "On": 2, "Or": [1, 4], "That": 1, "The": [1, 7], "Then": [1, 4], "There": [1, 2, 4], "These": [1, 2, 4], "Will": 7, "abl": [1, 4], "about": [1, 2, 4], "abov": 4, "abreast": 1, "absorb": 4, "abstract": 1, "access": 2, "accomplish": 1, "accordingli": 2, "across": 2, "act": 8, "action": 2, "activ": [1, 4], "ad": 4, "adapt": 4, "add": 1, "addit": 4, "administr": 1, "adult": 1, "advantag": 1, "after": [1, 4], "ag": [1, 4], "again": 4, "aim": 1, "algorithm": 1, "all": [1, 4], "allow": [1, 2, 4, 8], "alreadi": 1, "also": [1, 4, 8], "alter": 4, "alwai": 2, "amplicon_length": 4, "amplicon_weight": 4, "an": [1, 2, 4, 8], "anaconda": 1, "analysi": [1, 2, 4], "analyz": 1, "ani": [1, 2], "annot": 8, "anoth": [1, 4], "answer": [1, 4], "anyth": [1, 4], "anywher": 4, "appli": [7, 8], "ar": [1, 2, 4, 7], "arduou": 1, "arg1": 4, "arg2": 4, "around": 4, "assert": 1, "assign": [1, 4], "assum": 4, "attach": 4, "attribut": 7, "autom": 4, "avail": 1, "averag": 1, "awai": 1, "await": 2, "back": 1, "background": [2, 6, 8], "bake": 4, "barcod": 4, "base": [1, 2, 4], "base_weight": 4, "basepair": 4, "basic": [0, 1, 3, 4], "batteri": 1, "bblearn": 1, "beat": 1, "becaus": 1, "becom": [1, 2], "been": [1, 4, 5], "befor": [1, 2, 4], "begin": 4, "being": [1, 2], "below": 1, "berklei": 1, "better": 1, "between": [1, 4], "biolog": [1, 8], "biologi": [1, 4], "biostatist": [7, 8], "black": 4, "block": 4, "bmi": 1, "bold": 1, "book": [4, 7, 9], "both": [1, 2], "bp": [1, 4], "brace": 4, "break": 4, "brief": [1, 4], "briefli": 6, "browser": [1, 2], "bullet": 1, "button": 2, "bypass": 1, "calc_molar": 4, "calc_yield": 4, "call": [1, 2, 4], "can": [1, 2, 4, 5], "cannot": [1, 2], "captur": 1, "carri": 1, "case": 2, "cc": 7, "cell": 4, "chang": [1, 4], "chapter": [0, 1, 3], "check": 1, "check_al": 1, "chemic": 4, "circa": 4, "class": [1, 2], "click": 1, "clinic": 1, "cloud": 2, "code": [2, 4], "colab": [0, 2], "collect": 1, "colleg": 7, "com": [1, 6], "come": [1, 2, 4], "command": 2, "common": [1, 4, 7], "commun": 4, "compact": 4, "compani": 2, "companion": [7, 8], "compat": 2, "complet": [1, 2], "complex": [1, 4], "compris": 1, "comput": [1, 2, 4], "concentr": 4, "concept": [1, 2], "condit": 1, "connect": 2, "consid": 1, "consumpt": 7, "contact": 7, "contain": [2, 4], "content": [2, 5, 7, 8], "context": [4, 8], "continu": 4, "convers": 5, "convert": 4, "copi": 4, "corner": 1, "correct": 4, "correctli": 1, "could": 1, "count": [1, 4], "cours": [1, 2, 4, 7, 8], "cover": [1, 2, 4], "covid": 4, "creat": [1, 2, 4], "creativ": 7, "critic": [1, 4], "ctrl": 2, "curli": 4, "current": 4, "current_yield": 4, "dai": 1, "dampier": 7, "data": [1, 2, 4], "dataset": [1, 2, 8], "debug": 4, "decis": 1, "def": 4, "defin": 4, "delet": 2, "depth": 4, "describ": [1, 6], "descript": 4, "design": 4, "desir": 1, "detail": 4, "determin": 1, "develop": [1, 7], "devic": 4, "didn": 1, "differ": 4, "difficult": [1, 2], "difficulti": 1, "dilut": 4, "disconnect": 2, "discuss": [3, 4], "displai": 4, "dna": [4, 5], "dna_conc": 4, "dna_molar": 4, "dna_weight": 4, "dna_yield": 4, "dna_yield_descript": 4, "do": [1, 3, 4], "dollar": 4, "done": [2, 4, 5], "doubl": [1, 4], "down": [1, 4], "download": [1, 2], "dozen": 1, "drastic": 4, "drexel": [4, 7, 8], "dropdown": 1, "due": [1, 4], "dure": [1, 4], "dynam": 4, "each": [1, 2, 4], "easi": 1, "easier": 4, "edit": [1, 2, 7], "educ": 1, "effect": 2, "either": 2, "emerg": 1, "empti": 2, "encod": 2, "end": [1, 4], "endswith": 1, "ensur": [1, 2, 4], "enter": 2, "entir": 4, "environ": 1, "enzymat": 4, "equat": 4, "error": 4, "estim": 1, "even": [1, 4], "everyon": 1, "everyth": [2, 4], "evolv": 8, "exampl": 1, "excel": 1, "except": 1, "execut": [1, 2], "exercis": 1, "exist": 1, "expand": 8, "experi": [1, 4], "explain": 4, "explan": [4, 5], "explanatori": 4, "explor": 4, "explos": 1, "express": [1, 4], "extens": 2, "extra": 1, "f": 1, "face": 1, "fact": 1, "familiar": 1, "featur": 1, "fempto": 4, "femtomol": 4, "field": 1, "figur": 1, "file": [1, 2], "filterwarn": 1, "find": 1, "finish": 1, "first": [1, 2], "fix": [1, 2], "flavor": 1, "float": 4, "fmol": 4, "fmole": 4, "focus": 5, "follow": [1, 7], "footnot": 1, "form": 1, "format": [1, 2, 4], "found": 1, "frame": 1, "free": [1, 2, 7], "freeli": 1, "fresh": 2, "freshli": 2, "from": [1, 2, 4], "full": 1, "function": [1, 8], "function_nam": 4, "further": 4, "futur": 1, "g": 4, "gener": [1, 4], "get": [1, 2, 4], "give": [1, 2], "go": 1, "googl": [0, 2], "goolg": 1, "grace": 1, "grade": 1, "green": 4, "guru": 4, "ha": [1, 4, 5], "had": 4, "hand": 1, "have": [1, 2, 4, 6, 7], "healthi": 1, "heart_rate_reserv": 1, "height": 1, "hello": 1, "help": [1, 4], "her": 1, "here": 1, "hint": 4, "hipaa": 2, "hit": 1, "hold": 1, "hour": 2, "how": [1, 3, 4], "howev": [1, 2, 4], "hrr": 1, "html": 1, "http": [1, 6], "hundr": 1, "hurdl": 1, "hyperlink": 1, "hypothesi": 8, "i": [1, 2, 5, 7], "ideal": 1, "ignor": 1, "imag": 2, "imbalanc": 4, "immedi": 4, "immunologi": 8, "import": [1, 2], "importerror": 1, "includ": 1, "incorrect": 2, "incred": 1, "independ": 2, "indepth": 4, "individu": 4, "inferenti": 1, "inferentialthink": 1, "inform": 2, "ingredi": 4, "initi": [1, 4], "input": 4, "insid": 4, "instal": [1, 2], "instead": [1, 4], "instruct": 1, "insurmount": 1, "integ": 4, "intens": 1, "interact": [1, 2, 8], "interfac": 1, "intern": 7, "interpret": 2, "introduc": 0, "ipynb": 1, "isinst": 4, "isn": 1, "issu": [1, 2], "italic": 1, "item": 4, "itself": 2, "julia": 2, "jump": 4, "jupyt": 1, "jupyterlab": 1, "just": [1, 2, 4], "kei": 4, "kernel": 1, "kg": 1, "know": [2, 4], "kwarg1": 4, "kwarg2": 4, "lab": 4, "languag": [1, 2], "larg": [1, 2], "larger": 1, "last": 1, "lastli": [1, 8], "later": 1, "launch": 1, "lead": 4, "learn": 1, "left": 1, "len": 1, "less": 2, "let": 1, "licens": 7, "ligat": 4, "light": 4, "like": [1, 2, 4, 7], "limit": [2, 4], "line": 1, "link": [1, 2, 5], "list": [1, 4], "listdir": 1, "ll": [1, 2, 4], "load": [1, 2], "log": 2, "look": [1, 4, 5], "loop": 1, "lot": 4, "m": 2, "make": 4, "manag": 4, "mani": [1, 2, 4], "manual": 4, "markdown": 2, "mass": 4, "materi": 4, "math": [1, 3, 4], "maximum": 1, "mayb": 2, "mayo": 1, "me": 7, "measur": 4, "medicin": 7, "menu": [1, 2], "meter": 1, "method": 1, "microbiologi": 8, "miim": 8, "million": 4, "minion": 4, "minut": 1, "mircolit": 4, "mistak": [1, 2], "modif": 1, "modul": 4, "modular": 4, "mole": 4, "molecul": 4, "molecular": 4, "more": [1, 2, 4, 5], "morn": 1, "most": [1, 2], "motor": 4, "move": 1, "much": 4, "multipl": [1, 4], "multipli": 1, "must": 1, "my": 4, "name": 1, "nano": 4, "nanopor": 4, "natur": 1, "nc": 7, "nd": 7, "nearest": 4, "neb": 5, "need": [0, 1, 2, 4, 5], "never": 2, "new": [2, 4], "new_concentr": 4, "new_paragon_molar": 4, "newest": 1, "next": [1, 4], "ng": 4, "nice": 1, "noderiv": 7, "noncommerci": 7, "normal": [1, 4], "notebook": [1, 4], "notepad": [1, 2], "notic": [1, 4], "now": [1, 4], "nucleotid": 4, "number": [1, 4], "numer": 4, "o": 1, "obtain": 4, "ocassion": 2, "off": [1, 4], "often": [2, 4], "oftentim": 2, "okai": 2, "old": [1, 4], "onc": [1, 2, 4], "one": [1, 2, 4], "onli": [1, 2], "onlin": [1, 4], "open": [1, 2], "oper": 4, "option": 2, "orang": 4, "order": [1, 2], "organ": 4, "origin": 2, "other": [1, 2, 4], "our": [1, 2, 4], "out": 1, "outbreak": 4, "output": 1, "over": 4, "overhang": 4, "overwrit": 2, "own": [1, 2], "packag": 1, "page": 6, "paragon": 4, "paragon_molar": 4, "part": 4, "particular": 1, "past": [2, 4, 8], "path": 1, "pcr": 4, "peopl": 1, "per": [1, 4], "perfect": 4, "perfectli": 1, "perform": 4, "phrase": 1, "pip": 1, "place": 8, "plai": 2, "plain": [1, 2], "plan": 2, "plate": 4, "plu": 1, "point": 4, "pool": 4, "pose": 1, "possibl": [2, 4], "post": 4, "potenti": 2, "power": [1, 2], "precis": 1, "preload": 1, "prep": 4, "prepar": 4, "prescrib": 4, "previou": 1, "print": [1, 4], "prism": 1, "problem": [1, 2, 8], "process": [1, 4, 6], "program": [1, 2], "progress": 1, "project": 4, "prompt": 4, "prone": 4, "proper": 8, "protect": 2, "protein": 4, "protocol": 4, "provid": 1, "purpos": [1, 2, 4], "put": 1, "python": [2, 3], "q": 1, "qubit": 4, "question": 1, "quick": 5, "r": 2, "rang": 1, "rapid": 4, "rcp85jhlmni": 6, "re": [1, 2, 4], "read": 4, "readi": 2, "reagent": 4, "realli": 4, "reason": 4, "recent": [1, 2, 4], "recommend": 4, "refer": [4, 8], "refresh": 5, "relat": 4, "relev": 2, "rememb": [1, 2, 4], "remov": 4, "render": [2, 4], "repeat": 4, "repetit": 4, "replac": 4, "repres": 4, "reproduc": 4, "requir": [1, 4], "research": 1, "respond": 2, "rest": 1, "restart": 1, "resting_heart_r": 1, "result": [1, 4], "return": [1, 4], "reusabl": 4, "review": [4, 5], "right": [1, 4], "rigor": 1, "rna": 4, "rna_paragon_molar": 4, "round": 4, "run": 1, "runtim": 2, "said": 1, "same": [1, 4], "sample_concentr": 4, "sample_length": 4, "sample_volum": 4, "sample_yield": 4, "save": 1, "scienc": 1, "screen": 1, "searchabl": 8, "second": [1, 2], "secreti": 2, "section": 2, "secur": 2, "see": [1, 4], "self": 4, "send": 2, "senior": 4, "sensit": 2, "sent": 2, "sentenc": 4, "sequenc": 4, "seri": [1, 2, 4], "servic": 2, "session": [1, 4], "set": 1, "setup": 1, "share": 2, "shift": [1, 2], "short": 4, "shortcut": 2, "should": [1, 2, 4], "similar": 2, "simpl": 1, "sinc": [1, 5], "singl": 4, "size": 4, "skeleton": 1, "skill": 1, "small": 1, "smaller": 4, "so": [2, 4], "softwar": [1, 2], "solut": [1, 4], "solv": 1, "some": [1, 2, 4, 5, 6], "someth": [1, 4], "sometim": 2, "somewher": 1, "space": 4, "spawn": 1, "special": 2, "speedup": 4, "spin": 1, "spreadsheet": 1, "stai": 1, "start": [1, 2, 4], "statement": 4, "statist": 1, "step": 1, "still": 1, "stock": 4, "str": 4, "strand": 4, "strategi": 4, "structur": 1, "studi": 1, "stumbl": 1, "sublist": 1, "submiss": 2, "submit": 1, "subtract": 1, "success": 0, "suggest": 1, "summar": [1, 4], "synchron": [1, 4], "syntax": [1, 4], "system": [1, 2], "t": [1, 4], "tabl": [1, 2, 4], "tag": 2, "take": [1, 2, 4], "talk": [1, 2], "task": [1, 4], "taught": 1, "teach": 1, "techniqu": 1, "technologi": 1, "tediou": 4, "tell": [1, 4], "template_weight": 4, "tend": 5, "test": [1, 2, 4], "tests_dir": 1, "text": [1, 2, 4], "textbook": [1, 4, 7], "than": 2, "thei": 4, "them": [1, 2], "themselv": 2, "therebi": 1, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 9], "thing": [1, 2, 4], "think": [1, 4], "those": [1, 4], "through": 1, "throughout": 1, "time": [1, 4], "too": [1, 2], "tool": [0, 1], "top": 1, "topic": 1, "total": 4, "track": 4, "transpar": 4, "troubl": 1, "true": [1, 4], "try": 4, "tube": 4, "twice": 1, "two": [1, 2], "type": [1, 2, 4], "u": [1, 4], "uc": 1, "ul": 4, "under": 7, "underneath": 1, "understand": [2, 4], "undo": 2, "uniqu": 4, "unit": [4, 5], "univers": 7, "unless": 4, "unwieldi": 1, "unzip": 1, "up": [1, 4], "upload": [1, 2], "upon": 8, "upper_target_zon": 1, "us": [2, 3, 4, 5, 7], "usb": 4, "usual": 1, "v": 6, "val": 4, "valid": 1, "valu": [1, 4], "variabl": [1, 4], "ve": [1, 4, 5], "veri": 1, "version": 2, "video": [4, 6], "vigor": 1, "virtual": 2, "visual": 1, "volum": 4, "volume_to_add": 4, "wa": 1, "wai": [1, 4], "want": 2, "wanted_dna": 4, "warn": 1, "watch": [4, 6], "we": [1, 2, 4], "week": [1, 4], "weekli": [1, 4], "weigh": 4, "weight": 1, "were": 4, "what": 1, "when": [1, 2, 4, 5], "which": [1, 2], "while": [2, 5], "within": [2, 8], "without": [2, 4], "woman": 1, "word": 1, "wordpad": 2, "work": [1, 2, 4], "world": 1, "would": [1, 4, 7], "write": 1, "written": 2, "www": [1, 6], "x": [1, 2], "y": 1, "year": [1, 4], "you": [0, 1, 2, 4, 5, 7, 8], "young": 1, "your": [1, 2, 4, 7], "yourself": [1, 2], "youtub": 6, "z": 1, "zip": 1, "zip_fil": 1}, "titles": ["Module 1: Hello World", "Walkthrough", "Notebook basics", "Module 2: Simple calculations", "Walkthrough", "Dilution calculations", "Nanopore Sequencing", "Quantitative Reasoning in Biology", "About this book", "Introduction"], "titleterms": {"": 1, "1": 0, "2": 3, "The": 4, "about": 8, "abov": 1, "add": 4, "aerob": 1, "afraid": 2, "all": 2, "amount": 4, "arithmet": 4, "basic": 2, "biologi": 7, "block": 1, "book": 8, "calcul": [1, 3, 4, 5], "cell": [1, 2], "code": 1, "colab": 1, "color": 4, "conclus": 4, "describ": 4, "dilut": 5, "don": 2, "expect": 1, "f": 4, "function": 4, "googl": 1, "grader": 1, "heart": 1, "hello": 0, "i": 4, "inform": 1, "introduct": [1, 9], "jupyt": 2, "learn": 4, "limit": 1, "lint": 4, "markdown": 1, "me": 1, "modul": [0, 3], "molar": 4, "nanopor": 6, "notebook": 2, "object": 4, "otter": 1, "problem": 4, "programmat": 4, "python": [1, 4], "q1": [1, 4], "q2": 4, "q3": [1, 4], "q4": 4, "quantit": 7, "quick": 1, "rate": 1, "reaction": 4, "reason": 7, "reserv": 1, "restart": 2, "run": 2, "sampl": 4, "sequenc": 6, "session": 2, "simpl": 3, "string": 4, "subject": 1, "submiss": 1, "t": 2, "target": 1, "templat": 4, "thi": 8, "through": 4, "try": 1, "upper": 1, "us": 1, "walkthrough": [1, 4], "weight": 4, "what": 4, "which": 4, "why": 1, "world": 0, "write": 4, "yield": 4, "zone": 1}}) \ No newline at end of file +Search.setIndex({"alltitles": {"About this book": [[10, "about-this-book"]], "Acting on Columns": [[8, "acting-on-columns"]], "Acting on Rows": [[8, "acting-on-rows"]], "Boolean Indexing": [[8, "boolean-indexing"]], "Calculate a aerobic target heart rate?": [[1, "calculate-a-aerobic-target-heart-rate"]], "Cells": [[2, "cells"]], "Coding expectations": [[1, "coding-expectations"]], "Conclusion": [[4, "conclusion"], [8, "conclusion"]], "Dataset Reference": [[8, "dataset-reference"]], "Dilution calculations": [[5, "dilution-calculations"]], "Don\u2019t be afraid to Restart & Run all": [[2, null]], "Functions": [[4, "functions"]], "Imports": [[8, "imports"]], "Indexing": [[8, "indexing"]], "Introduction": [[1, "introduction"], [8, "introduction"], [11, "introduction"]], "Jupyter Notebooks": [[2, "jupyter-notebooks"]], "Learning Objectives": [[4, "learning-objectives"], [8, "learning-objectives"]], "Linting through color": [[4, "linting-through-color"]], "Markdown": [[1, "markdown"]], "Module 03 Walkthrough": [[8, "module-03-walkthrough"]], "Module 1: Hello World": [[0, "module-1-hello-world"]], "Module 2: Simple calculations": [[3, "module-2-simple-calculations"]], "Module 3: DataFrames": [[7, "module-3-dataframes"]], "Nanopore Sequencing": [[6, "nanopore-sequencing"]], "Notebook basics": [[2, "notebook-basics"]], "Numpy": [[8, "numpy"]], "Otter Grader": [[1, "otter-grader"]], "Pandas": [[8, "pandas"]], "Programmatic Arithmetic in Python": [[4, "programmatic-arithmetic-in-python"]], "Q1: Calculate the molarity of the sample": [[4, "q1-calculate-the-molarity-of-the-sample"]], "Q1: Extract the initial_viral_load column ?": [[8, "q1-extract-the-initial-viral-load-column"]], "Q1: Using the information above, calculate the subject\u2019s heart rate reserve.": [[1, "q1-using-the-information-above-calculate-the-subject-s-heart-rate-reserve"]], "Q2: Calculate the amount of sample to add.": [[4, "q2-calculate-the-amount-of-sample-to-add"]], "Q2: Calculate the average weeks_to_failure for the whole population?": [[8, "q2-calculate-the-average-weeks-to-failure-for-the-whole-population"]], "Q3: Calculate the average weeks to failure for the treated population?": [[8, "q3-calculate-the-average-weeks-to-failure-for-the-treated-population"]], "Q3: Describing the reaction yield": [[4, "q3-describing-the-reaction-yield"]], "Q3: Using the information above, calculate the upper limit of the subject\u2019s target heart rate zone.": [[1, "q3-using-the-information-above-calculate-the-upper-limit-of-the-subject-s-target-heart-rate-zone"]], "Q4: Calculate the average weeks_to_failure for the treated population?": [[8, "q4-calculate-the-average-weeks-to-failure-for-the-treated-population"]], "Q4: Calculate the average weeks_to_failure for the untreated population?": [[8, "q4-calculate-the-average-weeks-to-failure-for-the-untreated-population"]], "Q4: Write a function which calculates the reaction yield": [[4, "q4-write-a-function-which-calculates-the-reaction-yield"]], "Quantitative Reasoning in Biology": [[9, "quantitative-reasoning-in-biology"]], "Querying": [[8, "querying"]], "Quick introduction on cells and blocks": [[1, "quick-introduction-on-cells-and-blocks"]], "Session": [[2, "session"]], "Submissions": [[1, "submissions"]], "The Problem": [[4, "the-problem"]], "Try me": [[1, "try-me"]], "Walkthrough": [[1, "walkthrough"], [4, "walkthrough"], [4, "id1"]], "What is the template weight?": [[4, "what-is-the-template-weight"]], "Why Google Colab": [[1, "why-google-colab"]], "Why Python": [[1, "why-python"]], "f-strings": [[4, "f-strings"]]}, "docnames": ["content/Module01/Module01_book", "content/Module01/Module01_walkthrough_book", "content/Module01/notebook_actions", "content/Module02/Module02_book", "content/Module02/Module02_walkthrough_book", "content/Module02/dilution_calculations", "content/Module02/nanopore_description", "content/Module03/Module03_book", "content/Module03/Module03_walkthrough_book", "content/book_index", "content/misc/about_this_book", "content/misc/book_intro"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["content/Module01/Module01_book.md", "content/Module01/Module01_walkthrough_book.ipynb", "content/Module01/notebook_actions.md", "content/Module02/Module02_book.md", "content/Module02/Module02_walkthrough_book.ipynb", "content/Module02/dilution_calculations.md", "content/Module02/nanopore_description.md", "content/Module03/Module03_book.md", "content/Module03/Module03_walkthrough_book.ipynb", "content/book_index.md", "content/misc/about_this_book.md", "content/misc/book_intro.md"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 4, 6, 8], "0": [1, 4, 8, 9], "000000": 8, "02197802197804": 4, "022": 4, "03": 1, "041984": 8, "070204": 8, "0f": 4, "1": [1, 4, 8], "10": [4, 8], "100": [1, 4], "1000": 4, "11": 8, "12": [1, 2, 4, 8], "1231231": 4, "13": [1, 8], "133333": 8, "14": 8, "15": [4, 8], "150": 4, "16": 8, "1628": 8, "1680": 4, "17": 8, "18": 8, "182000": 4, "19": [4, 8], "1e": 4, "1f": [4, 8], "1st": 9, "2": [1, 4, 8], "20": [4, 8], "200": 4, "2007": 1, "2016": 4, "202663": 8, "21": [2, 8], "22": 8, "220": 1, "23": [1, 4, 8], "24": 8, "25": [4, 8], "250000": 8, "26": 8, "263": 4, "266667": 8, "26988": 4, "27": 8, "278": 4, "28": 8, "280": 4, "29": 8, "2f": 4, "3": [1, 4, 8], "30": 8, "31": 8, "32": 8, "320": 4, "33": 8, "34": [1, 8], "35": 4, "36": 8, "366667": 8, "37": 8, "38": 8, "4": [1, 8, 9], "40": 8, "400000": 8, "41": 8, "42": 8, "43": 8, "45": 8, "46": 8, "47": 8, "48": 8, "49": 8, "5": [1, 4, 8], "50": [4, 8], "500000": 8, "51": 8, "517": 10, "53": 8, "54": 8, "55": 8, "56": 8, "569209": 8, "57": 8, "59": 8, "6": [4, 8], "60": 1, "650": 4, "66": 8, "6950": 4, "6951": 4, "7": [1, 8], "70": 1, "72": [4, 8], "74": 8, "75": 8, "750000": 8, "76": 8, "77": 4, "8": [4, 8], "822714681440445": 1, "827": 4, "85": [1, 8], "86": 1, "87": 8, "88": 8, "9": [1, 4, 8], "900000": 8, "94": 8, "99": 8, "A": [1, 2, 4, 8], "And": [1, 5, 8], "As": [2, 4], "At": [1, 4, 8], "BY": 9, "But": 4, "By": [1, 8], "For": [1, 2, 8], "If": [1, 4, 5, 8, 9], "In": [1, 2, 4, 8], "It": [1, 2, 4, 8, 10], "NO": [4, 8], "NOT": 1, "Not": 4, "On": 2, "Or": [1, 4], "That": 1, "The": [1, 8, 9], "Then": [1, 4, 8], "There": [1, 2, 4, 8], "These": [1, 2, 4], "Will": 9, "_mask": 8, "abl": [1, 4, 8], "about": [1, 2, 4], "abov": 4, "abreast": 1, "absorb": 4, "abstract": 1, "access": 2, "accomplish": 1, "accordingli": 2, "across": 2, "act": 10, "action": 2, "activ": [1, 4, 8], "ad": 4, "adapt": 4, "add": [1, 8], "addit": 4, "administr": 1, "adult": 1, "advantag": 1, "aesthet": 8, "after": [1, 4, 8], "ag": [1, 4, 8], "again": 4, "age_col": 8, "age_initial_infect": 8, "age_mask": 8, "age_mean": 8, "age_mean_short": 8, "aged_high_vl": 8, "aged_low_vl": 8, "aged_sampl": 8, "ahead": 8, "aim": 1, "algorithm": [1, 8], "alia": 8, "all": [1, 4, 8], "allow": [1, 2, 4, 10], "along": 8, "alreadi": 1, "also": [1, 4, 8, 10], "alter": 4, "altern": 8, "alwai": 2, "amplicon_length": 4, "amplicon_weight": 4, "an": [1, 2, 4, 8, 10], "anaconda": 1, "analysi": [1, 2, 4, 8], "analyz": [1, 8], "ani": [1, 2, 8], "annot": 10, "anoth": [1, 4, 8], "answer": [1, 4], "anyth": [1, 4], "anywher": 4, "appli": [9, 10], "ar": [1, 2, 4, 8, 9], "arbitrari": 8, "arduou": 1, "arg1": 4, "arg2": 4, "around": [4, 8], "arrai": 8, "art": 8, "assert": 1, "assign": [1, 4, 8], "assum": 4, "atop": 8, "attach": [4, 8], "attribut": [8, 9], "autom": 4, "avail": 1, "averag": 1, "average_week": 8, "awai": 1, "await": 2, "back": [1, 8], "background": [2, 6, 10], "bake": 4, "barcod": 4, "base": [1, 2, 4, 8], "base_weight": 4, "basepair": 4, "basic": [0, 1, 3, 4, 8], "batteri": 1, "bblearn": 1, "beat": 1, "becaus": 1, "becom": [1, 2], "been": [1, 4, 5, 8], "befor": [1, 2, 4, 8], "begin": [4, 8], "being": [1, 2], "below": 1, "berklei": 1, "better": 1, "between": [1, 4, 8], "biolog": [1, 8, 10], "biologi": [1, 4], "biostatist": [9, 10], "black": 4, "block": 4, "bmi": 1, "bold": 1, "book": [4, 9, 11], "both": [1, 2], "bp": [1, 4], "brace": 4, "bracket": 8, "break": 4, "brief": [1, 4], "briefli": 6, "browser": [1, 2], "bullet": 1, "button": 2, "bypass": 1, "calc_molar": 4, "calc_yield": 4, "call": [1, 2, 4, 8], "can": [1, 2, 4, 5, 8], "cannot": [1, 2, 8], "captur": 1, "carri": 1, "case": [2, 8], "cc": 9, "cell": 4, "center": 8, "chain": 8, "chanc": 8, "chang": [1, 4], "chapter": [0, 1, 3, 7], "check": 1, "check_al": 1, "chemic": 4, "circa": 4, "class": [1, 2], "click": 1, "clinic": 1, "cloud": 2, "code": [2, 4, 8], "colab": [0, 2], "collect": 1, "colleg": 9, "com": [1, 6], "come": [1, 2, 4], "comma": 8, "command": [2, 8], "common": [1, 4, 8, 9], "commun": [4, 8], "compact": 4, "compani": 2, "companion": [9, 10], "comparison": 8, "compat": 2, "complet": [1, 2, 8], "complex": [1, 4, 8], "compris": 1, "comput": [1, 2, 4], "concentr": 4, "concept": [1, 2], "condit": 1, "connect": 2, "consid": 1, "constraint": 8, "consumpt": 9, "contact": 9, "contain": [2, 4, 8], "content": [2, 5, 9, 10], "context": [4, 10], "continu": 4, "control": 8, "convent": 8, "convers": 5, "convert": 4, "copi": [4, 8], "core": 8, "corner": 1, "correct": 4, "correctli": 1, "correspond": 8, "could": 1, "count": [1, 4, 8], "cours": [1, 2, 4, 8, 9, 10], "cover": [1, 2, 4, 8], "covid": 4, "creat": [1, 2, 4, 8], "creativ": 9, "critic": [1, 4], "csv": 8, "ctrl": 2, "cure": 8, "curli": 4, "current": [4, 8], "current_yield": 4, "dai": 1, "dampier": 9, "dash": 8, "data": [1, 2, 4, 7, 8], "datafram": 8, "dataset": [1, 2, 10], "debug": 4, "decis": 1, "def": 4, "defin": 4, "delet": 2, "depth": 4, "deriv": 8, "describ": [1, 6, 8], "descript": 4, "design": 4, "desir": 1, "detail": 4, "detect": 8, "determin": 1, "develop": [1, 9], "devic": 4, "didn": 1, "differ": [4, 8], "difficult": [1, 2], "difficulti": 1, "dilut": 4, "disconnect": 2, "discuss": [3, 4, 7], "displai": [4, 8], "dive": 8, "dna": [4, 5], "dna_conc": 4, "dna_molar": 4, "dna_weight": 4, "dna_yield": 4, "dna_yield_descript": 4, "do": [1, 3, 4, 8], "doesn": 8, "dollar": 4, "done": [2, 4, 5, 8], "dot": 8, "doubl": [1, 4], "down": [1, 4, 8], "download": [1, 2], "dozen": 1, "drastic": 4, "drexel": [4, 9, 10], "dropdown": 1, "dtype": 8, "due": [1, 4, 8], "dure": [1, 4, 8], "dynam": 4, "each": [1, 2, 4, 8], "easi": 1, "easier": [4, 8], "easili": 8, "edit": [1, 2, 9], "educ": 1, "effect": [2, 8], "either": 2, "elimin": 8, "emerg": 1, "emoji": 8, "emploi": 8, "empti": 2, "encod": 2, "end": [1, 4, 8], "endswith": 1, "ensur": [1, 2, 4], "enter": 2, "entir": [4, 8], "environ": 1, "enzymat": 4, "equat": 4, "error": 4, "estim": 1, "evalu": 8, "even": [1, 4, 8], "everyon": 1, "everyth": [2, 4, 8], "evolv": 10, "exactli": 8, "exampl": [1, 8], "excel": [1, 8], "except": 1, "execut": [1, 2], "exercis": 1, "exist": 1, "expand": 10, "experi": [1, 4], "explain": 4, "explan": [4, 5], "explanatori": 4, "explor": [4, 8], "explos": 1, "express": [1, 4, 8], "extend": 8, "extens": 2, "extra": 1, "f": [1, 8], "face": 1, "facilit": 8, "fact": 1, "factor": 8, "fals": 8, "familiar": 1, "fast": 8, "featur": 1, "fempto": 4, "femtomol": 4, "few": 8, "field": 1, "figur": 1, "file": [1, 2, 8], "filter": 8, "filterwarn": 1, "find": [1, 8], "finish": 1, "first": [1, 2, 8], "fit": 8, "fix": [1, 2], "flavor": 1, "float": [4, 8], "float64": 8, "fmol": 4, "fmole": 4, "focus": 5, "follow": [1, 9], "footnot": 1, "form": 1, "format": [1, 2, 4], "found": [1, 8], "four": 8, "frame": [1, 8], "free": [1, 2, 9], "freeli": 1, "fresh": 2, "freshli": 2, "from": [1, 2, 4, 8], "full": 1, "function": [1, 8, 10], "function_nam": 4, "further": 4, "futur": [1, 8], "g": 4, "gener": [1, 4], "get": [1, 2, 4, 8], "give": [1, 2], "go": 1, "googl": [0, 2], "goolg": 1, "got": 8, "grace": 1, "grade": 1, "green": 4, "group": 8, "guru": 4, "ha": [1, 4, 5, 8], "had": 4, "hand": 1, "have": [1, 2, 4, 6, 8, 9], "head": 8, "header": 8, "healthi": 1, "heart_rate_reserv": 1, "height": 1, "hello": 1, "help": [1, 4], "her": 1, "here": 1, "high_vl_mask": 8, "hint": 4, "hipaa": 2, "hit": 1, "hiv": 8, "hold": 1, "hour": 2, "how": [1, 3, 4, 7, 8], "howev": [1, 2, 4, 8], "hrr": 1, "html": 1, "http": [1, 6], "hundr": 1, "hurdl": 1, "hyperlink": 1, "hypothesi": 10, "hypothet": 8, "i": [1, 2, 5, 8, 9], "ideal": 1, "ignor": 1, "imag": 2, "imbalanc": 4, "immedi": 4, "immunologi": 10, "impact": 8, "import": [1, 2], "importerror": 1, "includ": [1, 8], "incorrect": 2, "incred": 1, "incredibli": 8, "increment": 8, "independ": 2, "indepth": 4, "indic": 8, "individu": 4, "infect": 8, "inferenti": 1, "inferentialthink": 1, "inform": [2, 8], "ingredi": 4, "init_vl": 8, "init_vl_sum": 8, "initi": [1, 4, 8], "input": 4, "insid": 4, "instal": [1, 2], "instead": [1, 4, 8], "instruct": 1, "insurmount": 1, "int64": 8, "integ": 4, "intens": 1, "interact": [1, 2, 10], "interfac": [1, 8], "intern": 9, "interoper": 8, "interpret": 2, "introduc": 0, "ipynb": 1, "isinst": [4, 8], "isn": 1, "issu": [1, 2], "italic": 1, "item": 4, "itself": 2, "julia": 2, "jump": 4, "jupyt": 1, "jupyterlab": 1, "just": [1, 2, 4], "kei": 4, "kernel": 1, "kg": 1, "know": [2, 4, 8], "kwarg1": 4, "kwarg2": 4, "lab": [4, 8], "languag": [1, 2], "larg": [1, 2], "larger": 1, "last": [1, 8], "lastli": [1, 10], "later": [1, 8], "launch": 1, "lead": 4, "learn": 1, "left": 1, "len": 1, "length": 8, "less": 2, "let": [1, 8], "level": 8, "libari": 8, "librari": 8, "licens": 9, "ligat": 4, "light": 4, "like": [1, 2, 4, 8, 9], "limit": [2, 4, 8], "line": [1, 8], "link": [1, 2, 5], "list": [1, 4, 8], "listdir": 1, "live": 8, "ll": [1, 2, 4, 8], "load": [1, 2, 7, 8], "loc": 8, "log": 2, "long": 8, "look": [1, 4, 5], "loop": 1, "lot": [4, 8], "m": 2, "main": 8, "make": [4, 8], "manag": 4, "mani": [1, 2, 4], "manual": 4, "markdown": 2, "mass": 4, "match": 8, "materi": 4, "math": [1, 3, 4], "mathemat": 8, "max": 8, "maximum": 1, "mayb": 2, "mayo": 1, "me": 9, "mean": 8, "measur": 4, "median": 8, "medicin": 9, "menu": [1, 2], "meter": 1, "method": [1, 8], "microbiologi": 10, "might": 8, "miim": 10, "million": 4, "min": 8, "minion": 4, "minut": 1, "mircolit": 4, "mistak": [1, 2], "mode": 8, "modif": 1, "modul": 4, "modular": 4, "mole": 4, "molecul": 4, "molecular": 4, "monitor": 8, "more": [1, 2, 4, 5, 8], "morn": 1, "most": [1, 2], "motor": 4, "move": 1, "much": 4, "multipl": [1, 4, 8], "multipli": 1, "must": 1, "my": 4, "name": [1, 8], "nano": 4, "nanopor": 4, "natur": 1, "nc": 9, "nd": 9, "nearest": 4, "neb": 5, "need": [0, 1, 2, 4, 5, 8], "never": 2, "new": [2, 4, 8], "new_concentr": 4, "new_paragon_molar": 4, "newest": 1, "next": [1, 4], "ng": 4, "nice": [1, 8], "noderiv": 9, "noncommerci": 9, "normal": [1, 4], "notebook": [1, 4], "notepad": [1, 2], "notic": [1, 4], "now": [1, 4, 8], "np": 8, "nucleotid": 4, "number": [1, 4, 8], "numer": [4, 8], "nuniqu": 8, "o": 1, "obtain": 4, "ocassion": 2, "off": [1, 4, 8], "often": [2, 4, 8], "oftentim": 2, "okai": 2, "old": [1, 4, 8], "onc": [1, 2, 4, 8], "one": [1, 2, 4, 8], "ones": 8, "onli": [1, 2, 8], "onlin": [1, 4], "open": [1, 2], "oper": 4, "option": 2, "orang": 4, "order": [1, 2], "organ": 4, "origin": 2, "other": [1, 2, 4, 8], "otherwis": 8, "our": [1, 2, 4, 8], "out": 1, "outbreak": 4, "output": 1, "over": [4, 8], "overhang": 4, "overwrit": 2, "own": [1, 2], "packag": 1, "page": 6, "panda": 7, "paragon": 4, "paragon_molar": 4, "part": 4, "particip": 8, "particular": 1, "past": [2, 4, 10], "path": 1, "patient": 8, "pcr": 4, "pd": 8, "peopl": [1, 8], "per": [1, 4], "perfect": 4, "perfectli": 1, "perform": [4, 8], "phrase": 1, "pip": 1, "place": 10, "plai": 2, "plain": [1, 2], "plan": 2, "plate": 4, "plethora": 8, "plu": 1, "plwh": 8, "point": 4, "pool": 4, "pose": 1, "possibl": [2, 4], "post": 4, "potenti": 2, "power": [1, 2, 8], "practic": 8, "precis": 1, "preload": 1, "prep": 4, "prepar": 4, "prescrib": 4, "present": 8, "previou": 1, "print": [1, 4, 8], "prism": 1, "problem": [1, 2, 10], "process": [1, 4, 6], "profici": 8, "program": [1, 2], "progress": [1, 8], "project": 4, "prompt": [4, 8], "prone": 4, "proper": 10, "protect": 2, "protein": 4, "protocol": 4, "provid": [1, 8], "purpos": [1, 2, 4, 8], "put": 1, "python": [2, 3, 7, 8], "q": 1, "qith": 8, "qubit": 4, "question": [1, 8], "quick": 5, "r": 2, "randomli": 8, "rang": [1, 8], "rapid": 4, "rcp85jhlmni": 6, "re": [1, 2, 4], "read": [4, 8], "read_csv": 8, "readi": [2, 8], "reagent": 4, "real": 8, "realli": 4, "reason": 4, "rebound": 8, "receiv": 8, "recent": [1, 2, 4], "recommend": 4, "refer": [4, 10], "refresh": 5, "regularli": 8, "relat": 4, "relev": 2, "rememb": [1, 2, 4, 8], "remov": 4, "render": [2, 4], "repeat": 4, "repetit": 4, "replac": 4, "repres": 4, "reproduc": 4, "requir": [1, 4, 8], "research": [1, 8], "respond": 2, "rest": 1, "restart": 1, "resting_heart_r": 1, "result": [1, 4, 8], "return": [1, 4], "reusabl": 4, "review": [4, 5], "right": [1, 4], "rigor": 1, "rna": 4, "rna_paragon_molar": 4, "round": 4, "run": 1, "runtim": 2, "sai": 8, "said": 1, "same": [1, 4, 8], "sampl": 8, "sample_concentr": 4, "sample_length": 4, "sample_volum": 4, "sample_yield": 4, "save": 1, "saw": 8, "sciecn": 8, "scienc": [1, 8], "screen": 1, "search": 8, "searchabl": 10, "second": [1, 2], "secreti": 2, "section": 2, "secur": 2, "see": [1, 4, 8], "select": 8, "self": 4, "send": 2, "senior": 4, "sens": 8, "sensit": 2, "sent": 2, "sentenc": 4, "sequenc": 4, "seri": [1, 2, 4, 8], "servic": 2, "session": [1, 4, 8], "set": 1, "setup": 1, "share": 2, "shift": [1, 2], "short": 4, "shortcut": 2, "should": [1, 2, 4, 8], "similar": [2, 8], "simpl": [1, 8], "sinc": [1, 5], "singl": [4, 8], "sit": 8, "size": 4, "skeleton": 1, "skill": 1, "small": [1, 8], "smaller": 4, "so": [2, 4, 8], "softwar": [1, 2], "solut": [1, 4, 8], "solv": 1, "some": [1, 2, 4, 5, 6, 8], "someon": 8, "someth": [1, 4], "sometim": [2, 8], "somewher": 1, "space": [4, 8], "spawn": 1, "special": 2, "specif": 8, "speedup": 4, "spin": 1, "split": 8, "spreadsheet": [1, 7, 8], "squar": 8, "stack": 8, "stai": 1, "start": [1, 2, 4, 8], "statement": [4, 8], "statist": [1, 8], "std": 8, "step": [1, 8], "still": 1, "stock": 4, "stop": 8, "str": 4, "stragei": 8, "strand": 4, "strategi": [4, 8], "structur": 1, "studi": [1, 8], "stumbl": 1, "style": [7, 8], "sublist": 1, "submiss": 2, "submit": 1, "subtract": 1, "success": 0, "suffix": 8, "suggest": 1, "sum": 8, "summar": [1, 4, 7, 8], "summari": 8, "synchron": [1, 4, 8], "syntax": [1, 4], "system": [1, 2], "t": [1, 4, 8], "tabl": [1, 2, 4, 8], "tag": 2, "take": [1, 2, 4], "talk": [1, 2], "task": [1, 4], "taught": 1, "teach": 1, "techniqu": [1, 8], "technologi": 1, "tediou": 4, "tell": [1, 4], "template_weight": 4, "tend": [5, 8], "test": [1, 2, 4], "tests_dir": 1, "text": [1, 2, 4], "textbook": [1, 4, 9], "than": 2, "thei": [4, 8], "them": [1, 2, 8], "themselv": 2, "therebi": 1, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11], "thing": [1, 2, 4], "think": [1, 4], "those": [1, 4, 8], "through": 1, "throughout": 1, "time": [1, 4, 8], "todai": 8, "too": [1, 2], "took": 8, "tool": [0, 1, 8], "top": 1, "topic": 1, "total": 4, "track": 4, "transform": 8, "transpar": 4, "treated_average_week": 8, "treated_mask": 8, "treatment": 8, "trial": 8, "trial_data": 8, "trial_df": 8, "troubl": 1, "true": [1, 4, 8], "try": 4, "tube": 4, "twice": 1, "two": [1, 2, 8], "type": [1, 2, 4, 8], "u": [1, 4, 8], "uc": 1, "ul": [4, 8], "uncontrol": 8, "under": 9, "underneath": 1, "understand": [2, 4, 8], "undo": 2, "uniqu": 4, "unit": [4, 5], "univers": 9, "unless": 4, "until": 8, "untreated_average_week": 8, "unwieldi": 1, "unzip": 1, "up": [1, 4], "upload": [1, 2], "upon": 10, "upper_target_zon": 1, "us": [2, 3, 4, 5, 7, 8, 9], "usb": 4, "usual": [1, 8], "util": 8, "v": 6, "val": 4, "valid": [1, 8], "valu": [1, 4, 8], "var": 8, "variabl": [1, 4], "ve": [1, 4, 5], "veri": 1, "version": [2, 8], "video": [4, 6], "vigor": 1, "viral": 8, "virtual": 2, "visual": 1, "volum": 4, "volume_to_add": 4, "wa": [1, 8], "wai": [1, 4], "want": [2, 8], "wanted_dna": 4, "wanted_sampl": 8, "warn": 1, "watch": [4, 6], "we": [1, 2, 4, 8], "week": [1, 4], "weekli": [1, 4, 8], "weigh": 4, "weight": 1, "went": 8, "were": [4, 8], "what": 1, "when": [1, 2, 4, 5, 8], "where": 8, "whether": 8, "which": [1, 2, 8], "while": [2, 5, 8], "who": 8, "within": [2, 8, 10], "without": [2, 4], "woman": 1, "word": 1, "wordpad": 2, "work": [1, 2, 4, 8], "world": 1, "would": [1, 4, 9], "write": 1, "written": [2, 8], "www": [1, 6], "x": [1, 2], "y": 1, "year": [1, 4, 8], "years_infect": 8, "you": [0, 1, 2, 4, 5, 8, 9, 10], "young": 1, "your": [1, 2, 4, 8, 9], "yourself": [1, 2, 8], "youtub": 6, "yr": 8, "z": 1, "zip": 1, "zip_fil": 1}, "titles": ["Module 1: Hello World", "Walkthrough", "Notebook basics", "Module 2: Simple calculations", "Walkthrough", "Dilution calculations", "Nanopore Sequencing", "Module 3: DataFrames", "Module 03 Walkthrough", "Quantitative Reasoning in Biology", "About this book", "Introduction"], "titleterms": {"": 1, "03": 8, "1": 0, "2": 3, "3": 7, "The": 4, "about": 10, "abov": 1, "act": 8, "add": 4, "aerob": 1, "afraid": 2, "all": 2, "amount": 4, "arithmet": 4, "averag": 8, "basic": 2, "biologi": 9, "block": 1, "book": 10, "boolean": 8, "calcul": [1, 3, 4, 5, 8], "cell": [1, 2], "code": 1, "colab": 1, "color": 4, "column": 8, "conclus": [4, 8], "datafram": 7, "dataset": 8, "describ": 4, "dilut": 5, "don": 2, "expect": 1, "extract": 8, "f": 4, "failur": 8, "function": 4, "googl": 1, "grader": 1, "heart": 1, "hello": 0, "i": 4, "import": 8, "index": 8, "inform": 1, "initial_viral_load": 8, "introduct": [1, 8, 11], "jupyt": 2, "learn": [4, 8], "limit": 1, "lint": 4, "markdown": 1, "me": 1, "modul": [0, 3, 7, 8], "molar": 4, "nanopor": 6, "notebook": 2, "numpi": 8, "object": [4, 8], "otter": 1, "panda": 8, "popul": 8, "problem": 4, "programmat": 4, "python": [1, 4], "q1": [1, 4, 8], "q2": [4, 8], "q3": [1, 4, 8], "q4": [4, 8], "quantit": 9, "queri": 8, "quick": 1, "rate": 1, "reaction": 4, "reason": 9, "refer": 8, "reserv": 1, "restart": 2, "row": 8, "run": 2, "sampl": 4, "sequenc": 6, "session": 2, "simpl": 3, "string": 4, "subject": 1, "submiss": 1, "t": 2, "target": 1, "templat": 4, "thi": 10, "through": 4, "treat": 8, "try": 1, "untreat": 8, "upper": 1, "us": 1, "walkthrough": [1, 4, 8], "week": 8, "weeks_to_failur": 8, "weight": 4, "what": 4, "which": 4, "whole": 8, "why": 1, "world": 0, "write": 4, "yield": 4, "zone": 1}}) \ No newline at end of file