diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..9386a01
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,5 @@
+# Change Log
+
+8/5/2024
+
+* Initial Module 3 Uploaded
diff --git a/_sources/content/Module02/Module02_walkthrough_book.ipynb b/_sources/content/Module02/Module02_walkthrough_book.ipynb
index 5d7b147..fa764d0 100644
--- a/_sources/content/Module02/Module02_walkthrough_book.ipynb
+++ b/_sources/content/Module02/Module02_walkthrough_book.ipynb
@@ -3,7 +3,9 @@
{
"cell_type": "markdown",
"id": "a10e9828",
- "metadata": {},
+ "metadata": {
+ "tags": []
+ },
"source": [
"# Walkthrough\n",
"\n",
diff --git a/_sources/content/Module03/Module03_book.md b/_sources/content/Module03/Module03_book.md
new file mode 100644
index 0000000..59d5e88
--- /dev/null
+++ b/_sources/content/Module03/Module03_book.md
@@ -0,0 +1,3 @@
+# Module 3: DataFrames
+
+This chapter will discuss how to use Python and Pandas to load and summarize spreadsheet style data.
\ No newline at end of file
diff --git a/_sources/content/Module03/Module03_walkthrough_book.ipynb b/_sources/content/Module03/Module03_walkthrough_book.ipynb
new file mode 100644
index 0000000..c6516a8
--- /dev/null
+++ b/_sources/content/Module03/Module03_walkthrough_book.ipynb
@@ -0,0 +1,2996 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "da4cbf41",
+ "metadata": {},
+ "source": [
+ "# Module 03 Walkthrough\n",
+ "\n",
+ "Remember, all assignments are due before the synchronous session.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1b24089-7033-4965-ae63-de76bdf935a9",
+ "metadata": {},
+ "source": [
+ "## Introduction\n",
+ "\n",
+ "Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial.\n",
+ "In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group.\n",
+ "After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first \"detectable\" viral load was found.\n",
+ "We will use `Pandas` to analyze this data and evaluate the treatment's effectiveness.\n",
+ "By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in `DataFrames`, and using summary methods like sum, mean, and max.\n",
+ "Let's get started!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d728e12b",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives\n",
+ "At the end of this learning activity you will be able to:\n",
+ " - Practice loading spreadsheet data into Python using `pandas`.\n",
+ " - Use Python methods to create derived columns in `pd.DataFrames`.\n",
+ " - Use `Pandas` summary methods like sum, mean, and max.\n",
+ " - Employ basic filtering and data extraction from `pandas`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28d532d9",
+ "metadata": {},
+ "source": [
+ "## Dataset Reference\n",
+ "\n",
+ "_File_: `trial_data.csv`\n",
+ "\n",
+ "_Columns_:\n",
+ "\n",
+ " - `age` : (years) Current age during the study. \n",
+ " - `age_initial_infection` : (years) Age at which the participant was initially infected.\n",
+ " - `initial_viral_load` : (copies/ul) The level of infection at the start of the study.\n",
+ " - `treatment` : (boolean) `True` for participant in the treatment group, `False` for those in the control group.\n",
+ " - `weeks_to_failure` : (weeks) Time from the treatment to the first week of uncontrolled viral load.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "621cd2ef",
+ "metadata": {},
+ "source": [
+ "## Imports"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "917b592b",
+ "metadata": {},
+ "source": [
+ "While _basic_ Python can do a lot, you have to do everything yourself.\n",
+ "The **real** power of Python is that you can `import` code that is written by others.\n",
+ "\n",
+ "For this course, we will use a common data science stack of interoperable tools centered around the [Numpy](https://numpy.org/).\n",
+ "\n",
+ "There are four that we will use regularly, two of which we'll cover today."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cfb0afb0-6fe4-47c7-b044-75144973797c",
+ "metadata": {},
+ "source": [
+ "### Numpy\n",
+ "\n",
+ "[Numpy](https://numpy.org/)\n",
+ "\n",
+ "A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.\n",
+ "\n",
+ "By convention, the community tends to _alias_ the long `numpy` as `np`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "d5cc7c1d-b078-4555-a578-f862584233c4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8152b253-6408-4f55-9505-672f597e23e7",
+ "metadata": {},
+ "source": [
+ "### Pandas\n",
+ "\n",
+ "[Pandas](https://pandas.pydata.org/)\n",
+ "\n",
+ "A libary that sits atop `numpy` and provides a _spreadsheet_ style object called a `DataFrame` along with a plethora of data sciecne utilities.\n",
+ "This is the main tool we will be using for data exploration.\n",
+ "\n",
+ "By convention, the community tends to _alias_ the long `pandas` as `pd`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "d5a223d0-d0d2-471a-b5b8-a63a700eda75",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "519ff9d5",
+ "metadata": {},
+ "source": [
+ "Nicely, it can read `csv` files for us."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "4492bb2c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 48 \n",
+ " 41 \n",
+ " 99 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " 56 \n",
+ " 41 \n",
+ " 59 \n",
+ " False \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 17 \n",
+ " 48 \n",
+ " 33 \n",
+ " 57 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " 51 \n",
+ " 42 \n",
+ " 25 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " 55 \n",
+ " 46 \n",
+ " 45 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 20 \n",
+ " 43 \n",
+ " 24 \n",
+ " 46 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " 48 \n",
+ " 37 \n",
+ " 99 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " 51 \n",
+ " 27 \n",
+ " 36 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " 43 \n",
+ " 34 \n",
+ " 48 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " 51 \n",
+ " 43 \n",
+ " 88 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 25 \n",
+ " 49 \n",
+ " 20 \n",
+ " 76 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 26 \n",
+ " 54 \n",
+ " 47 \n",
+ " 74 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " 45 \n",
+ " 25 \n",
+ " 87 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " 59 \n",
+ " 40 \n",
+ " 49 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "10 55 21 94 False \n",
+ "11 53 42 85 True \n",
+ "12 40 34 27 True \n",
+ "13 48 41 99 False \n",
+ "14 56 41 59 False \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "17 48 33 57 False \n",
+ "18 51 42 25 False \n",
+ "19 55 46 45 False \n",
+ "20 43 24 46 False \n",
+ "21 48 37 99 True \n",
+ "22 51 27 36 False \n",
+ "23 43 34 48 True \n",
+ "24 51 43 88 False \n",
+ "25 49 20 76 False \n",
+ "26 54 47 74 False \n",
+ "27 45 25 87 True \n",
+ "28 59 40 49 False \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure \n",
+ "0 3 \n",
+ "1 4 \n",
+ "2 6 \n",
+ "3 5 \n",
+ "4 5 \n",
+ "5 9 \n",
+ "6 4 \n",
+ "7 4 \n",
+ "8 5 \n",
+ "9 7 \n",
+ "10 3 \n",
+ "11 5 \n",
+ "12 8 \n",
+ "13 3 \n",
+ "14 6 \n",
+ "15 7 \n",
+ "16 8 \n",
+ "17 4 \n",
+ "18 2 \n",
+ "19 1 \n",
+ "20 1 \n",
+ "21 8 \n",
+ "22 2 \n",
+ "23 7 \n",
+ "24 2 \n",
+ "25 5 \n",
+ "26 5 \n",
+ "27 5 \n",
+ "28 5 \n",
+ "29 8 "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df = pd.read_csv('trial_data.csv')\n",
+ "\n",
+ "# If a `DataFrame` is the last line, it will display a nice summary\n",
+ "trial_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31664b42",
+ "metadata": {},
+ "source": [
+ "And we should see that this exactly matches the table we saw in Excel."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e653c16-ca8d-4641-ac5c-d81e549657ae",
+ "metadata": {},
+ "source": [
+ "The object we got back is called a `DataFrame`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "b8e9d2ba-70fa-4614-8dae-e70f0a2f0db1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(trial_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9124b71d-4468-42ad-b98a-4412d553f369",
+ "metadata": {},
+ "source": [
+ "If we only want to see a small version of the `DataFrame` we can use the `.head()` _method_."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "075dbccc-d1cf-4127-bd61-e15ddab0f2ce",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment weeks_to_failure\n",
+ "0 55 26 66 False 3\n",
+ "1 48 26 66 False 4\n",
+ "2 45 36 32 True 6\n",
+ "3 43 31 23 False 5\n",
+ "4 40 20 45 True 5"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "197cc06c-7490-4ea2-ab98-652c64a37c50",
+ "metadata": {},
+ "source": [
+ "## Acting on Columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "75de8d1e",
+ "metadata": {},
+ "source": [
+ "We can reference each column by name using square brackets `[]`.\n",
+ "For example: Extracting the `age` column like so:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "cacc125e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 55\n",
+ "1 48\n",
+ "2 45\n",
+ "3 43\n",
+ "4 40\n",
+ "5 42\n",
+ "6 55\n",
+ "7 56\n",
+ "8 59\n",
+ "9 51\n",
+ "10 55\n",
+ "11 53\n",
+ "12 40\n",
+ "13 48\n",
+ "14 56\n",
+ "15 53\n",
+ "16 57\n",
+ "17 48\n",
+ "18 51\n",
+ "19 55\n",
+ "20 43\n",
+ "21 48\n",
+ "22 51\n",
+ "23 43\n",
+ "24 51\n",
+ "25 49\n",
+ "26 54\n",
+ "27 45\n",
+ "28 59\n",
+ "29 51\n",
+ "Name: age, dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df['age']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c8d83ab1",
+ "metadata": {},
+ "source": [
+ "### Q1: Extract the `initial_viral_load` column ?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "f99e62ac",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "init_vl = trial_df['initial_viral_load'] # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "68ab0587-f8db-46fa-b758-2320a9ec6858",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "init_vl is a `pd.Series`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('init_vl is a `pd.Series`:', isinstance(init_vl, pd.Series))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "640c9a7e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "init_vl_sum = 1628\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'init_vl_sum = {init_vl.sum()}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2cac446b",
+ "metadata": {},
+ "source": [
+ "Once we can extract columns, we can start summarizing them."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "48ce947a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The mean age of the population is 50.1 yrs.\n"
+ ]
+ }
+ ],
+ "source": [
+ "age_col = trial_df['age']\n",
+ "age_mean = age_col.mean()\n",
+ "print(f'The mean age of the population is {age_mean:0.1f} yrs.')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "35eb614c",
+ "metadata": {},
+ "source": [
+ "Expressions can also be _chained_. \n",
+ "They are functionally the same, the only difference is aesthetic. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "1be80170",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The mean age of the population is 50.1 yrs, even when done on a single line.\n"
+ ]
+ }
+ ],
+ "source": [
+ "age_mean_short = trial_df['age'].mean()\n",
+ "print(f'The mean age of the population is {age_mean_short:0.1f} yrs, even when done on a single line.')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "73927199",
+ "metadata": {},
+ "source": [
+ "### Q2: Calculate the average `weeks_to_failure` for the whole population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "ba3fa20b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "average_weeks = trial_df['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "e6176369",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "average_weeks = 4.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'average_weeks = {average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f948c7a-4e79-4083-9a76-ba7a040240c7",
+ "metadata": {},
+ "source": [
+ "We can also summarize an entire `DataFrame` with a single command."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "bd9ca277-1b6d-4d66-b000-b9c0e1973e38",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "age 50.133333\n",
+ "age_initial_infection 34.366667\n",
+ "initial_viral_load 54.266667\n",
+ "treatment 0.400000\n",
+ "weeks_to_failure 4.900000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bac8ff72-1f06-4ad1-a484-fac4126cf4a3",
+ "metadata": {},
+ "source": [
+ "In this case the summary went _down_ the columns and calculated a mean for each."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "679c12f1-e5bb-42d5-9dcb-7e7634853f67",
+ "metadata": {},
+ "source": [
+ "There are a number of other summarization _methods_.\n",
+ " - `max()`\n",
+ " - `min()`\n",
+ " - `mode()`\n",
+ " - `median()`\n",
+ " - `var()`\n",
+ " - `std()`\n",
+ " - `nunique()`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "11f7825b-21ba-4a24-9316-b91047dc17b6",
+ "metadata": {},
+ "source": [
+ "```{note}\n",
+ ":class: dropdown\n",
+ "Methods, are functions that are attached to an `object`.\n",
+ "They usually act on the object to provide a summary, perform a transformation, or otherwise utilize the information within the object.\n",
+ "In this case, these summarization methods utilize the information within the dataframe to summarize each column.\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "142a4c50-7ac3-4db8-b08b-b0ad4a075b14",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " \n",
+ " \n",
+ " mean \n",
+ " 50.133333 \n",
+ " 34.366667 \n",
+ " 54.266667 \n",
+ " 4.900000 \n",
+ " \n",
+ " \n",
+ " std \n",
+ " 5.569209 \n",
+ " 9.041984 \n",
+ " 24.070204 \n",
+ " 2.202663 \n",
+ " \n",
+ " \n",
+ " min \n",
+ " 40.000000 \n",
+ " 20.000000 \n",
+ " 22.000000 \n",
+ " 1.000000 \n",
+ " \n",
+ " \n",
+ " 25% \n",
+ " 45.750000 \n",
+ " 26.250000 \n",
+ " 36.500000 \n",
+ " 3.250000 \n",
+ " \n",
+ " \n",
+ " 50% \n",
+ " 51.000000 \n",
+ " 34.000000 \n",
+ " 48.500000 \n",
+ " 5.000000 \n",
+ " \n",
+ " \n",
+ " 75% \n",
+ " 55.000000 \n",
+ " 41.750000 \n",
+ " 72.000000 \n",
+ " 6.750000 \n",
+ " \n",
+ " \n",
+ " max \n",
+ " 59.000000 \n",
+ " 50.000000 \n",
+ " 99.000000 \n",
+ " 9.000000 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load weeks_to_failure\n",
+ "count 30.000000 30.000000 30.000000 30.000000\n",
+ "mean 50.133333 34.366667 54.266667 4.900000\n",
+ "std 5.569209 9.041984 24.070204 2.202663\n",
+ "min 40.000000 20.000000 22.000000 1.000000\n",
+ "25% 45.750000 26.250000 36.500000 3.250000\n",
+ "50% 51.000000 34.000000 48.500000 5.000000\n",
+ "75% 55.000000 41.750000 72.000000 6.750000\n",
+ "max 59.000000 50.000000 99.000000 9.000000"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5b40fdd",
+ "metadata": {},
+ "source": [
+ "Selecting columns is nice.\n",
+ "We can also add a new column based on another one.\n",
+ "\n",
+ "In HIV research it is often important to know how long someone has been living with HIV.\n",
+ "However, this dataset contains their current age, and their age at infection.\n",
+ "We can use these two to calculate the length."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "7c162199",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 "
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# first make a new `Series`\n",
+ "years_infected = trial_df['age'] - trial_df['age_initial_infection']\n",
+ "\n",
+ "# Then add that series into the table\n",
+ "trial_df['years_infected'] = years_infected\n",
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "69cd190f-6c41-48e5-805c-5d3bde23a510",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 "
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Alternatively\n",
+ "trial_df['years_infected'] = trial_df['age'] - trial_df['age_initial_infection']\n",
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3d5dc837-650c-44db-8aab-8675491b8049",
+ "metadata": {},
+ "source": [
+ "## Acting on Rows"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f2dac9e-00f7-4442-8a36-20631e73f8f6",
+ "metadata": {},
+ "source": [
+ "### Indexing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c38315cd",
+ "metadata": {},
+ "source": [
+ "When selecting rows, or rows and columns, we need to use the `.loc` attribute of the `DataFrame`.\n",
+ "\n",
+ "We can select by row number."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "85d1364b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "age 55\n",
+ "age_initial_infection 26\n",
+ "initial_viral_load 66\n",
+ "treatment False\n",
+ "weeks_to_failure 3\n",
+ "years_infected 29\n",
+ "Name: 0, dtype: object"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.loc[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "39614ebd-ee4b-46ab-9619-40b14ac66418",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " 34 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "10 55 21 94 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "10 3 34 "
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can use a : to indicate a range.\n",
+ "trial_df.loc[0:10]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "87473eee-abd9-4ca7-9f85-420103cf22c0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 48 \n",
+ " 41 \n",
+ " 99 \n",
+ " False \n",
+ " 3 \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "5 42 20 57 True \n",
+ "7 56 50 22 False \n",
+ "13 48 41 99 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "5 9 22 \n",
+ "7 4 6 \n",
+ "13 3 7 "
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can provide an arbitrary list\n",
+ "trial_df.loc[[0, 5, 7, 13]]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "24b190cc-d554-46ea-b7c3-ada85f89ede5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " initial_viral_load \n",
+ " age \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 66 \n",
+ " 55 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 57 \n",
+ " 42 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 22 \n",
+ " 56 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 99 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " initial_viral_load age\n",
+ "0 66 55\n",
+ "5 57 42\n",
+ "7 22 56\n",
+ "13 99 48"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can also select columns at the same time.\n",
+ "trial_df.loc[[0, 5, 7, 13], ['initial_viral_load', 'age']]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7110e944-753f-4e99-a6d4-c1a62c84ce40",
+ "metadata": {},
+ "source": [
+ "### Boolean Indexing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2327261a-e23e-4f80-ab9c-61d9d593769a",
+ "metadata": {},
+ "source": [
+ "If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing.\n",
+ "In this stragey we create a new `pd.Series` of True/False values where True corresponds to the ones we want."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04de3217-aaea-445d-91c1-f55329913752",
+ "metadata": {},
+ "source": [
+ "Start by finding all people over 50 years old."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "0ddab93f-c9b6-4caa-b47d-37325c748b76",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 "
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "age_mask = trial_df['age'] > 50\n",
+ "aged_samples = trial_df.loc[age_mask]\n",
+ "aged_samples.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31e45076-4226-4f85-9731-21502452138f",
+ "metadata": {},
+ "source": [
+ "```{note}\n",
+ ":class: dropdown\n",
+ "I often use the suffix `_mask` when I create boolean indexes.\n",
+ "It is not required, but utilizing naming conventions makes your code easier to understand by yourself and others.\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f82b318d-5429-4c16-aff9-1b8fb32d35db",
+ "metadata": {},
+ "source": [
+ "Now, if we also wanted to split by the initial_viral_load we might do:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "e3948cff-b118-4fa2-bb49-541567d43404",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "high_vl_mask = trial_df['initial_viral_load'] > 50"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "467ede0c-c706-456e-8cb8-f6b257cdbf86",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " 34 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " 56 \n",
+ " 41 \n",
+ " 59 \n",
+ " False \n",
+ " 6 \n",
+ " 15 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " 51 \n",
+ " 43 \n",
+ " 88 \n",
+ " False \n",
+ " 2 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "10 55 21 94 False \n",
+ "11 53 42 85 True \n",
+ "14 56 41 59 False \n",
+ "24 51 43 88 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "10 3 34 \n",
+ "11 5 11 \n",
+ "14 6 15 \n",
+ "24 2 8 "
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "aged_high_vl = trial_df.loc[age_mask & high_vl_mask]\n",
+ "aged_high_vl.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "7e70c0ea-31ae-4315-81e7-080229ed1b6e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "15 53 47 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "15 7 6 "
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# ~ can be used to say \"not\"\n",
+ "aged_low_vl = trial_df.loc[age_mask & ~high_vl_mask]\n",
+ "aged_low_vl.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45027741-6266-4ea5-86ce-fe20ef65baa3",
+ "metadata": {},
+ "source": [
+ "### Q3: Calculate the average weeks to failure for the treated population?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "5e318b06-4063-4d00-b42a-c13c0b9b55d4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "treated_mask = trial_df['treatment'] == True # SOLUTION NO PROMPT\n",
+ "treated_average_weeks = trial_df.loc[treated_mask, 'weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "dd141501-a148-4168-b9bc-3448e7e60028",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks = 6.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0673dd84-d614-4c05-afcd-941163c57608",
+ "metadata": {},
+ "source": [
+ "Utilizing boolean indexing you can express _any_ algorithmic row selecting strategy.\n",
+ "This can even include comparisons between rows, for example if there were multiple rows of the same sample.\n",
+ "We will cover these strategies later in the course."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c212312f-58e6-4354-8bb3-94df1bc669f2",
+ "metadata": {},
+ "source": [
+ "Sometimes, our searches are simple.\n",
+ "Pandas also includes another method for indexing rows called `.query()` for these purposes."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b87435e-b89c-4cf5-be6b-526faf8469fd",
+ "metadata": {},
+ "source": [
+ "### Querying"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1abf62a",
+ "metadata": {},
+ "source": [
+ "`.query()` is an interface that facilitates simple queries qith a few specific limitations:\n",
+ " - It can only use the information present in the row.\n",
+ " - It can only work on one row at a time.\n",
+ " - Column headers cannot contain spaces, dots, dashes, commas, or emoji."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e6ea84aa-edc6-40e8-aebd-e8eedbfd59e8",
+ "metadata": {},
+ "source": [
+ "Our questions on this dataset easily fit within those constraints."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "19e82c53",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "9 51 30 49 True \n",
+ "11 53 42 85 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "9 7 21 \n",
+ "11 5 11 "
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# All treatment rows\n",
+ "trial_df.query('treatment == True').head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "c2ac06a0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "3 43 31 23 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "3 5 12 \n",
+ "6 4 24 \n",
+ "7 4 6 "
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('treatment == False').head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d7c3caf",
+ "metadata": {},
+ "source": [
+ "You can also make them more complex."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "id": "7a4fa71b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " 16 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " 48 \n",
+ " 37 \n",
+ " 99 \n",
+ " True \n",
+ " 8 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " 43 \n",
+ " 34 \n",
+ " 48 \n",
+ " True \n",
+ " 7 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " 45 \n",
+ " 25 \n",
+ " 87 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "9 51 30 49 True \n",
+ "11 53 42 85 True \n",
+ "12 40 34 27 True \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "21 48 37 99 True \n",
+ "23 43 34 48 True \n",
+ "27 45 25 87 True \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "9 7 21 \n",
+ "11 5 11 \n",
+ "12 8 6 \n",
+ "15 7 6 \n",
+ "16 8 16 \n",
+ "21 8 11 \n",
+ "23 7 9 \n",
+ "27 5 20 \n",
+ "29 8 8 "
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('age > 33 & treatment == True')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8b2af46a",
+ "metadata": {},
+ "source": [
+ "This statement doesn't make a \"biological sense\", but it is an example of a valid comparison."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "id": "af1fd110",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " 16 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " 51 \n",
+ " 42 \n",
+ " 25 \n",
+ " False \n",
+ " 2 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " 55 \n",
+ " 46 \n",
+ " 45 \n",
+ " False \n",
+ " 1 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " 51 \n",
+ " 27 \n",
+ " 36 \n",
+ " False \n",
+ " 2 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " 59 \n",
+ " 40 \n",
+ " 49 \n",
+ " False \n",
+ " 5 \n",
+ " 19 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "12 40 34 27 True \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "18 51 42 25 False \n",
+ "19 55 46 45 False \n",
+ "22 51 27 36 False \n",
+ "28 59 40 49 False \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "12 8 6 \n",
+ "15 7 6 \n",
+ "16 8 16 \n",
+ "18 2 9 \n",
+ "19 1 9 \n",
+ "22 2 24 \n",
+ "28 5 19 \n",
+ "29 8 8 "
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('age >= initial_viral_load')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e68592de",
+ "metadata": {},
+ "source": [
+ "### Q4: Calculate the average `weeks_to_failure` for the untreated population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "id": "cfcdf2f7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# BEGIN SOLUTION NO PROMPT\n",
+ "\n",
+ "wanted_samples = trial_df.query('treatment == False')\n",
+ "\n",
+ "# END SOLUTION\n",
+ "\n",
+ "untreated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "5ea72ed6-01ee-48d4-800b-5d145f3517ad",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Untreated participants took 3.6 weeks to rebound.\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Untreated participants took {untreated_average_weeks:0.1f} weeks to rebound.')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "7f804ed1-a9c2-41a1-bc31-58035c91e967",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "untreated_average_weeks is a `float`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('untreated_average_weeks is a `float`:', isinstance(untreated_average_weeks, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "8f8d7324",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "untreated_average_weeks = 3.6\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'untreated_average_weeks = {untreated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e87cce47",
+ "metadata": {},
+ "source": [
+ "### Q4: Calculate the average `weeks_to_failure` for the treated population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "2742d0bb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# BEGIN SOLUTION NO PROMPT\n",
+ "\n",
+ "wanted_samples = trial_df.query('treatment == True')\n",
+ "\n",
+ "# END SOLUTION\n",
+ "\n",
+ "treated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "c6b2bfa0-8673-4666-adcd-31a1a574fd79",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Treated patients took 6.9 weeks to rebound.\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Treated patients took {treated_average_weeks:0.1f} weeks to rebound.')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "79786fec-0e7c-461e-9a65-0fac65d6870a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks is a `float`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('treated_average_weeks is a `float`:', isinstance(treated_average_weeks, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "id": "ea73783c",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks = 6.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5af4b494",
+ "metadata": {},
+ "source": [
+ "# Conclusion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a2d1c3b8",
+ "metadata": {},
+ "source": [
+ "We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks.\n",
+ "While not a complete cure, any incremental step is useful progress in the elimination of HIV.\n",
+ "\n",
+ "In the lab you will use similar techniques to explore whether other factors in this dataset impact the results.\n",
+ "In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "493f93a7",
+ "metadata": {},
+ "source": [
+ "---------------------------------------------"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/content/Module01/Module01_book.html b/content/Module01/Module01_book.html
index 94b7e7f..82e684d 100644
--- a/content/Module01/Module01_book.html
+++ b/content/Module01/Module01_book.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module01/Module01_walkthrough_book.html b/content/Module01/Module01_walkthrough_book.html
index fe0cf47..b75b590 100644
--- a/content/Module01/Module01_walkthrough_book.html
+++ b/content/Module01/Module01_walkthrough_book.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module01/notebook_actions.html b/content/Module01/notebook_actions.html
index b23ab5c..a86da7c 100644
--- a/content/Module01/notebook_actions.html
+++ b/content/Module01/notebook_actions.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module02/Module02_book.html b/content/Module02/Module02_book.html
index ced0d9c..5fdfba3 100644
--- a/content/Module02/Module02_book.html
+++ b/content/Module02/Module02_book.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module02/Module02_walkthrough_book.html b/content/Module02/Module02_walkthrough_book.html
index d9ff590..41dd307 100644
--- a/content/Module02/Module02_walkthrough_book.html
+++ b/content/Module02/Module02_walkthrough_book.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module02/dilution_calculations.html b/content/Module02/dilution_calculations.html
index a2d3bf2..7a19ed2 100644
--- a/content/Module02/dilution_calculations.html
+++ b/content/Module02/dilution_calculations.html
@@ -62,6 +62,7 @@
+
@@ -193,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
@@ -390,6 +396,15 @@ Dilution calculations
+
+
+
next
+
Module 3: DataFrames
+
+
+
diff --git a/content/Module02/nanopore_description.html b/content/Module02/nanopore_description.html
index 8b86d9d..13b3893 100644
--- a/content/Module02/nanopore_description.html
+++ b/content/Module02/nanopore_description.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/Module03/Module03_book.html b/content/Module03/Module03_book.html
new file mode 100644
index 0000000..c2c1f4d
--- /dev/null
+++ b/content/Module03/Module03_book.html
@@ -0,0 +1,467 @@
+
+
+
+
+
+
+
+
+
+
+ Module 3: DataFrames — Quantitative Reasoning in Biology
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Skip to main content
+
+
+
+
+
+ Back to top
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Module 3: DataFrames
+
+
+
+
+
+
+
+
+
+
+Module 3: DataFrames
+This chapter will discuss how to use Python and Pandas to load and summarize spreadsheet style data.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/content/Module03/Module03_walkthrough_book.html b/content/Module03/Module03_walkthrough_book.html
new file mode 100644
index 0000000..596e46f
--- /dev/null
+++ b/content/Module03/Module03_walkthrough_book.html
@@ -0,0 +1,2602 @@
+
+
+
+
+
+
+
+
+
+
+ Module 03 Walkthrough — Quantitative Reasoning in Biology
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Skip to main content
+
+
+
+
+
+ Back to top
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Module 03 Walkthrough
+
+
+
+
+
+
Contents
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Module 03 Walkthrough
+Remember, all assignments are due before the synchronous session.
+
+Introduction
+Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial.
+In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group.
+After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first “detectable” viral load was found.
+We will use Pandas
to analyze this data and evaluate the treatment’s effectiveness.
+By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in DataFrames
, and using summary methods like sum, mean, and max.
+Let’s get started!
+
+
+Learning Objectives
+At the end of this learning activity you will be able to:
+
+Practice loading spreadsheet data into Python using pandas
.
+Use Python methods to create derived columns in pd.DataFrames
.
+Use Pandas
summary methods like sum, mean, and max.
+Employ basic filtering and data extraction from pandas
.
+
+
+
+Dataset Reference
+File : trial_data.csv
+Columns :
+
+age
: (years) Current age during the study.
+age_initial_infection
: (years) Age at which the participant was initially infected.
+initial_viral_load
: (copies/ul) The level of infection at the start of the study.
+treatment
: (boolean) True
for participant in the treatment group, False
for those in the control group.
+weeks_to_failure
: (weeks) Time from the treatment to the first week of uncontrolled viral load.
+
+
+
+Imports
+While basic Python can do a lot, you have to do everything yourself.
+The real power of Python is that you can import
code that is written by others.
+For this course, we will use a common data science stack of interoperable tools centered around the Numpy .
+There are four that we will use regularly, two of which we’ll cover today.
+
+Numpy
+Numpy
+A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.
+By convention, the community tends to alias the long numpy
as np
.
+
+
+
+Pandas
+Pandas
+A libary that sits atop numpy
and provides a spreadsheet style object called a DataFrame
along with a plethora of data sciecne utilities.
+This is the main tool we will be using for data exploration.
+By convention, the community tends to alias the long pandas
as pd
.
+
+Nicely, it can read csv
files for us.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+
+
+ 5
+ 42
+ 20
+ 57
+ True
+ 9
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+
+
+ 8
+ 59
+ 33
+ 33
+ False
+ 5
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+
+
+ 10
+ 55
+ 21
+ 94
+ False
+ 3
+
+
+ 11
+ 53
+ 42
+ 85
+ True
+ 5
+
+
+ 12
+ 40
+ 34
+ 27
+ True
+ 8
+
+
+ 13
+ 48
+ 41
+ 99
+ False
+ 3
+
+
+ 14
+ 56
+ 41
+ 59
+ False
+ 6
+
+
+ 15
+ 53
+ 47
+ 38
+ True
+ 7
+
+
+ 16
+ 57
+ 41
+ 42
+ True
+ 8
+
+
+ 17
+ 48
+ 33
+ 57
+ False
+ 4
+
+
+ 18
+ 51
+ 42
+ 25
+ False
+ 2
+
+
+ 19
+ 55
+ 46
+ 45
+ False
+ 1
+
+
+ 20
+ 43
+ 24
+ 46
+ False
+ 1
+
+
+ 21
+ 48
+ 37
+ 99
+ True
+ 8
+
+
+ 22
+ 51
+ 27
+ 36
+ False
+ 2
+
+
+ 23
+ 43
+ 34
+ 48
+ True
+ 7
+
+
+ 24
+ 51
+ 43
+ 88
+ False
+ 2
+
+
+ 25
+ 49
+ 20
+ 76
+ False
+ 5
+
+
+ 26
+ 54
+ 47
+ 74
+ False
+ 5
+
+
+ 27
+ 45
+ 25
+ 87
+ True
+ 5
+
+
+ 28
+ 59
+ 40
+ 49
+ False
+ 5
+
+
+ 29
+ 51
+ 43
+ 38
+ True
+ 8
+
+
+
+
+
+And we should see that this exactly matches the table we saw in Excel.
+The object we got back is called a DataFrame
.
+
+
+
+
pandas.core.frame.DataFrame
+
+
+
+
+If we only want to see a small version of the DataFrame
we can use the .head()
method .
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+
+
+
+
+
+
+
+
+Acting on Columns
+We can reference each column by name using square brackets []
.
+For example: Extracting the age
column like so:
+
+
+
+
0 55
+1 48
+2 45
+3 43
+4 40
+5 42
+6 55
+7 56
+8 59
+9 51
+10 55
+11 53
+12 40
+13 48
+14 56
+15 53
+16 57
+17 48
+18 51
+19 55
+20 43
+21 48
+22 51
+23 43
+24 51
+25 49
+26 54
+27 45
+28 59
+29 51
+Name: age, dtype: int64
+
+
+
+
+
+Q1: Extract the initial_viral_load
column ?
+
+
+
+
+
init_vl is a `pd.Series`: True
+
+
+
+
+
+Once we can extract columns, we can start summarizing them.
+
+
+
+
The mean age of the population is 50.1 yrs.
+
+
+
+
+Expressions can also be chained .
+They are functionally the same, the only difference is aesthetic.
+
+
+
+
The mean age of the population is 50.1 yrs, even when done on a single line.
+
+
+
+
+
+
+Q2: Calculate the average weeks_to_failure
for the whole population?
+
+
+We can also summarize an entire DataFrame
with a single command.
+
+
+
+
age 50.133333
+age_initial_infection 34.366667
+initial_viral_load 54.266667
+treatment 0.400000
+weeks_to_failure 4.900000
+dtype: float64
+
+
+
+
+In this case the summary went down the columns and calculated a mean for each.
+There are a number of other summarization methods .
+
+max()
+min()
+mode()
+median()
+var()
+std()
+nunique()
+
+
+
Note
+
Methods, are functions that are attached to an object
.
+They usually act on the object to provide a summary, perform a transformation, or otherwise utilize the information within the object.
+In this case, these summarization methods utilize the information within the dataframe to summarize each column.
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ weeks_to_failure
+
+
+
+
+ count
+ 30.000000
+ 30.000000
+ 30.000000
+ 30.000000
+
+
+ mean
+ 50.133333
+ 34.366667
+ 54.266667
+ 4.900000
+
+
+ std
+ 5.569209
+ 9.041984
+ 24.070204
+ 2.202663
+
+
+ min
+ 40.000000
+ 20.000000
+ 22.000000
+ 1.000000
+
+
+ 25%
+ 45.750000
+ 26.250000
+ 36.500000
+ 3.250000
+
+
+ 50%
+ 51.000000
+ 34.000000
+ 48.500000
+ 5.000000
+
+
+ 75%
+ 55.000000
+ 41.750000
+ 72.000000
+ 6.750000
+
+
+ max
+ 59.000000
+ 50.000000
+ 99.000000
+ 9.000000
+
+
+
+
+
+Selecting columns is nice.
+We can also add a new column based on another one.
+In HIV research it is often important to know how long someone has been living with HIV.
+However, this dataset contains their current age, and their age at infection.
+We can use these two to calculate the length.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+ 22
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+ 12
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+ 20
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+ 22
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+ 12
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+ 20
+
+
+
+
+
+
+
+
+Acting on Rows
+
+Indexing
+When selecting rows, or rows and columns, we need to use the .loc
attribute of the DataFrame
.
+We can select by row number.
+
+
+
+
age 55
+age_initial_infection 26
+initial_viral_load 66
+treatment False
+weeks_to_failure 3
+years_infected 29
+Name: 0, dtype: object
+
+
+
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+ 22
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+ 12
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+ 20
+
+
+ 5
+ 42
+ 20
+ 57
+ True
+ 9
+ 22
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+ 24
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+ 8
+ 59
+ 33
+ 33
+ False
+ 5
+ 26
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+ 10
+ 55
+ 21
+ 94
+ False
+ 3
+ 34
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 5
+ 42
+ 20
+ 57
+ True
+ 9
+ 22
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+ 13
+ 48
+ 41
+ 99
+ False
+ 3
+ 7
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ initial_viral_load
+ age
+
+
+
+
+ 0
+ 66
+ 55
+
+
+ 5
+ 57
+ 42
+
+
+ 7
+ 22
+ 56
+
+
+ 13
+ 99
+ 48
+
+
+
+
+
+
+
+Boolean Indexing
+If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing.
+In this stragey we create a new pd.Series
of True/False values where True corresponds to the ones we want.
+Start by finding all people over 50 years old.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+ 24
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+ 8
+ 59
+ 33
+ 33
+ False
+ 5
+ 26
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+
+
+
+
+
Note
+
I often use the suffix _mask
when I create boolean indexes.
+It is not required, but utilizing naming conventions makes your code easier to understand by yourself and others.
+
+Now, if we also wanted to split by the initial_viral_load we might do:
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 10
+ 55
+ 21
+ 94
+ False
+ 3
+ 34
+
+
+ 11
+ 53
+ 42
+ 85
+ True
+ 5
+ 11
+
+
+ 14
+ 56
+ 41
+ 59
+ False
+ 6
+ 15
+
+
+ 24
+ 51
+ 43
+ 88
+ False
+ 2
+ 8
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+ 24
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+ 8
+ 59
+ 33
+ 33
+ False
+ 5
+ 26
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+ 15
+ 53
+ 47
+ 38
+ True
+ 7
+ 6
+
+
+
+
+
+
+
+Q3: Calculate the average weeks to failure for the treated population?
+
+
+
+
+
treated_average_weeks = 6.9
+
+
+
+
+Utilizing boolean indexing you can express any algorithmic row selecting strategy.
+This can even include comparisons between rows, for example if there were multiple rows of the same sample.
+We will cover these strategies later in the course.
+Sometimes, our searches are simple.
+Pandas also includes another method for indexing rows called .query()
for these purposes.
+
+
+Querying
+.query()
is an interface that facilitates simple queries qith a few specific limitations:
+
+It can only use the information present in the row.
+It can only work on one row at a time.
+Column headers cannot contain spaces, dots, dashes, commas, or emoji.
+
+Our questions on this dataset easily fit within those constraints.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+ 20
+
+
+ 5
+ 42
+ 20
+ 57
+ True
+ 9
+ 22
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+ 11
+ 53
+ 42
+ 85
+ True
+ 5
+ 11
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 0
+ 55
+ 26
+ 66
+ False
+ 3
+ 29
+
+
+ 1
+ 48
+ 26
+ 66
+ False
+ 4
+ 22
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+ 12
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+ 24
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+
+
+
+You can also make them more complex.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 4
+ 40
+ 20
+ 45
+ True
+ 5
+ 20
+
+
+ 5
+ 42
+ 20
+ 57
+ True
+ 9
+ 22
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+ 11
+ 53
+ 42
+ 85
+ True
+ 5
+ 11
+
+
+ 12
+ 40
+ 34
+ 27
+ True
+ 8
+ 6
+
+
+ 15
+ 53
+ 47
+ 38
+ True
+ 7
+ 6
+
+
+ 16
+ 57
+ 41
+ 42
+ True
+ 8
+ 16
+
+
+ 21
+ 48
+ 37
+ 99
+ True
+ 8
+ 11
+
+
+ 23
+ 43
+ 34
+ 48
+ True
+ 7
+ 9
+
+
+ 27
+ 45
+ 25
+ 87
+ True
+ 5
+ 20
+
+
+ 29
+ 51
+ 43
+ 38
+ True
+ 8
+ 8
+
+
+
+
+
+This statement doesn’t make a “biological sense”, but it is an example of a valid comparison.
+
+
+
+
+
+
+
+
+
+ age
+ age_initial_infection
+ initial_viral_load
+ treatment
+ weeks_to_failure
+ years_infected
+
+
+
+
+ 2
+ 45
+ 36
+ 32
+ True
+ 6
+ 9
+
+
+ 3
+ 43
+ 31
+ 23
+ False
+ 5
+ 12
+
+
+ 6
+ 55
+ 31
+ 23
+ False
+ 4
+ 24
+
+
+ 7
+ 56
+ 50
+ 22
+ False
+ 4
+ 6
+
+
+ 8
+ 59
+ 33
+ 33
+ False
+ 5
+ 26
+
+
+ 9
+ 51
+ 30
+ 49
+ True
+ 7
+ 21
+
+
+ 12
+ 40
+ 34
+ 27
+ True
+ 8
+ 6
+
+
+ 15
+ 53
+ 47
+ 38
+ True
+ 7
+ 6
+
+
+ 16
+ 57
+ 41
+ 42
+ True
+ 8
+ 16
+
+
+ 18
+ 51
+ 42
+ 25
+ False
+ 2
+ 9
+
+
+ 19
+ 55
+ 46
+ 45
+ False
+ 1
+ 9
+
+
+ 22
+ 51
+ 27
+ 36
+ False
+ 2
+ 24
+
+
+ 28
+ 59
+ 40
+ 49
+ False
+ 5
+ 19
+
+
+ 29
+ 51
+ 43
+ 38
+ True
+ 8
+ 8
+
+
+
+
+
+
+
+Q4: Calculate the average weeks_to_failure
for the untreated population?
+
+
+
+
+
Untreated participants took 3.6 weeks to rebound.
+
+
+
+
+
+
+
+
untreated_average_weeks is a `float`: True
+
+
+
+
+
+
+
+
untreated_average_weeks = 3.6
+
+
+
+
+
+
+Q4: Calculate the average weeks_to_failure
for the treated population?
+
+
+
+
+
Treated patients took 6.9 weeks to rebound.
+
+
+
+
+
+
+
+
treated_average_weeks is a `float`: True
+
+
+
+
+
+
+
+
treated_average_weeks = 6.9
+
+
+
+
+
+
+
+
+Conclusion
+We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks.
+While not a complete cure, any incremental step is useful progress in the elimination of HIV.
+In the lab you will use similar techniques to explore whether other factors in this dataset impact the results.
+In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/content/book_index.html b/content/book_index.html
index 6d3bae2..4a7789c 100644
--- a/content/book_index.html
+++ b/content/book_index.html
@@ -195,6 +195,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/misc/about_this_book.html b/content/misc/about_this_book.html
index f90c8f3..f879c8e 100644
--- a/content/misc/about_this_book.html
+++ b/content/misc/about_this_book.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/content/misc/book_intro.html b/content/misc/book_intro.html
index a7762bc..0880fd2 100644
--- a/content/misc/book_intro.html
+++ b/content/misc/book_intro.html
@@ -194,6 +194,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/genindex.html b/genindex.html
index 87e9743..923c428 100644
--- a/genindex.html
+++ b/genindex.html
@@ -193,6 +193,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb b/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb
index 51335f0..96bc54c 100644
--- a/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb
+++ b/jupyter_execute/content/Module01/Module01_walkthrough_book.ipynb
@@ -416,9 +416,9 @@
" import otter\n",
"\n",
"if not os.path.exists('walkthrough-tests'):\n",
- " zip_files = [f for f in os.listdir()]\n",
+ " zip_files = [f for f in os.listdir() if f.endswith('.zip')]\n",
" assert len(zip_files)>0, 'Could not find any zip files!'\n",
- " assert len(zip_files)>1, 'Found multiple zip files!'\n",
+ " assert len(zip_files)==1, 'Found multiple zip files!'\n",
" ! unzip {zip_files[0]}\n",
"\n",
"grader = otter.Notebook(colab=True,\n",
diff --git a/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb b/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb
new file mode 100644
index 0000000..d879fc2
--- /dev/null
+++ b/jupyter_execute/content/Module02/Module02_walkthrough_book.ipynb
@@ -0,0 +1,744 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "a10e9828",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "# Walkthrough\n",
+ "\n",
+ "Remember, all assignments are due before the weekly synchronous session."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "719610dc",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives\n",
+ "At the end of this learning activity you will be able to:\n",
+ "\n",
+ " - Use basic arithmetic operations in Python.\n",
+ " - Summarize the basic expression syntax in Python.\n",
+ " - Write an equation that uses the result of one variable to calculate the value of another. \n",
+ " - Create basic `f-strings` in Python to display dynamically created data.\n",
+ " - Summarize a general strategy for using Python to calculate dilutions."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38ad9ba4",
+ "metadata": {},
+ "source": [
+ "## Programmatic Arithmetic in Python"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2528718d",
+ "metadata": {},
+ "source": [
+ "Often times in the lab we have common tasks that we repeat over and over again. \n",
+ "This can be anything from counting the number of cells on a plate, to normalizing values with a reference, to calculating dilutions for stock chemicals.\n",
+ "Automating these types of tasks can lead to drastic speedups in the time it takes to get common tasks done. \n",
+ "This week we'll use a common problem from molecular biology as our jumping off point into Python.\n",
+ "\n",
+ "Recently, my lab obtained a Nanopore MinION.\n",
+ "It is a 1000 dollar, USB-key sized DNA sequencer that reads millions of bases for about 100 dollars per sample.\n",
+ "As part of a Senior Design Project we used the device to track the COVID outbreak in the Drexel community using rapid sequencing.\n",
+ "Watch the video explaining the project in the Recommended Materials for more context.\n",
+ "This protocol requires numerous tedious calculations relating mass, moles, and concentrations.\n",
+ "This week we will explore how to use Python to automate these calculations."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bd70cef6",
+ "metadata": {},
+ "source": [
+ "The Nanopore sequencing protocol requires the operator to perform 3 enzymatic reactions:\n",
+ " 1. `End-Prep`: Prepare the 3' and 5' ends of the DNA by removing single-basepair overhangs and add a single `A` at the end of the molecule.\n",
+ " 2. `Barcode ligation`: Attach unique barcodes to each sample using a `T` overhang so each sample has an individual *key* at the start of the sequence.\n",
+ " 3. `Adapter ligation`: After pooling each sample, another DNA molecule (called an *adapter*) needs to be added so it can attach to the motor protein inside the Nanopore device.\n",
+ " \n",
+ "Refer to the online textbook for more detail."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "280e4fda",
+ "metadata": {},
+ "source": [
+ "## The Problem"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2b7eead",
+ "metadata": {},
+ "source": [
+ "Just like baking, when performing enzymatic reactions it is critical that we use the right amount of each ingredient.\n",
+ "The Nanopore enzymatic reagents come in prescribed amounts and it is up to the operator to ensure that the correct initial amount of template DNA is added to each reaction.\n",
+ "\n",
+ "The amount of template DNA needed for each reaction is listed in the protocol in [*moles*](https://en.wikipedia.org/wiki/Mole_(unit)).\n",
+ "Moles are a unit of \"amount\" such as the number of molecules of DNA, there are 6.022 × 10^(23) items in a mole.\n",
+ "However, we can't *count* the amount of DNA we have in a test-tube.\n",
+ "But, we can *weight* the DNA by looking at the amount of light absorbed by the sample using a device called a [Qubit](https://www.youtube.com/watch?v=RRKZN--7jqg).\n",
+ "Then, if we know the number of nucleotides in the strand, we can convert the weight of the DNA into a number of *moles*.\n",
+ "Refer to the course book for a in-depth review of math.\n",
+ "\n",
+ "Doing this calculation manually is tedious and prone to error. The perfect thing to automate."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "35d4169c",
+ "metadata": {},
+ "source": [
+ "## Walkthrough"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c516dbb5",
+ "metadata": {},
+ "source": [
+ "We do this through a series of *expressions*.\n",
+ "Remember, the computer is not 'space limited' we should write code so WE understand it.\n",
+ "Not, try to make everything as short and compact as possible."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "add539e5",
+ "metadata": {},
+ "source": [
+ "Assume you have a 25 ul of a 280 bp double-stranded template at that you measured to be a concentration of 50.6 ng/ul."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "9ff40df4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# It is often useful to define all of your variables at the beginning.\n",
+ "amplicon_length = 280 # bp\n",
+ "dna_weight = 650 # g/mole/bp\n",
+ "dna_conc = 50.6 # ng/ul\n",
+ "volume = 25 # ul"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f78b8ae",
+ "metadata": {},
+ "source": [
+ "## What is the template weight?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "2ac5ff60",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The template weighs 182000 g/mole\n"
+ ]
+ }
+ ],
+ "source": [
+ "template_weight = amplicon_length*dna_weight\n",
+ "print(f'The template weighs {template_weight} g/mole')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aaa997e6",
+ "metadata": {},
+ "source": [
+ "## Q1: Calculate the molarity of the sample"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "b387af00",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The DNA molarity is 278.02197802197804 fmoles/ul\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Answer in fmoles/ul\n",
+ "\n",
+ "dna_molarity = dna_conc * 1E-9 / template_weight / 1E-15 # SOLUTION\n",
+ "print(f'The DNA molarity is {dna_molarity} fmoles/ul')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "7468753b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Is dna_molarity a float: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('Is dna_molarity a float:', isinstance(dna_molarity, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "2b1265bc",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "dna_molarity = 278.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'dna_molarity = {dna_molarity:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0538443b",
+ "metadata": {},
+ "source": [
+ "Some things to notice above:\n",
+ " 1. There's an `f` immediately before the `'`. This makes it a \"formatted\" string. Or `f-string`.\n",
+ " 2. There's a lot of different colors changing."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2310a821",
+ "metadata": {},
+ "source": [
+ "### `f-strings`\n",
+ "\n",
+ "These are a new (circa 2016) addition to Python that makes adding data into strings.\n",
+ "Representing our results as dynamically changing explanatory statements helps make our analysis more transparent and reproducible.\n",
+ "`f-strings` make this much easier.\n",
+ "\n",
+ "Take a look at this post from [The Python Guru](https://thepythonguru.com/python-string-formatting/) for an indepth explanation of the formatting."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f7942e2",
+ "metadata": {},
+ "source": [
+ "### Linting through color\n",
+ "\n",
+ "If we look around our notebook, we can see that there are a lot of different text colors.\n",
+ "Those are hints at what Python thinks we're trying to tell it.\n",
+ "Understanding the code can really help with debugging.\n",
+ "\n",
+ "\n",
+ "Numbers are green.\n",
+ "```python\n",
+ "1231231\n",
+ "```\n",
+ "\n",
+ "Variables are black.\n",
+ "```python\n",
+ "val = 1231231\n",
+ "other = val\n",
+ "```\n",
+ "\n",
+ "Strings are orange.\n",
+ "```python\n",
+ "val = '1231231'\n",
+ "```\n",
+ "_Even if they are strings of numbers._\n",
+ "\n",
+ "`f-strings` are orange.\n",
+ "```python\n",
+ "val = f'1231231'\n",
+ "```\n",
+ "\n",
+ "\n",
+ "`f-strings` are orange, unless it is between `{` `}`.\n",
+ "```python\n",
+ "age = 12\n",
+ "val = f'This book is {age} years old.'\n",
+ "```\n",
+ "\n",
+ "The parts between curly braces are replaced by the value in the code.\n",
+ "\n",
+ "\n",
+ "Notice how imbalanced braces alters the color.\n",
+ "```python\n",
+ "age = 12\n",
+ "val = f'This book is {age years old.'\n",
+ "```\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6d7c3cd8",
+ "metadata": {},
+ "source": [
+ "## Q2: Calculate the amount of sample to add.\n",
+ "\n",
+ "The protocol requires us to start with 200 fmoles of template DNA.\n",
+ "How many mircoliters of our stock do we need to start with?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "077c05a7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "You should add 0.72 ul of sample to your reaction.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Answer in ul\n",
+ "\n",
+ "wanted_dna = 200 # fmoles\n",
+ "\n",
+ "volume_to_add = wanted_dna / dna_molarity # SOLUTION\n",
+ "\n",
+ "print(f'You should add {volume_to_add:0.2f} ul of sample to your reaction.')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "999d7794",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Is volume_to_add a float: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('Is volume_to_add a float:', isinstance(volume_to_add, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "23f4c888",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "volume_to_add = 0.72\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'volume_to_add = {volume_to_add:0.2f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a2e6b58e",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "## Q3: Describing the reaction yield\n",
+ "\n",
+ "Calculating how much **total** amount of DNA we created during the PCR is called the _yield_ of the reaction.\n",
+ "\n",
+ "Create an `f-string` that renders the yield in femtomoles of this reaction. Round your answer to the nearest integer."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "419fd114",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The experiment yielded 6951 fmoles of DNA.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Calculate the amount of DNA in the entire reaction\n",
+ "# Answer in fmoles\n",
+ "dna_yield = dna_molarity*volume # SOLUTION\n",
+ "\n",
+ "# Create an f-string that uses the dna_yield variable\n",
+ "# and describes the result in a short sentence\n",
+ "dna_yield_description = f'The experiment yielded {dna_yield:0.0f} fmoles of DNA.' # SOLUTION\n",
+ "\n",
+ "print(dna_yield_description)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "e6576ec2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Is dna_yield_description a str: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('Is dna_yield_description a str:', isinstance(dna_yield_description, str))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "c7f9a1c7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Is the correct number in the description: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('Is the correct number in the description:', '6951' in dna_yield_description)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5af9a301-0a1e-47ed-8360-8cba5370ae6d",
+ "metadata": {},
+ "source": [
+ "## Functions"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b04917b2-c287-45ca-950a-ed84d8a524a3",
+ "metadata": {},
+ "source": [
+ "Functions are self contained blocks of code created for a reusable purpose.\n",
+ "\n",
+ "**Purpose:**\n",
+ "* Modularity: Breaks down complex processes into smaller, manageable parts.\n",
+ "* Reusability: Allows the same code to be used multiple times without repetition.\n",
+ "* Organization: Makes the code more organized and easier to understand.\n",
+ "\n",
+ "\n",
+ "```python\n",
+ "def function_name(arg1, arg2, kwarg1=1, kwarg2='a'):\n",
+ " \"A brief function description\"\n",
+ "\n",
+ " # do something with inputs\n",
+ " result = arg1 + 2*arg2\n",
+ "\n",
+ " return result\n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "09b1a9c9-ef18-48be-a11c-29b3ce7f19ab",
+ "metadata": {},
+ "source": [
+ "Instead of continually copy-paste-and-change, we should write a function.\n",
+ "\n",
+ "We've been using something like this to calculate the molarity from the concentration.\n",
+ "\n",
+ "```python\n",
+ "dna_molarity = dna_conc * 1E-9 / template_weight / 1E-15 \n",
+ "\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "0338098a-34d3-4fb1-a150-7177effdb519",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def calc_molarity(sample_concentration, sample_length, base_weight=650):\n",
+ " \"\"\"Calculate molarity of samples.\n",
+ "\n",
+ " sample_concentration : ng/ul\n",
+ " sample_length : bases\n",
+ " base_weight : g/mole/bp\n",
+ "\n",
+ " returns molarity fmols/ul\n",
+ " \"\"\"\n",
+ "\n",
+ " nano = 1E-9\n",
+ " fempto = 1E-15\n",
+ "\n",
+ " amplicon_weight = sample_length*base_weight\n",
+ " molarity = sample_concentration * nano / amplicon_weight / fempto\n",
+ "\n",
+ " return molarity\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ecc78848-963b-4775-a5e4-1c25774af6ab",
+ "metadata": {},
+ "source": [
+ "Once created, we can use this function anywhere."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "bbd13b0d-ec1b-462a-9989-118dfc1fe04e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Function calculated paragon molarity 278.0 fmols/ul\n"
+ ]
+ }
+ ],
+ "source": [
+ "paragon_molarity = calc_molarity(50.6, 280)\n",
+ "print(f'Function calculated paragon molarity {paragon_molarity:0.1f} fmols/ul')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1c4aca50-9a5c-4f89-85a8-5abf3a6ca9d5",
+ "metadata": {},
+ "source": [
+ "Now, if we had another sample with a different concentration."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "d3f2afaf-186b-4647-b674-c5db087feb63",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Function calculated new molarity 827.5 fmols/ul\n"
+ ]
+ }
+ ],
+ "source": [
+ "new_concentration = 150.6 # ng/ul\n",
+ "\n",
+ "new_paragon_molarity = calc_molarity(new_concentration, 280)\n",
+ "print(f'Function calculated new molarity {new_paragon_molarity:0.1f} fmols/ul')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b36d47c0-9b06-4a18-9cba-f2167f5a4049",
+ "metadata": {},
+ "source": [
+ "Or, if *for some reason* you were making RNA, the `base_weight` would be different."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "3754737d-7d32-4dec-9e6a-1aa9df64ac9f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Function calculated rna molarity 1680.8 fmols/ul\n"
+ ]
+ }
+ ],
+ "source": [
+ "rna_paragon_molarity = calc_molarity(new_concentration, 280, base_weight=320)\n",
+ "print(f'Function calculated rna molarity {rna_paragon_molarity:0.1f} fmols/ul')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0dcc246-60c5-49e0-8f67-5f392010d7b1",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "## Q4: Write a function which calculates the reaction yield\n",
+ "\n",
+ "Use the function above as a template to create on that further calculates the reaction yield in `fmols`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "de147208-c0ef-4383-a394-3340e64865e5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def calc_yield(sample_concentration, sample_length, sample_volume, base_weight=650):\n",
+ " \"\"\"Calculate molarity of samples.\n",
+ "\n",
+ " sample_concentration : ng/ul\n",
+ " sample_length : bases\n",
+ " base_weight : g/mole/bp\n",
+ "\n",
+ " returns sample_yield in fmols\n",
+ " \"\"\"\n",
+ " # BEGIN SOLUTION NO PROMPT\n",
+ "\n",
+ " molarity = calc_molarity(sample_concentration, sample_length, base_weight=base_weight)\n",
+ " sample_yield = molarity*sample_volume\n",
+ "\n",
+ " return sample_yield\n",
+ "\n",
+ " # END SOLUTION\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "03d9d1c8-f1d4-4da2-a78d-145134ab5f59",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Current reaction yield is 6950.5 fmols\n"
+ ]
+ }
+ ],
+ "source": [
+ "current_yield = calc_yield(50.6, 280, 25)\n",
+ "print(f'Current reaction yield is {current_yield:0.1f} fmols')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "1cef68ee-529b-4c25-b2a9-233d69939ddf",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Testing calc_yield(50.6, 280, 25) = 6950.5\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Testing calc_yield(50.6, 280, 25) = {calc_yield(50.6, 280, 25):0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "409348d9-f122-4d5e-8257-d0ad492c1d42",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Testing calc_yield(35, 263, 20, base_weight=320) = 26988.6\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Testing calc_yield(35, 263, 20, base_weight=320) = {calc_yield(35, 77, 19, base_weight=320):0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c7e25054-fc34-4393-975f-de23c2500122",
+ "metadata": {},
+ "source": [
+ "## Conclusion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22b14f97-975d-466b-a595-16c6866f86e5",
+ "metadata": {},
+ "source": [
+ "In this walkthrough we have discussed a number of ways to perform basic math in Python.\n",
+ "We also covered strategies to modularize processes into reusable functions.\n",
+ "This week we worked with a 'one number at a time' strategy, in the next module we will explore using tables to work with multiple samples at the same time."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb b/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb
new file mode 100644
index 0000000..b5ff137
--- /dev/null
+++ b/jupyter_execute/content/Module03/Module03_walkthrough_book.ipynb
@@ -0,0 +1,2996 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "da4cbf41",
+ "metadata": {},
+ "source": [
+ "# Module 03 Walkthrough\n",
+ "\n",
+ "Remember, all assignments are due before the synchronous session.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1b24089-7033-4965-ae63-de76bdf935a9",
+ "metadata": {},
+ "source": [
+ "## Introduction\n",
+ "\n",
+ "Get ready to dive into some data analysis as we explore the effectiveness of a hypothetical HIV treatment trial.\n",
+ "In this walkthrough, we have a dataset containing information from 30 people living with HIV (PLWH) who were randomly assigned to a treatment or control group.\n",
+ "After receiving the treatment, they stopped their ART and were monitored weekly for the number of weeks until their first \"detectable\" viral load was found.\n",
+ "We will use `Pandas` to analyze this data and evaluate the treatment's effectiveness.\n",
+ "By the end of this activity, you will be proficient in loading spreadsheet data into Python, creating derived columns in `DataFrames`, and using summary methods like sum, mean, and max.\n",
+ "Let's get started!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d728e12b",
+ "metadata": {},
+ "source": [
+ "## Learning Objectives\n",
+ "At the end of this learning activity you will be able to:\n",
+ " - Practice loading spreadsheet data into Python using `pandas`.\n",
+ " - Use Python methods to create derived columns in `pd.DataFrames`.\n",
+ " - Use `Pandas` summary methods like sum, mean, and max.\n",
+ " - Employ basic filtering and data extraction from `pandas`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28d532d9",
+ "metadata": {},
+ "source": [
+ "## Dataset Reference\n",
+ "\n",
+ "_File_: `trial_data.csv`\n",
+ "\n",
+ "_Columns_:\n",
+ "\n",
+ " - `age` : (years) Current age during the study. \n",
+ " - `age_initial_infection` : (years) Age at which the participant was initially infected.\n",
+ " - `initial_viral_load` : (copies/ul) The level of infection at the start of the study.\n",
+ " - `treatment` : (boolean) `True` for participant in the treatment group, `False` for those in the control group.\n",
+ " - `weeks_to_failure` : (weeks) Time from the treatment to the first week of uncontrolled viral load.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "621cd2ef",
+ "metadata": {},
+ "source": [
+ "## Imports"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "917b592b",
+ "metadata": {},
+ "source": [
+ "While _basic_ Python can do a lot, you have to do everything yourself.\n",
+ "The **real** power of Python is that you can `import` code that is written by others.\n",
+ "\n",
+ "For this course, we will use a common data science stack of interoperable tools centered around the [Numpy](https://numpy.org/).\n",
+ "\n",
+ "There are four that we will use regularly, two of which we'll cover today."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cfb0afb0-6fe4-47c7-b044-75144973797c",
+ "metadata": {},
+ "source": [
+ "### Numpy\n",
+ "\n",
+ "[Numpy](https://numpy.org/)\n",
+ "\n",
+ "A numerical Python library that contains incredibly fast arrays, mathematical functions, and other useful utilities.\n",
+ "\n",
+ "By convention, the community tends to _alias_ the long `numpy` as `np`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "d5cc7c1d-b078-4555-a578-f862584233c4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8152b253-6408-4f55-9505-672f597e23e7",
+ "metadata": {},
+ "source": [
+ "### Pandas\n",
+ "\n",
+ "[Pandas](https://pandas.pydata.org/)\n",
+ "\n",
+ "A libary that sits atop `numpy` and provides a _spreadsheet_ style object called a `DataFrame` along with a plethora of data sciecne utilities.\n",
+ "This is the main tool we will be using for data exploration.\n",
+ "\n",
+ "By convention, the community tends to _alias_ the long `pandas` as `pd`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "d5a223d0-d0d2-471a-b5b8-a63a700eda75",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "519ff9d5",
+ "metadata": {},
+ "source": [
+ "Nicely, it can read `csv` files for us."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "4492bb2c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 48 \n",
+ " 41 \n",
+ " 99 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " 56 \n",
+ " 41 \n",
+ " 59 \n",
+ " False \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 17 \n",
+ " 48 \n",
+ " 33 \n",
+ " 57 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " 51 \n",
+ " 42 \n",
+ " 25 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " 55 \n",
+ " 46 \n",
+ " 45 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 20 \n",
+ " 43 \n",
+ " 24 \n",
+ " 46 \n",
+ " False \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " 48 \n",
+ " 37 \n",
+ " 99 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " 51 \n",
+ " 27 \n",
+ " 36 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " 43 \n",
+ " 34 \n",
+ " 48 \n",
+ " True \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " 51 \n",
+ " 43 \n",
+ " 88 \n",
+ " False \n",
+ " 2 \n",
+ " \n",
+ " \n",
+ " 25 \n",
+ " 49 \n",
+ " 20 \n",
+ " 76 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 26 \n",
+ " 54 \n",
+ " 47 \n",
+ " 74 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " 45 \n",
+ " 25 \n",
+ " 87 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " 59 \n",
+ " 40 \n",
+ " 49 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "10 55 21 94 False \n",
+ "11 53 42 85 True \n",
+ "12 40 34 27 True \n",
+ "13 48 41 99 False \n",
+ "14 56 41 59 False \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "17 48 33 57 False \n",
+ "18 51 42 25 False \n",
+ "19 55 46 45 False \n",
+ "20 43 24 46 False \n",
+ "21 48 37 99 True \n",
+ "22 51 27 36 False \n",
+ "23 43 34 48 True \n",
+ "24 51 43 88 False \n",
+ "25 49 20 76 False \n",
+ "26 54 47 74 False \n",
+ "27 45 25 87 True \n",
+ "28 59 40 49 False \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure \n",
+ "0 3 \n",
+ "1 4 \n",
+ "2 6 \n",
+ "3 5 \n",
+ "4 5 \n",
+ "5 9 \n",
+ "6 4 \n",
+ "7 4 \n",
+ "8 5 \n",
+ "9 7 \n",
+ "10 3 \n",
+ "11 5 \n",
+ "12 8 \n",
+ "13 3 \n",
+ "14 6 \n",
+ "15 7 \n",
+ "16 8 \n",
+ "17 4 \n",
+ "18 2 \n",
+ "19 1 \n",
+ "20 1 \n",
+ "21 8 \n",
+ "22 2 \n",
+ "23 7 \n",
+ "24 2 \n",
+ "25 5 \n",
+ "26 5 \n",
+ "27 5 \n",
+ "28 5 \n",
+ "29 8 "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df = pd.read_csv('trial_data.csv')\n",
+ "\n",
+ "# If a `DataFrame` is the last line, it will display a nice summary\n",
+ "trial_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31664b42",
+ "metadata": {},
+ "source": [
+ "And we should see that this exactly matches the table we saw in Excel."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e653c16-ca8d-4641-ac5c-d81e549657ae",
+ "metadata": {},
+ "source": [
+ "The object we got back is called a `DataFrame`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "b8e9d2ba-70fa-4614-8dae-e70f0a2f0db1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(trial_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9124b71d-4468-42ad-b98a-4412d553f369",
+ "metadata": {},
+ "source": [
+ "If we only want to see a small version of the `DataFrame` we can use the `.head()` _method_."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "075dbccc-d1cf-4127-bd61-e15ddab0f2ce",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment weeks_to_failure\n",
+ "0 55 26 66 False 3\n",
+ "1 48 26 66 False 4\n",
+ "2 45 36 32 True 6\n",
+ "3 43 31 23 False 5\n",
+ "4 40 20 45 True 5"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "197cc06c-7490-4ea2-ab98-652c64a37c50",
+ "metadata": {},
+ "source": [
+ "## Acting on Columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "75de8d1e",
+ "metadata": {},
+ "source": [
+ "We can reference each column by name using square brackets `[]`.\n",
+ "For example: Extracting the `age` column like so:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "cacc125e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 55\n",
+ "1 48\n",
+ "2 45\n",
+ "3 43\n",
+ "4 40\n",
+ "5 42\n",
+ "6 55\n",
+ "7 56\n",
+ "8 59\n",
+ "9 51\n",
+ "10 55\n",
+ "11 53\n",
+ "12 40\n",
+ "13 48\n",
+ "14 56\n",
+ "15 53\n",
+ "16 57\n",
+ "17 48\n",
+ "18 51\n",
+ "19 55\n",
+ "20 43\n",
+ "21 48\n",
+ "22 51\n",
+ "23 43\n",
+ "24 51\n",
+ "25 49\n",
+ "26 54\n",
+ "27 45\n",
+ "28 59\n",
+ "29 51\n",
+ "Name: age, dtype: int64"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df['age']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c8d83ab1",
+ "metadata": {},
+ "source": [
+ "### Q1: Extract the `initial_viral_load` column ?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "f99e62ac",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "init_vl = trial_df['initial_viral_load'] # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "68ab0587-f8db-46fa-b758-2320a9ec6858",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "init_vl is a `pd.Series`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('init_vl is a `pd.Series`:', isinstance(init_vl, pd.Series))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "640c9a7e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "init_vl_sum = 1628\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'init_vl_sum = {init_vl.sum()}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2cac446b",
+ "metadata": {},
+ "source": [
+ "Once we can extract columns, we can start summarizing them."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "48ce947a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The mean age of the population is 50.1 yrs.\n"
+ ]
+ }
+ ],
+ "source": [
+ "age_col = trial_df['age']\n",
+ "age_mean = age_col.mean()\n",
+ "print(f'The mean age of the population is {age_mean:0.1f} yrs.')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "35eb614c",
+ "metadata": {},
+ "source": [
+ "Expressions can also be _chained_. \n",
+ "They are functionally the same, the only difference is aesthetic. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "1be80170",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The mean age of the population is 50.1 yrs, even when done on a single line.\n"
+ ]
+ }
+ ],
+ "source": [
+ "age_mean_short = trial_df['age'].mean()\n",
+ "print(f'The mean age of the population is {age_mean_short:0.1f} yrs, even when done on a single line.')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "73927199",
+ "metadata": {},
+ "source": [
+ "### Q2: Calculate the average `weeks_to_failure` for the whole population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "ba3fa20b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "average_weeks = trial_df['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "e6176369",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "average_weeks = 4.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'average_weeks = {average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8f948c7a-4e79-4083-9a76-ba7a040240c7",
+ "metadata": {},
+ "source": [
+ "We can also summarize an entire `DataFrame` with a single command."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "bd9ca277-1b6d-4d66-b000-b9c0e1973e38",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "age 50.133333\n",
+ "age_initial_infection 34.366667\n",
+ "initial_viral_load 54.266667\n",
+ "treatment 0.400000\n",
+ "weeks_to_failure 4.900000\n",
+ "dtype: float64"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bac8ff72-1f06-4ad1-a484-fac4126cf4a3",
+ "metadata": {},
+ "source": [
+ "In this case the summary went _down_ the columns and calculated a mean for each."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "679c12f1-e5bb-42d5-9dcb-7e7634853f67",
+ "metadata": {},
+ "source": [
+ "There are a number of other summarization _methods_.\n",
+ " - `max()`\n",
+ " - `min()`\n",
+ " - `mode()`\n",
+ " - `median()`\n",
+ " - `var()`\n",
+ " - `std()`\n",
+ " - `nunique()`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "11f7825b-21ba-4a24-9316-b91047dc17b6",
+ "metadata": {},
+ "source": [
+ "```{note}\n",
+ ":class: dropdown\n",
+ "Methods, are functions that are attached to an `object`.\n",
+ "They usually act on the object to provide a summary, perform a transformation, or otherwise utilize the information within the object.\n",
+ "In this case, these summarization methods utilize the information within the dataframe to summarize each column.\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "142a4c50-7ac3-4db8-b08b-b0ad4a075b14",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " weeks_to_failure \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " 30.000000 \n",
+ " \n",
+ " \n",
+ " mean \n",
+ " 50.133333 \n",
+ " 34.366667 \n",
+ " 54.266667 \n",
+ " 4.900000 \n",
+ " \n",
+ " \n",
+ " std \n",
+ " 5.569209 \n",
+ " 9.041984 \n",
+ " 24.070204 \n",
+ " 2.202663 \n",
+ " \n",
+ " \n",
+ " min \n",
+ " 40.000000 \n",
+ " 20.000000 \n",
+ " 22.000000 \n",
+ " 1.000000 \n",
+ " \n",
+ " \n",
+ " 25% \n",
+ " 45.750000 \n",
+ " 26.250000 \n",
+ " 36.500000 \n",
+ " 3.250000 \n",
+ " \n",
+ " \n",
+ " 50% \n",
+ " 51.000000 \n",
+ " 34.000000 \n",
+ " 48.500000 \n",
+ " 5.000000 \n",
+ " \n",
+ " \n",
+ " 75% \n",
+ " 55.000000 \n",
+ " 41.750000 \n",
+ " 72.000000 \n",
+ " 6.750000 \n",
+ " \n",
+ " \n",
+ " max \n",
+ " 59.000000 \n",
+ " 50.000000 \n",
+ " 99.000000 \n",
+ " 9.000000 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load weeks_to_failure\n",
+ "count 30.000000 30.000000 30.000000 30.000000\n",
+ "mean 50.133333 34.366667 54.266667 4.900000\n",
+ "std 5.569209 9.041984 24.070204 2.202663\n",
+ "min 40.000000 20.000000 22.000000 1.000000\n",
+ "25% 45.750000 26.250000 36.500000 3.250000\n",
+ "50% 51.000000 34.000000 48.500000 5.000000\n",
+ "75% 55.000000 41.750000 72.000000 6.750000\n",
+ "max 59.000000 50.000000 99.000000 9.000000"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5b40fdd",
+ "metadata": {},
+ "source": [
+ "Selecting columns is nice.\n",
+ "We can also add a new column based on another one.\n",
+ "\n",
+ "In HIV research it is often important to know how long someone has been living with HIV.\n",
+ "However, this dataset contains their current age, and their age at infection.\n",
+ "We can use these two to calculate the length."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "7c162199",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 "
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# first make a new `Series`\n",
+ "years_infected = trial_df['age'] - trial_df['age_initial_infection']\n",
+ "\n",
+ "# Then add that series into the table\n",
+ "trial_df['years_infected'] = years_infected\n",
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "69cd190f-6c41-48e5-805c-5d3bde23a510",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 "
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Alternatively\n",
+ "trial_df['years_infected'] = trial_df['age'] - trial_df['age_initial_infection']\n",
+ "trial_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3d5dc837-650c-44db-8aab-8675491b8049",
+ "metadata": {},
+ "source": [
+ "## Acting on Rows"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f2dac9e-00f7-4442-8a36-20631e73f8f6",
+ "metadata": {},
+ "source": [
+ "### Indexing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c38315cd",
+ "metadata": {},
+ "source": [
+ "When selecting rows, or rows and columns, we need to use the `.loc` attribute of the `DataFrame`.\n",
+ "\n",
+ "We can select by row number."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "85d1364b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "age 55\n",
+ "age_initial_infection 26\n",
+ "initial_viral_load 66\n",
+ "treatment False\n",
+ "weeks_to_failure 3\n",
+ "years_infected 29\n",
+ "Name: 0, dtype: object"
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.loc[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "39614ebd-ee4b-46ab-9619-40b14ac66418",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " 34 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "10 55 21 94 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "10 3 34 "
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can use a : to indicate a range.\n",
+ "trial_df.loc[0:10]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "87473eee-abd9-4ca7-9f85-420103cf22c0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 48 \n",
+ " 41 \n",
+ " 99 \n",
+ " False \n",
+ " 3 \n",
+ " 7 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "5 42 20 57 True \n",
+ "7 56 50 22 False \n",
+ "13 48 41 99 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "5 9 22 \n",
+ "7 4 6 \n",
+ "13 3 7 "
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can provide an arbitrary list\n",
+ "trial_df.loc[[0, 5, 7, 13]]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "24b190cc-d554-46ea-b7c3-ada85f89ede5",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " initial_viral_load \n",
+ " age \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 66 \n",
+ " 55 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 57 \n",
+ " 42 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 22 \n",
+ " 56 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " 99 \n",
+ " 48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " initial_viral_load age\n",
+ "0 66 55\n",
+ "5 57 42\n",
+ "7 22 56\n",
+ "13 99 48"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# We can also select columns at the same time.\n",
+ "trial_df.loc[[0, 5, 7, 13], ['initial_viral_load', 'age']]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7110e944-753f-4e99-a6d4-c1a62c84ce40",
+ "metadata": {},
+ "source": [
+ "### Boolean Indexing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2327261a-e23e-4f80-ab9c-61d9d593769a",
+ "metadata": {},
+ "source": [
+ "If we do not know the row number ahead of time, but instead want to select rows based on their values, we can using boolean indexing.\n",
+ "In this stragey we create a new `pd.Series` of True/False values where True corresponds to the ones we want."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04de3217-aaea-445d-91c1-f55329913752",
+ "metadata": {},
+ "source": [
+ "Start by finding all people over 50 years old."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "0ddab93f-c9b6-4caa-b47d-37325c748b76",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 "
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "age_mask = trial_df['age'] > 50\n",
+ "aged_samples = trial_df.loc[age_mask]\n",
+ "aged_samples.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31e45076-4226-4f85-9731-21502452138f",
+ "metadata": {},
+ "source": [
+ "```{note}\n",
+ ":class: dropdown\n",
+ "I often use the suffix `_mask` when I create boolean indexes.\n",
+ "It is not required, but utilizing naming conventions makes your code easier to understand by yourself and others.\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f82b318d-5429-4c16-aff9-1b8fb32d35db",
+ "metadata": {},
+ "source": [
+ "Now, if we also wanted to split by the initial_viral_load we might do:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "e3948cff-b118-4fa2-bb49-541567d43404",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "high_vl_mask = trial_df['initial_viral_load'] > 50"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "467ede0c-c706-456e-8cb8-f6b257cdbf86",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " 55 \n",
+ " 21 \n",
+ " 94 \n",
+ " False \n",
+ " 3 \n",
+ " 34 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " 56 \n",
+ " 41 \n",
+ " 59 \n",
+ " False \n",
+ " 6 \n",
+ " 15 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " 51 \n",
+ " 43 \n",
+ " 88 \n",
+ " False \n",
+ " 2 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "10 55 21 94 False \n",
+ "11 53 42 85 True \n",
+ "14 56 41 59 False \n",
+ "24 51 43 88 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "10 3 34 \n",
+ "11 5 11 \n",
+ "14 6 15 \n",
+ "24 2 8 "
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "aged_high_vl = trial_df.loc[age_mask & high_vl_mask]\n",
+ "aged_high_vl.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "7e70c0ea-31ae-4315-81e7-080229ed1b6e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "15 53 47 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "15 7 6 "
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# ~ can be used to say \"not\"\n",
+ "aged_low_vl = trial_df.loc[age_mask & ~high_vl_mask]\n",
+ "aged_low_vl.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "45027741-6266-4ea5-86ce-fe20ef65baa3",
+ "metadata": {},
+ "source": [
+ "### Q3: Calculate the average weeks to failure for the treated population?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "5e318b06-4063-4d00-b42a-c13c0b9b55d4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "treated_mask = trial_df['treatment'] == True # SOLUTION NO PROMPT\n",
+ "treated_average_weeks = trial_df.loc[treated_mask, 'weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "dd141501-a148-4168-b9bc-3448e7e60028",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks = 6.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0673dd84-d614-4c05-afcd-941163c57608",
+ "metadata": {},
+ "source": [
+ "Utilizing boolean indexing you can express _any_ algorithmic row selecting strategy.\n",
+ "This can even include comparisons between rows, for example if there were multiple rows of the same sample.\n",
+ "We will cover these strategies later in the course."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c212312f-58e6-4354-8bb3-94df1bc669f2",
+ "metadata": {},
+ "source": [
+ "Sometimes, our searches are simple.\n",
+ "Pandas also includes another method for indexing rows called `.query()` for these purposes."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b87435e-b89c-4cf5-be6b-526faf8469fd",
+ "metadata": {},
+ "source": [
+ "### Querying"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1abf62a",
+ "metadata": {},
+ "source": [
+ "`.query()` is an interface that facilitates simple queries qith a few specific limitations:\n",
+ " - It can only use the information present in the row.\n",
+ " - It can only work on one row at a time.\n",
+ " - Column headers cannot contain spaces, dots, dashes, commas, or emoji."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e6ea84aa-edc6-40e8-aebd-e8eedbfd59e8",
+ "metadata": {},
+ "source": [
+ "Our questions on this dataset easily fit within those constraints."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "19e82c53",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "9 51 30 49 True \n",
+ "11 53 42 85 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "9 7 21 \n",
+ "11 5 11 "
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# All treatment rows\n",
+ "trial_df.query('treatment == True').head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "c2ac06a0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 55 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 3 \n",
+ " 29 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 48 \n",
+ " 26 \n",
+ " 66 \n",
+ " False \n",
+ " 4 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "0 55 26 66 False \n",
+ "1 48 26 66 False \n",
+ "3 43 31 23 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "0 3 29 \n",
+ "1 4 22 \n",
+ "3 5 12 \n",
+ "6 4 24 \n",
+ "7 4 6 "
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('treatment == False').head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2d7c3caf",
+ "metadata": {},
+ "source": [
+ "You can also make them more complex."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "id": "7a4fa71b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 40 \n",
+ " 20 \n",
+ " 45 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 42 \n",
+ " 20 \n",
+ " 57 \n",
+ " True \n",
+ " 9 \n",
+ " 22 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " 53 \n",
+ " 42 \n",
+ " 85 \n",
+ " True \n",
+ " 5 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " 16 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " 48 \n",
+ " 37 \n",
+ " 99 \n",
+ " True \n",
+ " 8 \n",
+ " 11 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " 43 \n",
+ " 34 \n",
+ " 48 \n",
+ " True \n",
+ " 7 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " 45 \n",
+ " 25 \n",
+ " 87 \n",
+ " True \n",
+ " 5 \n",
+ " 20 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "4 40 20 45 True \n",
+ "5 42 20 57 True \n",
+ "9 51 30 49 True \n",
+ "11 53 42 85 True \n",
+ "12 40 34 27 True \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "21 48 37 99 True \n",
+ "23 43 34 48 True \n",
+ "27 45 25 87 True \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "4 5 20 \n",
+ "5 9 22 \n",
+ "9 7 21 \n",
+ "11 5 11 \n",
+ "12 8 6 \n",
+ "15 7 6 \n",
+ "16 8 16 \n",
+ "21 8 11 \n",
+ "23 7 9 \n",
+ "27 5 20 \n",
+ "29 8 8 "
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('age > 33 & treatment == True')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8b2af46a",
+ "metadata": {},
+ "source": [
+ "This statement doesn't make a \"biological sense\", but it is an example of a valid comparison."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "id": "af1fd110",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " age \n",
+ " age_initial_infection \n",
+ " initial_viral_load \n",
+ " treatment \n",
+ " weeks_to_failure \n",
+ " years_infected \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 45 \n",
+ " 36 \n",
+ " 32 \n",
+ " True \n",
+ " 6 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 43 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 5 \n",
+ " 12 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 55 \n",
+ " 31 \n",
+ " 23 \n",
+ " False \n",
+ " 4 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " 56 \n",
+ " 50 \n",
+ " 22 \n",
+ " False \n",
+ " 4 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " 59 \n",
+ " 33 \n",
+ " 33 \n",
+ " False \n",
+ " 5 \n",
+ " 26 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " 51 \n",
+ " 30 \n",
+ " 49 \n",
+ " True \n",
+ " 7 \n",
+ " 21 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " 40 \n",
+ " 34 \n",
+ " 27 \n",
+ " True \n",
+ " 8 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " 53 \n",
+ " 47 \n",
+ " 38 \n",
+ " True \n",
+ " 7 \n",
+ " 6 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " 57 \n",
+ " 41 \n",
+ " 42 \n",
+ " True \n",
+ " 8 \n",
+ " 16 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " 51 \n",
+ " 42 \n",
+ " 25 \n",
+ " False \n",
+ " 2 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " 55 \n",
+ " 46 \n",
+ " 45 \n",
+ " False \n",
+ " 1 \n",
+ " 9 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " 51 \n",
+ " 27 \n",
+ " 36 \n",
+ " False \n",
+ " 2 \n",
+ " 24 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " 59 \n",
+ " 40 \n",
+ " 49 \n",
+ " False \n",
+ " 5 \n",
+ " 19 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " 51 \n",
+ " 43 \n",
+ " 38 \n",
+ " True \n",
+ " 8 \n",
+ " 8 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " age age_initial_infection initial_viral_load treatment \\\n",
+ "2 45 36 32 True \n",
+ "3 43 31 23 False \n",
+ "6 55 31 23 False \n",
+ "7 56 50 22 False \n",
+ "8 59 33 33 False \n",
+ "9 51 30 49 True \n",
+ "12 40 34 27 True \n",
+ "15 53 47 38 True \n",
+ "16 57 41 42 True \n",
+ "18 51 42 25 False \n",
+ "19 55 46 45 False \n",
+ "22 51 27 36 False \n",
+ "28 59 40 49 False \n",
+ "29 51 43 38 True \n",
+ "\n",
+ " weeks_to_failure years_infected \n",
+ "2 6 9 \n",
+ "3 5 12 \n",
+ "6 4 24 \n",
+ "7 4 6 \n",
+ "8 5 26 \n",
+ "9 7 21 \n",
+ "12 8 6 \n",
+ "15 7 6 \n",
+ "16 8 16 \n",
+ "18 2 9 \n",
+ "19 1 9 \n",
+ "22 2 24 \n",
+ "28 5 19 \n",
+ "29 8 8 "
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "trial_df.query('age >= initial_viral_load')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e68592de",
+ "metadata": {},
+ "source": [
+ "### Q4: Calculate the average `weeks_to_failure` for the untreated population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "id": "cfcdf2f7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# BEGIN SOLUTION NO PROMPT\n",
+ "\n",
+ "wanted_samples = trial_df.query('treatment == False')\n",
+ "\n",
+ "# END SOLUTION\n",
+ "\n",
+ "untreated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "5ea72ed6-01ee-48d4-800b-5d145f3517ad",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Untreated participants took 3.6 weeks to rebound.\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Untreated participants took {untreated_average_weeks:0.1f} weeks to rebound.')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "7f804ed1-a9c2-41a1-bc31-58035c91e967",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "untreated_average_weeks is a `float`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('untreated_average_weeks is a `float`:', isinstance(untreated_average_weeks, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "8f8d7324",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "untreated_average_weeks = 3.6\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'untreated_average_weeks = {untreated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e87cce47",
+ "metadata": {},
+ "source": [
+ "### Q4: Calculate the average `weeks_to_failure` for the treated population?\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "2742d0bb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# BEGIN SOLUTION NO PROMPT\n",
+ "\n",
+ "wanted_samples = trial_df.query('treatment == True')\n",
+ "\n",
+ "# END SOLUTION\n",
+ "\n",
+ "treated_average_weeks = wanted_samples['weeks_to_failure'].mean() # SOLUTION"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "c6b2bfa0-8673-4666-adcd-31a1a574fd79",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Treated patients took 6.9 weeks to rebound.\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'Treated patients took {treated_average_weeks:0.1f} weeks to rebound.')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "79786fec-0e7c-461e-9a65-0fac65d6870a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks is a `float`: True\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('treated_average_weeks is a `float`:', isinstance(treated_average_weeks, float))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "id": "ea73783c",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "treated_average_weeks = 6.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'treated_average_weeks = {treated_average_weeks:0.1f}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5af4b494",
+ "metadata": {},
+ "source": [
+ "# Conclusion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a2d1c3b8",
+ "metadata": {},
+ "source": [
+ "We can see that this treatment extended the average time off ART from ~3 weeks to ~7 weeks.\n",
+ "While not a complete cure, any incremental step is useful progress in the elimination of HIV.\n",
+ "\n",
+ "In the lab you will use similar techniques to explore whether other factors in this dataset impact the results.\n",
+ "In future weeks we will explore statistical techniques to understand whether this difference is due to chance, or due to the effect of the treatment."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "493f93a7",
+ "metadata": {},
+ "source": [
+ "---------------------------------------------"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/makefile b/makefile
new file mode 100644
index 0000000..4f50e3c
--- /dev/null
+++ b/makefile
@@ -0,0 +1,7 @@
+
+update_book:
+ cp -r ../applied_biostats/_book/book/_build/html/* .
+ cp -r ../applied_biostats/_book/book/_build/jupyter_execute .
+
+#deploy_book:
+
diff --git a/objects.inv b/objects.inv
index 557396d..e46bdb3 100644
Binary files a/objects.inv and b/objects.inv differ
diff --git a/search.html b/search.html
index b979e3f..f5b4d53 100644
--- a/search.html
+++ b/search.html
@@ -195,6 +195,11 @@
Walkthrough
Nanopore Sequencing
Dilution calculations
+
+
+Module 3: DataFrames
diff --git a/searchindex.js b/searchindex.js
index c5e4a54..3c045aa 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"About this book": [[8, "about-this-book"]], "Calculate a aerobic target heart rate?": [[1, "calculate-a-aerobic-target-heart-rate"]], "Cells": [[2, "cells"]], "Coding expectations": [[1, "coding-expectations"]], "Conclusion": [[4, "conclusion"]], "Dilution calculations": [[5, "dilution-calculations"]], "Don\u2019t be afraid to Restart & Run all": [[2, null]], "Functions": [[4, "functions"]], "Introduction": [[1, "introduction"], [9, "introduction"]], "Jupyter Notebooks": [[2, "jupyter-notebooks"]], "Learning Objectives": [[4, "learning-objectives"]], "Linting through color": [[4, "linting-through-color"]], "Markdown": [[1, "markdown"]], "Module 1: Hello World": [[0, "module-1-hello-world"]], "Module 2: Simple calculations": [[3, "module-2-simple-calculations"]], "Nanopore Sequencing": [[6, "nanopore-sequencing"]], "Notebook basics": [[2, "notebook-basics"]], "Otter Grader": [[1, "otter-grader"]], "Programmatic Arithmetic in Python": [[4, "programmatic-arithmetic-in-python"]], "Q1: Calculate the molarity of the sample": [[4, "q1-calculate-the-molarity-of-the-sample"]], "Q1: Using the information above, calculate the subject\u2019s heart rate reserve.": [[1, "q1-using-the-information-above-calculate-the-subject-s-heart-rate-reserve"]], "Q2: Calculate the amount of sample to add.": [[4, "q2-calculate-the-amount-of-sample-to-add"]], "Q3: Describing the reaction yield": [[4, "q3-describing-the-reaction-yield"]], "Q3: Using the information above, calculate the upper limit of the subject\u2019s target heart rate zone.": [[1, "q3-using-the-information-above-calculate-the-upper-limit-of-the-subject-s-target-heart-rate-zone"]], "Q4: Write a function which calculates the reaction yield": [[4, "q4-write-a-function-which-calculates-the-reaction-yield"]], "Quantitative Reasoning in Biology": [[7, "quantitative-reasoning-in-biology"]], "Quick introduction on cells and blocks": [[1, "quick-introduction-on-cells-and-blocks"]], "Session": [[2, "session"]], "Submissions": [[1, "submissions"]], "The Problem": [[4, "the-problem"]], "Try me": [[1, "try-me"]], "Walkthrough": [[1, "walkthrough"], [4, "walkthrough"], [4, "id1"]], "What is the template weight?": [[4, "what-is-the-template-weight"]], "Why Google Colab": [[1, "why-google-colab"]], "Why Python": [[1, "why-python"]], "f-strings": [[4, "f-strings"]]}, "docnames": ["content/Module01/Module01_book", "content/Module01/Module01_walkthrough_book", "content/Module01/notebook_actions", "content/Module02/Module02_book", "content/Module02/Module02_walkthrough_book", "content/Module02/dilution_calculations", "content/Module02/nanopore_description", "content/book_index", "content/misc/about_this_book", "content/misc/book_intro"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["content/Module01/Module01_book.md", "content/Module01/Module01_walkthrough_book.ipynb", "content/Module01/notebook_actions.md", "content/Module02/Module02_book.md", "content/Module02/Module02_walkthrough_book.ipynb", "content/Module02/dilution_calculations.md", "content/Module02/nanopore_description.md", "content/book_index.md", "content/misc/about_this_book.md", "content/misc/book_intro.md"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 4, 6], "0": [1, 4, 7], "02197802197804": 4, "022": 4, "03": 1, "0f": 4, "1": [1, 4], "10": 4, "100": [1, 4], "1000": 4, "12": [1, 2, 4], "1231231": 4, "13": 1, "15": 4, "150": 4, "1680": 4, "182000": 4, "19": 4, "1e": 4, "1f": 4, "1st": 7, "2": [1, 4], "20": 4, "200": 4, "2007": 1, "2016": 4, "21": 2, "220": 1, "23": [1, 4], "25": 4, "263": 4, "26988": 4, "278": 4, "280": 4, "2f": 4, "3": [1, 4], "320": 4, "34": 1, "35": 4, "4": [1, 7], "5": [1, 4], "50": 4, "517": 8, "6": 4, "60": 1, "650": 4, "6950": 4, "6951": 4, "7": 1, "70": 1, "72": 4, "77": 4, "8": 4, "822714681440445": 1, "827": 4, "85": 1, "86": 1, "9": [1, 4], "A": [1, 2, 4], "And": [1, 5], "As": [2, 4], "At": [1, 4], "BY": 7, "But": 4, "By": 1, "For": [1, 2], "If": [1, 4, 5, 7], "In": [1, 2, 4], "It": [1, 2, 4, 8], "NO": 4, "NOT": 1, "Not": 4, "On": 2, "Or": [1, 4], "That": 1, "The": [1, 7], "Then": [1, 4], "There": [1, 2, 4], "These": [1, 2, 4], "Will": 7, "abl": [1, 4], "about": [1, 2, 4], "abov": 4, "abreast": 1, "absorb": 4, "abstract": 1, "access": 2, "accomplish": 1, "accordingli": 2, "across": 2, "act": 8, "action": 2, "activ": [1, 4], "ad": 4, "adapt": 4, "add": 1, "addit": 4, "administr": 1, "adult": 1, "advantag": 1, "after": [1, 4], "ag": [1, 4], "again": 4, "aim": 1, "algorithm": 1, "all": [1, 4], "allow": [1, 2, 4, 8], "alreadi": 1, "also": [1, 4, 8], "alter": 4, "alwai": 2, "amplicon_length": 4, "amplicon_weight": 4, "an": [1, 2, 4, 8], "anaconda": 1, "analysi": [1, 2, 4], "analyz": 1, "ani": [1, 2], "annot": 8, "anoth": [1, 4], "answer": [1, 4], "anyth": [1, 4], "anywher": 4, "appli": [7, 8], "ar": [1, 2, 4, 7], "arduou": 1, "arg1": 4, "arg2": 4, "around": 4, "assert": 1, "assign": [1, 4], "assum": 4, "attach": 4, "attribut": 7, "autom": 4, "avail": 1, "averag": 1, "awai": 1, "await": 2, "back": 1, "background": [2, 6, 8], "bake": 4, "barcod": 4, "base": [1, 2, 4], "base_weight": 4, "basepair": 4, "basic": [0, 1, 3, 4], "batteri": 1, "bblearn": 1, "beat": 1, "becaus": 1, "becom": [1, 2], "been": [1, 4, 5], "befor": [1, 2, 4], "begin": 4, "being": [1, 2], "below": 1, "berklei": 1, "better": 1, "between": [1, 4], "biolog": [1, 8], "biologi": [1, 4], "biostatist": [7, 8], "black": 4, "block": 4, "bmi": 1, "bold": 1, "book": [4, 7, 9], "both": [1, 2], "bp": [1, 4], "brace": 4, "break": 4, "brief": [1, 4], "briefli": 6, "browser": [1, 2], "bullet": 1, "button": 2, "bypass": 1, "calc_molar": 4, "calc_yield": 4, "call": [1, 2, 4], "can": [1, 2, 4, 5], "cannot": [1, 2], "captur": 1, "carri": 1, "case": 2, "cc": 7, "cell": 4, "chang": [1, 4], "chapter": [0, 1, 3], "check": 1, "check_al": 1, "chemic": 4, "circa": 4, "class": [1, 2], "click": 1, "clinic": 1, "cloud": 2, "code": [2, 4], "colab": [0, 2], "collect": 1, "colleg": 7, "com": [1, 6], "come": [1, 2, 4], "command": 2, "common": [1, 4, 7], "commun": 4, "compact": 4, "compani": 2, "companion": [7, 8], "compat": 2, "complet": [1, 2], "complex": [1, 4], "compris": 1, "comput": [1, 2, 4], "concentr": 4, "concept": [1, 2], "condit": 1, "connect": 2, "consid": 1, "consumpt": 7, "contact": 7, "contain": [2, 4], "content": [2, 5, 7, 8], "context": [4, 8], "continu": 4, "convers": 5, "convert": 4, "copi": 4, "corner": 1, "correct": 4, "correctli": 1, "could": 1, "count": [1, 4], "cours": [1, 2, 4, 7, 8], "cover": [1, 2, 4], "covid": 4, "creat": [1, 2, 4], "creativ": 7, "critic": [1, 4], "ctrl": 2, "curli": 4, "current": 4, "current_yield": 4, "dai": 1, "dampier": 7, "data": [1, 2, 4], "dataset": [1, 2, 8], "debug": 4, "decis": 1, "def": 4, "defin": 4, "delet": 2, "depth": 4, "describ": [1, 6], "descript": 4, "design": 4, "desir": 1, "detail": 4, "determin": 1, "develop": [1, 7], "devic": 4, "didn": 1, "differ": 4, "difficult": [1, 2], "difficulti": 1, "dilut": 4, "disconnect": 2, "discuss": [3, 4], "displai": 4, "dna": [4, 5], "dna_conc": 4, "dna_molar": 4, "dna_weight": 4, "dna_yield": 4, "dna_yield_descript": 4, "do": [1, 3, 4], "dollar": 4, "done": [2, 4, 5], "doubl": [1, 4], "down": [1, 4], "download": [1, 2], "dozen": 1, "drastic": 4, "drexel": [4, 7, 8], "dropdown": 1, "due": [1, 4], "dure": [1, 4], "dynam": 4, "each": [1, 2, 4], "easi": 1, "easier": 4, "edit": [1, 2, 7], "educ": 1, "effect": 2, "either": 2, "emerg": 1, "empti": 2, "encod": 2, "end": [1, 4], "endswith": 1, "ensur": [1, 2, 4], "enter": 2, "entir": 4, "environ": 1, "enzymat": 4, "equat": 4, "error": 4, "estim": 1, "even": [1, 4], "everyon": 1, "everyth": [2, 4], "evolv": 8, "exampl": 1, "excel": 1, "except": 1, "execut": [1, 2], "exercis": 1, "exist": 1, "expand": 8, "experi": [1, 4], "explain": 4, "explan": [4, 5], "explanatori": 4, "explor": 4, "explos": 1, "express": [1, 4], "extens": 2, "extra": 1, "f": 1, "face": 1, "fact": 1, "familiar": 1, "featur": 1, "fempto": 4, "femtomol": 4, "field": 1, "figur": 1, "file": [1, 2], "filterwarn": 1, "find": 1, "finish": 1, "first": [1, 2], "fix": [1, 2], "flavor": 1, "float": 4, "fmol": 4, "fmole": 4, "focus": 5, "follow": [1, 7], "footnot": 1, "form": 1, "format": [1, 2, 4], "found": 1, "frame": 1, "free": [1, 2, 7], "freeli": 1, "fresh": 2, "freshli": 2, "from": [1, 2, 4], "full": 1, "function": [1, 8], "function_nam": 4, "further": 4, "futur": 1, "g": 4, "gener": [1, 4], "get": [1, 2, 4], "give": [1, 2], "go": 1, "googl": [0, 2], "goolg": 1, "grace": 1, "grade": 1, "green": 4, "guru": 4, "ha": [1, 4, 5], "had": 4, "hand": 1, "have": [1, 2, 4, 6, 7], "healthi": 1, "heart_rate_reserv": 1, "height": 1, "hello": 1, "help": [1, 4], "her": 1, "here": 1, "hint": 4, "hipaa": 2, "hit": 1, "hold": 1, "hour": 2, "how": [1, 3, 4], "howev": [1, 2, 4], "hrr": 1, "html": 1, "http": [1, 6], "hundr": 1, "hurdl": 1, "hyperlink": 1, "hypothesi": 8, "i": [1, 2, 5, 7], "ideal": 1, "ignor": 1, "imag": 2, "imbalanc": 4, "immedi": 4, "immunologi": 8, "import": [1, 2], "importerror": 1, "includ": 1, "incorrect": 2, "incred": 1, "independ": 2, "indepth": 4, "individu": 4, "inferenti": 1, "inferentialthink": 1, "inform": 2, "ingredi": 4, "initi": [1, 4], "input": 4, "insid": 4, "instal": [1, 2], "instead": [1, 4], "instruct": 1, "insurmount": 1, "integ": 4, "intens": 1, "interact": [1, 2, 8], "interfac": 1, "intern": 7, "interpret": 2, "introduc": 0, "ipynb": 1, "isinst": 4, "isn": 1, "issu": [1, 2], "italic": 1, "item": 4, "itself": 2, "julia": 2, "jump": 4, "jupyt": 1, "jupyterlab": 1, "just": [1, 2, 4], "kei": 4, "kernel": 1, "kg": 1, "know": [2, 4], "kwarg1": 4, "kwarg2": 4, "lab": 4, "languag": [1, 2], "larg": [1, 2], "larger": 1, "last": 1, "lastli": [1, 8], "later": 1, "launch": 1, "lead": 4, "learn": 1, "left": 1, "len": 1, "less": 2, "let": 1, "licens": 7, "ligat": 4, "light": 4, "like": [1, 2, 4, 7], "limit": [2, 4], "line": 1, "link": [1, 2, 5], "list": [1, 4], "listdir": 1, "ll": [1, 2, 4], "load": [1, 2], "log": 2, "look": [1, 4, 5], "loop": 1, "lot": 4, "m": 2, "make": 4, "manag": 4, "mani": [1, 2, 4], "manual": 4, "markdown": 2, "mass": 4, "materi": 4, "math": [1, 3, 4], "maximum": 1, "mayb": 2, "mayo": 1, "me": 7, "measur": 4, "medicin": 7, "menu": [1, 2], "meter": 1, "method": 1, "microbiologi": 8, "miim": 8, "million": 4, "minion": 4, "minut": 1, "mircolit": 4, "mistak": [1, 2], "modif": 1, "modul": 4, "modular": 4, "mole": 4, "molecul": 4, "molecular": 4, "more": [1, 2, 4, 5], "morn": 1, "most": [1, 2], "motor": 4, "move": 1, "much": 4, "multipl": [1, 4], "multipli": 1, "must": 1, "my": 4, "name": 1, "nano": 4, "nanopor": 4, "natur": 1, "nc": 7, "nd": 7, "nearest": 4, "neb": 5, "need": [0, 1, 2, 4, 5], "never": 2, "new": [2, 4], "new_concentr": 4, "new_paragon_molar": 4, "newest": 1, "next": [1, 4], "ng": 4, "nice": 1, "noderiv": 7, "noncommerci": 7, "normal": [1, 4], "notebook": [1, 4], "notepad": [1, 2], "notic": [1, 4], "now": [1, 4], "nucleotid": 4, "number": [1, 4], "numer": 4, "o": 1, "obtain": 4, "ocassion": 2, "off": [1, 4], "often": [2, 4], "oftentim": 2, "okai": 2, "old": [1, 4], "onc": [1, 2, 4], "one": [1, 2, 4], "onli": [1, 2], "onlin": [1, 4], "open": [1, 2], "oper": 4, "option": 2, "orang": 4, "order": [1, 2], "organ": 4, "origin": 2, "other": [1, 2, 4], "our": [1, 2, 4], "out": 1, "outbreak": 4, "output": 1, "over": 4, "overhang": 4, "overwrit": 2, "own": [1, 2], "packag": 1, "page": 6, "paragon": 4, "paragon_molar": 4, "part": 4, "particular": 1, "past": [2, 4, 8], "path": 1, "pcr": 4, "peopl": 1, "per": [1, 4], "perfect": 4, "perfectli": 1, "perform": 4, "phrase": 1, "pip": 1, "place": 8, "plai": 2, "plain": [1, 2], "plan": 2, "plate": 4, "plu": 1, "point": 4, "pool": 4, "pose": 1, "possibl": [2, 4], "post": 4, "potenti": 2, "power": [1, 2], "precis": 1, "preload": 1, "prep": 4, "prepar": 4, "prescrib": 4, "previou": 1, "print": [1, 4], "prism": 1, "problem": [1, 2, 8], "process": [1, 4, 6], "program": [1, 2], "progress": 1, "project": 4, "prompt": 4, "prone": 4, "proper": 8, "protect": 2, "protein": 4, "protocol": 4, "provid": 1, "purpos": [1, 2, 4], "put": 1, "python": [2, 3], "q": 1, "qubit": 4, "question": 1, "quick": 5, "r": 2, "rang": 1, "rapid": 4, "rcp85jhlmni": 6, "re": [1, 2, 4], "read": 4, "readi": 2, "reagent": 4, "realli": 4, "reason": 4, "recent": [1, 2, 4], "recommend": 4, "refer": [4, 8], "refresh": 5, "relat": 4, "relev": 2, "rememb": [1, 2, 4], "remov": 4, "render": [2, 4], "repeat": 4, "repetit": 4, "replac": 4, "repres": 4, "reproduc": 4, "requir": [1, 4], "research": 1, "respond": 2, "rest": 1, "restart": 1, "resting_heart_r": 1, "result": [1, 4], "return": [1, 4], "reusabl": 4, "review": [4, 5], "right": [1, 4], "rigor": 1, "rna": 4, "rna_paragon_molar": 4, "round": 4, "run": 1, "runtim": 2, "said": 1, "same": [1, 4], "sample_concentr": 4, "sample_length": 4, "sample_volum": 4, "sample_yield": 4, "save": 1, "scienc": 1, "screen": 1, "searchabl": 8, "second": [1, 2], "secreti": 2, "section": 2, "secur": 2, "see": [1, 4], "self": 4, "send": 2, "senior": 4, "sensit": 2, "sent": 2, "sentenc": 4, "sequenc": 4, "seri": [1, 2, 4], "servic": 2, "session": [1, 4], "set": 1, "setup": 1, "share": 2, "shift": [1, 2], "short": 4, "shortcut": 2, "should": [1, 2, 4], "similar": 2, "simpl": 1, "sinc": [1, 5], "singl": 4, "size": 4, "skeleton": 1, "skill": 1, "small": 1, "smaller": 4, "so": [2, 4], "softwar": [1, 2], "solut": [1, 4], "solv": 1, "some": [1, 2, 4, 5, 6], "someth": [1, 4], "sometim": 2, "somewher": 1, "space": 4, "spawn": 1, "special": 2, "speedup": 4, "spin": 1, "spreadsheet": 1, "stai": 1, "start": [1, 2, 4], "statement": 4, "statist": 1, "step": 1, "still": 1, "stock": 4, "str": 4, "strand": 4, "strategi": 4, "structur": 1, "studi": 1, "stumbl": 1, "sublist": 1, "submiss": 2, "submit": 1, "subtract": 1, "success": 0, "suggest": 1, "summar": [1, 4], "synchron": [1, 4], "syntax": [1, 4], "system": [1, 2], "t": [1, 4], "tabl": [1, 2, 4], "tag": 2, "take": [1, 2, 4], "talk": [1, 2], "task": [1, 4], "taught": 1, "teach": 1, "techniqu": 1, "technologi": 1, "tediou": 4, "tell": [1, 4], "template_weight": 4, "tend": 5, "test": [1, 2, 4], "tests_dir": 1, "text": [1, 2, 4], "textbook": [1, 4, 7], "than": 2, "thei": 4, "them": [1, 2], "themselv": 2, "therebi": 1, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 9], "thing": [1, 2, 4], "think": [1, 4], "those": [1, 4], "through": 1, "throughout": 1, "time": [1, 4], "too": [1, 2], "tool": [0, 1], "top": 1, "topic": 1, "total": 4, "track": 4, "transpar": 4, "troubl": 1, "true": [1, 4], "try": 4, "tube": 4, "twice": 1, "two": [1, 2], "type": [1, 2, 4], "u": [1, 4], "uc": 1, "ul": 4, "under": 7, "underneath": 1, "understand": [2, 4], "undo": 2, "uniqu": 4, "unit": [4, 5], "univers": 7, "unless": 4, "unwieldi": 1, "unzip": 1, "up": [1, 4], "upload": [1, 2], "upon": 8, "upper_target_zon": 1, "us": [2, 3, 4, 5, 7], "usb": 4, "usual": 1, "v": 6, "val": 4, "valid": 1, "valu": [1, 4], "variabl": [1, 4], "ve": [1, 4, 5], "veri": 1, "version": 2, "video": [4, 6], "vigor": 1, "virtual": 2, "visual": 1, "volum": 4, "volume_to_add": 4, "wa": 1, "wai": [1, 4], "want": 2, "wanted_dna": 4, "warn": 1, "watch": [4, 6], "we": [1, 2, 4], "week": [1, 4], "weekli": [1, 4], "weigh": 4, "weight": 1, "were": 4, "what": 1, "when": [1, 2, 4, 5], "which": [1, 2], "while": [2, 5], "within": [2, 8], "without": [2, 4], "woman": 1, "word": 1, "wordpad": 2, "work": [1, 2, 4], "world": 1, "would": [1, 4, 7], "write": 1, "written": 2, "www": [1, 6], "x": [1, 2], "y": 1, "year": [1, 4], "you": [0, 1, 2, 4, 5, 7, 8], "young": 1, "your": [1, 2, 4, 7], "yourself": [1, 2], "youtub": 6, "z": 1, "zip": 1, "zip_fil": 1}, "titles": ["Module 1: Hello World", "Walkthrough", "Notebook basics", "Module 2: Simple calculations", "Walkthrough", "Dilution calculations", "Nanopore Sequencing", "Quantitative Reasoning in Biology", "About this book", "Introduction"], "titleterms": {"": 1, "1": 0, "2": 3, "The": 4, "about": 8, "abov": 1, "add": 4, "aerob": 1, "afraid": 2, "all": 2, "amount": 4, "arithmet": 4, "basic": 2, "biologi": 7, "block": 1, "book": 8, "calcul": [1, 3, 4, 5], "cell": [1, 2], "code": 1, "colab": 1, "color": 4, "conclus": 4, "describ": 4, "dilut": 5, "don": 2, "expect": 1, "f": 4, "function": 4, "googl": 1, "grader": 1, "heart": 1, "hello": 0, "i": 4, "inform": 1, "introduct": [1, 9], "jupyt": 2, "learn": 4, "limit": 1, "lint": 4, "markdown": 1, "me": 1, "modul": [0, 3], "molar": 4, "nanopor": 6, "notebook": 2, "object": 4, "otter": 1, "problem": 4, "programmat": 4, "python": [1, 4], "q1": [1, 4], "q2": 4, "q3": [1, 4], "q4": 4, "quantit": 7, "quick": 1, "rate": 1, "reaction": 4, "reason": 7, "reserv": 1, "restart": 2, "run": 2, "sampl": 4, "sequenc": 6, "session": 2, "simpl": 3, "string": 4, "subject": 1, "submiss": 1, "t": 2, "target": 1, "templat": 4, "thi": 8, "through": 4, "try": 1, "upper": 1, "us": 1, "walkthrough": [1, 4], "weight": 4, "what": 4, "which": 4, "why": 1, "world": 0, "write": 4, "yield": 4, "zone": 1}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"About this book": [[10, "about-this-book"]], "Acting on Columns": [[8, "acting-on-columns"]], "Acting on Rows": [[8, "acting-on-rows"]], "Boolean Indexing": [[8, "boolean-indexing"]], "Calculate a aerobic target heart rate?": [[1, "calculate-a-aerobic-target-heart-rate"]], "Cells": [[2, "cells"]], "Coding expectations": [[1, "coding-expectations"]], "Conclusion": [[4, "conclusion"], [8, "conclusion"]], "Dataset Reference": [[8, "dataset-reference"]], "Dilution calculations": [[5, "dilution-calculations"]], "Don\u2019t be afraid to Restart & Run all": [[2, null]], "Functions": [[4, "functions"]], "Imports": [[8, "imports"]], "Indexing": [[8, "indexing"]], "Introduction": [[1, "introduction"], [8, "introduction"], [11, "introduction"]], "Jupyter Notebooks": [[2, "jupyter-notebooks"]], "Learning Objectives": [[4, "learning-objectives"], [8, "learning-objectives"]], "Linting through color": [[4, "linting-through-color"]], "Markdown": [[1, "markdown"]], "Module 03 Walkthrough": [[8, "module-03-walkthrough"]], "Module 1: Hello World": [[0, "module-1-hello-world"]], "Module 2: Simple calculations": [[3, "module-2-simple-calculations"]], "Module 3: DataFrames": [[7, "module-3-dataframes"]], "Nanopore Sequencing": [[6, "nanopore-sequencing"]], "Notebook basics": [[2, "notebook-basics"]], "Numpy": [[8, "numpy"]], "Otter Grader": [[1, "otter-grader"]], "Pandas": [[8, "pandas"]], "Programmatic Arithmetic in Python": [[4, "programmatic-arithmetic-in-python"]], "Q1: Calculate the molarity of the sample": [[4, "q1-calculate-the-molarity-of-the-sample"]], "Q1: Extract the initial_viral_load column ?": [[8, "q1-extract-the-initial-viral-load-column"]], "Q1: Using the information above, calculate the subject\u2019s heart rate reserve.": [[1, "q1-using-the-information-above-calculate-the-subject-s-heart-rate-reserve"]], "Q2: Calculate the amount of sample to add.": [[4, "q2-calculate-the-amount-of-sample-to-add"]], "Q2: Calculate the average weeks_to_failure for the whole population?": [[8, "q2-calculate-the-average-weeks-to-failure-for-the-whole-population"]], "Q3: Calculate the average weeks to failure for the treated population?": [[8, "q3-calculate-the-average-weeks-to-failure-for-the-treated-population"]], "Q3: Describing the reaction yield": [[4, "q3-describing-the-reaction-yield"]], "Q3: Using the information above, calculate the upper limit of the subject\u2019s target heart rate zone.": [[1, "q3-using-the-information-above-calculate-the-upper-limit-of-the-subject-s-target-heart-rate-zone"]], "Q4: Calculate the average weeks_to_failure for the treated population?": [[8, "q4-calculate-the-average-weeks-to-failure-for-the-treated-population"]], "Q4: Calculate the average weeks_to_failure for the untreated population?": [[8, "q4-calculate-the-average-weeks-to-failure-for-the-untreated-population"]], "Q4: Write a function which calculates the reaction yield": [[4, "q4-write-a-function-which-calculates-the-reaction-yield"]], "Quantitative Reasoning in Biology": [[9, "quantitative-reasoning-in-biology"]], "Querying": [[8, "querying"]], "Quick introduction on cells and blocks": [[1, "quick-introduction-on-cells-and-blocks"]], "Session": [[2, "session"]], "Submissions": [[1, "submissions"]], "The Problem": [[4, "the-problem"]], "Try me": [[1, "try-me"]], "Walkthrough": [[1, "walkthrough"], [4, "walkthrough"], [4, "id1"]], "What is the template weight?": [[4, "what-is-the-template-weight"]], "Why Google Colab": [[1, "why-google-colab"]], "Why Python": [[1, "why-python"]], "f-strings": [[4, "f-strings"]]}, "docnames": ["content/Module01/Module01_book", "content/Module01/Module01_walkthrough_book", "content/Module01/notebook_actions", "content/Module02/Module02_book", "content/Module02/Module02_walkthrough_book", "content/Module02/dilution_calculations", "content/Module02/nanopore_description", "content/Module03/Module03_book", "content/Module03/Module03_walkthrough_book", "content/book_index", "content/misc/about_this_book", "content/misc/book_intro"], "envversion": {"sphinx": 61, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["content/Module01/Module01_book.md", "content/Module01/Module01_walkthrough_book.ipynb", "content/Module01/notebook_actions.md", "content/Module02/Module02_book.md", "content/Module02/Module02_walkthrough_book.ipynb", "content/Module02/dilution_calculations.md", "content/Module02/nanopore_description.md", "content/Module03/Module03_book.md", "content/Module03/Module03_walkthrough_book.ipynb", "content/book_index.md", "content/misc/about_this_book.md", "content/misc/book_intro.md"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [2, 4, 6, 8], "0": [1, 4, 8, 9], "000000": 8, "02197802197804": 4, "022": 4, "03": 1, "041984": 8, "070204": 8, "0f": 4, "1": [1, 4, 8], "10": [4, 8], "100": [1, 4], "1000": 4, "11": 8, "12": [1, 2, 4, 8], "1231231": 4, "13": [1, 8], "133333": 8, "14": 8, "15": [4, 8], "150": 4, "16": 8, "1628": 8, "1680": 4, "17": 8, "18": 8, "182000": 4, "19": [4, 8], "1e": 4, "1f": [4, 8], "1st": 9, "2": [1, 4, 8], "20": [4, 8], "200": 4, "2007": 1, "2016": 4, "202663": 8, "21": [2, 8], "22": 8, "220": 1, "23": [1, 4, 8], "24": 8, "25": [4, 8], "250000": 8, "26": 8, "263": 4, "266667": 8, "26988": 4, "27": 8, "278": 4, "28": 8, "280": 4, "29": 8, "2f": 4, "3": [1, 4, 8], "30": 8, "31": 8, "32": 8, "320": 4, "33": 8, "34": [1, 8], "35": 4, "36": 8, "366667": 8, "37": 8, "38": 8, "4": [1, 8, 9], "40": 8, "400000": 8, "41": 8, "42": 8, "43": 8, "45": 8, "46": 8, "47": 8, "48": 8, "49": 8, "5": [1, 4, 8], "50": [4, 8], "500000": 8, "51": 8, "517": 10, "53": 8, "54": 8, "55": 8, "56": 8, "569209": 8, "57": 8, "59": 8, "6": [4, 8], "60": 1, "650": 4, "66": 8, "6950": 4, "6951": 4, "7": [1, 8], "70": 1, "72": [4, 8], "74": 8, "75": 8, "750000": 8, "76": 8, "77": 4, "8": [4, 8], "822714681440445": 1, "827": 4, "85": [1, 8], "86": 1, "87": 8, "88": 8, "9": [1, 4, 8], "900000": 8, "94": 8, "99": 8, "A": [1, 2, 4, 8], "And": [1, 5, 8], "As": [2, 4], "At": [1, 4, 8], "BY": 9, "But": 4, "By": [1, 8], "For": [1, 2, 8], "If": [1, 4, 5, 8, 9], "In": [1, 2, 4, 8], "It": [1, 2, 4, 8, 10], "NO": [4, 8], "NOT": 1, "Not": 4, "On": 2, "Or": [1, 4], "That": 1, "The": [1, 8, 9], "Then": [1, 4, 8], "There": [1, 2, 4, 8], "These": [1, 2, 4], "Will": 9, "_mask": 8, "abl": [1, 4, 8], "about": [1, 2, 4], "abov": 4, "abreast": 1, "absorb": 4, "abstract": 1, "access": 2, "accomplish": 1, "accordingli": 2, "across": 2, "act": 10, "action": 2, "activ": [1, 4, 8], "ad": 4, "adapt": 4, "add": [1, 8], "addit": 4, "administr": 1, "adult": 1, "advantag": 1, "aesthet": 8, "after": [1, 4, 8], "ag": [1, 4, 8], "again": 4, "age_col": 8, "age_initial_infect": 8, "age_mask": 8, "age_mean": 8, "age_mean_short": 8, "aged_high_vl": 8, "aged_low_vl": 8, "aged_sampl": 8, "ahead": 8, "aim": 1, "algorithm": [1, 8], "alia": 8, "all": [1, 4, 8], "allow": [1, 2, 4, 10], "along": 8, "alreadi": 1, "also": [1, 4, 8, 10], "alter": 4, "altern": 8, "alwai": 2, "amplicon_length": 4, "amplicon_weight": 4, "an": [1, 2, 4, 8, 10], "anaconda": 1, "analysi": [1, 2, 4, 8], "analyz": [1, 8], "ani": [1, 2, 8], "annot": 10, "anoth": [1, 4, 8], "answer": [1, 4], "anyth": [1, 4], "anywher": 4, "appli": [9, 10], "ar": [1, 2, 4, 8, 9], "arbitrari": 8, "arduou": 1, "arg1": 4, "arg2": 4, "around": [4, 8], "arrai": 8, "art": 8, "assert": 1, "assign": [1, 4, 8], "assum": 4, "atop": 8, "attach": [4, 8], "attribut": [8, 9], "autom": 4, "avail": 1, "averag": 1, "average_week": 8, "awai": 1, "await": 2, "back": [1, 8], "background": [2, 6, 10], "bake": 4, "barcod": 4, "base": [1, 2, 4, 8], "base_weight": 4, "basepair": 4, "basic": [0, 1, 3, 4, 8], "batteri": 1, "bblearn": 1, "beat": 1, "becaus": 1, "becom": [1, 2], "been": [1, 4, 5, 8], "befor": [1, 2, 4, 8], "begin": [4, 8], "being": [1, 2], "below": 1, "berklei": 1, "better": 1, "between": [1, 4, 8], "biolog": [1, 8, 10], "biologi": [1, 4], "biostatist": [9, 10], "black": 4, "block": 4, "bmi": 1, "bold": 1, "book": [4, 9, 11], "both": [1, 2], "bp": [1, 4], "brace": 4, "bracket": 8, "break": 4, "brief": [1, 4], "briefli": 6, "browser": [1, 2], "bullet": 1, "button": 2, "bypass": 1, "calc_molar": 4, "calc_yield": 4, "call": [1, 2, 4, 8], "can": [1, 2, 4, 5, 8], "cannot": [1, 2, 8], "captur": 1, "carri": 1, "case": [2, 8], "cc": 9, "cell": 4, "center": 8, "chain": 8, "chanc": 8, "chang": [1, 4], "chapter": [0, 1, 3, 7], "check": 1, "check_al": 1, "chemic": 4, "circa": 4, "class": [1, 2], "click": 1, "clinic": 1, "cloud": 2, "code": [2, 4, 8], "colab": [0, 2], "collect": 1, "colleg": 9, "com": [1, 6], "come": [1, 2, 4], "comma": 8, "command": [2, 8], "common": [1, 4, 8, 9], "commun": [4, 8], "compact": 4, "compani": 2, "companion": [9, 10], "comparison": 8, "compat": 2, "complet": [1, 2, 8], "complex": [1, 4, 8], "compris": 1, "comput": [1, 2, 4], "concentr": 4, "concept": [1, 2], "condit": 1, "connect": 2, "consid": 1, "constraint": 8, "consumpt": 9, "contact": 9, "contain": [2, 4, 8], "content": [2, 5, 9, 10], "context": [4, 10], "continu": 4, "control": 8, "convent": 8, "convers": 5, "convert": 4, "copi": [4, 8], "core": 8, "corner": 1, "correct": 4, "correctli": 1, "correspond": 8, "could": 1, "count": [1, 4, 8], "cours": [1, 2, 4, 8, 9, 10], "cover": [1, 2, 4, 8], "covid": 4, "creat": [1, 2, 4, 8], "creativ": 9, "critic": [1, 4], "csv": 8, "ctrl": 2, "cure": 8, "curli": 4, "current": [4, 8], "current_yield": 4, "dai": 1, "dampier": 9, "dash": 8, "data": [1, 2, 4, 7, 8], "datafram": 8, "dataset": [1, 2, 10], "debug": 4, "decis": 1, "def": 4, "defin": 4, "delet": 2, "depth": 4, "deriv": 8, "describ": [1, 6, 8], "descript": 4, "design": 4, "desir": 1, "detail": 4, "detect": 8, "determin": 1, "develop": [1, 9], "devic": 4, "didn": 1, "differ": [4, 8], "difficult": [1, 2], "difficulti": 1, "dilut": 4, "disconnect": 2, "discuss": [3, 4, 7], "displai": [4, 8], "dive": 8, "dna": [4, 5], "dna_conc": 4, "dna_molar": 4, "dna_weight": 4, "dna_yield": 4, "dna_yield_descript": 4, "do": [1, 3, 4, 8], "doesn": 8, "dollar": 4, "done": [2, 4, 5, 8], "dot": 8, "doubl": [1, 4], "down": [1, 4, 8], "download": [1, 2], "dozen": 1, "drastic": 4, "drexel": [4, 9, 10], "dropdown": 1, "dtype": 8, "due": [1, 4, 8], "dure": [1, 4, 8], "dynam": 4, "each": [1, 2, 4, 8], "easi": 1, "easier": [4, 8], "easili": 8, "edit": [1, 2, 9], "educ": 1, "effect": [2, 8], "either": 2, "elimin": 8, "emerg": 1, "emoji": 8, "emploi": 8, "empti": 2, "encod": 2, "end": [1, 4, 8], "endswith": 1, "ensur": [1, 2, 4], "enter": 2, "entir": [4, 8], "environ": 1, "enzymat": 4, "equat": 4, "error": 4, "estim": 1, "evalu": 8, "even": [1, 4, 8], "everyon": 1, "everyth": [2, 4, 8], "evolv": 10, "exactli": 8, "exampl": [1, 8], "excel": [1, 8], "except": 1, "execut": [1, 2], "exercis": 1, "exist": 1, "expand": 10, "experi": [1, 4], "explain": 4, "explan": [4, 5], "explanatori": 4, "explor": [4, 8], "explos": 1, "express": [1, 4, 8], "extend": 8, "extens": 2, "extra": 1, "f": [1, 8], "face": 1, "facilit": 8, "fact": 1, "factor": 8, "fals": 8, "familiar": 1, "fast": 8, "featur": 1, "fempto": 4, "femtomol": 4, "few": 8, "field": 1, "figur": 1, "file": [1, 2, 8], "filter": 8, "filterwarn": 1, "find": [1, 8], "finish": 1, "first": [1, 2, 8], "fit": 8, "fix": [1, 2], "flavor": 1, "float": [4, 8], "float64": 8, "fmol": 4, "fmole": 4, "focus": 5, "follow": [1, 9], "footnot": 1, "form": 1, "format": [1, 2, 4], "found": [1, 8], "four": 8, "frame": [1, 8], "free": [1, 2, 9], "freeli": 1, "fresh": 2, "freshli": 2, "from": [1, 2, 4, 8], "full": 1, "function": [1, 8, 10], "function_nam": 4, "further": 4, "futur": [1, 8], "g": 4, "gener": [1, 4], "get": [1, 2, 4, 8], "give": [1, 2], "go": 1, "googl": [0, 2], "goolg": 1, "got": 8, "grace": 1, "grade": 1, "green": 4, "group": 8, "guru": 4, "ha": [1, 4, 5, 8], "had": 4, "hand": 1, "have": [1, 2, 4, 6, 8, 9], "head": 8, "header": 8, "healthi": 1, "heart_rate_reserv": 1, "height": 1, "hello": 1, "help": [1, 4], "her": 1, "here": 1, "high_vl_mask": 8, "hint": 4, "hipaa": 2, "hit": 1, "hiv": 8, "hold": 1, "hour": 2, "how": [1, 3, 4, 7, 8], "howev": [1, 2, 4, 8], "hrr": 1, "html": 1, "http": [1, 6], "hundr": 1, "hurdl": 1, "hyperlink": 1, "hypothesi": 10, "hypothet": 8, "i": [1, 2, 5, 8, 9], "ideal": 1, "ignor": 1, "imag": 2, "imbalanc": 4, "immedi": 4, "immunologi": 10, "impact": 8, "import": [1, 2], "importerror": 1, "includ": [1, 8], "incorrect": 2, "incred": 1, "incredibli": 8, "increment": 8, "independ": 2, "indepth": 4, "indic": 8, "individu": 4, "infect": 8, "inferenti": 1, "inferentialthink": 1, "inform": [2, 8], "ingredi": 4, "init_vl": 8, "init_vl_sum": 8, "initi": [1, 4, 8], "input": 4, "insid": 4, "instal": [1, 2], "instead": [1, 4, 8], "instruct": 1, "insurmount": 1, "int64": 8, "integ": 4, "intens": 1, "interact": [1, 2, 10], "interfac": [1, 8], "intern": 9, "interoper": 8, "interpret": 2, "introduc": 0, "ipynb": 1, "isinst": [4, 8], "isn": 1, "issu": [1, 2], "italic": 1, "item": 4, "itself": 2, "julia": 2, "jump": 4, "jupyt": 1, "jupyterlab": 1, "just": [1, 2, 4], "kei": 4, "kernel": 1, "kg": 1, "know": [2, 4, 8], "kwarg1": 4, "kwarg2": 4, "lab": [4, 8], "languag": [1, 2], "larg": [1, 2], "larger": 1, "last": [1, 8], "lastli": [1, 10], "later": [1, 8], "launch": 1, "lead": 4, "learn": 1, "left": 1, "len": 1, "length": 8, "less": 2, "let": [1, 8], "level": 8, "libari": 8, "librari": 8, "licens": 9, "ligat": 4, "light": 4, "like": [1, 2, 4, 8, 9], "limit": [2, 4, 8], "line": [1, 8], "link": [1, 2, 5], "list": [1, 4, 8], "listdir": 1, "live": 8, "ll": [1, 2, 4, 8], "load": [1, 2, 7, 8], "loc": 8, "log": 2, "long": 8, "look": [1, 4, 5], "loop": 1, "lot": [4, 8], "m": 2, "main": 8, "make": [4, 8], "manag": 4, "mani": [1, 2, 4], "manual": 4, "markdown": 2, "mass": 4, "match": 8, "materi": 4, "math": [1, 3, 4], "mathemat": 8, "max": 8, "maximum": 1, "mayb": 2, "mayo": 1, "me": 9, "mean": 8, "measur": 4, "median": 8, "medicin": 9, "menu": [1, 2], "meter": 1, "method": [1, 8], "microbiologi": 10, "might": 8, "miim": 10, "million": 4, "min": 8, "minion": 4, "minut": 1, "mircolit": 4, "mistak": [1, 2], "mode": 8, "modif": 1, "modul": 4, "modular": 4, "mole": 4, "molecul": 4, "molecular": 4, "monitor": 8, "more": [1, 2, 4, 5, 8], "morn": 1, "most": [1, 2], "motor": 4, "move": 1, "much": 4, "multipl": [1, 4, 8], "multipli": 1, "must": 1, "my": 4, "name": [1, 8], "nano": 4, "nanopor": 4, "natur": 1, "nc": 9, "nd": 9, "nearest": 4, "neb": 5, "need": [0, 1, 2, 4, 5, 8], "never": 2, "new": [2, 4, 8], "new_concentr": 4, "new_paragon_molar": 4, "newest": 1, "next": [1, 4], "ng": 4, "nice": [1, 8], "noderiv": 9, "noncommerci": 9, "normal": [1, 4], "notebook": [1, 4], "notepad": [1, 2], "notic": [1, 4], "now": [1, 4, 8], "np": 8, "nucleotid": 4, "number": [1, 4, 8], "numer": [4, 8], "nuniqu": 8, "o": 1, "obtain": 4, "ocassion": 2, "off": [1, 4, 8], "often": [2, 4, 8], "oftentim": 2, "okai": 2, "old": [1, 4, 8], "onc": [1, 2, 4, 8], "one": [1, 2, 4, 8], "ones": 8, "onli": [1, 2, 8], "onlin": [1, 4], "open": [1, 2], "oper": 4, "option": 2, "orang": 4, "order": [1, 2], "organ": 4, "origin": 2, "other": [1, 2, 4, 8], "otherwis": 8, "our": [1, 2, 4, 8], "out": 1, "outbreak": 4, "output": 1, "over": [4, 8], "overhang": 4, "overwrit": 2, "own": [1, 2], "packag": 1, "page": 6, "panda": 7, "paragon": 4, "paragon_molar": 4, "part": 4, "particip": 8, "particular": 1, "past": [2, 4, 10], "path": 1, "patient": 8, "pcr": 4, "pd": 8, "peopl": [1, 8], "per": [1, 4], "perfect": 4, "perfectli": 1, "perform": [4, 8], "phrase": 1, "pip": 1, "place": 10, "plai": 2, "plain": [1, 2], "plan": 2, "plate": 4, "plethora": 8, "plu": 1, "plwh": 8, "point": 4, "pool": 4, "pose": 1, "possibl": [2, 4], "post": 4, "potenti": 2, "power": [1, 2, 8], "practic": 8, "precis": 1, "preload": 1, "prep": 4, "prepar": 4, "prescrib": 4, "present": 8, "previou": 1, "print": [1, 4, 8], "prism": 1, "problem": [1, 2, 10], "process": [1, 4, 6], "profici": 8, "program": [1, 2], "progress": [1, 8], "project": 4, "prompt": [4, 8], "prone": 4, "proper": 10, "protect": 2, "protein": 4, "protocol": 4, "provid": [1, 8], "purpos": [1, 2, 4, 8], "put": 1, "python": [2, 3, 7, 8], "q": 1, "qith": 8, "qubit": 4, "question": [1, 8], "quick": 5, "r": 2, "randomli": 8, "rang": [1, 8], "rapid": 4, "rcp85jhlmni": 6, "re": [1, 2, 4], "read": [4, 8], "read_csv": 8, "readi": [2, 8], "reagent": 4, "real": 8, "realli": 4, "reason": 4, "rebound": 8, "receiv": 8, "recent": [1, 2, 4], "recommend": 4, "refer": [4, 10], "refresh": 5, "regularli": 8, "relat": 4, "relev": 2, "rememb": [1, 2, 4, 8], "remov": 4, "render": [2, 4], "repeat": 4, "repetit": 4, "replac": 4, "repres": 4, "reproduc": 4, "requir": [1, 4, 8], "research": [1, 8], "respond": 2, "rest": 1, "restart": 1, "resting_heart_r": 1, "result": [1, 4, 8], "return": [1, 4], "reusabl": 4, "review": [4, 5], "right": [1, 4], "rigor": 1, "rna": 4, "rna_paragon_molar": 4, "round": 4, "run": 1, "runtim": 2, "sai": 8, "said": 1, "same": [1, 4, 8], "sampl": 8, "sample_concentr": 4, "sample_length": 4, "sample_volum": 4, "sample_yield": 4, "save": 1, "saw": 8, "sciecn": 8, "scienc": [1, 8], "screen": 1, "search": 8, "searchabl": 10, "second": [1, 2], "secreti": 2, "section": 2, "secur": 2, "see": [1, 4, 8], "select": 8, "self": 4, "send": 2, "senior": 4, "sens": 8, "sensit": 2, "sent": 2, "sentenc": 4, "sequenc": 4, "seri": [1, 2, 4, 8], "servic": 2, "session": [1, 4, 8], "set": 1, "setup": 1, "share": 2, "shift": [1, 2], "short": 4, "shortcut": 2, "should": [1, 2, 4, 8], "similar": [2, 8], "simpl": [1, 8], "sinc": [1, 5], "singl": [4, 8], "sit": 8, "size": 4, "skeleton": 1, "skill": 1, "small": [1, 8], "smaller": 4, "so": [2, 4, 8], "softwar": [1, 2], "solut": [1, 4, 8], "solv": 1, "some": [1, 2, 4, 5, 6, 8], "someon": 8, "someth": [1, 4], "sometim": [2, 8], "somewher": 1, "space": [4, 8], "spawn": 1, "special": 2, "specif": 8, "speedup": 4, "spin": 1, "split": 8, "spreadsheet": [1, 7, 8], "squar": 8, "stack": 8, "stai": 1, "start": [1, 2, 4, 8], "statement": [4, 8], "statist": [1, 8], "std": 8, "step": [1, 8], "still": 1, "stock": 4, "stop": 8, "str": 4, "stragei": 8, "strand": 4, "strategi": [4, 8], "structur": 1, "studi": [1, 8], "stumbl": 1, "style": [7, 8], "sublist": 1, "submiss": 2, "submit": 1, "subtract": 1, "success": 0, "suffix": 8, "suggest": 1, "sum": 8, "summar": [1, 4, 7, 8], "summari": 8, "synchron": [1, 4, 8], "syntax": [1, 4], "system": [1, 2], "t": [1, 4, 8], "tabl": [1, 2, 4, 8], "tag": 2, "take": [1, 2, 4], "talk": [1, 2], "task": [1, 4], "taught": 1, "teach": 1, "techniqu": [1, 8], "technologi": 1, "tediou": 4, "tell": [1, 4], "template_weight": 4, "tend": [5, 8], "test": [1, 2, 4], "tests_dir": 1, "text": [1, 2, 4], "textbook": [1, 4, 9], "than": 2, "thei": [4, 8], "them": [1, 2, 8], "themselv": 2, "therebi": 1, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11], "thing": [1, 2, 4], "think": [1, 4], "those": [1, 4, 8], "through": 1, "throughout": 1, "time": [1, 4, 8], "todai": 8, "too": [1, 2], "took": 8, "tool": [0, 1, 8], "top": 1, "topic": 1, "total": 4, "track": 4, "transform": 8, "transpar": 4, "treated_average_week": 8, "treated_mask": 8, "treatment": 8, "trial": 8, "trial_data": 8, "trial_df": 8, "troubl": 1, "true": [1, 4, 8], "try": 4, "tube": 4, "twice": 1, "two": [1, 2, 8], "type": [1, 2, 4, 8], "u": [1, 4, 8], "uc": 1, "ul": [4, 8], "uncontrol": 8, "under": 9, "underneath": 1, "understand": [2, 4, 8], "undo": 2, "uniqu": 4, "unit": [4, 5], "univers": 9, "unless": 4, "until": 8, "untreated_average_week": 8, "unwieldi": 1, "unzip": 1, "up": [1, 4], "upload": [1, 2], "upon": 10, "upper_target_zon": 1, "us": [2, 3, 4, 5, 7, 8, 9], "usb": 4, "usual": [1, 8], "util": 8, "v": 6, "val": 4, "valid": [1, 8], "valu": [1, 4, 8], "var": 8, "variabl": [1, 4], "ve": [1, 4, 5], "veri": 1, "version": [2, 8], "video": [4, 6], "vigor": 1, "viral": 8, "virtual": 2, "visual": 1, "volum": 4, "volume_to_add": 4, "wa": [1, 8], "wai": [1, 4], "want": [2, 8], "wanted_dna": 4, "wanted_sampl": 8, "warn": 1, "watch": [4, 6], "we": [1, 2, 4, 8], "week": [1, 4], "weekli": [1, 4, 8], "weigh": 4, "weight": 1, "went": 8, "were": [4, 8], "what": 1, "when": [1, 2, 4, 5, 8], "where": 8, "whether": 8, "which": [1, 2, 8], "while": [2, 5, 8], "who": 8, "within": [2, 8, 10], "without": [2, 4], "woman": 1, "word": 1, "wordpad": 2, "work": [1, 2, 4, 8], "world": 1, "would": [1, 4, 9], "write": 1, "written": [2, 8], "www": [1, 6], "x": [1, 2], "y": 1, "year": [1, 4, 8], "years_infect": 8, "you": [0, 1, 2, 4, 5, 8, 9, 10], "young": 1, "your": [1, 2, 4, 8, 9], "yourself": [1, 2, 8], "youtub": 6, "yr": 8, "z": 1, "zip": 1, "zip_fil": 1}, "titles": ["Module 1: Hello World", "Walkthrough", "Notebook basics", "Module 2: Simple calculations", "Walkthrough", "Dilution calculations", "Nanopore Sequencing", "Module 3: DataFrames", "Module 03 Walkthrough", "Quantitative Reasoning in Biology", "About this book", "Introduction"], "titleterms": {"": 1, "03": 8, "1": 0, "2": 3, "3": 7, "The": 4, "about": 10, "abov": 1, "act": 8, "add": 4, "aerob": 1, "afraid": 2, "all": 2, "amount": 4, "arithmet": 4, "averag": 8, "basic": 2, "biologi": 9, "block": 1, "book": 10, "boolean": 8, "calcul": [1, 3, 4, 5, 8], "cell": [1, 2], "code": 1, "colab": 1, "color": 4, "column": 8, "conclus": [4, 8], "datafram": 7, "dataset": 8, "describ": 4, "dilut": 5, "don": 2, "expect": 1, "extract": 8, "f": 4, "failur": 8, "function": 4, "googl": 1, "grader": 1, "heart": 1, "hello": 0, "i": 4, "import": 8, "index": 8, "inform": 1, "initial_viral_load": 8, "introduct": [1, 8, 11], "jupyt": 2, "learn": [4, 8], "limit": 1, "lint": 4, "markdown": 1, "me": 1, "modul": [0, 3, 7, 8], "molar": 4, "nanopor": 6, "notebook": 2, "numpi": 8, "object": [4, 8], "otter": 1, "panda": 8, "popul": 8, "problem": 4, "programmat": 4, "python": [1, 4], "q1": [1, 4, 8], "q2": [4, 8], "q3": [1, 4, 8], "q4": [4, 8], "quantit": 9, "queri": 8, "quick": 1, "rate": 1, "reaction": 4, "reason": 9, "refer": 8, "reserv": 1, "restart": 2, "row": 8, "run": 2, "sampl": 4, "sequenc": 6, "session": 2, "simpl": 3, "string": 4, "subject": 1, "submiss": 1, "t": 2, "target": 1, "templat": 4, "thi": 10, "through": 4, "treat": 8, "try": 1, "untreat": 8, "upper": 1, "us": 1, "walkthrough": [1, 4, 8], "week": 8, "weeks_to_failur": 8, "weight": 4, "what": 4, "which": 4, "whole": 8, "why": 1, "world": 0, "write": 4, "yield": 4, "zone": 1}})
\ No newline at end of file