Many interventions in healthcare are still not based on hard evidence and care might differ between races, especially in the Intensive Care Unit (ICU).
The goal of this project is to investigate disparities between races in critically ill sepsis patients in regard to in-hospital mortality, renal replacement therapy (RRT), vasopressor use (VP), or mechanical ventilation (MV) in cohorts curated from MIMIC IV (2008-2019).
Run the following command in your terminal.
git clone https://github.com/joamats/mit-tmle.git
Run the following command:
source('src\r_scripts\setup\install_packages.R')
Run the following command:
pip install -r src/py_scripts/setup/requirements.txt
MIMIC data can be found in PhysioNet, a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology. Due to its sensitive nature, credentialing is required to access both datasets.
Documentation for MIMIC-IV's can be found here.
In this section, we explain how to set up GCP and your environment in order to run SQL queries through GCP right from your local Python setting. Follow these steps:
- Create a Google account if you don't have one and go to Google Cloud Platform
- Enable the BigQuery API
- Create a Service Account, where you can download your JSON keys
- Place your JSON keys in the parent folder (for example) of your project
- Create a .env file with the command
cp env.example env
- Update your .env file with your JSON keys path and the id of your project in BigQuery
After getting credentialing at PhysioNet, you must sign the data use agreement and connect the database with GCP, either asking for permission or uploading the data to your project.
Having all the necessary tables for the cohort generation query in your project, run the following command to fetch the data as a dataframe that will be saved as CSV in your local project. Make sure you have all required files and folders.
python3 src/py_scripts/get_data.py --sql "src/sql_queries/mimic_table.sql" --destination "data/MIMIC_data.csv"
And transform into a ready to use dataframe:
source("src/r_scripts/utils/load_data.R")
The ICD-9 to ICD-10 translation based on this GitHub Repo.
Targetted Maximum Likelihood Estimation was used to delineate the average treatment effect for one of the interventions. Data was stratified by race and predicted probability of mortality based on the OASIS score. Running the following command allows to replicate the obtained results.
source("src/r_scripts/tmle_bin.R")
# for binary outcomes
source("src/r_scripts/tmle_cont.R")
# for continuous outcomes
We are actively working on this project. Feel free to raise questions opening an issue, to fork this project and submit pull requests!