This repository implements the framework described in Data-Driven Modeling of 4D Ocean and Coastal Acidification from Surface Measurements to predict Aragonite Saturation State (
- Clone the repository and move into its directory:
git clone https://github.com/becklabs/aragonite-opendap.git && cd aragonite-opendap
- From this directory, install the
oadap
pip package in editable mode:
pip install -e .
-
Create an Earthdata Login: To access satellite datasources, you will need to create an Earthdata Login.
-
Set Environment Variables: After creating an account, set the following environment variables:
export EARTHDATA_USERNAME=<your_username>
export EARTHDATA_PASSWORD=<your_password>
- Alternative Option: Add Credentials to
.netrc
: Instead of setting environment variables, you can add the following lines to your~/.netrc
file:
machine urs.earthdata.nasa.gov
login <your_username>
password <your_password>
To run framework training and inference, you will need to download the provided static, preprocessed data artifacts:
-
Download the
data_artifacts.zip
archive (~500MB) from the Google Drive Link. -
Extract the contents of
data_artifacts.zip
to thedata/
directory at the root of the repository:unzip /path/to/data_artifacts.zip -d data/
The data_artifacts.zip
includes preprocessed data and artifacts from FVCOM, MWRA, and Earthdata sources.
You will need a Weights and Biases account to track training runs.
Login into Weights and Biases:
wandb login
To make daily scripts/run_framework.py
script:
python -m scripts.run_framework \
--start [Required] [str] Start date in YYYY-MM-DD format \
--end [Required] [str] End date in YYYY-MM-DD format \
--cache_dir [Optional] [str] Cache directory for joblib memory (default: intermediate_cache/) \
--output_nc [Optional] [str] Output NetCDF file path (default: aragonite_field.nc)
Example:
python -m scripts.run_framework \
--start 2018-01-01 \
--end 2018-01-31 \
--output_nc data/aragonite_field_jan2018.nc
The aragonite_field_jan2018.nc
file will contain the predicted
To train the Temporal Convolutional Network (TCN) to predict time-varying PCA coefficients for temperature and salinity:
- Update
config/v0.yaml
to point to the correct training dataset:
For Temperature:
train_file: "FVCOM/preprocessed/temperature/all/train/X.npy"
label_file: "FVCOM/preprocessed/temperature/all/train/y.npy"
For Salinity:
train_file: "FVCOM/preprocessed/salinity/all/train/X.npy"
label_file: "FVCOM/preprocessed/salinity/all/train/y.npy"
- Run the
scripts/tcn/train.py
script:
python -m scripts.tcn.train \
--config_file [Optional] [str] Path to the configuration file (default: config/v0.yaml) \
Example:
python -m scripts.tcn.train --config_file config/v0.yaml
To fit the Bayesian Ridge Regression for TAlk, run the scripts/regression/fit_talk.py
script:
python -m scripts.regression.fit_talk \
--csv_file [Optional] [str] Path to the raw MWRA CSV file (default: data/MWRA/MWRA.csv) \
--checkpoint_path [Optional] [str] Path to save the model checkpoint (default: checkpoints/TAlk_regression/model.pkl)
Example:
python -m scripts.regression.fit_talk \
--csv_file data/MWRA/MWRA.csv \
--checkpoint_path checkpoints/TAlk_regression/model.pkl
To fit the Gaussian Process Regression for DIC, run the scripts/regression/fit_dic.py
script:
python -m scripts.regression.fit_dic \
--csv_file [Optional] [str] Path to cleaned MWRA CSV file (default: data/MWRA/MWRA_clean.csv) \
--checkpoint_path [Optional] [str] Path to save the model checkpoint (default: checkpoints/DIC_regression/model.pkl)
Example:
python -m scripts.regression.fit_dic \
--csv_file data/MWRA/MWRA_clean.csv \
--checkpoint_path checkpoints/DIC_regression/model.pkl