Skip to content

Commit

Permalink
Update docs (#542)
Browse files Browse the repository at this point in the history
* update ref_envvars.rst

* consolidate information from ref_settings into pod_settings doc and delete ref_settings.rst

* add a section on generating catalogs to start_config.rst

* fix formatting in start_install.rst

* fix formatting in dev_git_intro and dev_start docs
  • Loading branch information
wrongkindofdoctor authored Apr 16, 2024
1 parent e872f44 commit 112ec89
Show file tree
Hide file tree
Showing 7 changed files with 198 additions and 565 deletions.
3 changes: 0 additions & 3 deletions doc/sphinx/dev_git_intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,9 +177,6 @@ This method requires adding the *NOAA-GFDL/MDTF-diagnostics* repo to the *.git/c
and is described in the GitHub discussion post
`Working with multiple remote repositories in your git config file <https://github.com/NOAA-GFDL/MDTF-diagnostics/discussions/96>`__.


.. (TODO: `pip install -v .`, other installation instructions...)
.. _ref-rebase:

Updating your POD branch by rebasing it onto the main branch (a bit more difficult than merging, but cleaner)
Expand Down
21 changes: 12 additions & 9 deletions doc/sphinx/dev_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Obtaining the source code

POD developers should create their branches from the
`main branch <https://github.com/NOAA-GFDL/MDTF-diagnostics/tree/main>`__ of the framework code

.. code-block:: console
git checkout -b [POD branch name] main
Expand All @@ -37,6 +38,7 @@ scripting languages, including, `R <https://anaconda.org/conda-forge/r-base>`__,
Python-based PODs should be written in Python 3.11 or newer. We provide a developer version of the python3_base environment (described below) that includes Jupyter and other developer-specific tools. This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script:

If you are using Anaconda or miniconda to manage the conda environments, run:

.. code-block:: console
% cd $CODE_ROOT
Expand All @@ -45,6 +47,7 @@ If you are using Anaconda or miniconda to manage the conda environments, run:
Installing dependencies with Micromamba
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Micromamba is a lightweight version of Anaconda. It is required to install the base and python3_base conda enviroments
on macOS machines with Apple M-series chips. Installation instructions are available in the
`Micromamba Documentation <https://mamba.readthedocs.io/en/latest/micromamba-installation.html>`__,
Expand Down Expand Up @@ -146,7 +149,7 @@ list (referred to as $POD_NAME):
a). If purely Python 3, the framework will look for ``src/conda/env_python3_base.yml`` and check its content to
determine whether the POD's requirements are met, and then switch to ``_MDTF_python3_base`` and run the POD.

b). Similarly, if NCL or R is used, then ``NCL_base`` or ``R_base`` environment will be activated at runtime.
b). Similarly, if NCL or R is used, then ``NCL_base`` or ``R_base`` environment will be activated at runtime.

Note that for the 6 existing PODs depending on NCL (EOF_500hPa, MJO_prop_amp, MJO_suite, MJO_teleconnection,
precip_diurnal_cycle, and Wheeler_Kiladis), Python is also used but merely as a wrapper. Thus the framework will
Expand All @@ -157,10 +160,10 @@ to selectively install conda environments using the ``--env`` flag (:ref:`ref-co
the environments needed for the PODs you're interested in, and that ``_MDTF_base`` is mandatory for the framework's
operation.

- For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs``
requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag``
(because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag``
to the ``--env`` flag one at a time (:ref:`ref-conda-install`).
For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs``
requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag``
(because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag``
to the ``--env`` flag one at a time (:ref:`ref-conda-install`).


Testing with a new Conda environment
Expand All @@ -187,14 +190,14 @@ correctly lists the language(s) (in case of updating an existing environment).
Or, if using micromamba:

.. code-block:: console
.. code-block:: console
% cd $CODE_ROOT
% ./src/conda/conda_env_setup.sh --env $your_POD_short_name --micromamba_root $MICROMAMBA_ROOT --env_dir $CONDA_ENV_DIR
% cd $CODE_ROOT
% ./src/conda/conda_env_setup.sh --env $your_POD_short_name --micromamba_root $MICROMAMBA_ROOT --env_dir $CONDA_ENV_DIR
Have the framework run your POD on suitable test data.

1. Add your POD's short name to the ``pod_list`` section of the configuration input file
(template: ``templates/runtime_config.[jsonc | yml]``).
(template: ``templates/runtime_config.[jsonc | yml]``).

2. Prepare the test data as described in :doc:`start_config`.
113 changes: 101 additions & 12 deletions doc/sphinx/pod_settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,13 @@ This is where you describe your diagnostic and list the programs it needs to run
`math <https://docs.python.org/3/library/math.html>`__. If no third-party libraries are needed,
the value should be an empty list.

``pod_env_vars``:
:ref:`object<object>`, optional. Names and values of shell environment variables used by your diagnostic,
*in addition* to those supplied by the framework. The user can't change these at runtime, but this can be used to set
site-specific installation settings for your diagnostic (eg, switching between low- and high-resolution observational
data depending on what the user has chosen to download). Note that environment variable values must be provided as
strings.

Data section
------------

Expand All @@ -127,22 +134,105 @@ This section is where you list the dimensions (coordinate axes) your variables a
key-value pair, where the key is the name your diagnostic uses for that dimension internally, and the value is a list of
settings describing that dimension. In order to be unambiguous, all dimensions must specify at least:

``standard_name``:
The CF `standard name <http://cfconventions.org/Data/cf-standard-names/72/build/cf-standard-name-table.html>`__ for
that coordinate.
Latitude and Longitude
^^^^^^^^^^^^^^^^^^^^^^

``standard_name``:
**Required**, string. Must be ``"latitude"`` and ``"longitude"``, respectively.

``units``:
Optional, a :ref:`CFunit<cfunit>`. Units the diagnostic expects the dimension to be in. Currently the framework only
supports decimal ``degrees_north`` and ``degrees_east``, respectively.

``range``:
:ref:`Array<array>` (list) of two numbers. Optional. If given, specifies the range of values the diagnostic expects
this dimension to take. For example, ``"range": [-180, 180]`` for longitude will have the first entry of the longitude
variable in each data file be near -180 degrees (not exactly -180, because dimension values are cell midpoints), and
the last entry near +180 degrees.

``need_bounds``:
Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied
for this dimension, in addition to its midpoint values, following the
`CF conventions <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#cell-boundaries>`__:
the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds
information.

``axis``:
String, optional. Assumed to be ``Y`` and ``X`` respectively if omitted, or if ``standard_name`` is
``"latitude"`` or ``"longitude"``. Included here to enable future support for non-lat-lon horizontal coordinates.

Time
^^^^

``standard_name``:
**Required**. Must be ``"time"``.

``units``:
String. Optional, defaults to "day". Units the diagnostic expects the dimension to be in. Currently the diagnostic
only supports time axes of the form "<units> since <reference data>", and the value given here is interpreted in this
sense (e.g. settings this to "day" would accommodate a dimension of the form "[decimal] days since 1850-01-01".)

``calendar``:
String, Optional. One of the CF convention
`calendars <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#calendar>`__ or
the string ``"any"``. **Defaults to "any" if not given**. Calendar convention used by your diagnostic. Only affects
the number of days per month.

``need_bounds``:
Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied
for this dimension, in addition to its midpoint values, following the
`CF conventions <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#cell-boundaries>`__: the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds information.

``axis``:
String, optional. Assumed to be ``T`` if omitted or provided.

Z axis (height/depth, pressure, ...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``standard_name``:
**Required**, string.
`Standard name <http://cfconventions.org/Data/cf-standard-names/72/build/cf-standard-name-table.html>`__ of the
variable as defined by the `CF conventions <http://cfconventions.org/>`__, or a commonly used synonym as employed in
the CMIP6 MIP tables.

``units``:
The units the diagnostic expects that coordinate to be in (using the syntax of the
`UDUnits library <https://www.unidata.ucar.edu/software/udunits/udunits-2.0.4/udunits2lib.html#Syntax>`__). This is
optional: if not given, the framework will assume you want CF convention
Optional, a :ref:`CFunit<cfunit>`. Units the diagnostic expects the dimension to be in. **If not provided, the
framework will assume CF convention**
`canonical units <http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html>`__.

In addition, any vertical (Z axis) dimension must specify:
``positive``:
String, **required**. Must be ``"up"`` or ``"down"``, according to the
`CF conventions <http://cfconventions.org/faq.html#vertical_coords_positive_attribute>`__.
A pressure axis is always ``"down"`` (increasing values are closer to the center of the earth), but this is not set
automatically.

``need_bounds``:
Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied
for this dimension, in addition to its midpoint values, following the
`CF conventions <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#cell-boundaries>`__:
the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds
information.

``positive``:
Either ``"up"`` or ``"down"``, according to the
`CF conventions <http://cfconventions.org/faq.html#vertical_coords_positive_attribute>`__. A pressure axis is always
``"down"`` (increasing values are closer to the center of the earth).
``axis``:
String, optional. Assumed to be ``Z`` if omitted or provided.

Other dimensions (wavelength, ...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``standard_name``:
**Required**, string. `Standard name <http://cfconventions.org/Data/cf-standard-names/72/build/cf-standard-name-table.html>`__
of the variable as defined by the `CF conventions <http://cfconventions.org/>`__, or a commonly used synonym as
employed in the CMIP6 MIP tables.

``units``:
Optional, a :ref:`CFunit<cfunit>`. Units the diagnostic expects the dimension to be in. **If not provided, the framework will assume CF convention** `canonical units <http://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html>`__.

``need_bounds``:
Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied
for this dimension, in addition to its midpoint values, following the
`CF conventions <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#cell-boundaries>`__:
the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds
information.

.. _sec_varlist:

Expand Down Expand Up @@ -227,4 +317,3 @@ variable. Most settings here are optional, but the main ones are:

In order to request multiple slices (e.g. wind velocity on multiple pressure levels, with each level saved to a
different file), create one varlist entry per slice.

85 changes: 51 additions & 34 deletions doc/sphinx/ref_envvars.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
MDTF Environment variables
==========================
MDTF-diagnostics Environment variables
======================================

This page describes the environment variables that the framework will set for your diagnostic when it's run.
This page describes the environment variables that the framework will set for your diagnostic when it's run.

Overview
--------

The MDTF framework can be viewed as a "wrapper" for your code that handles data fetching and munging.
The MDTF-diagnostics framework can be viewed as a "wrapper" for your code that handles data fetching and munging.
Your code communicates with this wrapper in two ways:

- The :doc:`settings file <./pod_settings>` is where your code talks to the framework: when you write your code,
you document what model data your code uses (not covered on this page, follow the link for details).
- The framework "talks" to a POD through a combination of shell environment variables passed directly to the subprocess
via the `env` parameter, and by defining a `case_info.yml` file in the case workdir with case-specific environment
variables.The framework communicates **all** runtime information this way: this is in order to 1) pass information
via the `env` parameter, and by defining a `case_info.yml` file in the `$WORK_DIR` with case-specific environment
variables. The framework communicates **all** runtime information this way: this is in order to 1) pass information
in a language-independent way, and 2) to make writing diagnostics easier (i.e., the POD does not need to parse
command-line settings).

Expand All @@ -25,22 +25,26 @@ Also note that names of environment variables are case-sensitive.

Paths
-----

``OBS_DATA``:
Path to the top-level directory containing any observational or reference data you've provided as the author of your
diagnostic. Any data your diagnostic uses that doesn't come from the model being analyzed should go here
(i.e., you supply it to the framework maintainers, they host it, and the user downloads it when they install the
framework). The framework will ensure this is copied to a local filesystem when your diagnostic is run, but this
directory should be treated as **read-only**.

``POD_HOME``:
Path to the top-level directory containing your diagnostic's source code. This will be of the form
``.../MDTF-diagnostics/diagnostics/<your POD's name>``. This can be used to call sub-scripts from your diagnostic's
driver script. This directory should be treated as **read-only**.

``WORK_DIR``:
Path to your diagnostic's *working directory*, which is where all output data should be written
(as well as any temporary files).
The following variables are accessed using the ``os.environ`` method:
``OBS_DATA``:
Path to the top-level directory containing any observational or reference data you've provided as the author of your
diagnostic. Any data your diagnostic uses that doesn't come from the model being analyzed should go here
(i.e., you supply it to the framework maintainers, they host it, and the user downloads it when they install the
framework). The framework will ensure this is copied to a local filesystem when your diagnostic is run, but this
directory should be treated as **read-only**.

``POD_HOME``:
Path to the top-level directory containing your diagnostic's source code. This will be of the form
``.../MDTF-diagnostics/diagnostics/<your POD's name>``. This can be used to call sub-scripts from your diagnostic's
driver script. This directory should be treated as **read-only**.

``DATA_DIR``:
(retained for backwards compatibility with v3.5 and earlier PODs) location of the model
input data directory.

``WORK_DIR``:
Path to your diagnostic's *working directory*, which is where all output data should be written
(as well as any temporary files).

The framework creates the following subdirectories within this directory:

Expand All @@ -51,27 +55,39 @@ Paths

Model run information
---------------------

``CASENAME``:
User-provided label describing the run of model data being analyzed.

``startdate``, ``enddate``:
Four-digit years describing the analysis period.

``case_env_file``:
location of the yaml file with case-specific environment variables accessed by calling
``os.environ[`case_env_file`]``. The following environment variables are loaded into a dictionary
from the case environment file:

``CATALOG_FILE``:
path to the esm-intake catalog header json file used to access the data catalog of
processed data files generated by the framework. If ``no_pp`` is specified at runtime, and no custom
preprocessing scripts are run on the input dataset, ``CATALOG_FILE`` is the path to input data catalog
specified with the ``DATA_CATALOG`` parameter in the runtime configuration file.

``CASENAME``:
User-provided label describing each run of model data being analyzed. Single-run PODs submitted to version 3.5 and
earlier of the framework directly access this variable with ``os.environ['CASENAME']``.

``startdate``, ``enddate``:
Strings in the format <yyyymmdd> or <yyyymmddHHMMSS> describing the start and end dates of the
analysis period for a case associated with ``CASENAME``. Single-run PODs submitted to version 3.5 and
earlier of the framework directly access this variable with ``os.environ['startdate]`` and ``os.environ['enddate]``.

Locations of model data files
-----------------------------

These are set depending on the data your diagnostic requests in its :doc:`settings file <./pod_settings>`. Refer to the
examples below if you're unfamiliar with how that file is organized.
The processed model data files are written to the `$WORK_DIR` and accessed via the esm-intake catalog
output by the framework, or by the original catalog passed to the framework at runtime if no preprocessing
is performed via the ``CATALOG_FILE`` environment variable in the ``case_env_file``

Names of variables and dimensions
---------------------------------

These are set depending on the data your diagnostic requests in its :doc:`settings file <./pod_settings>`. Refer to
the examples below if you're unfamiliar with how that file is organized.


Simple example
--------------

Expand All @@ -98,8 +114,7 @@ We only give the relevant parts of the :doc:`settings file <ref_settings>` below
}
}
The framework will set the following environment variables:
The framework will set the following environment variables in the ``case_env_file``:

#. ``lat_coord``: Name of the latitude dimension in the model's native format
#. ``lon_coord``: Name of the longitude dimension in the model's native format
Expand All @@ -108,3 +123,5 @@ The framework will set the following environment variables:
#. ``PR_FILE`` (retained for backwards compatibility): Absolute path to the file containing
``pr`` data, e.g. ``/dir/precip.nc``.

As with ``CASENAME``, ``startdate``, and ``enddate``, the variable-specific environment variables are
accessed with the ``os.environ`` method in single-run PODs from framework versions older than v4.0.
Loading

0 comments on commit 112ec89

Please sign in to comment.