diff --git a/doc/sphinx/dev_git_intro.rst b/doc/sphinx/dev_git_intro.rst index 3ffd54b19..b166f5710 100644 --- a/doc/sphinx/dev_git_intro.rst +++ b/doc/sphinx/dev_git_intro.rst @@ -177,9 +177,6 @@ This method requires adding the *NOAA-GFDL/MDTF-diagnostics* repo to the *.git/c and is described in the GitHub discussion post `Working with multiple remote repositories in your git config file `__. - -.. (TODO: `pip install -v .`, other installation instructions...) - .. _ref-rebase: Updating your POD branch by rebasing it onto the main branch (a bit more difficult than merging, but cleaner) diff --git a/doc/sphinx/dev_start.rst b/doc/sphinx/dev_start.rst index 081bf5920..52fda7007 100644 --- a/doc/sphinx/dev_start.rst +++ b/doc/sphinx/dev_start.rst @@ -15,6 +15,7 @@ Obtaining the source code POD developers should create their branches from the `main branch `__ of the framework code + .. code-block:: console git checkout -b [POD branch name] main @@ -37,6 +38,7 @@ scripting languages, including, `R `__, Python-based PODs should be written in Python 3.11 or newer. We provide a developer version of the python3_base environment (described below) that includes Jupyter and other developer-specific tools. This is not installed by default, and must be requested by passing the ``--all`` flag to the conda setup script: If you are using Anaconda or miniconda to manage the conda environments, run: + .. code-block:: console % cd $CODE_ROOT @@ -45,6 +47,7 @@ If you are using Anaconda or miniconda to manage the conda environments, run: Installing dependencies with Micromamba ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Micromamba is a lightweight version of Anaconda. It is required to install the base and python3_base conda enviroments on macOS machines with Apple M-series chips. Installation instructions are available in the `Micromamba Documentation `__, @@ -146,7 +149,7 @@ list (referred to as $POD_NAME): a). If purely Python 3, the framework will look for ``src/conda/env_python3_base.yml`` and check its content to determine whether the POD's requirements are met, and then switch to ``_MDTF_python3_base`` and run the POD. - b). Similarly, if NCL or R is used, then ``NCL_base`` or ``R_base`` environment will be activated at runtime. + b). Similarly, if NCL or R is used, then ``NCL_base`` or ``R_base`` environment will be activated at runtime. Note that for the 6 existing PODs depending on NCL (EOF_500hPa, MJO_prop_amp, MJO_suite, MJO_teleconnection, precip_diurnal_cycle, and Wheeler_Kiladis), Python is also used but merely as a wrapper. Thus the framework will @@ -157,10 +160,10 @@ to selectively install conda environments using the ``--env`` flag (:ref:`ref-co the environments needed for the PODs you're interested in, and that ``_MDTF_base`` is mandatory for the framework's operation. -- For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs`` - requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag`` - (because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag`` - to the ``--env`` flag one at a time (:ref:`ref-conda-install`). +For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs`` +requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag`` +(because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag`` +to the ``--env`` flag one at a time (:ref:`ref-conda-install`). Testing with a new Conda environment @@ -187,14 +190,14 @@ correctly lists the language(s) (in case of updating an existing environment). Or, if using micromamba: - .. code-block:: console + .. code-block:: console - % cd $CODE_ROOT - % ./src/conda/conda_env_setup.sh --env $your_POD_short_name --micromamba_root $MICROMAMBA_ROOT --env_dir $CONDA_ENV_DIR + % cd $CODE_ROOT + % ./src/conda/conda_env_setup.sh --env $your_POD_short_name --micromamba_root $MICROMAMBA_ROOT --env_dir $CONDA_ENV_DIR Have the framework run your POD on suitable test data. 1. Add your POD's short name to the ``pod_list`` section of the configuration input file - (template: ``templates/runtime_config.[jsonc | yml]``). + (template: ``templates/runtime_config.[jsonc | yml]``). 2. Prepare the test data as described in :doc:`start_config`. diff --git a/doc/sphinx/pod_settings.rst b/doc/sphinx/pod_settings.rst index 1bb0c3247..3935e0698 100644 --- a/doc/sphinx/pod_settings.rst +++ b/doc/sphinx/pod_settings.rst @@ -106,6 +106,13 @@ This is where you describe your diagnostic and list the programs it needs to run `math `__. If no third-party libraries are needed, the value should be an empty list. +``pod_env_vars``: + :ref:`object`, optional. Names and values of shell environment variables used by your diagnostic, + *in addition* to those supplied by the framework. The user can't change these at runtime, but this can be used to set + site-specific installation settings for your diagnostic (eg, switching between low- and high-resolution observational + data depending on what the user has chosen to download). Note that environment variable values must be provided as + strings. + Data section ------------ @@ -127,22 +134,105 @@ This section is where you list the dimensions (coordinate axes) your variables a key-value pair, where the key is the name your diagnostic uses for that dimension internally, and the value is a list of settings describing that dimension. In order to be unambiguous, all dimensions must specify at least: -``standard_name``: - The CF `standard name `__ for - that coordinate. +Latitude and Longitude +^^^^^^^^^^^^^^^^^^^^^^ + +``standard_name``: + **Required**, string. Must be ``"latitude"`` and ``"longitude"``, respectively. + +``units``: + Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. Currently the framework only + supports decimal ``degrees_north`` and ``degrees_east``, respectively. + +``range``: + :ref:`Array` (list) of two numbers. Optional. If given, specifies the range of values the diagnostic expects + this dimension to take. For example, ``"range": [-180, 180]`` for longitude will have the first entry of the longitude + variable in each data file be near -180 degrees (not exactly -180, because dimension values are cell midpoints), and + the last entry near +180 degrees. + +``need_bounds``: + Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied + for this dimension, in addition to its midpoint values, following the + `CF conventions `__: + the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds + information. + +``axis``: + String, optional. Assumed to be ``Y`` and ``X`` respectively if omitted, or if ``standard_name`` is + ``"latitude"`` or ``"longitude"``. Included here to enable future support for non-lat-lon horizontal coordinates. + +Time +^^^^ + +``standard_name``: + **Required**. Must be ``"time"``. + +``units``: + String. Optional, defaults to "day". Units the diagnostic expects the dimension to be in. Currently the diagnostic + only supports time axes of the form " since ", and the value given here is interpreted in this + sense (e.g. settings this to "day" would accommodate a dimension of the form "[decimal] days since 1850-01-01".) + +``calendar``: + String, Optional. One of the CF convention + `calendars `__ or + the string ``"any"``. **Defaults to "any" if not given**. Calendar convention used by your diagnostic. Only affects + the number of days per month. + +``need_bounds``: + Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied + for this dimension, in addition to its midpoint values, following the + `CF conventions `__: the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds information. + +``axis``: + String, optional. Assumed to be ``T`` if omitted or provided. + +Z axis (height/depth, pressure, ...) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``standard_name``: + **Required**, string. + `Standard name `__ of the + variable as defined by the `CF conventions `__, or a commonly used synonym as employed in + the CMIP6 MIP tables. ``units``: - The units the diagnostic expects that coordinate to be in (using the syntax of the - `UDUnits library `__). This is - optional: if not given, the framework will assume you want CF convention + Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. **If not provided, the + framework will assume CF convention** `canonical units `__. -In addition, any vertical (Z axis) dimension must specify: +``positive``: + String, **required**. Must be ``"up"`` or ``"down"``, according to the + `CF conventions `__. + A pressure axis is always ``"down"`` (increasing values are closer to the center of the earth), but this is not set + automatically. + +``need_bounds``: + Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied + for this dimension, in addition to its midpoint values, following the + `CF conventions `__: + the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds + information. -``positive``: - Either ``"up"`` or ``"down"``, according to the - `CF conventions `__. A pressure axis is always - ``"down"`` (increasing values are closer to the center of the earth). +``axis``: + String, optional. Assumed to be ``Z`` if omitted or provided. + +Other dimensions (wavelength, ...) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``standard_name``: + **Required**, string. `Standard name `__ + of the variable as defined by the `CF conventions `__, or a commonly used synonym as + employed in the CMIP6 MIP tables. + +``units``: + Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. **If not provided, the framework will assume CF convention** `canonical units `__. + +``need_bounds``: + Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied + for this dimension, in addition to its midpoint values, following the + `CF conventions `__: + the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds + information. .. _sec_varlist: @@ -227,4 +317,3 @@ variable. Most settings here are optional, but the main ones are: In order to request multiple slices (e.g. wind velocity on multiple pressure levels, with each level saved to a different file), create one varlist entry per slice. - diff --git a/doc/sphinx/ref_envvars.rst b/doc/sphinx/ref_envvars.rst index 5435e6f56..ed4fc4f84 100644 --- a/doc/sphinx/ref_envvars.rst +++ b/doc/sphinx/ref_envvars.rst @@ -1,19 +1,19 @@ -MDTF Environment variables -========================== +MDTF-diagnostics Environment variables +====================================== -This page describes the environment variables that the framework will set for your diagnostic when it's run. +This page describes the environment variables that the framework will set for your diagnostic when it's run. Overview -------- -The MDTF framework can be viewed as a "wrapper" for your code that handles data fetching and munging. +The MDTF-diagnostics framework can be viewed as a "wrapper" for your code that handles data fetching and munging. Your code communicates with this wrapper in two ways: - The :doc:`settings file <./pod_settings>` is where your code talks to the framework: when you write your code, you document what model data your code uses (not covered on this page, follow the link for details). - The framework "talks" to a POD through a combination of shell environment variables passed directly to the subprocess - via the `env` parameter, and by defining a `case_info.yml` file in the case workdir with case-specific environment - variables.The framework communicates **all** runtime information this way: this is in order to 1) pass information + via the `env` parameter, and by defining a `case_info.yml` file in the `$WORK_DIR` with case-specific environment + variables. The framework communicates **all** runtime information this way: this is in order to 1) pass information in a language-independent way, and 2) to make writing diagnostics easier (i.e., the POD does not need to parse command-line settings). @@ -25,22 +25,26 @@ Also note that names of environment variables are case-sensitive. Paths ----- - -``OBS_DATA``: - Path to the top-level directory containing any observational or reference data you've provided as the author of your - diagnostic. Any data your diagnostic uses that doesn't come from the model being analyzed should go here - (i.e., you supply it to the framework maintainers, they host it, and the user downloads it when they install the - framework). The framework will ensure this is copied to a local filesystem when your diagnostic is run, but this - directory should be treated as **read-only**. - -``POD_HOME``: - Path to the top-level directory containing your diagnostic's source code. This will be of the form - ``.../MDTF-diagnostics/diagnostics/``. This can be used to call sub-scripts from your diagnostic's - driver script. This directory should be treated as **read-only**. - -``WORK_DIR``: - Path to your diagnostic's *working directory*, which is where all output data should be written - (as well as any temporary files). +The following variables are accessed using the ``os.environ`` method: + ``OBS_DATA``: + Path to the top-level directory containing any observational or reference data you've provided as the author of your + diagnostic. Any data your diagnostic uses that doesn't come from the model being analyzed should go here + (i.e., you supply it to the framework maintainers, they host it, and the user downloads it when they install the + framework). The framework will ensure this is copied to a local filesystem when your diagnostic is run, but this + directory should be treated as **read-only**. + + ``POD_HOME``: + Path to the top-level directory containing your diagnostic's source code. This will be of the form + ``.../MDTF-diagnostics/diagnostics/``. This can be used to call sub-scripts from your diagnostic's + driver script. This directory should be treated as **read-only**. + + ``DATA_DIR``: + (retained for backwards compatibility with v3.5 and earlier PODs) location of the model + input data directory. + + ``WORK_DIR``: + Path to your diagnostic's *working directory*, which is where all output data should be written + (as well as any temporary files). The framework creates the following subdirectories within this directory: @@ -51,19 +55,32 @@ Paths Model run information --------------------- - -``CASENAME``: - User-provided label describing the run of model data being analyzed. - -``startdate``, ``enddate``: - Four-digit years describing the analysis period. - +``case_env_file``: + location of the yaml file with case-specific environment variables accessed by calling + ``os.environ[`case_env_file`]``. The following environment variables are loaded into a dictionary + from the case environment file: + + ``CATALOG_FILE``: + path to the esm-intake catalog header json file used to access the data catalog of + processed data files generated by the framework. If ``no_pp`` is specified at runtime, and no custom + preprocessing scripts are run on the input dataset, ``CATALOG_FILE`` is the path to input data catalog + specified with the ``DATA_CATALOG`` parameter in the runtime configuration file. + + ``CASENAME``: + User-provided label describing each run of model data being analyzed. Single-run PODs submitted to version 3.5 and + earlier of the framework directly access this variable with ``os.environ['CASENAME']``. + + ``startdate``, ``enddate``: + Strings in the format or describing the start and end dates of the + analysis period for a case associated with ``CASENAME``. Single-run PODs submitted to version 3.5 and + earlier of the framework directly access this variable with ``os.environ['startdate]`` and ``os.environ['enddate]``. Locations of model data files ----------------------------- -These are set depending on the data your diagnostic requests in its :doc:`settings file <./pod_settings>`. Refer to the -examples below if you're unfamiliar with how that file is organized. +The processed model data files are written to the `$WORK_DIR` and accessed via the esm-intake catalog +output by the framework, or by the original catalog passed to the framework at runtime if no preprocessing +is performed via the ``CATALOG_FILE`` environment variable in the ``case_env_file`` Names of variables and dimensions --------------------------------- @@ -71,7 +88,6 @@ Names of variables and dimensions These are set depending on the data your diagnostic requests in its :doc:`settings file <./pod_settings>`. Refer to the examples below if you're unfamiliar with how that file is organized. - Simple example -------------- @@ -98,8 +114,7 @@ We only give the relevant parts of the :doc:`settings file ` below } } - -The framework will set the following environment variables: +The framework will set the following environment variables in the ``case_env_file``: #. ``lat_coord``: Name of the latitude dimension in the model's native format #. ``lon_coord``: Name of the longitude dimension in the model's native format @@ -108,3 +123,5 @@ The framework will set the following environment variables: #. ``PR_FILE`` (retained for backwards compatibility): Absolute path to the file containing ``pr`` data, e.g. ``/dir/precip.nc``. +As with ``CASENAME``, ``startdate``, and ``enddate``, the variable-specific environment variables are +accessed with the ``os.environ`` method in single-run PODs from framework versions older than v4.0. diff --git a/doc/sphinx/ref_settings.rst b/doc/sphinx/ref_settings.rst deleted file mode 100644 index 4174886c8..000000000 --- a/doc/sphinx/ref_settings.rst +++ /dev/null @@ -1,483 +0,0 @@ -Diagnostic settings file format -=============================== - -The settings file is how your diagnostic tells the framework what it needs to run, in terms of software and model data. - -Each diagnostic must contain a text file named ``settings.jsonc`` in the -`JSON `__ -format, with the addition that any text to the right of ``//`` is treated as a comment and ignored -(sometimes called the "JSONC" format). - -Brief summary of JSON ---------------------- - -We'll briefly summarize subset of JSON syntax used in this configuration file. The file's JSON expressions are built -up out of *items*, which may be either - -1. a boolean, taking one of the values ``true`` or ``false`` (lower-case, with no quotes). -2. a number (integer or floating-point). -3. a case-sensitive string, which must be delimited by double quotes. - -In addition, for the purposes of the configuration file we define - -.. _time_duration: - - - - **In addition**, the string ``"any"`` may be used to signify that any value is acceptable. - -.. _cfunit: - -5. a "CF unit": this is a string describing the units of a physical quantity, following the -`syntax `__ of the -`UDUNITS2 `__ library. -``1`` should be used for dimensionless quantities. - -Items are combined in compound expressions of two types: - -.. _array: - -6. *arrays*, which are one-dimensional ordered lists delimited with square brackets. Entries can be of any type, -e.g. ``[true, 1, "two"]``. - -.. _object: - -7. *objects*, which are *un*-ordered lists of key:value pairs separated by colons and delimited with curly brackets. -Keys must be strings and must all be unique within the object, while values may be any expression, e.g. -``{"red": 0, "green": false, "blue": "bagels"}``. - -Compound expressions may be nested within each other to an arbitrary depth. - -File organization ------------------ - -.. code-block:: js - - { - "settings" : { - <...properties describing the diagnostic..> - }, - "data" : { - <...properties for all requested model data...> - }, - "dimensions" : { - "my_first_dimension": { - <...properties describing this dimension...> - }, - "my_second_dimension": { - <...properties describing this dimension...> - }, - ... - }, - "varlist" : { - "my_first_variable": { - <...properties describing this variable...> - }, - "my_second_variable": { - <...properties describing this variable...> - }, - ... - } - } - - -At the top level, the settings file is an :ref:`object` containing four required entries, described in detail -below. - -- :ref:`settings`: properties that label the diagnostic and describe its runtime requirements. -- :ref:`data`: properties that apply to all the data your diagnostic is requesting. -- :ref:`dimensions`: properties that apply to the dimensions -(in `netCDF `__ terminology) -of the model data. Each distinct dimension (coordinate axis) of the data being requested should be listed as a separate -entry here. -- :ref:`varlist`: properties that describe the individual variables your diagnostic operates on. -Each variable should be listed as a separate entry here. - - -.. _sec_settings: - -Settings section ----------------- - -This section is an :ref:`object` containing properties that label the diagnostic and describe its runtime -requirements. - -Example -^^^^^^^ - -.. code-block:: js - - "settings" : { - "long_name" : "Effect of X on Y diagnostic", - "driver" : "my_script.py", - "runtime_requirements": { - "python": ["numpy", "matplotlib", "netCDF4", "cartopy"], - "ncl": ["contributed", "gsn_code", "gsn_csm"] - }, - "pod_env_vars" : { - // RES: Spatial Resolution (degree) for Obs Data (0.25, 0.50, 1.00). - "RES": "1.00" - } - } - - -Diagnostic description -^^^^^^^^^^^^^^^^^^^^^^ - -``long_name``: - String, **required**. Human-readable display name of your diagnostic. This is the text used to describe your diagnostic - on the top-level index.html page. It should be in sentence case (capitalize first word and proper nouns only) and omit - any punctuation at the end. - -``driver``: - String, **required**. Filename of the top-level driver script the framework should call to run your diagnostic's - analysis. - - -Diagnostic runtime -^^^^^^^^^^^^^^^^^^ - -``runtime_requirements``: - :ref:`object`, **required**. Programs your diagnostic needs to run (for example, scripting language - interpreters) and any third-party libraries needed in those languages. Each executable should be listed in a separate - key-value pair: - - - The *key* is the name of the required executable, e.g. languages such as "`python `__" or - "`ncl `__" etc. but also any utilities such as "`ncks `__", - "`cdo `__", etc. - - The *value* corresponding to each key is an :ref:`array` (list) of strings, which are names of third-party - libraries in that language that your diagnostic needs. You do *not* need to list standard libraries or scripts that - are provided in a standard installation of your language: eg, in python, you need to list - `numpy `__ but not `math `__. If no third-party - libraries are needed, the value should be an empty list. - - -``pod_env_vars``: - :ref:`object`, optional. Names and values of shell environment variables used by your diagnostic, - *in addition* to those supplied by the framework. The user can't change these at runtime, but this can be used to set - site-specific installation settings for your diagnostic (eg, switching between low- and high-resolution observational - data depending on what the user has chosen to download). Note that environment variable values must be provided as - strings. - - -.. _sec_data: - -Data section ------------- - -This section is an :ref:`object` containing properties that apply to all the data your diagnostic is requesting. - -Example -^^^^^^^ - -.. code-block:: js - - "data": { - "format": "netcdf4_classic", - "realm": "atmos", - "frequency": "3hr", - "min_frequency": "1hr", - "max_frequency": "6hr", - "min_duration": "5yr", - "max_duration": "any" - } - - -Example -^^^^^^^ - -``format``: - String. Optional: assumed ``"any_netcdf_classic"`` if not specified. Specifies the format(s) of *model* data your - diagnostic is able to read. As of this writing, the framework only supports retrieval of netCDF or Zarr formats, so - only the following values are allowed: - - - ``"any_netcdf"`` includes all of: - - - ``"any_netcdf3"`` includes all of: - - - ``"netcdf3_classic"`` (CDF-1, files restricted to < 2 Gb) - - ``"netcdf3_64bit_offset"`` (CDF-2) - - ``"netcdf3_64bit_data"`` (CDF-5) - - - ``"any_netcdf4"`` includes all of: - - - ``"netcdf4_classic"`` - - ``"netcdf4"`` - - - ``"any_netcdf_classic"`` includes all the above *except* ``"netcdf4"`` (classic data model only). - - See the `netCDF FAQ `__ for information on the distinctions. Any recent version of a supported language for diagnostics with netCDF support will be able to read all of these. However, the extended features of the ``"netcdf4"`` data model are not commonly used in practice and currently only supported at a beta level in NCL, which is why we've chosen ``"any_netcdf_classic"`` as the default. - - -``realm``: - String or :ref:`array` (list) of strings, **required**. One of the eight CMIP6 modeling realms (aerosol, atmos, - atmosChem, land, landIce, ocean, ocnBgchem, seaIce) describing what data your diagnostic uses. If your diagnostic uses - data from multiple realms, list them in an array (e.g. ``["atmos", "ocean"]``). This is used as part of the data - catalog query to help determine which file(s) match the POD's requirements - -``min_duration``, ``max_duration``: - :ref:`Time durations`. Optional: assumed ``"any"`` if not specified. Set minimum and maximum length of - the analysis period for which the diagnostic should be run: this overrides any choices the user makes at runtime. - Some example uses of this setting are: - - - If your diagnostic uses low-frequency (e.g. seasonal) data, you may want to set ``min_duration`` to ensure the - sample size will be large enough for your results to be statistically meaningful. - - On the other hand, if your diagnostic uses high-frequency (e.g. hourly) data, you may want to set ``max_duration`` - to prevent the framework from attempting to download a large volume of data for your diagnostic if the framework is - called with a multi-decadal analysis period. - -The following properties can optionally be set individually for each variable in the varlist -:ref:`section`. If so, they will override the global settings given here. - -.. _dims_ordered: - -``dimensions_ordered``: - Boolean. Optional: assumed ``false`` if not specified. If set to ``true``, the framework will ensure that the - dimensions of each variable's array are given in the same order as listed in ``dimensions``. **If set to false, your - diagnostic is responsible for handling arbitrary dimension orders**: e.g. it should *not* assume that 3D data will be - presented as (time, lat, lon). - -.. _freq_target: - -``frequency``, ``min_frequency``, ``max_frequency``: - :ref:`Time durations`. Time frequency at which the data is provided. Either ``frequency`` or the - min/max pair, or both, is required: - - - If only ``frequency`` is provided, the framework will attempt to obtain data at that frequency. If that's not - available from the data source, your diagnostic will not run. - - If the min/max pair is provided, the diagnostic must be capable of using data at any frequency within that range - (inclusive). **The diagnostic is responsible for determining the frequency** from the data file itself if this option - is used. - - If all three properties are set, the framework will first attempt to find data at ``frequency``. If that's not - available, it will try data within the min/max range, so your code must be able to handle this possibility. - - -.. _sec_dimensions: - -Dimensions section ------------------- - -This section is an :ref:`object` contains properties that apply to the dimensions of model data. "Dimensions" -are meant in the sense of the netCDF -`data model `__, -and "coordinate dimensions" in the CF conventions: informally, they are "coordinate axes" holding the values of -independent variables that the dependent variables are sampled at. - -All :ref:`dimensions` and :ref:`scalar coordinates` referenced by variables in the -varlist section must have an entry in this section. If two variables reference the same dimension, they will be sampled -on the same set of *spatial* values. Different time values are specified with the ``frequency`` attribute on varlist -entries. - -**Note** that the framework currently *only* supports the (simplest and most common) "independent axes" case of the -`CF conventions `__. -In particular, the framework only deals with data on lat-lon grids. - -Example -^^^^^^^ - -.. code-block:: js - - "dimensions": { - "lat": { - "standard_name": "latitude", - "units": "degrees_N", - "range": [-90, 90], - "need_bounds": false - }, - "lon": { - "standard_name": "longitude", - "units": "degrees_E", - "range": [-180, 180], - "need_bounds": false - }, - "plev": { - "standard_name": "air_pressure", - "units": "hPa", - "positive": "down", - "need_bounds": false - }, - "time": { - "standard_name": "time", - "units": "days", - "calendar": "noleap", - "need_bounds": false - } - } - - -Latitude and Longitude -^^^^^^^^^^^^^^^^^^^^^^ - -``standard_name``: - **Required**, string. Must be ``"latitude"`` and ``"longitude"``, respectively. - -``units``: - Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. Currently the framework only - supports decimal ``degrees_north`` and ``degrees_east``, respectively. - -``range``: - :ref:`Array` (list) of two numbers. Optional. If given, specifies the range of values the diagnostic expects - this dimension to take. For example, ``"range": [-180, 180]`` for longitude will have the first entry of the longitude - variable in each data file be near -180 degrees (not exactly -180, because dimension values are cell midpoints), and - the last entry near +180 degrees. - -``need_bounds``: - Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied - for this dimension, in addition to its midpoint values, following the - `CF conventions `__: - the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds - information. - -``axis``: - String, optional. Assumed to be ``Y`` and ``X`` respectively if omitted, or if ``standard_name`` is - ``"latitude"`` or ``"longitude"``. Included here to enable future support for non-lat-lon horizontal coordinates. - -Time -^^^^ - -``standard_name``: - **Required**. Must be ``"time"``. - -``units``: - String. Optional, defaults to "day". Units the diagnostic expects the dimension to be in. Currently the diagnostic - only supports time axes of the form " since ", and the value given here is interpreted in this - sense (e.g. settings this to "day" would accommodate a dimension of the form "[decimal] days since 1850-01-01".) - -``calendar``: - String, Optional. One of the CF convention - `calendars `__ or - the string ``"any"``. **Defaults to "any" if not given**. Calendar convention used by your diagnostic. Only affects - the number of days per month. - -``need_bounds``: - Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied - for this dimension, in addition to its midpoint values, following the - `CF conventions `__: the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds information. - -``axis``: - String, optional. Assumed to be ``T`` if omitted or provided. - -Z axis (height/depth, pressure, ...) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``standard_name``: - **Required**, string. - `Standard name `__ of the - variable as defined by the `CF conventions `__, or a commonly used synonym as employed in - the CMIP6 MIP tables. - -``units``: - Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. **If not provided, the - framework will assume CF convention** - `canonical units `__. - -``positive``: - String, **required**. Must be ``"up"`` or ``"down"``, according to the - `CF conventions `__. - A pressure axis is always ``"down"`` (increasing values are closer to the center of the earth), but this is not set - automatically. - -``need_bounds``: - Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied - for this dimension, in addition to its midpoint values, following the - `CF conventions `__: - the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds - information. - -``axis``: - String, optional. Assumed to be ``Z`` if omitted or provided. - -Other dimensions (wavelength, ...) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``standard_name``: - **Required**, string. `Standard name `__ - of the variable as defined by the `CF conventions `__, or a commonly used synonym as - employed in the CMIP6 MIP tables. - -``units``: - Optional, a :ref:`CFunit`. Units the diagnostic expects the dimension to be in. **If not provided, the framework will assume CF convention** `canonical units `__. - -``need_bounds``: - Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that bounds are supplied - for this dimension, in addition to its midpoint values, following the - `CF conventions `__: - the ``bounds`` attribute of this dimension will be set to the name of another netCDF variable containing the bounds - information. - -``modifier`` (backward: -String, Optional. Used to distinguish variables that are defined on a vertical level that is not a pressure level -(e.g., 2-meter temperature) from variables that are defined on pressure levels. Modfiers are defined in data/modifiers.jsonc. -MDTF-diagnostics currently supports `atmos_height`. - -.. _sec_varlist: - -Varlist section ---------------- - -This section is an :ref:`object` contains properties that apply to the model variables your diagnostic needs for its analysis. "Dimensions" are meant in the sense of the netCDF `data model `__: informally, they are the "dependent variables" whose values are being computed as a function of the values stored in the dimensions. - -**Note** that this includes "auxiliary coordinates" in the CF conventions terminology and similar ancillary information. If your diagnostic needs, eg, cell areas or volumes, orography data, etc., each piece of data should be listed as a separate entry here, *even if* their use is conventionally implied by the use of other variables. - -Each entry corresponds to a distinct data file (or set of files, if ``multi_file_ok`` is ``true``) downloaded by the framework. If your framework needs the same physical quantity sampled with different properties (e.g. slices of a variable at multiple pressure levels), specify them as multiple entries. - -Varlist entry example -^^^^^^^^^^^^^^^^^^^^^ - -.. code-block:: js - - "u500": { - "standard_name": "eastward_wind", - "units": "m s-1", - "realm": "atmos", - "dimensions" : ["time", "lat", "lon"], - "dimensions_ordered": true, - "scalar_coordinates": {"pressure": 500}, - "requirement": "optional", - "alternates": ["another_variable_name", "a_third_variable_name"] - } - - -Varlist entry properties -^^^^^^^^^^^^^^^^^^^^^^^^ - -The *key* in a varlist key-value pair is the name your diagnostic uses to refer to this variable (and must be unique). -The value of the key-value pair is an :ref:`object` containing properties specific to that variable: - -``standard_name``: - String, **required**. `Standard name `__ - of the variable as defined by the `CF conventions `__, or a commonly used synonym as - employed in the CMIP6 MIP tables (e.g. "ua" instead of "eastward_wind"). - - -``units``: - Optional, a :ref:`CFunit`. Units the diagnostic expects the variable to be in. **If not provided, the - framework will assume CF convention** - `canonical units `__. - -``realm": - String, **required**. The CMIP model realm(s) (e.g., atmos, ocean, ice) that the variable belongs to. ``realm`` can be - defined for each variable, or in the `data` section if all POD variables are part of the same model realm(s). - -.. _item_var_dims: - -``dimensions``: - **Required**. List of strings, which must be selected the keys of entries in the :ref:`dimensions` - section. Dimensions of the array containing the variable's data. **Note** that the framework will not reorder - dimensions (transpose) unless ``dimensions_ordered`` is additionally set to ``true``. - -``dimensions_ordered``: - Boolean. Optional: assumed ``false`` if not specified. If ``true``, the framework will ensure that the dimensions of - this variable's array are given in the same order as listed in ``dimensions``. **If set to false, your diagnostic is - responsible for handling arbitrary dimension orders**: e.g. it should *not* assume that 3D data will be presented as - (time, lat, lon). If given here, overrides the values set globally in the ``data`` - section (see :ref:`description` there). - -.. _item_var_coords: - - - -``frequency``, ``min_frequency``, ``max_frequency``: - :ref:`Time durations`. Optional. Time frequency at which the variable's data is provided. - If given here, overrides the values set globally in the ``data`` section (see :ref:`description` there). - diff --git a/doc/sphinx/start_config.rst b/doc/sphinx/start_config.rst index 445cbfcdf..fda73674c 100644 --- a/doc/sphinx/start_config.rst +++ b/doc/sphinx/start_config.rst @@ -20,20 +20,20 @@ data source requires that model data follow one of several recognized variable n the package. The currently recognized conventions are: * ``CMIP``: Variable names and units as used in the -`CMIP6 `__ `data request `__. -There is a `web interface `__ to the request. -Data from any model that has been published as part of CMIP6 -(e.g., made available via `ESGF `__) should follow this convention. + `CMIP6 `__ `data request `__. + There is a `web interface `__ to the request. + Data from any model that has been published as part of CMIP6 + (e.g., made available via `ESGF `__) should follow this convention. * ``CESM``: Variable names and units used in the default output of models developed at the -`National Center for Atmospheric Research `__, such as -`CAM `__ (all versions) and -`CESM2 `__. + `National Center for Atmospheric Research `__, such as + `CAM `__ (all versions) and + `CESM2 `__. * ``GFDL``: Variable names and units used in the default output of models developed at the -`Geophysical Fluid Dynamics Laboratory `__, such as -`AM4 `__, `CM4 `__ and -`SPEAR `__. + `Geophysical Fluid Dynamics Laboratory `__, such as + `AM4 `__, `CM4 `__ and + `SPEAR `__. The names and units for the variables in the model data you're adding need to conform to one of the above conventions in order to be recognized by the LocalFile data source. For models that aren't currently supported, the workaround we @@ -41,13 +41,19 @@ recommend is to generate ``CMIP``-compliant data by postprocessing model output `CMOR `__ tool. We hope to offer support for the naming conventions of a wider range of models in the future. -Adding your model data files -++++++++++++++++++++++++++++ +Generating an ESM-intake catalog of your model dataset +++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +The MDTF-diagnostics uses `intake-ESM `__ catalogs and APIs to access +model datasets and verify POD data requirements. The MDTF-diagnostics package provides a basic +`catalog_builder script `__ +that uses `ecgtools `__ APIs to generate data catalogs. +The NOAA-GFDL workflow team also maintains an `intake-ESM catalog builder +`__ that uses the directory structure to generate data catalogs. +It is optimized for the files stored on GFDL systems, but can be configured to generate catalogs on a local file system. -* <*dataset_name*> is any string uniquely identifying the dataset, -* <*frequency*> is a string describing the frequency at which the data is sampled, e.g. - ``1hr``, ``3hr``, ``6hr``, ``day``, ``mon`` or ``year``. -* <*variable_name*> is the name of the variable in the convention chosen in the previous section. +Adding your observational data files +++++++++++++++++++++++++++++++++++++ If you have observational data you want to analyze available on a locally mounted disk, we recommend creating `symlinks `__ that have the needed filenames, rather than making copies @@ -172,4 +178,3 @@ The output of the package will be saved as a series of web pages in a directory If you run the package multiple times with the same configuration values and **overwrite** set to *false, the suffixes ".v1", ".v2", etc. will be added to duplicate `MDTF_output` directory names. - diff --git a/doc/sphinx/start_install.rst b/doc/sphinx/start_install.rst index b65884671..f81baf18d 100644 --- a/doc/sphinx/start_install.rst +++ b/doc/sphinx/start_install.rst @@ -351,21 +351,26 @@ You can customize either template depending on your preferences; save a copy of <*config_file_path*> and open it in a text editor. The following paths need to be configured before running the framework: -- ``DATA_CATALOG``: set to the path of the ESM-intake data catalog with model input data +- ``DATA_CATALOG``: + set to the path of the ESM-intake data catalog with model input data -- ``OBS_DATA_ROOT``: set to the location of input observational data if you are running PODs that require observational - datasets (e.g., ../inputdata/obs_data). +- ``OBS_DATA_ROOT``: + set to the location of input observational data if you are running PODs that require observational + datasets (e.g., ../inputdata/obs_data). -- ``conda_root`` should be set to the location of your conda installation: the value of <*CONDA_ROOT*> +- ``conda_root``: + should be set to the location of your conda installation: the value of <*CONDA_ROOT*> that was used in :numref:`ref-conda-install` -- ``conda_env_root`` set to the location of the conda environments (should be the same as <*CONDA_ENV_DIR*> in - :numref:`ref-conda-install`) +- ``conda_env_root``: + set to the location of the conda environments (should be the same as <*CONDA_ENV_DIR*> in + :numref:`ref-conda-install`) - ``micromamba_exe``: Set to the full path to micromamba executable on your system if you are using micromamba to manage the conda environments -- Finally, ``OUTPUT_DIR`` should be set to the location you want the output files to be written to +- ``OUTPUT_DIR``: + should be set to the location you want the output files to be written to (default: ``mdtf/wkdir/``; will be created by the framework). The output of each run of the framework will be saved in a different subdirectory in this location.