Skip to content

Commit

Permalink
Include nco-yaml representations of datasets (and script)
Browse files Browse the repository at this point in the history
As an alternate approach to using nccsv source of truth datasets
in #51, instead we store text yaml representations of the
gold standard datasets based on nco-json output (nco-yaml if you will,
which is more readable than json due to lack of brackets).

Also included is a script to easily (re-)generate the yaml
representations from the source of truth NetCDF files, and
to check if the yaml representations are up to date
(possibly used in a future pre-commit hook, GitHub action, etc).
  • Loading branch information
srstsavage committed Dec 13, 2023
1 parent 479a1dc commit 6f16e58
Show file tree
Hide file tree
Showing 7 changed files with 1,504 additions and 3 deletions.
4 changes: 2 additions & 2 deletions DasDds.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ docker run --rm -it \
-v "$(pwd)/datasets:/datasets" \
-v "$(pwd)/logs:/erddapData/logs" \
-v "$(pwd)/erddap/content:/usr/local/tomcat/content/erddap" \
axiom/docker-erddap:latest \
-e ERDDAP_flagKeyKey=flag-key-not-needed-for-dasdds \
axiom/docker-erddap:2.23-jdk17-openjdk \
bash -c "cd webapps/erddap/WEB-INF/ && bash DasDds.sh"

10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,13 @@ Uses the [ERDDAP Docker image](https://github.com/axiom-data-science/docker-erdd
You can view this setup live at <https://standards.sensors.ioos.us/erddap/index.html>.

Documentation can be found at <https://ioos.github.io/erddap-gold-standard/>.

# Datasets

The gold standard datasets are
[IOOS Metadata Profile](https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html)
compliant NetCDF files stored in the `./datasets` directory.

Yaml text representations of the NetCDF metadata (based on nco-json output) are also
included for easy discoverability and access in the `./datasets-yml` directory.
See the `./datasets-yml/datasets.yml.sh` script for more details.
62 changes: 62 additions & 0 deletions datasets-yml/datasets.yml.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/bin/bash
# generate nco-yaml (readable yaml representation of nco-json)
# representations of gold standard datasets
# usage:
# generate nco-yaml files from datasets/*.nc files:
# ./datasets-yml/datasets.yml.sh
# check that nco-yaml files are up-to-date:
# ./datasets-yml/datasets.yml.sh -c
#
# requirements:
# ncks/nco (https://nco.sourceforge.net/nco.html)
# yq v4 (https://mikefarah.gitbook.io/yq/)

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
cd "$DIR"

CHECK=0
while getopts ":c" opt; do
case ${opt} in
c )
CHECK=1
;;
\? )
echo "Invalid option: $OPTARG" 1>&2
;;
: )
echo "Invalid option: $OPTARG requires an argument" 1>&2
;;
esac
done
shift $((OPTIND -1))

# check for required commands
for c in ncks yq; do
if ! command -v $c &> /dev/null; then
echo "Command $c is required but missing" >&2
exit 1
fi
done

# write nco-yaml for each dataset
for d in ../datasets/*.nc; do
YML="$(basename $d).yml"
NCO_YAML="$(ncks -mM --json "$d" | yq -p=json)"
if [ "$CHECK" == "1" ]; then
# if in check mode, make sure yaml files exist for all datasets
# and don't differ from generated output
if [ ! -f "$YML" ]; then
echo "$d doesn't have an existing yml file">&2
exit 1
fi
NC_CHECKSUM=$(echo -n "$NCO_YAML" | md5sum | cut -f1 -d\ )
YAML_CHECKSUM=$(md5sum $YML | cut -f1 -d\ )
if [ "$NC_CHECKSUM" != "$YAML_CHECKSUM" ]; then
echo "$d metadata checksum differs from existing yaml file">&2
exit 1
fi
else
# if not in check mode, write nco-yaml
echo -n "$NCO_YAML" > "$YML"
fi
done
Loading

0 comments on commit 6f16e58

Please sign in to comment.