OceanGliders 1.0 format - Terms of References Navigation: GitHub repository Terms of reference Vocabulary collections |
---|
A working group on harmonization of the glider data format was set up in September 2020 at the request of the OceanGliders steering team. All members of the OceanGliders data management team were invited to join. The three existing formats (at the time) were analysed and compared by the group. Regular technical meetings occurred and the format harmonization process was managed via a Google doc. In late 2021 the group decided to move the document to GitHub to better manage the issues, work asynchronously and contribute to the open source community. The GitHub repository is open to any comments from the community and validation rules were created by the working group (see the governance document). From there the working group organized regular meetings to validate the evolution of the format, discuss issues and reach a consensus on the OG1.0 format. It is envisaged that the format will continually evolve with contributions from the community.
-
Justin Buck, National Oceanography Centre, British Oceanographic Data Centre, UK 0000-0002-3587-4889
-
Guilherme Castelão, Scripps Institution of Oceanography - UC San Diego, USA 0000-0002-6765-0708
-
Emma Gardner, National Oceanography Centre, British Oceanographic Data Centre, UK 0009-0007-9640-7268
-
Daniel Hayes, [OGDMTT co-chair], Cyprus Subsea Consulting and Services C.S.C.S. Ltd, Cyprus 0000-0003-0141-9973
-
John Kerfoot, Institute of Marine & Coastal Sciences, Rutgers University, New Jersey, USA 0000-0003-1405-4248
-
Victor Turpin, [OGDMTT co-chair], OceanGliders Technical Coordinator at OceanOPS, Brest, France 0000-0002-1662-4358
-
Callum Rollo, Voice of the Ocean Foundation, Sweden 0000-0002-5134-7886
-
Jennifer Sevadjian, Scripps Institution of Oceanography - UC San Diego, USA 0000-0001-9286-2043
-
Sören Thomsen, LOCEAN, ISPL, Sorbonne University, Paris, France, 0000-0002-0598-8340
-
Pierre Testor, National Centre for Scientific Research, France 0000-0002-8038-9479
-
Juan Gabriel Fernández Pineda, SOCIB, Mallorca, Spain 0000-0002-8732-7887
-
Thierry Carval, IFREMER, Brest, France 0000-0003-2700-4020
The OceanGliders program brings together marine scientists and glider operators from all over the world who observe the long-term physical, biogeochemical, biological ocean processes, and phenomena relevant for societal applications. It allows active coordination and strengthening the roles of gliders in the ocean observation programs worldwide and contributes to the present international efforts for ocean observation for climate, ocean health and real-time services.
The program oversees the monitoring of global glider activity, a prerequisite for active coordination. By sharing requirements, efforts and scientific knowledge needed for glider data collection OceanGliders aims to continuously develop the network by supporting the dissemination of glider data in global databases, in real-time and delayed mode, for a wider community.
The OceanGliders program was created about 10 years after the popularization of the use of gliders by ocean scientists. With no common rules on format, data managers from Australia (IMOS), the USA (IOOS), and Europe (Coriolis) processed 3 regional formats that are not interoperable.
Harmonization toward interoperability within the existing formats and many national/regional glider networks is a recommendation from the OceanGliders steering team to strengthen the program and reach the FAIR principles (Findable, Accessible, Interoperable, Reusable) adopted by GOOS (Global Ocean Observing System), and better monitor the program activity.
-
The required granularity of the data set is the glider mission, starting from deployment at sea to recovery.
-
Data are recorded as a trajectory Discrete Geometry, using NetCDF[1] system and following CF 1.10[2] (Climate and Forecast) specifications. Each data file contains a series of dive cycles representing the mission of the glider. It can be produced in near real time after every glider transmission and revised later into a recovery-mode (when glider on shore and any data gaps filled in) or a delayed-mode (after rigorous QC) version.
-
Format follows the ACDD 1.3[3] convention where possible.
-
Vocabulary collections will be hosted in different places (NERC Vocabulary Server - NVS, OceanOPS, ICES, etc). The OceanGliders data management team will manage (additions, updates, etc.) the collections. Treatment of vocabularies is detailed in the vocabularies document.
-
OG1.0 defines some variables and definitions, as described in this document. Other types of measurements (e.g. intermediate parameters, technical measurements, other variables) not framed by OG1.0 can be included in OG1.0 data files. No control will be applied to those measurements.
-
Geospatial variables and along-track positioning variables are mandatory.
-
Variables use the standard_name attribute following CF 1.10 taken from the CF standard names table[4].
-
Methodologies used to interpolate data, compute along-track positioning variables, apply QC etc. Should be linked and, where possible, follow community best practices. It is encouraged to use unique resource identifiers (uris) to link to these resources.
-
3 recommendations level have been defined for OG1.0:
-
Mandatory: Minimum metadata set to be compliant with OG1.0 requirement.
-
Highly desirable: Worth having for complete use of the data set.
-
Suggested: If the information is available.
-
-
A DOI can be minted at any level (PI, Reference Data Center, Data Assembly Center, Global Data Assembly Center) following the internal policy of data curation.
-
A DOI can be minted for a single glider mission or multiple glider missions (i.e. project, reference lines).
-
A DOI if included in OG1.0 files needs to be preserved. The DOI must remain unchanged if there is no valuable modification. If valuable information is aggregated/added or a new product produced, a new DOI should be created and the new DOI should link to the original DOI to acknowledge as the source
-
The most effective way of preserving the integrity of the source citation is to preserve the initial DOI added in the OG1.0 file.
-
Data files should be named as follows:
-
"<id>.nc" (ex : "sp065_20210616T143025_R.nc")
-
Where <id> = "<platform_serial>_<start_date>_<data_mode>" (ex : "sp065_20210616T143025_R")
In this case:
-
<platform_serial> = "sp065" a Spray glider number 065
-
<start_date> = "20210616T143025" The datetime in seconds precision 2021-06-16 14:30:25 formatted following ISO 8601
-
<data_mode> = "R" for near real time data
The global attribute section is used for data discovery. The following global attributes should appear in the global section. The NetCDF Climate and Forecast (CF) Metadata Conventions are available from: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#trajectory-data. It is recommended that extra global attributes follow ACDD convention".
Global attribute | Definition | Requirement status | Format, fixed value or example |
---|---|---|---|
title |
A short phrase or sentence describing the dataset. |
mandatory |
ex.: “OceanGliders trajectory file” |
platform |
Name of the platform(s) that supported the sensors data used to create this data set or product. |
mandatory |
ex.: "sub-surface gliders" |
platform_vocabulary |
Controlled vocabulary for the names used in the "platform" attribute. |
mandatory |
|
id |
Formatted mission name: <platform_serial>_<start_date>_<data_mode> |
mandatory |
ex.:
|
naming_authority |
A unique name that identifies the institution who provided the id. ACDD-1.3 recommends using reverse-DNS naming. |
highly desirable |
ex.:
|
institution |
The name of the institution where the original data was produced. |
highly desirable |
ex.:
|
internal_mission_identifier |
The mission identifier used by the institution principally responsible for originating this data |
highly desirable |
ex.:
|
geospatial_lat_min |
Describes a simple lower latitude limit |
suggested |
format: decimal degree |
geospatial_lat_max |
Describes a simple upper latitude limit |
suggested |
format: decimal degree |
geospatial_lon_min |
Describes a simple longitude limit |
suggested |
format: decimal degree |
geospatial_lon_max |
Describes a simple longitude limit |
suggested |
format: decimal degree |
geospatial_vertical_min |
Describes the numerically smaller vertical limit. |
suggested |
format: meter depth |
geospatial_ vertical_max |
Describes the numerically larger vertical limit |
suggested |
format: meter depth |
time_coverage_start |
Datetime of the first measurement in this dataset |
highly desirable |
format: Formatted string: YYYYmmddTHHMMss ex.: 20240425T145805 |
time_coverage_end |
Datetime of the last measurement in this dataset |
highly desirable |
format: Formatted string: YYYYmmddTHHMMss ex.: 20240425T145805 |
site |
The name of the regular sample line or area. |
highly desirable |
ex.: MOOSE_T00, MOOSE_02 |
site_vocabulary |
Controlled vocabulary of the names used in the “site” attribute |
highly desirable |
To be defined |
program |
The overarching program(s) of which the dataset is a part. A program consists of a set (or portfolio) of related and possibly interdependent projects that meet an overarching objective. |
Highly desirable |
ex.: MOOSE glider program |
program_vocabulary |
Controlled vocabulary of the program attribute |
highly desirable |
To be defined |
project |
The name of the project(s) principally responsible for originating this data. Multiple projects can be separated by commas |
suggested |
ex.: SAMBA |
network |
A network is a group of platforms crossing the boundaries of a single program. It can represent a mutual scientific objective, a geographical focus, an array and/or a project. Multiple networks shall be separated by commas. |
suggested |
ex.: Integrated Ocean Observing System (IOOS), Southern California Coastal Ocean Observing System (SCCOOS), Boundary Ocean Observing Network (BOON) |
contributor_name |
Name of the contributors to the glider mission. Multiple contributors are separated by commas. |
PI name is mandatory |
ex.: "Victor Turpin, Callum Rollo" |
contributor_email |
Email of the contributors to the glider mission. Multiple contributors' emails are separated by commas. |
PI email is mandatory |
ex.: "[email protected], [email protected]" |
contributor_id |
Unique id of the contributors to the glider mission. Multiple contributors’ ids are separated by commas. |
highly desirable |
|
contributor_role |
Role of the contributors to the glider mission. Multiple contributors’ roles are separated by commas. |
PI vocabulary is mandatory |
|
contributor_role_vocabulary |
Controlled vocabulary for the roles used in the "contributors_role". Multiple contributors’ roles and vocabularies are separated by commas. |
PI vocabulary is mandatory |
|
contributing_institutions |
Names of institutions involved in the glider mission. Multiple institutions are separated by commas. |
Operator is mandatory |
|
contributing_institutions_vocabulary |
url to the repository of the id |
highly desirable |
|
contributing_institutions_role |
Role of the institutions involved in the glider mission. Multiple institutions' roles are separated by a comma. |
Operator role is mandatory |
|
contributing_institutions_role_vocabulary |
The controlled vocabulary of the role used in the institution’s role. Multiple vocabularies are separated by commas. |
Operator vocabulary is mandatory |
|
uri |
Other universal resource identifiers relevant to be linked to this dataset. Multiple uris are separated by a comma. |
suggested |
ex.: EDIOS, CSR, EDMERP, EDMED, CDI, ICES, etc. |
data_url |
url link to where the OG1.0 data file is hosted |
highly desirable |
|
doi |
The digital object identifier of the OG1.0 data file |
highly desirable |
|
rtqc_method |
The method used by DAC to apply real-time quality control to the data set |
mandatory |
ex.: "Quality control performed with IOOS QARTOD toolbox v2.1", "No QC applied" |
rtqc_method_doi |
The digital object identifier of the methodology used to apply real-time quality control to the data set. |
highly desirable |
ex.: "n/a" |
web_link |
url that provides useful information about anything related to the glider mission. Multiple urls are separated by commas. |
suggested |
|
comment |
Miscellaneous information about the glider mission, data or methods used to produce it. |
suggested |
ex.: "glider recovered after 5 days due to leak" |
start_date |
Datetime of glider deployment |
mandatory |
format: Formatted string: YYYYmmddTHHMMss ex.: 20240425T145805 |
date_created |
date of creation of this data set |
mandatory |
format: Formatted string: YYYYmmddTHHMMss ex.: 20240425T145805 |
featureType |
Description of a single feature with this discrete sampling geometry |
mandatory |
fixed value: "trajectory" |
Conventions |
A comma-separated list of the conventions that are followed by the dataset. |
mandatory |
ex.: "CF-1.10, ACDD-1.3, OG-1.0" |
Name | Definition | Comment |
---|---|---|
N_MEASUREMENTS |
N_MEASUREMENTS = unlimited; |
Number of recorded measurements. |
OG1.0 requirements cover positioning variables and geolocation of any scientific measurements made by the glider during its mission, described in the table below.
VARIABLE NAME | variable attributes | requirement status |
---|---|---|
TIME
|
|
mandatory |
LONGITUDE
|
|
mandatory |
LATITUDE
|
|
mandatory |
DEPTH
|
|
mandatory |
See parameters section for guidance on attributes and conventions on _QC channels.
OG1.0 requirements cover the GPS variables delivered by the glider when at the sea surface, described in the table below.
VARIABLE NAME | variable attributes | requirement status |
---|---|---|
TIME_GPS
|
|
mandatory |
LONGITUDE_GPS
|
|
mandatory |
LATITUDE_GPS
|
|
mandatory |
VARIABLE NAME | variable attributes | requirement status | Format, fixed value or example |
---|---|---|---|
TRAJECTORY
|
TRAJECTORY:cf_role = "trajectory_id" TRAJECTORY:long_name = “trajectory name”; Where the format is <platform_serial>_<start_date> The format for platform_serial is described in the PLATFORM_SERIAL variable documentation. The format for start_date is the ISO 8601 representation of the deployment start UTC datetime with seconds precision. |
mandatory |
ex.: sp041_20210909T160556 |
VARIABLE NAME | variable attributes | requirement status | Format, fixed value or example |
---|---|---|---|
WMO_IDENTIFIER
|
WMO_IDENTIFIER:long_name = “wmo id”; |
mandatory |
ex.: '6801706' |
PLATFORM_MODEL
|
PLATFORM_MODEL:long_name: “model of the glider”; PLATFORM_MODEL:platform_model_vocabulary = “”; |
mandatory |
ex.: "Kongsberg Maritime Seaglider M1 glider" PLATFORM_MODEL: platform_model_vocabulary = “https://vocab.nerc.ac.uk/ collection/B76/current/B7600002/”; |
PLATFORM_SERIAL_NUMBER
|
PLATFORM_SERIAL_NUMBER:long_name = “glider serial number”; |
mandatory |
The platform serial should be constructed from the manufacturer prefix and the platform serial number ex.: "sg001" (seaglider), "unit_001" (slocum), "sea001" (seaexplorer), "sp001" (spray). Where the serial number of the platform is not known, a local nickname can be used e.g. "orca", "sverdrup", "ammonite". |
PLATFORM_NAME
|
PLATFORM_NAME:long_name = “Local or nickname of the glider”; |
highly desirable |
a local nickname for the glider ex.: "orca", "sverdrup", "ammonite". |
PLATFORM_DEPTH_RATING
|
PLATFORM_DEPTH_RATING:long_name = “depth limit in meters of the glider for this mission”; PLATFORM_DEPTH_RATING:convention = “positive value expected - e.g. 100m depth = 100”; |
highly desirable |
ex.: 1000 |
ICES_CODE
|
ICES_CODE:long_name = “ICES platform code of the glider”; ICES_CODE:ices_code_vocabulary = “”; |
highly desirable |
ex.: "7460" ICES_CODE:ices_code_vocabulary = “https://vocab.ices.dk/? codeguid=690d82e2 -b4e2-4e30-9772-7499c66144c6”; |
PLATFORM_MAKER
|
PLATFORM_MAKER:long_name = “glider manufacturer”; PLATFORM_MAKER:platform_maker_vocabulary = “”; |
suggested |
ex.: "Teledyne Webb Research" PLATFORM_MAKER: platform_maker_vocabulary = “https://vocab.nerc.ac.uk/ collection/B75/current/ORG01077/”; |
VARIABLE NAME | variable attributes | requirement status |
---|---|---|
DEPLOYMENT_TIME
|
DEPLOYMENT_TIME:long_name = “date of deployment”; DEPLOYMENT_TIME:standard_name = "time"; DEPLOYMENT_TIME:calendar = "gregorian"; DEPLOYMENT_TIME:units = "seconds since 1970-01-01T00:00:00Z"; |
mandatory |
DEPLOYMENT_LATITUDE
|
DEPLOYMENT_LATITUDE:long_name = “latitude of deployment”; |
mandatory |
DEPLOYMENT_LONGITUDE
|
long_name = “longitude of deployment”; |
mandatory |
VARIABLE NAME | variable attributes | requirement status | Format, fixed value or example |
---|---|---|---|
FIELD_COMPARISON_REFERENCE
|
FIELD_COMPARISON_REFERENCE:long_name = “links (uri or url) to supplementary data that can provide field comparison for platform sensors.”; FIELD_COMPARISON_REFERENCE:comment = “multiple links are separated by a comma” |
highly desirable |
Note: FIELD_COMPARISON_REFERENCE is applicable to deployment, recovery, and delayed versions.
VARIABLE NAME | variable attributes | requirement status | Format, fixed value or example |
---|---|---|---|
GLIDER_FIRMWARE_VERSION
|
GLIDER_FIRMWARE_VERSION:long_name = “version of the internal glider firmware”; |
highly desirable |
ex.: v1.3.22 |
LANDSTATION_VERSION
|
LANDSTATION_VERSION:long_name = “version of the server onshore”; |
highly desirable |
ex.: "dockserver v3.42" |
BATTERY_TYPE
|
BATTERY_TYPE:long_name = “type of the battery”; BATTERY_TYPE:battery_type_vocabulary = “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/battery_type.md”; |
suggested |
ex.: "lithium rechargeable" BATTERY_TYPE:battery_type_vocabulary = “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/battery_type.md” |
BATTERY_PACK
|
BATTERY_PACK:long_name = “battery packaging”; |
suggested |
ex.: "2X24 12V battery" |
VARIABLE NAME | variable attributes | requirement status | Format, fixed value or example |
---|---|---|---|
TELECOM_TYPE
|
TELECOM_TYPE:long_name = “types of telecommunication systems used by the glider, multiple telecom type are separated by a comma”; TELECOM_TYPE:telecom_type_vocabulary = “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/telecom_type.md”; |
highly desirable |
ex.: iridium TELECOM_TYPE:telecom_type_vocabulary = “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/telecom_type.md” |
TRACKING_SYSTEM
|
TRACKING_SYSTEM:long_name = “type of tracking systems used by the glider, multiple tracking system are separated by a comma”; TRACKING_SYSTEM:tracking_system_vocabulary = “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/tracking_system.md”; |
highly desirable |
ex.: "gps, accoustic" TRACKING_SYSTEM:tracking_system_vocabulary = "https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/tracking_system.md, https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/tracking_system.md" |
PHASE describes the glider behaviors when at sea. The different behaviors are described in the phase vocabulary (ascent, descent, surfacing, parking, inflection, etc.)
Note that the vocabulary will be fully described and implemented in the control vocabulary tool during the implementation phase.
Phase calculation methodologies need publishing as a best practice document separately to the OG1.0 terms of reference.
VARIABLES NAME | variable attributes | requirement status |
---|---|---|
PHASE
|
PHASE:long_name = “behavior of the glider at sea”; PHASE:phase_vocabulary: “https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md”; PHASE:_FillValue = 0b ; PHASE:phase_calculation_method = “”; PHASE:phase_calculation_method_vocabulary = “”; PHASE: ancillary_variables = "PHASE_QC" |
Highly desirable |
PHASE_QC
|
PHASE_QC:long_name = "quality flag"; PHASE_QC:_FillValue = " "; PHASE_QC:valid_range = 0b, 1b, 2b, 3b, 4b; PHASE_QC:flag_values = 0b, 1b, 2b, 3b, 4b; PHASE_QC:flag_meanings = "No QC has been applied, Good data, Probably good data, Probably bad data, Bad data" ; |
Highly desirable |
PHASE is used to derive data product (profile, trajectory profiles, gridded product) from OG1.0 data sets. It is recommended to include PHASE when possible.
Each geophysical variable to be recorded must be described with its sensor. Sensor information are stored in an empty and dimensionless variable filled with variable attributes.
Sensor variables are named as follows: SENSOR_<sensor_type>_<sensor_serial_number>
Where <sensor_type> is a controlled vocabulary: see vocabulary guidelines
VARIABLE NAME | variable attributes and requirement status | example |
---|---|---|
SENSOR_<sensor_type>_ <sensor_serial_number> |
string SENSOR_<sensor_type>_<sensor_serial_number> attributes: sensor_type_vocabulary: “” - mandatory long_name: “” - mandatory sensor_model: “” - mandatory sensor_model_vocabulary: “” - mandatory sensor_maker: “” - highly desirable sensor_maker_vocabulary: “” - highly desirable sensor_serial_number: “” - highly desirable sensor_calibration_date: “” - highly desirable - Format iso8601 ("YYYY-MM-DD") |
SENSOR_CTD_206523 attributes: sensor_type_vocabulary: "https://vocab.nerc.ac.uk/collection/L05/current/130” long_name: “RBR Legato3 CTD” sensor_model: “RBR Legato3 CTD” sensor_model_vocabulary: “https://vocab.nerc.ac.uk/collection/L22/current/TOOL1745” sensor_maker: “RBR” sensor_maker_vocabulary: “https://vocab.nerc.ac.uk/collection/L35/current/MAN0049” sensor_serial_number: “206523” sensor_calibration_date: “2021-03-01” |
Geophysical variables are measurements of a physical phenomenon (or an intermediate variable needed to measure a physical phenomenon); It is provided directly by a sensor (in sensor counts or in physical units) or computed (derived) from other parameters (i.e. intermediate parameters).
The fill value should have the same data type as the variable and be outside the range of possible data values.
VARIABLE NAME | variable attributes | requirement status | example |
---|---|---|---|
<PARAM>
|
<PARAM>:long_name = "<X>"; <PARAM>:standard_name = "<X>"; <PARAM>:vocabulary = ""; <PARAM>:_FillValue = <X>; Note: The fill value should have the same data type as the variable and be outside the range of possible data values. <PARAM>:units = "<X>"; <PARAM>:ancillary_variables = "<PARAM>_QC"; <PARAM>:coordinates = "TIME, LONGITUDE, LATITUDE, DEPTH" <PARAM>:sensor = "SENSOR_<sensor_type>_<sensor_serial_number>" |
mandatory <PARAM> contains the values of a parameter listed in the control vocabulary related to OceanGliders parameters. <X>: these fields are specified in the control vocabularies. |
CNDC(N_MEASUREMENT); attributes: long_name = "Electrical conductivity of the water body by CTD" standard_name = "<X>"; vocabulary = "https://vocab.nerc.ac.uk/collection/OG1/current/CNDC/"; _FillValue = NaNf units = "mS cm-1" ancillary_variables = "CNDC_QC"; coordinates = "TIME, LONGITUDE, LATITUDE, DEPTH" sensor = "SENSOR_CTD_206523" |
<PARAM>_QC
|
<PARAM>_QC:long_name = "quality flag"; <PARAM>_QC:_FillValue = " "; <PARAM>_QC:valid_range = 0b, 1b, 2b, 3b, 4b; <PARAM>_QC:flag_values = 0b, 1b, 2b, 3b, 4b; <PARAM>_QC:flag_meanings = "No QC has been applied, Good data, Probably good data, Probably bad data, Bad data" ; <PARAM>_QC:RTQC_methodology = “”; <PARAM>_QC:RTQC_methodology_vocabulary = “”; <PARAM>_QC:RTQC_methodology_doi = “”; |
higly desirable |
Methodologies used to produce files meeting the OG format should be published in the IODE Ocean Best Practice (OBP) repository (https://repository.oceanbestpractices.org/) under the community “OceanGliders” and the collection “data management”. The following methodologies are covered, among others:
-
Interpolation
-
PHASE computation
-
RTQC
Methodologies should describe the computation methods used (typically by Data Assembly Centers) to produce the data set. Ocean Best Practices are assigned a DOI and should be tagged as "OceanGliders" practices by the submitter. Additionally, OBP endorsed by the OceanGliders data management task team will be marked as such.
Management of the evolution of the format will be organized by the OceanGliders data management team.