Skip to content
Emmanuel Blondel edited this page Mar 11, 2015 · 41 revisions

Welcome to the GEMS wiki!

GEMS - GIS Enforcing Metadata & Semantics, is a piece of Java software currently developed by the FAO Fishery & Aquaculture Department to perform the batch creation and web-publication of GIS layers and their associated metadata.

Table of Contents

1. Context & Objectives
2. Overview
3. Success stories
3.1 GIS Data collections disseminated with GEMS
3.2 Introducing a methodology for enabling a Geo Linked Open Data
4. Promoting & Implementing standards
4.1 Today
4.2 Tomorrow
5. Understanding GEMS
5.1 Concepts
5.2 How to use it
5.2 Business logic
6. Developing GEMS
6.1 Software architecture
6.2 Third-party Technologies
6.3 Limitations / Perspectives
6.4 Source code
6.4 Issue reporting

### 1. Context & Objectives ***

The need to disseminate and describe properly GIS data across the web is growing, and becomes part of institutional objectives. Best practices are reinforced by the implementation of internationnally-recognized standards such as ISO and OGC, and regulations such as the EC INSPIRE directive. To reach these objectives and guarantee that standards are implemented and best practices followed, tools are needed with the data managers as main target audience.

GEMS is one of these tools, and intends to facilitate the co-management of GIS web data (layers) and their metadata description, in an operational, iterative & cost-efficient approach. GEMS is born from an initial brainstorming in the FIGIS team that highlighted the need to improve the public dissemination of GIS products and guarantee its data provenance. Its implementation started within the i-Marine Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources.

The GEMS initiative is directly inspired by the objective to promote and implement international standards (OGC, ISO) and follow other approaches like INSPIRE. In this sense, GEMS has benefited from the fruitful interactions within the i-Marine geospatial working group, especially with the French IRD Institute with which guidelines were jointly prepared.

GEMS also builds its strengh by interacting with:

Initially developed for publishing GIS metadata for the FAO FI GIS data collections (e.g. aquatic species distributions, RFB competence areas, etc), its use goes now beyond FAO as external partners use or plan to use it.

The objective is to make its development as collaborative as possible. Hence, this tool tends to be generic, to make it usable by other institutions that have established partnership with the FAO Fishery & Aquaculture Department. Among the institutions that might benefit of this tool and possibly contribute to its development, we can cite the VLIZ Flanders Marine Institute, where the tool is currently applied to the GIS collection of Exclusive Economic Zones (EEZ).

### 2. Overview ***

GEMS is a tool that makes easy the GIS data dissemination in a standard way. It provides:

  • a framework to proceed to a strong association between a GIS data (in a geographic server) and its metadata description (in a metadata catalogue), where both resources are not managed in a scattered way but unified through the notion of layer/metadata pair.
  • a simple command line tool to proceed to the batch publication of GIS layer/metadata pairs. This simplicity higlights the operativeness & cost-efficiency of the GEMS initiative.
  • a single configuration file where to specify both data/collection characteristics & metadata content template

GEMS extends its power providing a bridge to related web information systems with:

  • the capacity to annotate GIS resources with external resource identifiers, such as Linked Open Data (LOD) URIs, for establishing bridges with external information systems and hence facilitate new paths for discovering geographic data.
  • the capacity to provide multiple bridges to external web-resources, for each GIS published product, to existing information systems and applications including: geographic server (for map visualization & data download), factsheet resources, thematic FIGIS map viewers, and Linked Open Data.
  • the guaranty to comply with standards such as OGC, ISO and the INSPIRE directive
### 3. Success stories *** #### 3.1 GIS Data collections disseminated with GEMS

This section gives an inventory of GIS data collections managed with GEMS, with a strong association established between geographic server and metadata catalogue web-resources:

Collection Organization Status Nb of Records
FAO Aquatic species distributions FAO Fishery & Aquaculture Department publicly available 791
RFB geographic competence areas FAO Fishery & Aquaculture Department publicly available 49
Exclusive Economic Zones VLIZ Flanders Marine Institute publicly available 249
Vulnerable Marine Ecosystems FAO Fishery & Aquaculture Department publicly available 244
#### 3.2 Introducing a methodology for enabling a Geo Linked Open Data

So far, Linked Open Data systems such as the FAO Fishery Linked Open Data (FLOD) hav been handling and disseminating domain data graphs. The geographic nature of some domain data (e.g. FAO areas, countries), the need to associate specific geographic distributions to domain information (e.g. distributions of species) and the important and growing role of OGC standards in disseminating geographic information together with the need to interconnect existing information systems (with their respective roles), require to think on how Linked Open Data systems may be interconnected with OGC-based information systems.

Geographic information is today disseminated in the web through well-known internationally recognized standards such as OGC and ISO. This involves mainly instances of geographic server and metadata catalogue. GIS metadata are commonly used as the product discovered and browsed by the user, and acts as the data provider in the sense the GIS data is accessed through the metadata which provides all the information required to interrogate the geographic server.

In order to insert within this well-known and commonly used GIS product dissemination strategy, a methodology has been drafted to connect Linked Open Data, and provide a new path to discover geographic data.

This methodology relies on the fact a GIS resource can be identified with a Unique Resource Identifier, and introduces a 2-step communication process between GIS information systems (through GEMS) and a Linked Open Data system:

  • GEMS publish each GIS product, based on some domain information (e.g. a species), and asks to the LOD system for its identifier. This is done by interactinc with web-service(s). With the LOD identifiers given, GEMS annotates its GIS products.
  • FLOD asks to the GIS information systems for a list of GIS products for a particular collection (e.g. list of species distributions). Such list results in a mapping data giving binding GIS metadata and LOD identifiers. Hence, FLOD can easily associate geographic references to existing domain graphs.

Having LOD domain graphs associated with geographic references, through OGC standards, applications based on Linked Open Data can easily inherit geographic information, display maps, query data, without any information a priori on which data to query and where to find it!

Such methodology has been successfully introduced and applied in the SmartFish Chimaera data portal, enabling discovery and display of geographic distributions of species, based on the indexing of documents describing species information.

### 4. Promoting & Implementing standards *** #### 4.1 Today
  • OGC standards: GEMS intends to align with internationally recognized standards, including OGC and ISO
  • INSPIRE: The development is also aimed to be compliant with the INSPIRE directive in term of metadata validation
  • Codelists: No standard is currently used for handling entity codelists, also given the heterogeneity of datasources, protocols & formats.
  • Linked Open Data annotation: The capacity to annotate GIS resources with LOD identifiers still relies on scattered collection-specific services. No standard is currently implemented in GEMS for the exploitation of LOD information, but the annotation itself is compliant with OGC standards.
#### 4.2 Tomorrow
  • It is expected to give to GEMS the capacity to handle standard codelists. It is also expected to unify this and enable a unique collection-independent GEMS middleware to inherit LOD information and annotate GEMS products. In both cases, the exploitation of Virtual-Repository format abstraction libraries and Grade outcomes could be an asset: The Virtual-Repository, through its plugins, provides a way to make easier the consumption of codelists, and Grade may provide a single and standard (SPARQL) service-based functionality to provide LOD identifiers for annotating GIS resources.

  • It is expected to implement standard revisions for geographic information, e.g. ISO 19115:2014

### 5. Understanding GEMS *** #### 5.1 Concepts

The key concept of the GEMS is called a GeographicEntity. An entity is an object for which one geographic reference can be associated. A single GIS layer/metadata pair will be created targeting one more geographic entities. For example, for FAO aquatic distributions, one single GIS layer and associated metadata are created and published for each species: the entity considered here is the species. Until now, the tool was mainly applied for a collection of a single GeographicEntity. Enabling the publication of a GIS layer & associated metadata for a combination of entities (e.g. 1 species / 1 vessel type / 1 flag state) is under consideration (a typical example is an intersection ).

The tool strongly associates the data publication to the metadata publication. For this, it relies on two different information systems, the geographic server for data, and the metadata catalogue for metadata. Both information systems must implement the OGC web-service standards: mainly WMS (Web Map Service) and WFS (Web Feature Service) for data, and CSW (Catalogue Service for the Web) for metadata. The metadata description is handled in ISO 19115/19139 metadata standard (OGC approved).

The tool has been developed in close linkage with the FAO Fishery Linked Open Data (FLOD) which tends to set-up a semantic repository (or warehouse). Hence, the FIGIS team has been establishing a mapping between the GeographicEntity concept described above, and the concept of semantic coded entity. GEMS also includes a process for annotating the GIS metadata documents (through ISO 19115-19139 XML metadata) and the GIS layer (through OGC WxS GetCapabilities Layer properties) with FLOD semantic coded entities, as one instance of external identifiers. The result of such annotation process is a mapping between semantic coded entity and GIS metadata URL (that is a unique identifier of the GIS resource). Such mapping can then to be used by semantic repositories such as FLOD, and go towards the dissemination of geographic information through OGC metadata and the discovery of geographic references through a geospatial-enabled Linked Open Data.

#### 5.2 How to use it
Pre-requisites

At now, GEMS requires:

  • a GeoServer instance (>= 2.1.x)
  • a GeoNetwork instance
  • a spatial database geographic server datastore as the layer publication is based on virtual SQL view layers (Geoserver). The layer publication intends to be extended to shapefile publication.
  • a reference code list of the entity collection. At now, the tool uses specific parsing for each codelist as reference lists are different and do not follow standards. Hence, according to the data collection, the Java library will be enriched with a specific codelist parser.
Command Line tool

GEMS includes a simple Java application that can be run as '''command line for managing data collections: Such application accepts a single input ie a XML configuration file which handles different sets of settings required to configure the GEMS publication.

Settings

Set of settings required to configure the GEMS publication. The settings are splitted into 3 parts:

Geographic Server settings

  • Connection: Url, user, password
  • Source info: workspace, layer, attribute(s)
  • Target info: workspace, datastore, layer prefix, publication type
  • Base layers (for metadata thumbnail): list of workspace/layer pairs

Metadata Catalogue settings

  • Connection: URL, user, password

Publication settings

  • Actions: publish/unpublish, force (data, metadata)
  • Codelist: URL, parser (class name)

Metadata content (template)

Set of textual information to be used for all metadata being generated

  • collection name & URL
  • Base title, abstract, purpose, methodology, supplementary info
  • Thesaurus, topic categories
  • License, disclaimer
  • Organization & individual contacts

See the configuration example for details on how to configure GEMS.

#### 5.3 Business logic

The configuration file is read by the application, and used to configure the batch GEMS publication process.

The reference entity codelist specified in the configuration file is parsed. An iteration is made on this reference codelist using the codelist parser

For each entity code:

  • the source GIS data collection layer is filtered on the attribute(s) specified in the configuration file.
  • geo-properties are computed using a dedicated Feature Client. Properties include: CRS (Coordinate Reference System), Feature count, Envelope (actual data & preview), time extent. These properties are used both as layer properties (in Geoserver) and as metadata elements (in Geonetwork)
  • the metadata is built using the metadata content template retrieved from the configuration file, and published in the metadata catalogue. At this runtime, OGC WxS resources specified in the metadata are 'offline'
  • the filtered GIS layer is published in Geoserver. It makes online the OGC resources specified in the metadata
### 6. Developing GEMS *** #### 6.1 Software Architecture

GEMS is a Java software. Its architecture is based on Maven, containing 6 modules:

module name description
gems-main Defines the type of objects that will be used by the application. Relies mainly on the concepts of GeographicEntity and GeographicMetaObject
gems-model Defines the GEMS configuration model (including the settings and metadata template), together with an XML simple representation of this model
gems-feature Provides the functionalities to retrieve data and compute geoproperties including: feature count, envelope (actual & preview), time extent, crs. This module relies a feature client framework and currently provides a powerful GeoAPI-compliant WFS client enabling gzip compression to retrieve data from the web.
gems-publisher Provides the functionalities to publish the data (in GeoServer) and metadata (in GeoNetwork)
gems-collection Handles the collection-specific code to prepare (currently made of a codelist parser and some business code to retrieve LOD information - for annotation purpose -)
gems-application Main application (Shaded JAR) for using GEMS as command line tool
#### 6.2 Third-party Technologies

GEMS is currently tailored for publishing data in GeoServer and metadata GeoNetwork

Geospatial features of Gems are based on Apache SIS and GeoToolKit (for those features that have not been yet integrated in Apache SIS)

The GEMS WFS Feature client relies on the Java API for RESTful Services (JAX-RS) and Jersey, and GeoToolKit for implementing the GeoAPI

Publication client libraries used are:

The XML Configuration model is based on XStream

#### 6.3 Known limitations / Perspectives

Modelling

For each new entity collection for which the metadata have to be created/published, a subclass implementing the CodelistParser interface has to be implemented. This is due to the heterogeneity of the codelist datasource, protocols and format. An additional class is often implemented together to retrieve information from FLOD. Future workshould lead to simplify these implementation requirements, by mapping completely the entity concept to a well-known semantic entity concept, modelled in a semantic ontology. Exploiting the iMarine Top Level Ontology might be considered.

Data / Metadata publication

The tool requires to publish the GIS layers in a spatial database datastore (successfully tested on Oracle and PostGIS datastores). Future work could support shapefile as target datastore (sucessfully tested locally, but requiring collaborating with the geoserver-manager to make the enhancements publicly available)

At now, the tool only supports data publication into GeoServer, and metadata publication in GeoNetwork, by relying on the Java libraries geoserver-manager and geonetwork manager. Future work could include other OGC-compliant softwares but that have different publication requirements. A possible use case would be to support Constellation geographic server & MDWeb metadata catalogue

#### 6.4 Source code

The GEMS source code is hosted and managed in the Openfigis github repository: https://github.com/openfigis/gems.

#### 6.4 Issue reporting

Issues can be reported at https://github.com/openfigis/gems/issues.

Clone this wiki locally