-
Notifications
You must be signed in to change notification settings - Fork 12
Proposal #1: custom metadata mapping
CKAN CSW harvester (implemented in the ckanext-spatial extension) extracts information from ISO19139 records using a wide but fixed set of xpath.
Site developers/admins may need to extract some other information that are not already mapped.
We need a way to add new field mappings without editing the python code.
It shall be possible to import any field from the harvested ISO record into the dataset extras fields.
What is needed is the name of the extra fields that should be created, and the XPath of the information to extract from the metadata record.
Configuration will optionally contain the extras_mappings
field.
It will be a map with these contents:
- key: the name of the extra field that will be created
- value: the xpath of the data that will be extracted
An XPath extracts a nodeset from an XML document, so we need to set some constraints and encoding:
-
Text nodes only:
We'll only want to handle XPath expressions that extract text nodes. If the XPath does not select a text, the harvester may throw an error. -
Multiple values:
If more than one text node is selected by a single XPath, the correspondingextras
field will contain a list of strings encoded as a JSon array. -
Empty values:
If the XPath does not select anything, the extras field will be created as an empty string.
Example:
{
... other configuration fields ... ,
"extras_mappings":{
"servicepurpose":"//gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:purpose/gco:CharacterString/text()",
"mytitle":"//gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString/text()"
}
The existing implementation that extracts data from the ISO records is split in two steps:
- a first step extracts bare information from the record, and you can find it here:
http://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/model/harvested_metadata.py
At the end of the same file there are some methodsinfer_**
that try to extract single valued data from a set of nodes. - Second step is performed in the import stage; Method
get_package_dict()
in file filebase.py
http://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L154
maps the extracted values into static andextras
fields in the CKAN dataset.