Skip to content

Commit

Permalink
Documentation / Improve harvesters documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
josegar74 committed Jan 2, 2024
1 parent 6376582 commit 8f554bc
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 24 deletions.
32 changes: 20 additions & 12 deletions docs/manual/docs/user-guide/harvesting/harvesting-csw.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,24 @@ This harvester will connect to a remote CSW server and retrieve metadata records

## Adding a CSW harvester

The figure above shows the options available:
Configuration options:

- **Site** - Options about the remote site.
- *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the CSW harvester.
- *Service URL* - The URL of the capabilities document of the CSW server to be harvested. eg. <http://geonetwork-site.com/srv/eng/csw?service=CSW&request=GetCabilities&version=2.0.2>. This document is used to discover the location of the services to call to query and retrieve metadata.
- *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results.
- *Use account* - Account credentials for basic HTTP authentication on the CSW server.
- **Search criteria** - Using the Add button, you can add several search criteria. You can query only the fields recognised by the CSW protocol.
- **Options** - Scheduling options.
- **Options** - Specific harvesting options for this harvester.
- *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
- **Privileges** - Assign privileges to harvested metadata.
- **Categories**
- **Identification**:
- *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the harvester.
- *Group* - Group that owns the harvested metadata.
- *User* - User that owns the harvested metadata.
- *Action on UUID collision* - Allows to configure the action when a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...).
- skipped (default)
- overriden
- generate a new UUID
- **Schedule** - Schedule configuration to execute the harvester.
- **Configuration for protocol OGC CSW 2.0.2**:
- *Service URL* - The URL of the capabilities document of the CSW server to be harvested. eg. `http://geonetwork-site.com/srv/eng/csw?service=CSW&request=GetCabilities&version=2.0.2`. This document is used to discover the location of the services to call to query and retrieve metadata.
- *Search filter* - Using the Add button, you can add several search criteria.
- *XPath filter* - When record is retrieved from remote server, check an XPath expression to accept or discard the record. The XPath must use namespaces of the schema of the record (eg. `gmd`, `gco`, `srv` for ISO19139) and must return a boolean value. For example, to filter record with status = completed `count(.//gmd:status/*[@codeListValue = 'completed']) > 0`.
- **Advanced options for protocol CSW**:
- *Remote authentication* - Account credentials for basic HTTP authentication on the CSW server.
- *Check for duplicate resources based on the resource identifier* - Checks if exists another metadata with the same resource identifier, discarding the harvested metadata in such case. Comparison is made on the element `gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/*/gmd:code/gco:CharacterString`. It only applies to records in ISO19139 or ISO profiles.
- *Category for harvested records* - The harvested metadata will be assigned to the selected category.
- *Validate records before import* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
- **Privileges** - Assign privileges to harvested metadata.
36 changes: 24 additions & 12 deletions docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,30 @@ This harvester will harvest metadata as XML files from a filesystem available on

The figure above shows the options available:

- **Site** - Options about the remote site.
- *Name* - This is a short description of the filesystem harvester. It will be shown in the harvesting main page as the name for this instance of the Local Filesystem harvester.
- *Directory* - The path name of the directory containing the metadata (as XML files) to be harvested.
- *Recurse* - If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found.
- *Keep local if deleted at source* - If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified.
- *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results.
- **Options** - Scheduling options.
- **Harvested Content** - Options that are applied to harvested content.
- *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format.
- *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
- **Privileges** - Assign privileges to harvested metadata.
- **Categories**
- **Identification**:
- *Name* - This is a short description of the filesystem harvester. It will be shown in the harvesting main page as the name for this instance of the Local Filesystem harvester.
- *Group* - Group that owns the harvested metadata.
- *User* - User that owns the harvested metadata.
- **Schedule** - Scheduling options.
- **Configure connection to Directory**:
- *Directory* - The path name of the directory containing the metadata (as XML files) to be harvested.
- *Also search in subfolders* - If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found.
- *Script to run before harvesting*
- *Type of record*
- **Configure response processing for filesystem**:
- *Action on UUID collision* - Allows to configure the action when a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...).
- skipped (default)
- overriden
- generate a new UUID
- **Filtering and processing response** - Options that are applied to harvested content.
- *Update catalog record only if file was updated*
- *Keep local if deleted at source* - If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified.
- *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
- *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format.
- *Batch edits*
- *Category for harvested records* - The harvested metadata will be assigned to the selected category.

- **Privileges** - Assign privileges to harvested metadata.

!!! Notes

Expand Down

0 comments on commit 8f554bc

Please sign in to comment.