Documentation / Improve harvesters documentation

GeoCat · Jan 2, 2024 · 8f554bc · 8f554bc
1 parent 6376582
commit 8f554bc
Show file tree

Hide file tree

Showing 2 changed files with 44 additions and 24 deletions.
diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-csw.md b/docs/manual/docs/user-guide/harvesting/harvesting-csw.md
@@ -4,16 +4,24 @@ This harvester will connect to a remote CSW server and retrieve metadata records
 
 ## Adding a CSW harvester
 
-The figure above shows the options available:
+Configuration options:
 
--   **Site** - Options about the remote site.
-    -   *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the CSW harvester.
-    -   *Service URL* - The URL of the capabilities document of the CSW server to be harvested. eg. <http://geonetwork-site.com/srv/eng/csw?service=CSW&request=GetCabilities&version=2.0.2>. This document is used to discover the location of the services to call to query and retrieve metadata.
-    -   *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results.
-    -   *Use account* - Account credentials for basic HTTP authentication on the CSW server.
--   **Search criteria** - Using the Add button, you can add several search criteria. You can query only the fields recognised by the CSW protocol.
--   **Options** - Scheduling options.
--   **Options** - Specific harvesting options for this harvester.
-    -   *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
--   **Privileges** - Assign privileges to harvested metadata.
--   **Categories**
+- **Identification**:
+    - *Name* - This is a short description of the remote site. It will be shown in the harvesting main page as the name for this instance of the harvester.
+    - *Group* - Group that owns the harvested metadata.
+    - *User* - User that owns the harvested metadata.
+    - *Action on UUID collision* - Allows to configure the action when a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...).
+        - skipped (default)
+        - overriden
+        - generate a new UUID
+- **Schedule** - Schedule configuration to execute the harvester.
+- **Configuration for protocol OGC CSW 2.0.2**:
+    - *Service URL* - The URL of the capabilities document of the CSW server to be harvested. eg. `http://geonetwork-site.com/srv/eng/csw?service=CSW&request=GetCabilities&version=2.0.2`. This document is used to discover the location of the services to call to query and retrieve metadata.
+    - *Search filter* - Using the Add button, you can add several search criteria.
+    - *XPath filter* - When record is retrieved from remote server, check an XPath expression to accept or discard the record. The XPath must use namespaces of the schema of the record (eg. `gmd`, `gco`, `srv` for ISO19139) and must return a boolean value. For example, to filter record with status = completed `count(.//gmd:status/*[@codeListValue = 'completed']) > 0`.
+- **Advanced options for protocol CSW**:
+    - *Remote authentication* - Account credentials for basic HTTP authentication on the CSW server.
+    - *Check for duplicate resources based on the resource identifier* - Checks if exists another metadata with the same resource identifier, discarding the harvested metadata in such case. Comparison is made on the element `gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:identifier/*/gmd:code/gco:CharacterString`. It only applies to records in ISO19139 or ISO profiles.
+    - *Category for harvested records* - The harvested metadata will be assigned to the selected category.
+    - *Validate records before import* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
+- **Privileges** - Assign privileges to harvested metadata.
diff --git a/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md b/docs/manual/docs/user-guide/harvesting/harvesting-filesystem.md
@@ -6,18 +6,30 @@ This harvester will harvest metadata as XML files from a filesystem available on
 
 The figure above shows the options available:
 
--   **Site** - Options about the remote site.
-    -   *Name* - This is a short description of the filesystem harvester. It will be shown in the harvesting main page as the name for this instance of the Local Filesystem harvester.
-    -   *Directory* - The path name of the directory containing the metadata (as XML files) to be harvested.
-    -   *Recurse* - If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found.
-    -   *Keep local if deleted at source* - If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified.
-    -   *Icon* - An icon to assign to harvested metadata. The icon will be used when showing harvested metadata records in the search results.
--   **Options** - Scheduling options.
--   **Harvested Content** - Options that are applied to harvested content.
-    -   *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format.
-    -   *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
--   **Privileges** - Assign privileges to harvested metadata.
--   **Categories**
+- **Identification**:
+    - *Name* - This is a short description of the filesystem harvester. It will be shown in the harvesting main page as the name for this instance of the Local Filesystem harvester.
+    - *Group* - Group that owns the harvested metadata.
+    - *User* - User that owns the harvested metadata.
+- **Schedule** - Scheduling options. 
+- **Configure connection to Directory**:  
+    - *Directory* - The path name of the directory containing the metadata (as XML files) to be harvested.
+    - *Also search in subfolders* - If checked and the *Directory* path contains other directories, then the harvester will traverse the entire file system tree in that directory and add all metadata files found.
+    - *Script to run before harvesting*
+    - *Type of record*
+- **Configure response processing for filesystem**:
+    - *Action on UUID collision* - Allows to configure the action when a harvester finds the same uuid on a record collected by another method (another harvester, importer, dashboard editor,...).
+        - skipped (default)
+        - overriden
+        - generate a new UUID
+- **Filtering and processing response** - Options that are applied to harvested content.
+    - *Update catalog record only if file was updated*
+    - *Keep local if deleted at source* - If checked then metadata records that have already been harvested will be kept even if they have been deleted from the *Directory* specified.
+    - *Validate* - If checked, the metadata will be validated after retrieval. If the validation does not pass, the metadata will be skipped.
+    - *Apply this XSLT to harvested records* - Choose an XSLT here that will convert harvested records to a different format.
+    - *Batch edits*
+      - *Category for harvested records* - The harvested metadata will be assigned to the selected category.
+
+- **Privileges** - Assign privileges to harvested metadata.
 
 !!! Notes