Implements a pipeline to retrieve BSO data from the Smapshot platform.
- Copy the provided
.env.example
filecp .env.example .env
- Edit the
.env
file:ENDPOINT
: SPARQL Graph Store endpoint for data ingestGRAPH
: Named graph used for ingesting dataUSERNAME
: Username of endpoint with permissions to edit specified graphPASSWORD
: Password of endpoint with permissions to edit specified graph
- Run with
docker-compose up -d
To execute the pipeline, run:
bash run.sh
To execute individual steps of the pipeline, use the Task runner. Display a list of available tasks by running:
Note: Replace bso_smapshot
with your container name if necessary
docker exec bso_smapshot task --list
The command will output the list of tasks:
task: Available tasks for this project:
* ingest: Ingest the data into a named graph via the specified SPARQL Graph Store endpoint
* perform-mapping: Map Smapshot XML data in the temporary folder to CIDOC/RDF
* prepare-mapping: Compare the retrieved XML files with the mapped TTL files and copy all unconverted XML files to a temporary folder.
* retrieve-data: Download the detailed data for all validated images in the SARI/BSO collection into the `/data` folder. The data is converted to XML for later mapping using the X3ML Mapping Engine.
* run: Run entire pipeline
Run a given task as follows:
docker exec bso_smapshot task <task name>
e.g.
docker exec bso_smapshot task retrieve-data
The ingest step sends the data to a RDF Graph Store backend. The Metaphacts and ResearchSpace platforms both implement the Graph Store protocol.
For security reasons, it's best to create a new user on the platform that has only access to the required operation and named graph. For MP and RS, create or edit the shiro-roles.ini
and add a role with permissions for the named graph where the data will be pushed to. The name of the role can be chosen at will.
For example, to create a role push-smapshot
with permissions for the named graph https://resource.swissartresearch.net/graph/smapshot
add:
[roles]
push-smapshot=sparql:graphstore:*:<https://resource.swissartresearch.net/graph/smapshot>
Then create a new user that has (only) the newly created role.