Skip to content

Commit

Permalink
docs: Explain the why
Browse files Browse the repository at this point in the history
  • Loading branch information
ddeboer committed May 14, 2024
1 parent 8fa7e07 commit c6c30b1
Showing 1 changed file with 12 additions and 8 deletions.
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,24 @@
# LD Workbench

LD Workbench is a transformation tool for linked data that is designed to use SPARQL as its main configuration language.
LD Workbench is a command-line tool for transforming large RDF datasets using pure SPARQL.

LD Workbench is a Command Line Interface (CLI) application. LD Workbench is tested in Linux Bash, macOS Z shell, and Windows PowerShell.

This project is currently in a Proof-of-Concept phase. Feel free to watch our progress, but please do not use this project in a production setting.
This project is currently in a Proof-of-Concept phase.

## Approach

A *pipeline* is the sequence of *stages*.
The main design principes are scalability and extensibility.

### Scalability

LD Workbench is **scalable** due to its iterator/generator approach:

Each *stage* consists of two components: an *iterator* and a *generator*.
* the **iterator** component fetches URIs using a SPARQL SELECT query, paginating results using SPARQL `OFFSET` and `LIMIT` (binding each URI to a `$this` variable)
* the **generator** component then runs a SPARQL CONSTRUCT query for each URI ([pre-binding](https://www.w3.org/TR/shacl/#pre-binding) `$this` to the URI), which returns the transformed result.

The *iterator* component is configured by a SPARQL Select query. This query binds a sequence of RDF terms to a variable called `$this`. This sequence forms an iterator over a potentially large data collection. In the absence of a good approach for streaming through large data collections, the SPARQL standard allows us to apply 'pagination' through a large collection by using the Offset and Limit keywords.
### Extensible

Every binding for variable `$this` is used to parameterize a SPARQL Construct query; this is the *generator* component. Parameterization follows [SPARQL pre-binding](https://www.w3.org/TR/shacl/#pre-binding) according to the SHACL standard. Each SPARQL Construct query returns RDF triples that are part of the transformed result.
LD Workbench is **extensible** because it uses pure SPARQL queries (instead of code) for configuring transformation pipelines.
Each pipeline is a sequence of stages; each stage consists of an iterator and generator.

## Configuration

Expand Down

0 comments on commit c6c30b1

Please sign in to comment.