Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: docs: logicalSource: add required charter reference formulations #152

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 59 additions & 1 deletion spec/docs/logicalSource.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,63 @@ A <dfn data-lt="iteration">logical iteration</dfn> is an item in the sequence pr
A <dfn>data source</dfn> is an abstract concept that represents a source of data that can be accessed via a [=logical source=]. A [=data source=] can be a file, a database, a web service, or any other source of data.

<aside class="note">
There can be many different types of [=reference formulation=]. The known types, and the details of how a reference formulation is handled and implemented for each data format, are specified in [[RML-IO-Registry]].
There can be many different types of [=reference formulation=]. RML-Core covers CSV and JSONPath, RML-IO extends this further with XML and SQL relational databases. Other known types, and the details of how a reference formulation is handled and implemented for each data format, are specified in [[RML-IO-Registry]]. This way, implementations are not required to support all defined reference formulations to be compliant with RML-Core.
</aside>

## Reference Formulations

RML-Core covers `rml:CSV` and `rml:JSONPath` as the minimum reference formulations to be supported by any implementation for referencing data in CSV and JSON documents.

Each Logical Source has a reference formulation to define how to reference
to elements of the data of the input source.
Several reference formulations (`rml:ReferenceFormulation`)
are defined in this specification:

- `rml:CSV`: CSV documents
- `rml:JSONPath`: JSON documents

See [[RML-IO-Registry]] for a detailed specification of these reference formulations.

## Access descriptions

RML-Core requires the `rml:PathFile` access description to be supported by any implementation to access files with an absolute or relative path.
This access description allows accessing files with relative and absolute paths from:

- `rml:CurrentWorkingDirectory`: relative to the current working directory of the RML processor.
- `rml:MappingDirectory`: relative to the location of the RML mapping.
- A string Literal: a string describing an absolute path against which relative paths are resolved, similar to the Base URI in [RFC3986](https://www.rfc-editor.org/rfc/rfc3986).

If `rml:root` is not specified, it defaults to `rml:CurrentWorkingDirectory`.

| Property | Domain | Range |
| ----------- | ------------------------- | ------------------------------------------------------------------ |
| `rml:root` | `rml:FilePath` | `rml:CurrentWorkingDirectory`, `rml:MappingDirectory` or `Literal` |
| `rml:path` | `rml:FilePath` | `Literal` |

Example of accessing a CSV file relative to the current working directory.
The file's absolute path is `$CURRENT_WORKING_DIR/file.csv` where `$CURRENT_WORKING_DIR` is
the location of the RML mapping.

<pre class="ex-source">
&lt;#RelativePathCWD&gt; a rml:LogicalSource;
rml:source [ a rml:FilePath;
rml:root rml:CurrentWorkingDirectory;
rml:path "./file.csv";
];
.
</pre>

Example of accessing a JSON file relative to the path of the mapping.
The file's absolute path is `$MAPPING_DIR/file.json` where `$MAPPING_DIR` is
the location of the RML mapping.

<pre class="ex-source">
&lt;#RelativePathMapping&gt; a rml:LogicalSource;
rml:source [ a rml:FilePath;
rml:root rml:MappingDirectory;
rml:path "./file.json";
];
rml:referenceFormulation rml:JSONPath;
rml:iterator "$";
.
</pre>
Comment on lines +31 to +73
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced the access description should be in core.
I think it should stay in IO, since there is where we define access descriptions

Yes, it is used in core tests, but that is not a problem IMO. It is normal to have some core module that needs other modules before it can lead to something that works. And therefor it is also normal to include some of those dependencies to run tests.

In fact I would prefer for CSV and JSON to also be introduced in IO. I think it makes the specs more easy to follow. But since reference formulation is introduced in core I'm not as strongly opposed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was discussed like this during the physical meeting to also move the access description, without it, you cannot perform any test cases nor have a stand-alone RML mapping if you only support Core. Therefore, the simplest access description (next gen of rml:source "/path/to/file.csv") is moved here. You mention you don't see this an issue, but having something standalone is IMO the best. Requiring people to implement some features of a module to have Core even working is not great IMO.

AFAIK, RML-Core would be the original RML actually where these things were included...

Loading