Skip to content
AtesComp edited this page Nov 22, 2024 · 3 revisions

Lucene Document Database

Lucene is a "document" database used to help RDF Transform manage ontology information used to construct the RDF transforms for projects.

Lucene Storage

The RDF Transform's cache directory

.../openrefine/cache/rdf-transform 

contains the Lucene storage data files that RDF Transform uses to stores ontology information.

The cache directory (above) can be deleted between runs as it will be recreated on the next initialization. This results in a longer initialization as it retrieves the ontologies and populates the store. Once the cache is created, initialization will shorten for subsequent runs.

If the Lucene store becomes corrupted, deleting the cache directory is a suggested method of recovery.

Vocabularies Meta

In the RDF Transform's cache directory, the "VocabulariesMeta.json" file holds the working copy of the RDF Transform global vocabularies. When RDF Transform is initialized during startup, this file is created if not present. A "luceneIndex" directory is also created holding the Lucene data store for the ontologies and adds the "default" vocabularies to this store. When you add additional ontologies, this file and the Lucene store are updated.

Predefined Namespaces

The "VocabulariesMeta.json" file (above) is generated from the Predefined Namespaces file located at:

.../openrefine/extensions/rdf-transform/module/MOD-INF/classes/files/PredefinedVocabs

The "PredefinedVocabs" file contains the Prefix, Namespace, and URL of the Namespace. You can change the default vocabularies for all projects by modifying the "PredefinedVocabs" file.

To regenerate the RDF Transform's Lucene store, deleting the ".../openrefine/cache/rdf-transform" directory, and restart OpenRefine.

Clone this wiki locally