-
Notifications
You must be signed in to change notification settings - Fork 58
Document Markings in gerbil.nif.transfer
A NIF document object can contain 'Markings' that have further information about single parts of the document, e.g., the position and URL of a named entity. In this article, we want to present the different interaces and classes that the gerbil.nif.transfer
library offers. Where possible, we added the RDF triples to which an instance of an interface or a class would be translated.
The following RDF prefixes are used in this article
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
The following UML diagram shows the interfaces, classes and their relations described in this article.
The interfaces described in this section can be found in the package org.aksw.gerbil.transfer.nif
.
This is the broadest interface and defines no special methods. Implementing this interface does not really make sense.
This is a marking that contains a certain meaning. This could be a single URI or a set of URIs. Note, that if a single meaning contains a set of URIs, GERBIL will assume that the URIs can be connected with owl:sameAs
.
Classes implementing this interface mark a certain part of the text of a document. They have a start position and a length. Both measured in Java characters. Following the conventions in Java, the position end = start + length
is the position of the first character after the span.
Instances of this interface are translated into RDF nodes with an own URI. They are connected to the RDF node of the document using the nif:referenceContext
property. The start and end positions are according to the NIF standard added to the node URI and to the node itself using the nif:beginIndex
and nif:endIndex
properties, respectively.
A typed marking does contain a set of types. Note, that there is no class that implements this interface without implementing one of the other interfaces, too.
A scored marking contains a confidence score of the annotator. The higher this score, the more confident an annotator is regarding the correctness of this marking. Note, that there is no class that implements this interface without implementing one of the other interfaces, too.
This interface is the combination of the Meaning
and the Span
interfaces.
This interface is the combination of the TypedMarking
and the Span
interfaces.
This interface is the combination of the ScoredMarking
and the Span
interfaces.
The classes described in this section can be found in the package org.aksw.gerbil.transfer.nif.data
.
This class implements the Meaning
interface. It is used to add a general topic to a document, e.g., for the C2KB task. Instances of this class are added as RDF nodes with their own URI. The document references them using the nif:topic
property.
This class extends the Annotation
class by implementing the ScoredMarking
interface and adding a confidence score to the annotation. In the RDF graph, the confidence score is added to the annotation using the itsrdf:taConfidence
property.
This class implements the Span
interface.
This class extends the SpanImpl
class and implements the ScoredSpan
interface.
This class represents a named entity inside the text. It extends the SpanImpl
class and implements the MeaningSpan
interface. In the RDF graph, the URI(s) of the named entity are added to the Span
RDF node using the itsrdf:taIdentRef
property.
This class extends the NamedEntity
class by implementing the ScoredMarking
interface and adding a confidence score to the named entity. In the RDF graph, the confidence score is added to the named entities RDF node using the itsrdf:taConfidence
property.
This class extends the NamedEntity
class by implementing the TypedMarking
interface and adding a set of types to the named entity. In the RDF graph, the type URI(s) are added to the Span
RDF node using the itsrdf:taClassRef
property.
This class extends the ScoredTypedNamedEntity
class by implementing the ScoredMarking
interface and adding a confidence score to the named entity. In the RDF graph, the confidence score is added to the named entities RDF node using the itsrdf:taConfidence
property.
This class represents a part of the text, for which no meaning (in terms of one or more URIs) but a list of types is available. In the RDF graph, the type URI(s) are added to the Span
RDF node using the itsrdf:taClassRef
property.