-
Notifications
You must be signed in to change notification settings - Fork 58
Document language
GERBIL is language agnostic (it even ignores language information). This has advantages but also disadvantages. The main advantage is that GERBIL works for all languages (as long as they can be expressed in text form). The main disadvantage is that the user has to make sure that an annotation system is able to process the documents of the dataset the user is using for benchmarking.
The language can not be retrieved at the moment. It has to be known beforehand or the annotation system has to be able to detect the language.
This might change in the future (see #35).
This is very system dependent. In general, it is the task of the user to make sure that the system he is benchmarking is able to understand the language of the dataset.
However, using the example of DBpedia Spotlight, we will briefly show how an annotation system can be adapted.
The documentation of DBpedia Spotlight shows the list of supported languages. We simply have to adapt the URL of the web service in the annotators.properties file. While by default GERBIL will use
http://model.dbpedia-spotlight.org/en/
this can be easily changed to
http://model.dbpedia-spotlight.org/fr/
You could even copy the definition of the DBpedia Spotlight annotation system. But please make sure that you use a different parameter key than spotlight and a different name that "DBpedia Spotlight". In the following, we define an annotation system DBpedia Spotlight (FR)
with the parameter key spotlightFR
org.aksw.gerbil.annotators.definition.spotlightFR.name=DBpedia Spotlight (FR)
org.aksw.gerbil.annotators.definition.spotlightFR.experimentType=OKE_Task1
org.aksw.gerbil.annotators.definition.spotlightFR.cacheable=true
org.aksw.gerbil.annotators.definition.spotlightFR.class=org.aksw.gerbil.annotator.impl.spotlight.SpotlightAnnotator
org.aksw.gerbil.annotators.definition.spotlightFR.constructorArgs=http://model.dbpedia-spotlight.org/fr/
It can be seen that we use the URI for the french endpoint as argument for the constructor (https://github.com/dice-group/gerbil/blob/master/src/main/java/org/aksw/gerbil/annotator/impl/spotlight/SpotlightAnnotator.java#L60).
Note that not all annotation systems might be that easy adaptable. Again: it depends on a) whether an annotation system offers the desired language and b) how its API accepts the language parameter. In many cases, it can be submitted as part of the URL (as described above). However, there might be other annotators, that expect the parameter in a different way, which might make it necessary to adapt the system adapter implementation itself.