kba-stanford-corenlp

Wrappers for generating one-word-per-line output representing all the goodies from Stanford CoreNLP, so we can include it in the KBA stream corpus.

Dependency

This wrapper supports a varity version of CoreNLP. We use the version version 1.3.4 for testing.

Installation

After checking out the repository, you need to get all dependencies by run the lib/get.sh script:

$ cd lib
$ sh get.sh

This script will download a lib.zip file, and unzip it to get all .jar files we need.

After that, you can compile the code using

$ ant

NER, Parsing, and Coref

To get the .jar file for NER (including parsing and coref), run

$ ant ner

This will produce a .jar file called runNER.jar.

$ java -jar runNER.jar <INPUT> <OUTPUT>

NER and Parsing

The file src/nlp/runNER.java takes as input a corpus in format

<FILENAME id="DOCUMENTID">
content...
</FILENAME>
...
<FILENAME id="DOCUMENTID">
content...
</FILENAME>

and output the one-token-per-line TSV format as

<FILENAME id="DOCUMENTID">
<SENT id="SENT-NUM">
TOK-NUM    TOKEN    BEGIN:END_OFFSET    POS    NER    LEMMA    DEP-PATH-TO-PARENT    PARENT-ID    COREF-CLUSTER-ID
...
</SENT>
</FILENAME>

SENT-NUM is zero-based index to sentence within the document. TOK-NUM is zero-based index to token within the sentence.

BEGIN_OFFSET and END_OFFSET count number of bytes not characters.... or not?!

You can see the input.txt and ner-output.txt examples in the test/ folder.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
lib		lib
src		src
test		test
.gitignore		.gitignore
README.md		README.md
assemble_ner.py		assemble_ner.py
build.xml		build.xml
test-out.xml		test-out.xml
test.xml		test.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kba-stanford-corenlp

Dependency

Installation

NER, Parsing, and Coref

NER and Parsing

About

Releases

Packages

Languages

trec-kba/kba-stanford-corenlp

Folders and files

Latest commit

History

Repository files navigation

kba-stanford-corenlp

Dependency

Installation

NER, Parsing, and Coref

NER and Parsing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages