This version has been trimmed of all licensed and copyrighted materials. As a result, the test data is not included in this repository, and commands may not run as expected without additional setup.
make -j $(nproc) target/dnb18.i5.xml SRC_DIR=test/resources/DNB YEARS=18
make -j $(nproc) i5 SRC_DIR=test/resources/DNB
Prerequisite: KorAP-XML-CoNLL-U
make -j $(nproc) target/ SRC_DIR=test/resources/DNB YEARS=18
make -j $(nproc) index
The index will be in target/dnb.index
Adjust the following line in your korap4dnb-compose.yml
to point to your index (it is in target/dnb.index by default, but should better be copied to a safe place):
- "${PWD}/target/dnb.index:/kustvakt/index:z"
and start the docker:
docker compose -p korap4dnb --profile=lite -f korap4dnb-compose.yml up -d
docker compose -p korap4dnb down
Install prerequisite korap/conllu2treetagger and korap/conllu2spacy docker images if not present:
docker image inspect korap/conllu2treetagger:latest || curl -Ls '' | docker load
docker image inspect korap/conllu2spacy:latest || curl -Ls | docker load
Make annotations fro dnb20:
make -j $(nproc) target/ target/ target/
Build KorAP all, up to the deployable index:
make -j $(nproc) all
- extended genre classification based on metadata keywords
- Saxon XSLT processor and license updated from 9 to 12.4
- added
elements with all ids given by dnb SRU api - fixed bug with ambiguous (dnb-id/isbn) ids
- basic genre classification based on metadata keywords
- added
- SRC_DIR now defaults to the production sample!
- ISBN number recognition should be fixed now
- ignore faulty xhtml input files and conversion errors – just issue a warning
- added pass2 and pass3 to xslt conversion to …
- fix div, p, hi, ref … nestings
- remove empty elements
- join subsequent hi elements
- improved korapxml2krill performance by using all cores (-1 does not work here)
- sanitized the Makefile and dropped YY variable, use YEARS instead
- added pass2 and pass3 to xslt conversion to …
- multiple authors (and non-authors) are now correctly handled
- some more .(x)html files are now dropped (toc, cover, etc.)
- PRELIMINARY support for splitting everything into annual volumes
- use
make YY=22
to select 2022 - does not yet work for the index!
- use
- slow udpipe2 dropped
- added marmot POS and morpho-syntactic annotations
- added malt dependency annotations
- added
make deploy
to install new index and restart local KorAP@DNB instance (also available as ci target) - added
make targets to monitor the local KorAP@DNB instance
- added
- added
make all
to build all targets, including the index
- added
- CI/CD pipeline added
- first working pipeline for EPub ⮕ TEI I5 ⮕ KorAP-XML ⮕ (UDPipe+TreeTagger+Spacy) ⮕ Krill ⮕ KorAP-JSON
2024-03-15: DNB test data added