-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathINSTALL
139 lines (108 loc) · 6.24 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
You will need
-------------
* CMake (version >= 2.8)
Needed to build and install the project. Either the curses interface 'ccmake'
or the Qt version is also recommended for easier configuration. A build
system for which a CMake generator exists (e.g. make) and the compiler tools
for C++ are of course a necessity as well (tested on gcc 4.5.3, 4.7.2).
Debian packages: g++, make, cmake, cmake-curses-gui
* Python
Needed to run the analyze script.
Debian packages: python
* Maxent Modeling Toolkit for Python and C++ (Le Zhang)
I recommend that you have a Fortran compiler installed so that the L-BFGS
procedure is also compiled. The zlib libraries and headers will also allow
you to save the trained model in a compressed binary format (not required).
The project is hosted at https://github.com/lzhang10/maxent
Debian packages: gfortran (for the Fortran compiler), zlib1g, zlib1g-dev (for
the optional zlib support)
* GNU Fortran libraries
When on a UNIX-like system which links shared libraries the way the gold
linker does (Debian Wheezy, Fedora), some library dependencies required by
Maxent and Boost.Thread have to be linked by us and so trtok always tries to
link with the GNU Fortran libraries on a UNIX-like system. This is because we
assume most users will have Maxent compiled with L-BFGS and this is one of
the cases where we have to explicitly link the libraries. The other
explicitly linked libraries which are presumed to be present are pthread and
rt.
Debian packages: libgfortran3 (installed as a dependency of gfortran)
* Quex (version 0.63.1)
The lexical analyzer generator used during both compile time and runtime.
Quex can change during its development. This release was tested with Quex
version 0.63.1. If you use other versions of Quex, you're gonna have a bad
time.
See the official releases at http://quex.org.
* libtool dynamic loading library (with headers)
A platform-independent wrapper for loading shared modules during runtime.
Debian packages: libltdl-dev
* PCRE (Perl Compliant Regular Expression) (with headers)
The library built with the C++ wrapper and support for UTF-8 is needed. The
source can be downloaded at http://pcre.org.
Debian packages: libpcre3, libpcre3-dev, libpcrecpp0
* Boost libraries (version >= 1.46, with headers)
The following compiled libraries are needed: system, filesystem,
program_options, thread. A lot more headers-only libraries are required so I
recommend installing the whole lot.
Debian packages: libboost-all-dev
* Threading Building Blocks (with headers)
This library allows us to harness multiple threads with great ease. It should
be readily available for all platforms at http://threadingbuildingblocks.org.
Debian packages: libtbb2, libtbb-dev
* ICU or iconv (with headers)
Any of these two libraries will do. These libraries are needed by Quex and by
trtok himself when decoding and encoding the character stream. On my Debian
machine iconv is already built-in (to the point where I can't even choose
whether I want to link it) and I suppose this will be the case with a lot of
GNU/Linux distributions. On the other hand, ICU should be present on Windows
machines. Sources for ICU are at http://site.icu-project.org.
Debian packages: libicu44, libicu-dev (optional, as Debian comes with iconv
already)
Installing dependencies
----------------------
On Debian, simply issue
aptitude install g++ make cmake cmake-curses-gui python3 gfortran zlib1g-dev\
libltdl-dev libpcre3-dev libpcrecpp0 libboost-all-dev\
libtbb-dev
as root. When using a distribution older than Debian Wheezy, you might need to
install Boost 1.46 from upstream due to changes in the FileSystem API.
For maxent, git clone the repository at git://github.com/jirkamarsik/maxent.git
or download the tarred source from the web interface at
https://github.com/jirkamarsik/maxent and follow the installation directions.
You should end up with the compiled library in one of your lib directories and
the headers somewhere in your include directories.
For Quex, download the latest version from http://quex.org (or at least version
0.59.1) and unpack it to any location. After you set the QUEX_PATH environment
variable to that directory, you are set.
Building and installing trtok
-----------------------------
Navigate to the cloned/downloaded-and-untarred directory where this file is
located. I suggest making a build directory next to the src directory and
navigating to it. Issue
cmake ../src
to start an out-of-source (or in-source if you omitted creating the build
directory and simply went for the src directory) build. If CMake failed to find
any required components, you can point it in the right directory by editing
the CMake cache. For that I recommend ccmake, where you simply run
ccmake .
from the build directory and you can setup all the parameters in a nice way.
Besides helping CMake find the required libraries, you can use ccmake to setup
the installation directory INSTALL_DIR, whether iconv or ICU is used for
character decoding and encoding (USE_ICONV, USE_ICU), whether static libraries
should be preferred (PREFER_STATIC_LIBRARIES), the path to Quex (QUEX), the
build type (CMAKE_BUILD_TYPE=Debug, Release...) and tune several program
parameters (the most interesting being WORK_UNIT_COUNT, which sets the number of
threads running in the pipeline at the same time and CHUNK_SIZE, which sets the
size of work units being passed through the pipeline).
After setting the parameters, press 'c' and then 'g' to configure the cache and
generate the Makefiles (or other files for your build system of choice).
An alternative to setting the CMake parameters interactively using ccmake is
to use the -D option of cmake, e.g.:
cmake . -DCMAKE_BUILD_TYPE=Release
Now you can use make or whatever build system you use to compile and install
the application. For example with make you could type
make -j 4 # run 4 jobs simultaneously
make install
After installing, you need to set the environment variable TRTOK_PATH to your
installation directory, so trtok always knows where to look for your
tokenization schemes, where to build and train them and where the code for the
dynamically compiled rough tokenizer is stored.