-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathafterword.tex
530 lines (489 loc) · 29 KB
/
afterword.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
\chapter*{Afterword}
\markboth{\textsc{AFTERWORD}}{}
\addcontentsline{toc}{chapter}{Afterword}
It is hard to believe, but at this point we have finally reached the
home stretch of our \thepage-page long marathon and, preparing the
final spurt, we should first recall the main milestones that we have
seen along this way:
\begin{itemize}
\item As you might remember, we started off by summarizing the history
of sentiment analysis, going back to its very origins in the ancient
Greek philosophy and tracing its development to the present day;
\item Afterwards, to see what the current state of the art in opinion
mining would yield on German Twitter, we created a corpus of
$\approx8,000$ German tweets, collecting these messages for four
different topics (federal elections, papal conclave, general
political discussions, and casual everyday conversations). To
ensure a good recall of opinionated statements in the resulting
dataset, we grouped all microblogs into three formal categories
(tweets with a polar term from the SentiWS lexicon, messages
containing a smiley, and all remaining microblogs) and sampled an
equal number of tweets (666) for each of the four topics from each
of these three categories. After annotating the corpus in three
steps (initial, adjudication, and final), we attained a reliable
level of inter-annotator agreement for all elements (sentiments,
sources, targets, polar terms, downtoners, negations, and
intensifiers), finding that both selection criteria (topics and
formal traits) significantly affected the distribution of sentiments
and polar terms and the reliability of their annotation;
\item Then, at the first checkpoint, we compared existing German
sentiment lexicons, which were translated from English resources and
revised by human experts, with lexicons that were generated
automatically from scratch with the help of state-of-the-art
dictionary\mbox{-,} corpus\mbox{-,} and word-embedding--based
methods. An evaluation of these approaches on our corpus showed
that semi-automatically translated polarity lists were generally
better than the automatically induced ones, reaching 0.587
macro-\F{} and attaining 0.955 micro-\F{}--score on the prediction
of polar terms. Furthermore, among fully automatic methods,
dictionary-based systems showed stronger results than their corpus-
and word-embedding--based competitors, yielding 0.479 macro-\F{} and
0.962 micro-\F{}. We could, however, improve on the latter metric
(pushing it to 0.963) with our proposed linear projection solution,
in which we first found a line that maximized the mutual distance
between the projections of seed vectors with opposite semantic
orientations and then projected the embeddings of all remaining
words on that line, considering the distance of these projections to
the median as polarity scores of respective terms;
\item In Chapter~\ref{chap:fgsa}, we turned our attention to the
fine-grained sentiment analysis, in which we tried to predict the
spans of sentiments, targets, and holders of opinions using two most
popular approaches to this task: conditional random fields and
recurrent neural networks. We obtained our best results (0.287
macro-\F{}) with the first-order linear-chain CRFs. We could,
however, increase these scores by using alternative topologies of
CRFs (second-order linear-chain and semi-Markov CRFs) and also boost
the macro-averaged \F{} to 0.38 by taking a narrower interpretation
of sentiment spans (in which we only assigned the \textsc{Sentiment}
tag to polar terms). Further evaluation of these methods proved the
utility of the text normalization step (which raised the macro-\F{}
of the CRF-method by almost 3\%) and task-specific word embeddings
with the least-squares fallback (which improved the
macro-\F{}--score of the GRU system by 1.4\%);
\item Afterwards, in Chapter~\ref{chap:cgsa}, we addressed one of the
most popular objective in contemporary sentiment
analysis---message-level sentiment analysis (MLSA). To get a better
overview of the numerous existing systems, we compared three larger
families of MLSA methods: dictionary-, machine-learning--, and
deep-learning--based ones; finding that the last two groups
performed significantly better than the lexicon-based approaches
(the best macro-\F{}--scores of machine- and deep-learning methods
run up to 0.677 and 0.69 respectively, whereas the best
lexicon-based solution [\citeauthor{Hu:04}, \citeyear{Hu:04}] only
reached 0.641 macro-\F{}). Apart from this, we improved the results
of many reimplemented approaches by changing their default
configuration (\eg{} abandoning polarity changing rules of
lexicon-based systems, using alternative classifiers in ML-based
systems, or taking the least-squares embeddings for DL-based
methods). In addition to the numerous reimplementations of popular
existing algorithms, we also proposed our own
solution---lexicon-based attention (LBA), in which we tried to unite
the lexicon and deep-learning paradigms by taking a bidirectional
LSTM network and explicitly pointing its attention to polar terms
that appeared in the analyzed messages. With this solution, we not
only outperformed all alternative DL systems but also improved on
the scores of ML-based classifiers, attaining 0.69 macro-\F{} and
0.73 micro-\F{} on the PotTS corpus. Similarly to our findings of
the previous chapter, we observed a strong positive effect of text
normalization and task-specific embeddings with the least-squares
approximation;
\item Finally, in the last part, we tried to improve the results of
the proposed LBA method by making it aware of the discourse
structure. For this purpose, we segmented all microblogs from the
PotTS and SB10k corpora into elementary discourse units,
individually analyzing each of these segments with our MLSA
classifier, and then estimated the overall polarity of a tweet by
joining the polarity scores of its EDUs over the RST tree. We
proposed three different ways of doing this joining: latent CRFs,
latent-marginalized CRFs, and Recursive Dirichlet Process; obtaining
better results than existing discourse-aware sentiment methods and
also outperforming the original discourse-unaware baseline. In the
concluding experiments, we further improved these scores by using
manually annotated RST trees and richer subsets of discourse
relations.
\end{itemize}
\section*{Conclusions}
%% \addcontentsline{toc}{section}{Conclusions}
Now that we have gone past all these landmarks, it is time to unbag
the questions which we had asked ourselves at the beginning of this
endeavor, and try to answer them again, equipped with all knowledge
that we have acquired during our run. Here we go:
\begin{itemize}
\item\textbf{Can we apply opinion mining methods devised for
standard English to German Twitter?}
Yes, we can, but the success of these approaches might
significantly vary depending on the task, the size, and the
reliability of the training data, as well as the evaluation metric
that we use. For example, dictionary-based lexicon methods
achieved fairly good results on their objective, but this success
was mostly due to the high quality of the \textsc{GermaNet}
annotation. On the other hand, our manually labeled PotTS corpus
was evidently too small for fine-grained sentiment systems, which
failed to generalize to unseen tweets despite their very high
scores on the training set. Message-level sentiment approaches,
vice versa, seemed to be quite happy with the size of the training
dataset, attaining good results on both corpora (PotTS and SB10k).
Nevertheless, we again experienced a lack of data while working on
discourse-aware enhancements, many of which hit the same ceiling
of the macro-averaged \F{}-scores.
Apart from these difficulties arising from insufficient data, we
also noticed a significant degradation of the scores for systems
whose original tasks and evaluation metrics were different from
ours. For example, the lexicon generation method of
\citet{Esuli:05} was originally designed to assign polarity scores
to all \emph{synsets} found in the \textsc{WordNet} and not to
produce a list of polar \emph{words}. Similarly, the RNTN
approach of \citet{Socher:13} was trained and evaluated on all
syntactic subtrees of a document and not only at the top text
level. Likewise, the system of~\citet{Yessenalina:11} was devised
for doing ordinal logistic regression and not polarity
classification, as in our case. As a result, all these approaches
showed lower scores than their competitors in our evaluation, even
though they are undoubtedly well suited for their original data
and tasks.
Due to the high diversity of methods, metrics, and tasks, it is
difficult to provide a general recipe for transferring existing
English sentiment systems to German Twitter, but we still would
like to formulate at least a few rules of thumb, which came up
during our experiments:
\begin{itemize}
\item\textbf{Prefer methods that are closest to your training
objective} and that were trained under similar conditions
w.r.t.\ the amount of data, their class distribution and
domain;
\item\textbf{Put every single setting of these methods into
question}---bear in mind that things that work well in the
original cases are not guaranteed to work in your
situation.\footnote{In this respect, it is important to
realize that every classification task is merely an attempt
to solve a system of equations, so that methods that are
good at solving one system might completely fail to solve
another set of equations.} The more options you try, the
better will be your results;
\item\textbf{Try using manually labeled resources for your
target domain}, if they are available, but pay attention to
the quality of their annotation---it often matters more than
the corpus size;
\item If there are manually annotated data, \textbf{prefer
machine-learning methods to hard-coded rules}---they will
penalize their bad components automatically by themselves;
\item\textbf{Do not use randomly initialized word embeddings for
deep-learning systems}---initialize them with language-model
vectors (which are cheap to obtain). Otherwise, your model
might get stuck in a very bad local optimum.
\end{itemize}
\item\textbf{Which groups of approaches are best suited for which
sentiment tasks?}
Based on our evaluation, we answer this question as follows:
\begin{itemize}
\item\emph{Sentiment lexicon generation} is more amenable to
dictionary-based solutions, provided that there exists a
sufficiently big, reliably annotated lexical taxonomy for
these systems. If there is no such resource, one should
better resort to word-embedding--based algorithms;
\item With a limited amount of training data, \emph{fine-grained
sentiment analysis} can be better addressed with probabilistic
graphical models, such as conditional random fields with
hand-crafted features;
\item On the other hand, plain \emph{message-level sentiment
analysis} can be efficiently tackled with both machine- and
deep-learning algorithms, such as SVM, logistic regression, or
RNN\@;
\item But probabilistic graphical models strike back at
\emph{discourse-aware sentiment methods}, where they might
even outperform pure neural-network solutions, although the
margin of these improvements is not that large.
\end{itemize}
Thus, probabilistic model can still hold their ground when it
comes to structured prediction, but the difference of these
algorithms from and their improvements upon neural networks are
gradually vanishing.
\item\textbf{How much do word- and discourse-level analyses affect
message-level sentiment classification?}
Our evaluation in Section~\ref{cgsa:subsec:eval:lexicons} showed
that the macro-averaged \F{}-scores of our proposed lexicon-based
attention system varied by up to 14\% (from 0.64 to 0.69
macro-\F{} on the PotTS corpus, and from 0.44 to 0.58 on SB10k)
depending on the lexicon used by this approach. At the same,
discourse enhancements could only improve the results of LBA by at
most 1.5\% percent (from 0.677 to 0.678 on PotTS, and from 0.557
to 0.572 on SB10k). Although it appears as if the lexicon
component were more important to a sentiment system, we would like
to preclude such incorrect conclusion, because
\begin{inparaenum}[(a)]
\item a full-fledged sentiment solution should take into account
both linguistic levels (words and discourse) and
\item these relative results might look different if we expand
the analyzed domain to longer documents or apply
discourse-aware methods to complete discussion threads.
\end{inparaenum}
\item\textbf{Does text normalization help analyze sentiments?}
Yes, it definitely does. As we could see in
Chapters~\ref{chap:fgsa} and~\ref{chap:cgsa}, normalization
significantly improves the quality of fine-grained and
message-level sentiment analyses, boosting the results on the
former task by up to 4\% (see
Table~\ref{snt-fgsa:tbl:normalization}) and improving the
macro-averaged \F{}-measure of message-level sentiment methods by
up to 25\% (see Table~\ref{snt-cgsa:tbl:res-no-normalization}).
The only question that remained unanswered in this context is
which normalization steps exactly improve the scores of sentiment
systems. To make up for this omission, we separately deactivated
each individual step of our text normalization pipeline
(unification of Twitter phenomena, spelling correction, and
normalization of slang terms) and rerun our message-level
classification experiments using the lexicon-based attention
system. As we can see from the results in
Table~\ref{afterword:tbl:lba-normalization-steps}, the
micro-averaged \F{}-scores on both datasets benefit most from the
unification of Twitter-specific phenomena, sinking by almost 19\%
when this component is deactivated. This step is also most useful
for the macro-\F{} on the SB10k corpus, whereas the macro-average
on PotTS mostly capitalizes on the normalization of slang terms.
\begin{table}[htb!]
\begin{center}
\bgroup\setlength\tabcolsep{0.1\tabcolsep}\scriptsize
\begin{tabular}{p{0.162\columnwidth} % first columm
*{9}{>{\centering\arraybackslash}p{0.074\columnwidth}} % next nine columns
*{2}{>{\centering\arraybackslash}p{0.068\columnwidth}}} % last two columns
\toprule
\multirow{2}*{\bfseries Method} & %
\multicolumn{3}{c}{\bfseries Positive} & %
\multicolumn{3}{c}{\bfseries Negative} & %
\multicolumn{3}{c}{\bfseries Neutral} & %
\multirow{2}{0.068\columnwidth}{\bfseries\centering Macro\newline \F{}$^{+/-}$} & %
\multirow{2}{0.068\columnwidth}{\bfseries\centering Micro\newline \F{}}\\
\cmidrule(lr){2-4}\cmidrule(lr){5-7}\cmidrule(lr){8-10}
& Precision & Recall & \F{} & %
Precision & Recall & \F{} & %
Precision & Recall & \F{} & & \\\midrule
\multicolumn{12}{c}{\cellcolor{cellcolor}PotTS}\\
%% with normalization
with normalization & 0.76 & 0.84 & 0.79 & %
0.6 & 0.56 & 0.58 & %
0.75 & 0.68 & 0.72 & %
0.69 & 0.73\\
%% without replacement of Twitter-specific phenomena
%% General Statistics:
%% precision recall f1-score support
%% positive 0.51 0.87 0.64 679
%% negative 0.57 0.40 0.47 268
%% neutral 0.68 0.22 0.34 593
%% micro avg 0.54 0.54 0.54 1540
%% macro avg 0.59 0.50 0.48 1540
%% weighted avg 0.59 0.54 0.49 1540
%% Macro-Averaged F1-Score (Positive and Negative Classes): 55.53%
%% Micro-Averaged F1-Score (All Classes): 53.7662%
w/o unification of Twitter phenomena & 0.51\negdelta{0.25} & 0.87\posdelta{0.03} & 0.64\negdelta{0.05} & %
0.57\negdelta{0.03} & 0.4\negdelta{0.16} & 0.47\negdelta{0.11} & %
0.68\negdelta{0.07} & 0.22\negdelta{0.46} & 0.34\negdelta{0.38} & %
0.56\negdelta{0.13} & 0.54\negdelta{0.19}\\
%% without spelling correction
%% General Statistics:
%% precision recall f1-score support
%% positive 0.67 0.84 0.74 648
%% negative 0.61 0.34 0.44 268
%% neutral 0.74 0.68 0.71 626
%% micro avg 0.69 0.69 0.69 1542
%% macro avg 0.67 0.62 0.63 1542
%% weighted avg 0.69 0.69 0.68 1542
%% Macro-Averaged F1-Score (Positive and Negative Classes): 59.01%
%% Micro-Averaged F1-Score (All Classes): 68.9364%
w/o spelling correction & 0.67\negdelta{0.09} & 0.84 & 0.74\negdelta{0.05} & %
0.61\posdelta{0.01} & 0.34\negdelta{0.22} & 0.44\negdelta{0.14} & %
0.74\negdelta{0.01} & 0.68 & 0.71\negdelta{0.01} & %
0.59\negdelta{0.1} & 0.69\negdelta{0.04}\\
%% without slang normalization
%% General Statistics:
%% precision recall f1-score support
%% positive 0.59 0.87 0.70 650
%% negative 0.60 0.17 0.26 268
%% neutral 0.72 0.60 0.65 624
%% micro avg 0.64 0.64 0.64 1542
%% macro avg 0.64 0.54 0.54 1542
%% weighted avg 0.65 0.64 0.61 1542
%% Macro-Averaged F1-Score (Positive and Negative Classes): 48.26%
%% Micro-Averaged F1-Score (All Classes): 63.6187%
w/o slang normalization & 0.59\negdelta{0.17} & 0.87\posdelta{0.03} & 0.7\negdelta{0.09} & %
0.6 & 0.17\negdelta{0.39} & 0.26\negdelta{0.32} & %
0.72\negdelta{0.03} & 0.6\negdelta{0.08} & 0.65\negdelta{0.07} & %
0.48\negdelta{0.21} & 0.64\negdelta{0.09}\\
\multicolumn{12}{c}{\cellcolor{cellcolor}SB10k}\\
%% with normalization
with normalization & 0.6 & 0.72 & 0.66 & %
0.47 & 0.42 & 0.44 & %
0.84 & 0.8 & 0.82 & %
0.55 & 0.73\\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% without replacement of Twitter-specific phenomena
%% cgsa_sentiment train -t lba --lstsq -l cgsa/data/lexicons/linproj.word2vec.kim_hovy_seedset.auto.txt ~/data/CGSA/SB10k/preprocessed-no-noise-cleaner/train/train.tsv ~/data/CGSA/SB10k/preprocessed-no-noise-cleaner/dev/dev.tsv
%% cgsa_sentiment test ~/data/CGSA/SB10k/preprocessed-no-noise-cleaner/test/test.tsv > ~/data/CGSA/SB10k/preprocessed-no-noise-cleaner/predicted/lba.test
%% General Statistics:
%% precision recall f1-score support
%% positive 0.36 0.85 0.50 354
%% negative 0.60 0.25 0.35 212
%% neutral 0.84 0.51 0.63 930
%% micro avg 0.55 0.55 0.55 1496
%% macro avg 0.60 0.53 0.49 1496
%% weighted avg 0.69 0.55 0.56 1496
%% Macro-Averaged F1-Score (Positive and Negative Classes): 42.52%
%% Micro-Averaged F1-Score (All Classes): 55.1471%
w/o unification of Twitter phenomena & 0.36\negdelta{0.24} & 0.85\posdelta{0.13} & 0.5\negdelta{0.16} &%
0.6\posdelta{0.13} & 0.25\negdelta{0.17} & 0.35\negdelta{0.09} & %
0.84 & 0.51\negdelta{0.29} & 0.63\negdelta{0.19} & %
0.43\negdelta{0.12} & 0.55\negdelta{0.18}\\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% without spelling correction
%% Commands:
%% cgsa_sentiment train -t lba --lstsq -l cgsa/data/lexicons/linproj.word2vec.kim_hovy_seedset.auto.txt ~/data/CGSA/SB10k/preprocessed-no-misspelling-restorer/train/train.tsv ~/data/CGSA/SB10k/preprocessed-no-misspelling-restorer/dev/dev.tsv
%% cgsa_sentiment test ~/data/CGSA/SB10k/preprocessed-no-misspelling-restorer/test/test.tsv > ~/data/CGSA/SB10k/preprocessed-no-misspelling-restorer/predicted/lba.test
%% General Statistics:
%% precision recall f1-score support
%% positive 0.54 0.71 0.61 354
%% negative 0.54 0.26 0.35 212
%% neutral 0.79 0.79 0.79 930
%% micro avg 0.70 0.70 0.70 1496
%% macro avg 0.62 0.59 0.58 1496
%% weighted avg 0.70 0.70 0.69 1496
%% Macro-Averaged F1-Score (Positive and Negative Classes): 48.12%
%% Micro-Averaged F1-Score (All Classes): 69.5856%
w/o spelling correction & 0.54\negdelta{0.06} & 0.71\negdelta{0.01} & 0.61\negdelta{0.05} & %
0.54\posdelta{0.07} & 0.26\negdelta{0.16} & 0.35\negdelta{0.09} & %
0.79\negdelta{0.05} & 0.79\negdelta{0.01} & 0.79\negdelta{0.03} & %
0.48\negdelta{0.07} & 0.7\negdelta{0.03}\\
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% without slang normalization
%% Commands:
%% cgsa_sentiment train -t lba --lstsq -l cgsa/data/lexicons/linproj.word2vec.kim_hovy_seedset.auto.txt ~/data/CGSA/SB10k/preprocessed-no-slang-normalization/train/train.tsv ~/data/CGSA/SB10k/preprocessed-no-slang-normalization/dev/dev.tsv
%% cgsa_sentiment test ~/data/CGSA/SB10k/preprocessed-no-slang-normalization/test/test.tsv > ~/data/CGSA/SB10k/preprocessed-no-slang-normalization/predicted/lba.test
%% General Statistics:
%% precision recall f1-score support
%% positive 0.55 0.71 0.62 354
%% negative 0.64 0.20 0.30 212
%% neutral 0.78 0.82 0.80 930
%% micro avg 0.70 0.70 0.70 1496
%% macro avg 0.66 0.57 0.57 1496
%% weighted avg 0.71 0.70 0.69 1496
%% Macro-Averaged F1-Score (Positive and Negative Classes): 45.97%
%% Micro-Averaged F1-Score (All Classes): 70.3877%
w/o slang normalization & 0.55\negdelta{0.05} & 0.71\negdelta{0.01} & 0.62\negdelta{0.04} & %
0.64\posdelta{0.17} & 0.2\negdelta{0.22} & 0.3\negdelta{0.14} & %
0.78\negdelta{0.06} & 0.82\posdelta{0.02} & 0.8\negdelta{0.02} & %
0.46\negdelta{0.09} & 0.7\negdelta{0.03}\\\bottomrule
\end{tabular}
\egroup{}
\caption{LBA$^{(1)}$ results without single text normalization
steps}\label{afterword:tbl:lba-normalization-steps}
\end{center}
\end{table}
\item\textbf{Can we do better than existing approaches?}
Yes, we can:
\begin{itemize}
\item we improved the macro-averaged results of existing
lexicon-generation methods with our proposed linear-projection
algorithm;
\item we increased the scores of fine-grained analysis by
redefining the topologies of CRFs;
\item our lexicon-based attention network outperformed many of
its competitors on message-level classification;
\item and, finally, we surpassed the discourse-unware baseline and
other existing discourse-aware sentiment solutions with the
proposed latent-marginalized CRFs and Recursive Dirichlet
Process.
\end{itemize}
\end{itemize}
\section*{Contributions}
%% \addcontentsline{toc}{section}{Contributions}
Apart from answering the above questions and pushing the state of the
art for several major sentiment tasks on the PotTS and SB10k corpora,
we have also paved the way for other researchers who want to work on
the same topics by releasing the data and the code that we used in our
experiments:
\begin{itemize}
\item the Potsdam Twitter Sentiment (PotTS) corpus is available at:\\
\url{https://github.com/WladimirSidorenko/PotTS};
\item scripts and executables used in our lexicon generation chapter
can be downloaded
from:\\ \url{https://github.com/WladimirSidorenko/SentiLex};
\item for our text normalization pipeline and fine-grained sentiment
methods, please refer to:\\
\url{https://github.com/WladimirSidorenko/TextNormalization};
\item furthermore, you can find our MLSA approaches
at:\\ \url{https://github.com/WladimirSidorenko/CGSA};
\item and get all discourse-aware solutions
from:\\ \url{https://github.com/WladimirSidorenko/DASA};
\item last but not least, we have released the discourse segmenter,
the adapted DPLP parser, and our modified version of
\textsc{RST-Tool}, which was adjusted to the annotation of
multilogues, at:
\begin{itemize}
\item\url{https://github.com/WladimirSidorenko/DiscourseSegmenter},
\item\url{https://github.com/WladimirSidorenko/RSTParser}, and
\item\url{https://github.com/WladimirSidorenko/RSTTool}, respectively.
\end{itemize}
\end{itemize}
In addition to open-sourcing all projects, we have also made a few
attempts to increase the visibility of our research with the following
publications:
\begin{itemize}
\item the rule-based text normalization was described
in~\cite{Sidarenka:13};
\item the PotTS corpus was presented in~\cite{Sidarenka:16};
\item in~\cite{Sidarenka:16b}, we summarized the evaluation of
existing sentiment lexicons (a separate paper on the linear
projection algorithm was withdrawn due to a mistake in the initial
implementation);
\item in~\cite{Sidarenka:16a} and~\cite{Sidarenka:17}, we described
our initial experiments on message-level classification;
\item furthermore, we introduced the SVM-based discourse segmenter
in~\cite{Sidarenka:15};
\item and sketched our pilot study of discourse annotation in
Twitter in~\cite{Sidarenka:15a}.
\end{itemize}
Unfortunately, due to the lack of experience at the initial stage of
working on this dissertation and limited time at the concluding stage,
I\footnote{Throughout this work we have been using the scientific
``we,'' considering the reader as a companion in our marathon. But
because at this point I start describing the limitations of this
work, which I am the only person responsible for, I would like to
exclude the reader from this criticism by switching to the solitary
``I''.} was not able to publish more or at higher-level venues. I
apologize for that.
\section*{Limitations}
%% \addcontentsline{toc}{section}{Limitations}
Much to my regret, the initial lack of academic expertise has also
prevented me from running this scientific marathon faster, better,
and, most sadly, along more exciting places. Alas, in the ``Sentiment
Analysis of German Twitter'', I have concentrated more on the
``Sentiment Analysis'' part, much at the cost of ``German Twitter''.
I wish I had tried out more sophisticated cross-lingual methods for
adapting English methods to German, elaborated more on linguistic
traits of microblogs, and addressed the social aspect of the Twitter
network. The main reason why I have not done all these things is
that, six years ago, when I started working on this dissertation
completely from scratch, with neither code, nor data, nor any proper
plan in my head, I was so overwhelmed by the abundance of works on
opinion mining and text normalization that I decided to answer the
questions whether existing sentiment methods work, whether text
normalization helps, and whether I could improve on these methods,
first. Regrettably, these questions have pretty much preoccupied me
since then. Another reason why I refrained from addressing certain
topics (why, for example, I did not extend the Rhetorical Structure
Theory to multilogues) is because properly handling these problems
would require another dissertation and would certainly first need to
be done for English in order to get any international attention.
\section*{Final Remarks}
%% \addcontentsline{toc}{section}{Final Remarks}
Nevertheless, I believe that with this thesis I have basically built a
theme park on the moon. Although this park is still missing a slot
machine and a few carousels, it does have a nice card table and many
other kinds of funny amusements. Another good thing about it is that
new carousels are now easier to build; existing amusement rides work
for free and better than in many other places; and, finally, the park
itself might still entertain its occasional guests. You might have
enjoyed this theme park as well, or you might have not---I appreciate
either of your decisions and would like to thank you in any case for
your visit.
\begin{figure*}[htb]
\centering \includegraphics[height=5em]{img/bender.png}
\end{figure*}