-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy paths_Introduction.tex
211 lines (199 loc) · 17.1 KB
/
s_Introduction.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
\section{Introduction}\label{sec:introduction}
\subsection{Arabic Language}\label{sec:arabic-language}
\begin{table}[!t]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}{c c c c c c}
\toprule
\textbf{\small{Diacritics}} & {\small{\textit{without}}} & {\small{\textit{fat-ha}}} & {\small{\textit{dam-ma}}} & {\small{\textit{kas-ra}}} & {\small{\textit{sukun}}} \\
\midrule
\textbf{\small{writing}} & \textarabic{د} & \textarabic{دَ} & \textarabic{دُ} & \textarabic{دِ} & \textarabic{دْ}\\
\textbf{\small{short vowel}} & --- & /\textit{a}/ & /\textit{u}/ & /\textit{i}/ & /\textit{no vowel}/\\
\bottomrule
\end{tabular}
}
\caption{\textit{The 4 Diacritics of Arabic Language. Transliterated names (1st row), writing
style on example letter \textarabic{د}} (2nd row), and corresponding short pronunciation
vowel (3rd row).}\label{arabic:diacritics_dal}
\end{table}
Arabic is the fifth most widely spoken language~\cite{Simons201720thEditionEthnologue}. It is
written from right to left (RTL). Its alphabet consists of 28 primary letters and 8 further derived
letters from the primary ones, which makes all letters sum up to 36. The writing system is cursive;
hence, most letters are joined and a few letters remain disjoint.
Each Arabic letter represents a consonant, which means that short vowels are not represented by the
36 letters. For this reason the need rises for \textit{diacritics}, which are symbols ``decorating''
original letters. Usually, a \textit{diacritic} is written above or under the letter to emphasize the
short vowel accompanied with that letter. There are 4 diacritics: \mbox{\textarabic{◌َ} \textarabic{◌ُ}
\textarabic{◌ِ} \textarabic{◌ْ}}. Table~\ref{arabic:diacritics_dal} lists these 4 diacritics on an
example letter \textarabic{د}, their transliterated names, along with their short vowel
representation. Each of the three diacritics \mbox{\textarabic{◌َ} \textarabic{◌ُ} \textarabic{◌ِ}} is called \textit{harakah}; whereas the fourth \textarabic{◌ْ} is called
\textit{sukun}. Diacritics are just to make short vowels clearer; however, their writing is not
compulsory since they can be almost inferred from the grammatical rules and the semantic of the
text. Moreover, a phrase with diacritics written for only some letters is linguistically sound.
There are two more sub-diacritics made up of the basic four. The first is known as \textit{shaddah}
\textarabic{◌ّ}, which must associate with one of the three \textit{harakah} and written as
\mbox{\textarabic{◌َّ} \textarabic{◌ُّ} \textarabic{◌ِّ}}. \textit{Shaddah} is a shorthand writing for the
case when a letter appears two times in a row where the first occurrence is accompanied with
\textit{sukun} and the second occurrence is accompanied with \textit{harakah}. Then, for short, it
is written as one occurrence accompanied with \textit{shaddah} associated with the corresponding
\textit{harakah}. E.g., \mbox{\textarabic{دْدَ}} is written as \textarabic{دَّ}. The second is known as
\textit{tanween}, which must associate as well one of the three \textit{harakah} and written as:
\mbox{\textarabic{◌ٍ} \textarabic{◌ٌ} \textarabic{◌ً}}. \textit{Tanween} accompanies the last letter of
some words, according to Arabic grammar, ending with \textit{harakah}. This is merely for reminding
the reader to pronounce the word as if there is \textarabic{نْ} (sounding as \textit{/n/}), follows
that \textit{harakah}. However, it is just a phone and is not a part of the word itself. E.g.,
\textarabic{رَجُلٌ} is pronounced \mbox{\textarabic{رَجُلُ + نْ}} and \textarabic{رَجُلٍ} is pronounced
\mbox{\textarabic{رَجُلِ + نْ}}.
\subsection{Arabic Poetry (\textarabic{الشعر العربى})}\label{sec:arab-poetry-text}
Arabic poetry is the earliest form of Arabic literature; it dates back to the sixth century. Poets
wrote poems without knowing exactly what rules make a collection of words a poem. People recognize
poetry by nature, but only talented ones who could write poems. This was the case until
\textit{Al-Farahidi} (718 – 786 CE) has analyzed Arabic poems and recognized their patterns. He came
up with that the succession of consonants and vowels, and hence \textit{harakah} and \textit{sukun},
rather than the succession of letters themselves, produces patterns (\textit{meters}) which keeps
the balanced music of pieces of poem. He recognized fifteen meters. Later, one of his students,
\textit{Al-khfash}, discovered one more meter to make them all sixteen. Arabs call meters
\textarabic{بحور}, which means ``seas''~\cite{Moustafa}.
A poem is a collection of verses. A verse example is:%
\begin{Arabic}
\begin{traditionalpoem*}
قفا نبك من ذِكرى حبيب ومنزل\quad & \quad بسِقطِ اللِّوى بينَ الدَّخول فحَوْملِ
\end{traditionalpoem*}
\end{Arabic}
A verse, known in Arabic as \textit{bayt} \textarabic{(بَيت)}, which consists of two halves. Each
half is called a \textit{shatr} (\textarabic{شطر}). \textit{Al-Farahidi} has introduced
\textit{al-'arud} (\textarabic{العروض}), which is often called the \textit{Knowledge of Poetry} or
the study of poetic meters. He laid down rigorous rules and measures, with them we can determine
whether a meter of a poem is sound or broken. For the present article to be fairly self-contained,
where many details are reduced, a very brief introduction to \textit{al-'arud} is provided through
the following lines.
\begin{table}[!tb]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}[t!]{ccccccccc}
\toprule
\textbf{Foot}& \textarabic{فَعُوْلُنْ} & \textarabic{فَاْعِلُنْ} & \textarabic{مُسْتَفْعِلُنْ} & \textarabic{مَفَاْعِيْلُنْ} &\textarabic{مَفْعُوْلَاْتُ} &\textarabic{فَاْعِلَاْتُنْ} &\textarabic{مُفَاْعَلَتُنْ} &\textarabic{مُتَفَاْعِلُنْ}\\
\midrule
\textbf{Scansion}&\texttt{0/0//}&\texttt{0//0/}&\texttt{0//0/0/}&\texttt{0/0/0//}&\texttt{/0/0/0/}&\texttt{0/0//0/}&\texttt{0///0//}&\texttt{0//0///}\\
\bottomrule
\end{tabular}}
\caption{The eight feet of Arabic poetry. Every digit (\texttt{/} or \texttt{0}) represents the
corresponding diacritic over a letter of a feet: \textit{harakah} (\mbox{\textarabic{◌َ}
\textarabic{◌ُ} \textarabic{◌ِ}}) or \textit{sukun} (\textarabic{◌ْ}) respectively. Any of the
three letters \mbox{\textarabic{و ا ى}} (called \textit{mad}) is equivalent to \texttt{0};
\textit{tanween} and \textit{shaddah} are equivalent to \texttt{0/} and \texttt{/0}
respectively.}\label{arud:feet}
\end{table}
A meter is an ordered sequence of phonetic syllables (blocks or mnemonics) called \textit{feet}. A
foot is written with letters only having \textit{harakah} or \textit{sukun}, i.e., with neither
\textit{shaddah} nor \textit{tanween}; and hence each letter in a foot maps directly to either a
consonant or a vowel. Therefore, feet represent phonetic mnemonics, of the pronounced
poem, called \textit{tafa'il} (\textarabic{تفاعيل}). Table~\ref{arud:feet} lists the eight feet used
by Arabs and the pattern (scansion) of \textit{harakah} and \textit{sukun} of each foot, where a
\textit{harakah} is represented as \texttt{/} and a \textit{sukun} is represented as
\texttt{0}. Each scansion reads RTL to match the letters of the corresponding foot.
According to \textit{Al-Farahidi} and his student, they discovered sixteen combinations of
\textit{tafa'il} in Arabic poems; they called each combination a \textit{meter}
(\textarabic{بحر}). (Theoretically speaking, there is no limit for either the number of
\textit{tafa'il} or their combinations; however, Arab composed poems using only this structure). A
meter appears in a \textit{verse} twice, once in each \textit{shatr}. E.g., \mbox{\textarabic{وَيُسْأَلُ
فِي الحَوادِثِ ذو صَواب}} is the first \textit{shatr} of a verse of \textit{Al-Wafeer} meter
\mbox{\textarabic{مُفَاْعَلَتُنْ مُفَاْعَلَتُنْ فَعُوْلُنْ}}. The pattern of the \textit{harakah} and \textit{sukun} of
this meter is \mbox{\texttt{0/0// 0///0// 0///0//}} (RTL), and is obtainable directly by replacing
each of the three feet by the corresponding code in table~\ref{arud:feet}. This pattern corresponds
exactly to the pattern of \textit{harakah} and \textit{sukun} of the pronounced (not written)
\textit{shatr}. E.g., the pronunciation of the first two words and the first two letters of the
third word \mbox{\textarabic{{\color{OliveGreen}وَيُسْأَلُ} {\color{fgred}فِي الْـ}}} has exactly the same
pattern as the first of the three \textit{{tafa'il}} of the meter
\mbox{\textarabic{{\color{OliveGreen}مُفَاعَلَـ}{\color{fgred}تُنْ}}}, and both have the scansion
\texttt{\color{fgred}{0/}\color{OliveGreen}{//0//}}. For more clarification, the colored parts have
corresponding pronunciation pattern; which emphasizes that the start and end of a word do not have
to coincide with the start and end of the phonetic syllable. The pronunciation of the rest of the
\textit{shatr} \mbox{\textarabic{حوادث ذو صواب}} maps to the rest of the meter
\mbox{\textarabic{مفاعلتن فعولن}}. Any other poem, regardless of its wording and semantic, following
the same meter, i.e., following the same pattern of \textit{harakah} and \textit{sukun}, will have
the same pronunciation or phonetic pattern.
\begin{table}[!tb]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}[t!]{lrr}
\toprule
\textbf{Meter Name} & \textbf{Pattern} & \textbf{Scansion}\\
\midrule
\textit{al-Taweel} & \textarabic{فَعُوْلُنْ مَفَاْعِيْلُنْ فَعُوْلُنْ مَفَاْعِلُنْ} & \texttt{0//0// ~~0/0// 0/0/0// ~~0/0//} \\
\textit{al-Kamel} & \textarabic{مُتَفَاْعِلُنْ مُتَفَاْعِلُنْ مُتَفَاْعِلُنْ} & \texttt{0//0/// 0//0/// 0//0///}\\
\textit{al-Baseet} & \textarabic{مُسْتَفْعِلُنْ فَاْعِلُنْ مُسْتَفْعِلُنْ فَاْعِلُنْ} & \texttt{0//0/ 0//0/0/ ~~0//0/ 0//0/0/} \\
\textit{al-Khafeef} & \textarabic{فَاْعِلَاْتُنْ مُسْتَفْعِلُنْ فَاْعِلَاْتُنْ} & \texttt{0/0//0/ 0//0/0/ 0/0//0/}\\
\textit{al-Wafeer} & \textarabic{مُفَاْعَلَتُنْ مُفَاْعَلَتُنْ فَعُوْلُنْ} & \texttt{0/0// 0///0// 0///0//}\\
\textit{al-Rigz} & \textarabic{مُسْتَفْعِلُنْ مُسْتَفْعِلُنْ مُسْتَفْعِلُنْ} & \texttt{0//0/0/ 0//0/0/ 0//0/0/}\\
\textit{al-Raml} & \textarabic{فَاْعِلَاْتُنْ فَاْعِلَاْتُنْ فَاْعِلَاْتُنْ} & \texttt{0/0//0/ 0/0//0/ 0/0//0/}\\
\textit{al-Motakarib} & \textarabic{فَعُوْلُنْ فَعُوْلُنْ فَعُوْلُنْ فَعُوْلُنْ} & \texttt{0/0// ~~0/0// ~~0/0// ~~0/0//}\\
\textit{al-Sar'e} & \textarabic{مُسْتَفْعِلُنْ مُسْتَفْعِلُنْ مَفْعُوْلَاْتُ} & \texttt{/0/0/0/ 0//0/0/ 0//0/0/}\\
\textit{al-Monsareh} & \textarabic{مُسْتَفْعِلُنْ مَفْعُوْلَاْتُ مُسْتَفْعِلُنْ} & \texttt{0//0/0/ /0/0/0/ 0//0/0/}\\
\textit{al-Mogtath} & \textarabic{مُسْتَفْعِلُنْ فَاْعِلَاْتُنْ فَاْعِلَاْتُنْ} & \texttt{0/0//0/ 0/0//0/ 0//0/0/}\\
\textit{al-Madeed} & \textarabic{فَاْعِلَاْتُنْ فَاْعِلُنْ فَاْعِلَاْتُنْ} & \texttt{0/0//0/ ~~0//0/ 0/0//0/}\\
\textit{al-Hazg} & \textarabic{مَفَاْعِيْلُنْ مَفَاْعِيْلُنْ} & \texttt{0/0/0// 0/0/0//}\\
\textit{al-Motadarik} & \textarabic{فَاْعِلُنْ فَاْعِلُنْ فَاْعِلُنْ فَاْعِلُنْ} & \texttt{0//0/ ~~0//0/ ~~0//0/ ~~0//0/}\\
\textit{al-Moktadib} & \textarabic{مَفْعُوْلَاْتُ مُسْتَفْعِلُنْ مُسْتَفْعِلُنْ} & \texttt{0//0/0/ 0//0/0/ /0/0/0/}\\
\textit{al-Modar'e} & \textarabic{مَفَاْعِيْلُنْ فَاْعِلَاْتُنْ فَاْعِلَاْتُنْ} & \texttt{0/0//0/ 0/0//0/ 0/0/0//}\\
\bottomrule
\end{tabular}}
\caption{The sixteen meters of Arabic poem: transliterated names (1st col.), mnemonics or
\textit{tafa'il} (2nd col.), and the corresponding pattern of \textit{harakah} and
\textit{sukun} in \texttt{0/} format or scansion (3rd col.).}\label{arud:meters}
\end{table}
Table~\ref{arud:meters} lists the names of all the sixteen meters, the transliteration of their
names, and their patterns (scansion). Each pattern is written in two equivalent forms: the feet
style using the eight feet of Table~\ref{arud:feet} and the scansion pattern using the \texttt{0/}
symbols. The scansion is written in groups; each corresponds to one foot and all are RTL\@.
\subsection{English poetry}\label{sec:english-poetry}
English poetry dates back to the seventh century. At that time poems were written in
\textit{Anglo-Saxon}, also known as \textit{Old English}. Many political changes have influenced
the language until it became as it is nowadays. English prosody was not formalized rigorously as a
stand-alone knowledge, but many tools of the \textit{Greek} prosody were borrowed to describe it.
A \textit{syllable} is the unit of pronunciation having one vowel, with or without surrounding
consonants. English words consist of one or more syllables. For example the word \mbox{``water''}
(pronounced as \mbox{\textipa{\sffamily /"wO:t@(r)/}}) consists of two phonetic syllables:
\mbox{\textipa{\sffamily /"wO:/}} and \mbox{\textipa{\sffamily /t@(r)/}}. Each syllable has only one
vowel sound. Syllables can be either stressed or unstressed and will be denoted by \textit{/} and
$\times$ respectively. In phonology, a stress is a phonetic emphasis given to a syllable, which can
be caused by, e.g., increasing the loudness, stretching vowel length, or changing the sound
pitch. In the previous ``water'' example, the first syllable is stressed, which means it is
pronounced with high sound pitch; whereas the second syllable is unstressed which means it is
pronounced in low sound pitch. Therefore, ``water'' is a stressed-unstressed word, which can be
denoted by \mbox{\textit{/}$\times$}. Stresses are shown in the phonetic script using the primary stress
symbol \textipa{\sffamily (")}.
\begin{table}[!tb]
\centering
\resizebox{\columnwidth}{!}{%
\begin{tabular}{cccccccc}
\toprule
\textbf{{Foot}} & \textit{Iamb} & \textit{Trochee} & \textit{Dactyl} & \textit{Anapest} & \textit{Pyrrhic} & \textit{Amphibrach} & \textit{Spondee} \\
\midrule
\textbf{Stresses}& $\times$\textit{/} & \textit{/}$\times$ & \textit{/}$\times\times$ & $\times\times$\textit{/} & $\times\times$ & $\times$\textit{/}$\times$ & \textit{/}\textit{/} \\
\bottomrule
\end{tabular}}
\caption{The seven feet of English poem. Every foot is a combination of stressed and unstressed
syllables, denoted by \textit{/} and \textit{x} respectively.}\label{feet}
\end{table}
There are seven different combinations of stressed and unstressed syllables that make the seven
poetic \textit{feet}. They are shown in table~\ref{feet}. Meters are described as a sequence of
feet. English meters are \textit{qualitative} meters, which are stressed syllables coming at regular
intervals. A meter is defined as the repetition of one of the previous seven feet one to eight
times. If the foot is repeated once, then the verse is \textit{monometer}, if it is repeated twice
then it is a \textit{dimeter} verse, and so on until \textit{octameter} which means a foot is
repeated eight times. This is an example, where stressed syllables are bold: \mbox{``That
\textbf{time} of \textbf{year} thou \textbf{mayst} in \textbf{me} be\textbf{hold}''}. The previous
verse belongs to the \textit{Iamb} meter, with the pattern \mbox{$\times$\textit{/}} repeated five times;
so it is an \textit{Iambic pentameter} verse.
\subsection{Paper Organization}\label{sec:paper-organization}
The rest of this paper is organized as follows. Sec.~\ref{sec:literature-review} is a literature
review of meter detection of both languages; the novelty of our approach and the point of departure
from the literature will be emphasized. Sec.~\ref{sec:datasets} explains the data acquisition steps
and the data repository created by this project to be publicly available for future research; in
addition, this section explains character encoding methods, along with our new encoding method and
how they are applied to Arabic letters in particular. Sec.~\ref{sec:model} explains how experiments
are designed and conducted in this research. Sec.~\ref{sec:results} presents and interprets the
results of these experiments. Sec.~\ref{sec:discussion} is a discussion, where we emphasize the
interpretation of some counter-intuitive results and connect them to the size of conducted
experiments, and the remedy in the future work that is currently under implementation.