-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmls.1
402 lines (383 loc) · 15.8 KB
/
mls.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
.\" Copyright (c) 2001-2003, Marek -Marki- Podmaka <[email protected]>
.TH mls 1 "June 2003" Utils "User Manuals"
.SH NAME
mls \- Display useful statistics on email messages
.SH SYNOPSIS
.B mls [-hvq] [-l
.IB lang "] [-i"
.IB file "] [-o"
.IB file "] [-r|w|u"
.IB file "] [-t|T"
.IB text "] [-m"
.IB mode "] [-n"
.IB XX "] [-g"
.IB xxxx "]
.SH DESCRIPTION
.B MailListStat
is program that prints some "useful" statistical info on email messages.
It's main usage is in email conferences - mailing lists. Currently it
displays both
.I tables
and
.IR graphs .
You can select either TEXT or HTML output.
.SH OPTIONS
.TP
.B \-h
print help text and exit
.TP
.B \-q
be quiet (print only errors to stderr)
.TP
.B \-v
turn on verbose mode - in this mode it will print more info to
.IR stderr
- indication of progress (will print every 10th, 20th, ..., 90th, 100th,
200th, ..., 900th, 1000th, 2000th ... message being processed) and warnings
about malformed headers found
.TP
.BI \-l " lang"
select output language; please note that this applies only to generated
statistics - program messages printed to
.I stderr
ale always in English. These languages are currently supported:
.IR EN " (English),"
.IR SK " (Slovak),"
.IR IT " (Italian),"
.IR FR " (Francais),"
.IR DE " (Deutsch),"
.IR ES " (Spanish),"
.IR SR " (Serbian),"
.IR BR " (Portugues Brasil)."
.TP
.BI \-i " file"
name of input file (if not specified, use
.IR stdin ")."
This file should be in MBOX format. It should exist and be readable.
.TP
.BI \-o " file"
name of output file (if not specified, use
.IR stdout ")."
If exists, it will be overwritten.
.TP
.BI \-r " file"
read input from cache file instead of mailbox. You can read input either
from mailbox or cache file, not both!
.TP
.BI \-w " file"
write cache file (no stats produced). You can either produce text output
or write cache file, not both! When writing cache file, output-related
options are ignored.
.TP
.BI \-u " file"
update cache file = read cache, read input, write cache. For use with .procmailrc/.forward
.TP
.BI \-t " text"
name of mailing list this statistics is computed for. If specified, it is
just appended to the title of statistics, so it will be like "Statistics
from 16.8.2001 to 7.9.2001 for text", where
.I text
is whatever you put as this parameter (it could be name of the mailing list
or just its email, e.g.
.IR "[email protected]" ")."
.TP
.BI \-T " text"
title text (only this will be printed as title); this can be used to supress
normal title text (date of oldest/newest msg) and completely replace it with
your text.
.TP
.BI \-m " mode"
select mode of output (text, html, html2).
.TP
.BI \-n " XX"
show TOP XX tables (default TOP 10). By default,
.B mls
displays tables of TOP 10 people, subjects, quoting or whatever. Using this
parameter, you can define how many lines shall these tables have.
.TP
.BI \-g " xxxx"
graphs to show (Day, Week, Month, Year, Xnone) - specify first letter
(e.g. -g dmy).
.SH EXIT STATUS
.TP
.B 0
Everything went OK and no error occurred.
.TP
.B 1
Error in sscanf() while reading & parsing cache file. It means that the format
of cache file is invalid. Try to create the cache file again.
.TP
.B 2
Invalid command-line option/language. You have specified an invalid
command-line parameter.
.TP
.B 3
Cannot open input/output file. Please check that you have typed correct
filename and that you have read permissions for input file and write
permissions to destination directory (because output file must be created).
If output file exists, it's overwritten.
.TP
.B 4
Not enough memory is available for dynamically allocated variables. This
could be caused by user-limits, because
.B mls
requires only few MBs of memory (it depends on number of messages processed
and number of different subjects and authors).
.TP
.B 5
Error compiling regex. This error should not occur in world-available
versions.
.SH USAGE
.SS Input
On input, there should be mailbox file in standard MBOX format. If the file
is in different format, the results are unpredictable. There should be at
least one email message, otherwise no stats can be computed.
.B Warning:
Be sure that no special messages are in input files (such as that
with "DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA" subject), because
they will be also analysed. Many programs (POP3/IMAP daemons, email readers)
put their special messages to the mailbox. This message is only ignored when
reporting oldest message found.
.SS Output
Statistics is put into output file (or stdout if unspecified) in specified
language. All diagnostic messages are written to stderr and are in English.
Output consists of several statistical data - tables, graphs and summaries.
The title has two formats depending on
.I "-t"
parameter. If it's not specified, it looks like "Statistics from 16.8.2001 to
7.9.2001", where first date is date of the oldest message found in input and
second is date of the newest one. If there is for example
.I "-t [email protected]"
parameter, it will look like "Statistics from 16.8.2001 to 7.9.2001 for
[email protected]". The problem is that date of oldest & newest msg is often wrong
(thanks to bad date/time settings on PC of msg author), so you can specify
entire title using command-line option
.IR "-T" .
When used, only your text will be printed as title, nothing more. There you can
put for example something like "Statistics for [email protected]".
Now you have option (
.IR -g ")"
to specify which graphs you want to show - hours of Day, days of Week, days of Month,
months of Year. Use 1st letters as argument to -g option (so
.I -g dw
will print just hours of Day and days of Week). Use
.I -g x
to disable printing of any graph. For example you don't want to show graph for months
of Year if you are presenting stats for one month, but for full-year stats you
probably want it.
.SS HTML output
You can choose between 2 modes of output - TEXT and HTML. When in HTML mode,
.B mls
will produce the output as HTML page. When you specify HTML2 mode, only
the body of HTML document is produced (no header/footer) - it can be used
to have different HTML header/footer when calling
.B mls
as CGI or when using
PHP wrapper. The output consists of HTML tables and bar graphs. Almost every
aspect of how it looks can be configured by modifying CSS style-sheet. Please
note that files
.I "style_mls.css"
and
.I "bar.gif"
must be present in the same
directory as produced HTML file. You can, however, modify both to best suit
your needs. Everything should be clear after reading comments in CSS file and
looking at the produced HTML source.
I was unsure what type of graphs to produce. I have tried also horizontal
bar graphs and if you want to try them, just uncomment part of code in
PrintGraphHtml() in mls_text.c.
.SS Cache file support
Instead of producing statistics in text format, you can save all the
generated values/results into "cache" file. Retrieving information from this
file is very fast, so it is useful for integration with web pages. Now you
can update the cache file just after new mail was received. Users can view
actual stats using
.B mls
as CGI script. It has an advantage over static stats
that user can choose language and others options and it will be generated
in a moment!
To update cache file, use the
.I -u
option. It works like this: first, the stats
are loaded from cache file (doesn't have to exist) and then new message(s)
to be added are read from stdin (or from -i file) and added to the stats.
Finally the updated stats are written back to the cache file. The process
is really quick, because usually only one message is added at a time. This is
useful mainly for updating cache files upon receiving new message. In the
"examples/" subdir, you can find examples of integration with your .forward
and .procmailrc files. By running MLS more than once, you can generate cache
files for individual months and also for whole years (see examples). Then use
some PHP script to present list of these cache files to user.
Format of cache files was changed in version 1.3, because of new stats added.
Now it contains version info, so mls can inform you that you have to
re-create that cache file with new version. Unfortunately, you have to
re-create them also when you want new email clients to be recognized also in
old (already processed) messages. Note that email clients detection was buggy
in 1.2.2 (a lot of clients not recognized).
.SS PHP wrapper
I have written also PHP wrapper for
.B mls
to make it more "interactive". It has
two major advantages over plain HTML output from
.BR mls :
User can choose output
language and number of TOP items to show. It works by running
.B mls
with appopriate command-line options. It's safe, because only two items from user
are language and topXX which are checked using regexp, so running arbitrary
code is not possible. You can also alter
.B mls
output - for example change @ in email addresses to (at) to prevent spamming.
You can have normal MBOX file as input, but I recommend using cache file.
When using cache file, the stats are produced in a moment. You can see how
long it took to generate the page, see the last line of HTML source. However,
there is minor speed problem. It takes longer when you specify to show many
topXX (like 999). The problem is regexp that searches for @. It has to search
for it in whole
.B mls
output together and when it is large, it takes a while
(1.1 seconds on my 2.1GHz pentium4). I have added an option which should use
Perl-compatible regex function (preg_replace) instead of POSIX (ereg_replace),
if available. This will result in MUCH faster execution (50ms instead of 1.1sec).
.SH NOTES
.SS How it is all computed?
OK, so let's start from beginning - the format of MBOX file. It's plain text
file containing some email messages delimited with one empty line. Each
message starts with line like this
.IR "From [email protected] Thu Aug 16 15:48:58 2001" .
After this line, there are few headers, one empty line and message text.
Storing emails in this format is quite common - your incoming mail is usually
saved in MBOX format and also your folders in mail-readers like
.BR elm (1),
.BR pine (1),
.BR mutt (1)...
Who is author of an email message? It's taken from
.I From:
header field and everything except the actual email address (like your full
name) is stripped off using quite simple regular expression (regexp).
Subject is taken from
.I Subject:
header field. If it contains some
.IR "Re:" ","
those will be stripped off. There can be up to 5 of them. Also counted format
(
.IR "Re[3]:" ")"
is supported. For example The Bat! email client uses it. MIME-decoding is
applied to subject lines (see below).
Date is just everything in the
.I Date:
header. This header is generated by
the email client, so it's date of message creation and it doesn't have to
be present in each message. If it isn't, you are warned by message like
"Warning: 1 message(s) not counted." in output. Some clients don't put
full date there and usually the day of week is missing and you are warned.
No timezones are considered, the date is taken as-is.
Message size is everything between end of message header and beginning of
new email (or end of file). So only actual size of message text (body) is
counted, not headers.
Email clients are taken from
.I X-Mailer:
or
.I User-Agent:
or
.I X-Newsreader:
headers and some grouping is done to avoid different versions of the same
mailer to take the whole TOP 10. There is also work-around for Pine mailer
(MLS will search also
.I Message-ID:
header).
.SS What is quoting? Why I have it 95%?
What is quoting? When you reply to some message, you can insert part of the
original message there, you quote the author of original message. Every line
of original text is usually prepended with
.I >
or
.IR "MP>" ","
where MP are initials of the original sender's name (for example The Bat! uses
this second format).
And what is "quote ratio"? It's size of quoted text divided by total size
of message, specified in percent. It's included in stats, because many
people reply to message, add one line of text and leaving there for
example 10 pages of original text, which makes the quote ratio even
higher than 90%! In times of FIDONET, there were conferences, where quote
ratio higher than 50% was forbidden. Try to think about it when replying
to message in mailing list where more than 300 people will download and
read it.
.SS And now all the stats
At first, there are TOP 10 tables (or TOP XX when using
.I "-n XX"
parameter). First table shows people who have written most
messages, how much and how many percent of total message count it is. Last
row shows the "other" - number of messages written by everyone not listed
above and how many percent it is. Second and third tables are similar to this
one - they also show best authors, but not by the number of messages written.
Authors are sorted by total (or average) size of all their messages, but
without quoting (size of message minus how much was quoted in that msg).
Next table shows most successful subjects and how many messages with
this subject have been posted. The other table shows most used email clients.
The last table show people with maximal quote ratio. It's computed as sum of
quoted text in all his/her messages divided by total size of those messages.
Last row shows an average - sum of quoted text in all messages divided by
total size of all messages.
Next part of stats are some graphs. They show how much messages have been
written during different hours of day, days of month and days of week. From
these you can see for example when (and how much) people sleep :) or if they
work during the working-hours or just write tons of messages...
Next part contains info about messages which are BEST in something - message
with max. quote ratio, longest message and some details about most successful
subject.
At the end, there is final summary - total number of messages, their total
and average size and number of different authors and subjects.
.SS MIME (Multipurpose Internet Mail Extensions)
What is it? Original implementation email permitted only 7bit ASCII messages.
But during the time, there was need to send international or even binary
files. MIME defines how can these be encoded into 7bit form suitable for
emailing and how to decode it back to human readable form.
In email message, you can have MIME-encoded text (body of message), but also
some headers - for example subject and From field.
.B MLS
tries to find out if subject lines are MIME-encoded and if so, it tries to
decode it, to present it to you in human-readable form. You can read more
about MIME in RFC 1521 and 1522.
.SS Inspiration
I was inspired by similar DOS program used before few years in FIDONET and
Slovak ULTRANET. It was created by Ivan Friedlander.
.SH BUGS/TODO
.IP \(bu
doesn't support header fields splitted to more lines (you can use
.BR formail (1)
to put them to one line before using MLS)
.IP \(bu
charset conversion in MIME-decoding
.IP \(bu
more stats
.SH VERSION
This man page is written for
.B mls
version
.IR "1.3" .
.SH AUTHOR
.B mls
(MailListStat) is written by Marek -Marki- Podmaka <[email protected]>.
.SH SEE ALSO
Visit
.UR http://freshmeat.net/projects/mls
http://freshmeat.net/projects/mls
.UE
for more information and latest version of
.BR mls .
.SH COPYING
.B MailListStat
- print useful statistics on email messages
Copyright (C) 2001-2003 Marek Podmaka <[email protected]>
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA