forked from nijel/enca
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
86 lines (70 loc) · 3.71 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#============================================================================
# Enca v1.19-dev (2016-01-07) guess and convert encoding of text files
# Copyright (C) 2000-2003 David Necas (Yeti) <[email protected]>
# Copyright (C) 2009-2016 Michal Cihar <[email protected]>
#============================================================================
TO THE NEXT RELEASE:
(this list must be empty at the time of release)
IN FUTURE:
(should be done, but maybe not right now)
* LCUC check for cyrillic charsets.
* Backups -- like cp, mv, etc. This will be hard to get right with all the
silly converters.
* More tests
* Structured documentation (the manual page is ugly)
- keep a reasonably brief manual page
- put all the boring doc stuff somewhere else, there are possibilities:
info: searchable, has links, partly portable, has console viewers
HTML: poorly searchable, has links, most portable, has console viewers
TeX (ps): not searchable, no links, portable, most pleasant to read,
no console viewers
=> use SGML (or info itself?) and generate the others
MAYBE SOMEDAY:
(when I will have mood for it, items are freely moved here and removed again)
* Detect all-caps texts OK.
After several experiments it seems we have to
- use pair occurences, at least, with specificaly computed
difference-maximising weights
- guess in two steps
- first with uncapitalization and pair weights, and check whether the
sample looks like natural text (garbageness test, but better)
- if the first approach fails, do it as we do it now
* design better levels of verbosity/warnings (or: remove the --verbose option,
keep important messages and remove all others?)
0: only messages followed by exit(EXIT_FAILURE) (or abort()) are printed
plus `cannot convert...'
1: all nonfatal errors/warnings
2: what converters are tried, what language gets detected (do not duplicate
--details)
>2: debug
* _real_ paranoiac behaviour assuring that nothing gets lost and that
conversion output is either correctly converted text or untouched original
(requires major redesign of all the conversion stuff)
NEVER:
(you can do anything GNU GPL v2 allows, but I'll restrain)
* features that nobody needs (mm, well, ... ok, let it be)
* duplicate other tools functionality more than necessary, use them instead
* dependency on anything that is not ISO C and/or POSIX (moreover do not use
braindead features of both); important functionallity must be present
everywhere nevertheless, enca can be smaller, faster or cleverer on some
(GNU) systems
* localization; please correct my english instead ;->
* converter calling generalization (would require inlcuding the whole wordexp
thing in enca, and: launching external converter is Bad Thing(TM) anyway)
* data in run-time files (needs parser (could live with) and disallows hooks
(can't live without))
* loadable module support (it's not very portable)
-------------
KNOWN ISO C CONFLICTS:
(perhaps to be solved someday)
All constants and typedefs. They start with ENCA_ and Enca, but:
Names beginning with a capital `E' followed a digit or uppercase
letter may be used for additional error code names. [errno.h]
And additionally inside libenca (i.e. not so serious):
* libenca.h: #define EPSILON [errno.h]
* filters.c: isvbox[] [ctype.h]
* guess.c: #define isbinary [ctype.h]
* guess.c: #define istext [ctype.h]
* multibyte.c: is_valid_utf7() [ctype.h]
* multibyte.c: is_valid_utf8() [ctype.h]
Some probably can't conflict.