>>>> "SJT" == Stephen J Turnbull
<stephen(a)xemacs.org> writes:
SJT> Well, as I suppose you know, what Ben has is mind is a statistical
SJT> detector that (eg) can distinguish EUC-JP from EUC-TW or EUC-KR
SJT> (although the really important case is the ISO-8859-X mess and the
SJT> various non-conforming sets like KOI8 and the Windows 12xx sets).
I can say that there is an extremely good scheme for statistical detection
of various Russian (really Russian, not Cyrillic) encodings, done by
S. V. Znamensky. I tried it, and it works really wonderful, allowing even
"twice-encoded" text which is seen occasionally.
I thought of adding something like this to XEmacs. Now if there is a
common infrastructure for this, I'd be glad to help in that area.
SJT> But the design is completely new, so we need to retune it. Also
SJT> there seem to be some bugs in coding priorities.
Uhm,
I'm now playing with current XEmacs-beta. It recognizes my
~/.xemacs/init.el as UTF-16, and does not let me to change the encoding
with "C-x RET f koi8-r RET" (but "C-x RET c koi8-r C-x C-f" works).
The
file itself is mostly ASCII, with two strings in Russian inside (near the
end of the file).
Are you interested in such bug reports, and if yes, should I send the file
or what? Other files it at least detects as "Raw".
set-language-enviroment Cyrillic-KOI8 does not help at all.
--alexm