intro: "SJT" == Stephen J Turnbull <stephen(a)xemacs.org> writes:
>>>>> "Nickolay" == Nickolay Pakoulin
<npak(a)ispras.ru> writes:
Nickolay> XEmacs maps codepoints that correspond
to the mentioned chars into
Nickolay> chinese and japanese charsets. I don't have ISO-2022 on my hands,
Nickolay> but I am pretty sure that the standard positions those chars in Far
Nickolay> East charsets.
SJT> Ugly, ain't it?
Yes, it is.
SJT> (I think the usual practice is to simply use
SJT> an 8-bit XEmacs with a KOI8-R font, which gives pleasant results with no
SJT> fuss).
Yes, but not under Windows, and not MULE (because decoding / encoding tables
are broken in non-cyrillic part of the table, see below).
SJT> First, check to see if GNU Emacs has this right. We should be able to
SJT> synch to their implementation for 21.4.
GNU Emacs 21.2 maps codepoints for box drawing to characters with the same
code. That is, when GNU Emacs comes across 130 ("BOX DRAWINGS LIGHT DOWN AND
RIGHT") in koi8-r input file then it inserts \202 in the buffer. When GNU
Emacs encodes buffer, it converts \202 back to 130. This is not the best
solution, because character \202 belongs to control-1 and it is NOT "BOX
DRAWINGS LIGHT DOWN AND RIGHT".
SJT> If not, I think the right way to deal with it in 21.4 is your #2,
SJT> assuming you have KOI8-R fonts. (Caveat, as I said before I don't know
SJT> how to do this on Windows.)
SJT> Create a private 96 charset ...
I have already finished the first draft. See
http://www.ispras.ru/~npak/koi8-pgraph.el
TODO is to make a ccl program for the charset. But makes no sense under
Windows because Windows XEmacs does not use ccl programs (imho) for charsets.
It is not tested under X. But still is helps to work with koi8-r files in
Windows.
SJT> That's the theory, I don't know if there's anybody left who has
done
SJT> this in practice.
Tables for koi8-r are broken. There are two bugs, imho.
Firstly, `cyrillic-koi8-r-decode-table' maps some codepoints to 32 (space) thus
ultimately loosing information and damging the data. Just opening and saving a
file (without editing) alters file. The effect is non-recoverable.
Secondly, ccl program `ccl-encode-koi8' takes care about cyrillic characters
only. Still, `cyrillic-koi8-r-decode-table' maps certain codepoints into Far
East characters. When `ccl-encode-koi8' comes across bytes that represent such
characters it just dumps them 'as it is'! That is, in place a single byte
there appear three bytes. This effect is recoverable, but recovery is tricky.
koi8-pgraph.el mentioned above does fix both bugs for koi8-r. I volunteer to
make similar fix for alternativny coding in both 21.4 and 21.5 and windows-1251
in 21.4. In 21.5 (at least under Windows) decoding / encoding for windows-1251
is done internally and requires further investigations.
Comments / suggestions ?
Nick.