I've recently had all the non-UTF8 non-ASCII mail in my folders corrupted,
irrecoverably so (short of searching through many days' backups, which
I can't do myself). The cause of the corruption is bugs in VM, exposed
by my switching all my coding system defaults to utf-8. The reason
it's irrecoverable is the putrid pile of dingos' kidneys that is
mule-ucs, and in particular the way it does no validity checking at
all when it decodes alleged utf-8 (rather than copying the invalid
bytes into the buffer as Latin1, as the ISO2022, SJIS and Big5 methods
do).
This caused me to observe:
(1) 21.4(.22) does have the necessary infrastructure to handle UTF8
itself for the BMP: it has UTF8 coding, it has mule-to-ucs-table
and ucs-to-mule-table and uses them in the C. So, with a fairly
small amount of work, plus the use of 9 private 2D charsets (for
which I had to lose chinese-isoir165 and ethiopic, which is frankly
no loss), one can implement UTF8 for the entire BMP in Lisp
without having to touch mule-ucs at all.
To me, this sounds like an improvement, that could be shipped
with 21.4 to make it more robust. However, ...
(2) The C routine coding_decode_utf8 *also* doesn't do any validity
checking! Who's responsible for that, eh?
This should be fixed, which I will do instanter (I already wrote
the code for my (currently suspended) pure Unicode fork anyway).
Any interest in having these in 21.4? (It is still the advertised
stable branch!)
Secondly, I also find it essential nowadays (if I could keep my mail
uncorrupted) to handle GB18030. So does anybody in China. So I
implemented that in C, using a mapping table to Unicode.
Do you want that? (It should be almost the same in 21.5.)
On that topic, it's a sad truth that that PRC-locale software
(especially that made by Microsoft) advertises text as GB2312 when in
fact it's GBK or even GB18030. This is just too big a fact to
ignore. So what I would like to do is arrange that my "gb2312" coding
system actually decodes GB18030 on read, but correctly only puts out
real GB2312 on write. I can't see any easy way to arrange this in
Lisp. Is there one?
Julian.
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta