Julian Bradfield writes:
This caused me to observe:
(1) 21.4(.22) does have the necessary infrastructure to handle UTF8
itself for the BMP: it has UTF8 coding, it has mule-to-ucs-table
and ucs-to-mule-table and uses them in the C. So, with a fairly
small amount of work, plus the use of 9 private 2D charsets (for
which I had to lose chinese-isoir165 and ethiopic, which is frankly
no loss), one can implement UTF8 for the entire BMP in Lisp
without having to touch mule-ucs at all.
To me, this sounds like an improvement, that could be shipped
with 21.4 to make it more robust. However, ...
(2) The C routine coding_decode_utf8 *also* doesn't do any validity
checking! Who's responsible for that, eh?
In 21.4, it's the same people who wrote mule-ucs, more or less. That
all came out of the Mule Lab and associated researchers here in
Tsukuba Japan in the late 1990s. All of the Japanese code I've worked
with (jperl, the jGNU utilities like jsed, Ghostscript, and Mule) has
that "feature" that errors are caught late and repaired. (In fact,
all of Japanese society has that feature, but this is not the place
for that rant. :-)
Any interest in having these in 21.4? (It is still the advertised
stable branch!)
That's up to Vin. However, for all that 21.5 is not ready for a
public release IMO, I really can't recommend putting much effort into
21.4; it will not be actively maintained, only patched if a bug
occurs. As you've noticed, the 21.4 infrastructure is very weak. If
you want a 21.4-based, actively developed XEmacs I have to recommend
SXEmacs instead, but that may not work well for you (eg, if you want
support for the Windows platform which has long since been entirely
removed from SXEmacs).
Secondly, I also find it essential nowadays (if I could keep my mail
uncorrupted) to handle GB18030. So does anybody in China. So I
implemented that in C, using a mapping table to Unicode.
Do you want that? (It should be almost the same in 21.5.)
I definitely want it for 21.5. You'd have to ask Vin about 21.4.
ignore. So what I would like to do is arrange that my
"gb2312" coding
system actually decodes GB18030 on read, but correctly only puts out
real GB2312 on write. I can't see any easy way to arrange this in
Lisp. Is there one?
Not in 21.4, and there never will be if I have anything to say about
it. That would mean deleting coding system objects, and maybe charset
objects, and that would be hairy to implement because they're used
implicitly in so many places.
We could do something about it in 21.5, but it would have to be an
#ifdef'd feature for a long time, I think.
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta