a couple of thoughts about codings in 21.4

Saturday, 18 July 2009

        Julian Bradfield writes:

...
 This caused me to observe:
 (1) 21.4(.22) does have the necessary infrastructure to handle UTF8
     itself for the BMP: it has UTF8 coding, it has mule-to-ucs-table
     and ucs-to-mule-table and uses them in the C. So, with a fairly
     small amount of work, plus the use of 9 private 2D charsets (for
     which I had to lose chinese-isoir165 and ethiopic, which is frankly
     no loss), one can implement UTF8 for the entire BMP in Lisp
     without having to touch mule-ucs at all.
     To me, this sounds like an improvement, that could be shipped
     with 21.4 to make it more robust. However, ...
 (2) The C routine coding_decode_utf8 *also* doesn't do any validity
     checking! Who's responsible for that, eh? 
In 21.4, it's the same people who wrote mule-ucs, more or less.  That
all came out of the Mule Lab and associated researchers here in
Tsukuba Japan in the late 1990s.  All of the Japanese code I've worked
with (jperl, the jGNU utilities like jsed, Ghostscript, and Mule) has
that "feature" that errors are caught late and repaired.  (In fact,
all of Japanese society has that feature, but this is not the place
for that rant. :-)

...
 Any interest in having these in 21.4? (It is still the advertised
 stable branch!) 
That's up to Vin.  However, for all that 21.5 is not ready for a
public release IMO, I really can't recommend putting much effort into
21.4; it will not be actively maintained, only patched if a bug
occurs.  As you've noticed, the 21.4 infrastructure is very weak.  If
you want a 21.4-based, actively developed XEmacs I have to recommend
SXEmacs instead, but that may not work well for you (eg, if you want
support for the Windows platform which has long since been entirely
removed from SXEmacs).

...
 Secondly, I also find it essential nowadays (if I could keep my mail
 uncorrupted) to handle GB18030. So does anybody in China. So I
 implemented that in C, using a mapping table to Unicode.
 Do you want that? (It should be almost the same in 21.5.) 
I definitely want it for 21.5.  You'd have to ask Vin about 21.4.

...
 ignore. So what I would like to do is arrange that my
"gb2312" coding
 system actually decodes GB18030 on read, but correctly only puts out
 real GB2312 on write. I can't see any easy way to arrange this in
 Lisp. Is there one? 
Not in 21.4, and there never will be if I have anything to say about
it.  That would mean deleting coding system objects, and maybe charset
objects, and that would be hairy to implement because they're used
implicitly in so many places.

We could do something about it in 21.5, but it would have to be an
#ifdef'd feature for a long time, I think.

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

a couple of thoughts about codings in 21.4