>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> why is it a waste of time? took me half an hour or so.
Because other people (eg, me) could do it in 45 minutes the first
time, and the same half-hour it takes you for the next one.
Ben> mbcs is used for implementing things like the koi8-r coding
Ben> system under unicode-internal.
The KOI8 coded character sets are unibyte and have no mode shifts, and
at most 256 characters. Why not simply have one table?
Ben> it can also replace the hand-coded big5 and shift-jis coding
Ben> systems, in unicode-internal.
Sure. But as you say, we've already got implementations of those, and
they're going to go away over time.
Ben> any ideas? (if this doesn't work, i'm sure there are gpl-ed
Ben> utf-7 implementations available.)
I doubt the implementation in the Unicode book is efficient or robust,
and there's no error handling in it. I'm sure there's one in Emacs
and another in gconv (glibc's implememntation of iconv). Python has
one. Surely Perl and Ruby do.
Ben> i can implement this if you can tell me the names and
Ben> encodings that are typically used in these segments. the x
Ben> standard only defines the general format of extended segments
Ben> and doesn't say what is actually encoded in them.
According to the standard, anything with an agreed name that isn't in
the list (ie, iso8859-14 and iso8859-15 violate XF86's own standard,
and UTF-8 should be in there).
The elegant way to implement it would be to treat it as a buffer and
translate it using a new lstream, parsing the name out of the extended
segment header and using that to determine the coding system.
Ben> we already have a gzip coding system. we also have base64
Ben> functions but not yet converted to a coding system (not too
Ben> hard to do, though). internally, i already generalized
Ben> coding systems (some time ago, in fact) to be typed for
Ben> either bytes or characters at either end; there's also a
Ben> `chain' coding system for stringing multiple coding systems
Ben> together.
Yeah, I'm aware of all that, but again it's mostly stuff that somebody
else can do, except that it would be really nice if the lstreams and
chain coding systems were exposed to LISP somehow.
Ben> keep in mind that i've already done most of the work you're
Ben> describing here. i think we're talking past each other; at
Ben> any rate, you seem to think i'm more confused than i am.
I'm sure you know exactly what you're doing, in the small. I will
look at the code asap, but your verbal descriptions do not inspire
confidence that what I will find is going to be a GNU-beater in
practice. GNU has more than one person seriously working on their
Mule implementation, and any of the senior developers is reasonably
comfortable trying to diagnose and even fix bugs. That's simply not
true for current XEmacs, and you're emphasizing backward
compatibility. Call it back-seat driving if you like, but somebody
needs to tell you about that tree looming in front of the windshield.
Ben> actually, what would really help is if you could take a look
Ben> at emacs-unicode-2, figure out what their api is, and
Ben> summarize it. this would be extremely useful to me.
I was afraid you'd say something like that. Maybe somebody left some
notes in Japanese....
--
School of Systems and Information Engineering
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.