>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> [1] what do we do about translating the stuff into the
Ben> internal format when we read it in?
We don't worry about it. As far as no-mule XEmacs is concerned, it's
binary. Anything that is not IS0 8859/1 (sorry, world, I am not
willing to go as far as /2 or /15) is simply unrecognized, and should
be protected by (featurep 'mule). Note that this doesn't mean that
you can't treat a buffer as ISO 8859/2 (or get "free" Euro support) by
declaring your font to be an iso8859-2 (resp. iso8859-15) font, just
that XEmacs doesn't translate Unicodes to ISO 8859 codes.
AFAIK the Mule->Unicode map is single-valued; if so, mess with
buffer-file-coding-system-for-write at your own peril, but if you
don't, the garbage XEmacs writes to disk will be the same garbage it
read in.
Sure, we need to do something sane about language handling here, but
that's always been true.
AFAICT, once we've dealt with the literal character syntax problem
(see below), all we need to do is put (require 'mule) at the top of
all Mule files. Compared to the current system, this just turns a
"file-not-found" error into a "feature-not-found" error.
Ben> [2] What about character objects? Strings are no problem but
Ben> a character object is listed in a .ELC file as a ? plus a
Ben> series of bytes that resolves to a single Mule character. To
Ben> handle this we'd have to [a] add a UTF-8 decoder to the
Ben> non-Mule build [not a big deal];
Yup and yup.
Ben> and [b] extend characters in non-Mule to be big enough to
Ben> hold a whole Mule character
Nope. They're already Lisp objects, plenty big. However, they won't
be legal characters. I would hope that they'll show up as Ebola if
they're ever accessed. But they should be hidden behind (featurep
'mule).
Am I missing something?
Ben> comments? Sounds like it might just be better to go ahead
Ben> and switch to UTF-8 internal and be done with it.
I'd like that, but there will be many little problems with it. It
took Morioka about a half year to work out many of the problems, and
he only cares about Japanese (in real life). We can borrow his
experience, and also from emacs-unicode, I expect. I just don't think
it's a priority, unless somebody like Aidan wants to work on it.
Ben> Stephen [once again], what in your opinion are the big issues
Ben> connected to this, including but not limited to the CJK
Ben> language-preservation issue?
The main issues I see are
(1) CJK language-preservation---but that's really the wrong way to put
it. It's a special case of the general problem we have that coding is
not robust to errors in choice of coding system.
(2) We'll have to work out details of character->coding system mapping
priorities again in the short run (these are the kinds of problems
Tomo ran into) and in the long run we want to think about repertoire-
covering algorithms. Ie, if we see a bunch of characters not in
Japanese, we should guess that's not Japanese and pick an appropriate
language == font for the whole run; OTOH, if we see the characteristic
kanji-kana-kana kanji-kana-kana pattern of Japanese, we should not
choose a Chinese font!
Other than that, I just don't see a problem. I dunno about GNU Emacs
Mule (it's probably the same), but our Mule is pretty well abstracted
from the notion of charset already; a charset is treated as a
convenient way (actually, the only way) to identify a repertoire, but
we should be able to adapt most things to do with charsets to
repertoire-oriented versions. (Lots of details, but the design should
be straightforward.)
Ben> What specifically are they [GNU Emacs] planning,
I don't know specifics.
Ben> and how far are they along?
As Eli wrote, they're going to merge their emacs-unicode branch
soon-ish; I'll take a look and see.
Ben> Is anyone in communication with them to try and ensure that
Ben> their API's look like ours,
Not me. With you out of touch the gain didn't seem worth the pain.
Ben> or are they just going to be incompatible AGAIN?
I'll need to look at the emacs-unicode branch before I'm willing to
say "just be incompatible". It's quite possible that the
"philosophical" underpinnings are quite different.
*****
I'm kinda in a crunch; I can rearrange my XEmacs work, but I'm going
to have to do that, I can't just add tasks. How bad and how quick do
you want to know this stuff? (No, I don't think you should just check
out emacs-unicode and look for yourself if we can avoid it. There's
work that you can do that I sure can't.)
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.