how to add support for more Unicode characters?
Stephen J. Turnbull
stephen at xemacs.org
Mon Jun 20 21:33:26 EDT 2005
>>>>> "Aidan" == Aidan Kehoe <kehoea at parhasard.net> writes:
Aidan> Visible warnings, on the other hand, could be very
They're ~~~~ing hard, though, because of the Ben-only-knows design of
the coding systems. I've tried, and discovered that it's just really
really hard---even in _binary_.
Again, you should do what you think is appropriate here, but my
feeling is that trying to fix problems under the current design is
going to be a lot of work. You know how much latin-unity helps, and
also how much it doesn't help.
What I've been thinking about is a design based on Unicode TR #17's
4-level architecture (five, actually, since we handle transfer
encoding syntaxes like BASE64 and gzip). The bottom level is the
abstract character set (which we can take to be the Unicode
repertoire, and we should represent as Unicode code points). The next
level is the internal encoding as a map from the Unicode repertoire to
(abstract) integers; for Unicode of course this is the identity, but
for Mule it's non-trivial (in fact, set-valued :-( ). The next level
is a map from character integers to sequences of code units (bytes or
shorts or ints). Finally we have a map from sequences of code units
to sequences of bytes. The important thing about this architecture is
that only at the top level is modality permitted, so everything else
has identifiable character units, and only characters are present in
the stream. This should make it possible to attach information about
ambiguous constructs to the buffer contents, and also give precise
By contrast, in current coding systems everything is mixed together so
that it's very difficult even to safely interrupt a coding process.
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the XEmacs-Beta