From: Ben Wing [mailto:ben@666.com]
there are numerous reasons we want our own internal char
points -- handling of
precomposed chars, non-char objects in the buffer, etc. not
only eudc's.
Hrm. Those are certainly possibilities, I'm afraid I don't know enough about
how the current small crop of complex script Unicode rendering engines work
to say whether or not this (precomposed chars) makes any sense or not. I
would tend to lean towards not though. There are so many composable
character combinations that fall out of Unicode that its not even funny. In
fact the rendered appearance of some character combinations in Unicode may
depend on its surrounding context as opposed to a sequence of composable
Unicode code points. Arabic, and the Indian family of languages are
especially nasty in this regard.
Examples of nasty rendering problems:
Character shaping: (Arabic)
http://www-4.ibm.com/software/developer/library/hindi-thai/sld005.htm
Required use of ligatures (Arabic and Devanagari):
http://www-4.ibm.com/software/developer/library/hindi-thai/sld006.htm
Bizarre repositioning of multiple combining marks over a single base
character:
http://www-4.ibm.com/software/developer/library/hindi-thai/sld007.htm
There are also a couple other problems that make rendering generic Unicode
text a pain in the ass.
The beginning of the slide deck (Arabic, Hebrew, Hindi, and Thai support in
IBM's Java 2)
is at:
http://www-4.ibm.com/software/developer/library/hindi-thai/sld001.htm
Non-character objects in the buffer: Hrm. This seems like the best reason
I've heard so far. (Remembering that I have no idea how XEmacs currently
exposes these beasts to ELisp code scrounging through the buffer) Although I
think I'd rather have some other way to access non-character objects in a
buffer, other than walking through the internal representation. Of course
there might be a need to keep these bits of info hidden from ELisp completly
which wouldn't be so bad.
Bill
use of an internal representation mostly compatible with
unicode is *purely* for
convenience. there is no real necessity in doing it, and we
certainly don't
need to contort things to agree with some outside interfaces,
because the
internal representation never escapes. we could just as
easily define a rot13
transformation of unicode values for our representation!
Bill Tutt wrote:
> > From: Ben Wing [mailto:ben@666.com]
> >
> > portability?
> >
>
> I'm not quite sure what you mean here. Why isn't the
concept/mapping that
> the registry setting values embody portable to some other
configuration
> state on other OSes?
> Heck, you could even combine the concept with the Plane 14
languages tags to
> tell you which codeset mapping to use for a certain run of
characters.
>
> Bill
>
> > Bill Tutt wrote:
> >
> > > > From: Ben Wing [mailto:ben@666.com]
> > >
> > > [cool mappings yanked for space]
> > >
> > > Ah, ok, then its most likely a simple 1-1 mapping based
on the EUDC
> > > CodeRange data in the registry. In which case you can't
depend on a
> > > hardwired mapping.
> > >
> > >
> > (HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodeP
> > age\EUDCCodeRa
> > > nge)
> > >
> > > Is there any particular reason this method of encoding
> > isn't a good enough
> > > extension for unassigned UTF-16 safe ranges, or do you just
> > not want to
> > > bother touching the code when we discover the loads of
> > alien races that are
> > > out there?
> > >
> > > Bill
> >
> > --
> > Ben
> >
> > In order to save my hands, I am cutting back on my mail.
I also write
> > as succinctly as possible -- please don't be offended.
If you send me
> > mail, you _will_ get a response, but please be patient,
especially for
> > XEmacs-related mail. If you need an immediate response
and it is not
> > apparent in your message, please say so. Thanks for your
> > understanding.
> >
> > See also
http://www.666.com/ben/typing.html.
> >
> >
--
Ben
In order to save my hands, I am cutting back on my mail. I also write
as succinctly as possible -- please don't be offended. If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail. If you need an immediate response and it is not
apparent in your message, please say so. Thanks for your
understanding.
See also
http://www.666.com/ben/typing.html.