From: Bill Tutt
> From: Stephen J. Turnbull [mailto:turnbullīŧ sk.tsukuba.ac.jp]
>
> See my other message for further discussion.
>
I beg your pardon, I thought you meant one you'd written privously, not one
you wrote today.
Please ignore my writing on why you want more than one internal
representation.
From: Stephen J. Turnbull
However, there are uses for the huge UCS-4 range. Tomo and his
buddies are already playing with the so-called "konjaku-mojikyo"
pseudo-charsets, which have over 70,000 code points already assigned.
These are quite popular with Japanese Windows users, too. Anyway,
they will eat up most of the UTF-16 private space if used in the
obvious way.
Nor do we want to encourage "Japanese exceptionalists"
to borrow not yet-standardized-parts of the UTF-16 space.
So why not just do this in 2 phases. Use the UTF-16 private space until more
CJK glyphs get added into UTF-16 properly, you can then alter the encoding
transforms from Asian encodings to pick the new CJK glyph.
The only annoying thing here is that ELisp programming that cares about
those kinds of characters would break down.
Another possible alternative might be: (from the Unicode FAQ)
"For a particular implementation, if someone really, really wanted a
representation that encoded more characters in a series of 16-bit code units
then a series of private-use characters would work. For example, suppose you
use a representation that consisted one BMP private-use character followed
by one private-use surrogate pair (e.g. three 16-bit units). With such a
representation, you can encode 6400 x 131,072 ( = 838,860,800) private use
code points."
Also, although _we_ should not support "language-tagged
character"
encodings (pace, Olivier) by default, we should permit third party
libraries and extension modules to do so. This could be easily done
using UCS-4 private space.
This doesn't sound like the Unicode technical report I mentioned in my other
email.
Could you explain what you mean here?
Thanks,
Bill