>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> NOTE: One possible default internal representation that was
Ben> compatible with UTF16 but allowed all possible chars in UCS4
Ben> would be to take an unused range of 2048 chars (not from the
BMP space is too precious; I can't imagine that a block of 2000
characters will stay unstandardized forever. Or very long. Even
currently: are you working from Unicode 3.0? I'm not following the
new characters soap opera (except for Hiura's amusing anecdotes about
the "Canadian minority"), but I vaguely recall reading that the BMP is
basically full, there aren't any sensible places to put alphabets or
syllabaries, let alone an ideograph set-sized block.
Since this is an internal-only encoding, why not allow the range to be
used to be user-selectable? Of course, we'd have to add a check for
occurance of the range in external data sources. But you have to do
that anyway, unless you put it in the private area, it's not a special
loss to the user-configurable scheme.
Ben> private area because Microsoft actually uses up most or all
Ben> of it with EUDC chars).
Oh, `echo "fsck -y /dev/Microsoft" >> /etc/init.d/rcS.d', and pray
for
some real fs errors. Maybe we can lose that whole partition.
[Do you have an URL for the MS spec? This is too big to be ignored.]
But I'm against the 2G surrogate proposal. People who want full MS
compatibility will not truck with characters outside of the UTF-16
range, and those who want 2G characters will generally not care about
MS compatibility. Those that want both should build a UCS-4 or UTF-8
XEmacs. This isn't really a big interoperability loss since the 2G
character space will not contain any standard characters, so people
using them are supposed to negotiate their use with other
applications.
Also, if we do go this way, we must be very careful to prevent _any_
leakage of internal 2G surrogates anywhere that a non-XEmacs app might
look at them.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."