>>>> "OG" == Olivier Galibert
<galibert(a)pobox.com> writes:
On Thu, Apr 30, 1998 at 01:41:39PM +0900, Stephen J. Turnbull wrote:
> Although extremely few users will want to use UCS-4 for itself,
> the least buggy quick path to a wide-char Mule in multilingual
> contexts uses UCS-4 in the implementation. UCS-4 characters,
> by a strange coincidence, just barely fit into a 31-bit
> integer.
OG> Errr, didn't we agree that 0-FFFFFF (aka, 24 bits) was enough
OG> ? Maximal tagbits have 24bits characters, minimal tagbits
OG> 30bits ones (and not 31, that's only for integers).
OK, I retract that, then. I forgot about "Ebola". I was thinking
that we could avoid a bunch of masking operations by using 31-bit
characters. But even if it's only one bit, we're going to have to do
it.
> Unifying the Han ideographs through UCS-2/Unicode is possible,
> but will surely introduce new coding-system I/O bugs.
OG> Sorry but I don't parse that.
The point is that putting the external coding system tag on an extent
property (as I have suggested) requires the higher-level Mule code to
test the extent property. Since the extent is different from the
character, the access operation introduces a potential for confusion
and bugs. Doing it efficiently suggests that functions that operate
on regions or strings should remember the relevant extents.
Current implementation of Mule looks at bufchars one-by-one as far as
I can remember. Your UCS-4 scheme lends itself to direct integration
into the current higher level interface more so than the UCS-2 +
extents scheme I've advocated.
> I think there's a pretty good argument here
OG> I don't want to see implementation choices done solely on the
OG> basis of things that may exist someday.
You're right.