>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)srce.hr> writes:
Hrvoje> Note that no editor on 32-bit systems can grok a >2G
Hrvoje> buffer because of limited pointer size. Thus a 1G limit
Hrvoje> is just one step below the "theoretical" limit. You can
Hrvoje> argue that being two steps below it is not much worse, but
Hrvoje> I am not really convinced.
I do so argue, and I see no reason why you should be convinced. But
maybe others will agree with me.
Hrvoje> What does direct UCS-4 support in characters buy me
Hrvoje> anyway? Will it be another case of all-encompassing Mule
Hrvoje> priorities breaking the neck of the rest of us?
It buys you direct UCS-4 support in characters. What did you think?
:-)
o XEmacs will never again (well, for a future longer than the
history of electronic computers) need to change the abstract
format for a Lisp character. (Of course it will get reimplemented
as 64, 128, ... bits as time goes on.)
o Mule backwards compatibility. Packages that expect Mule
character formats will be able to get it by putting the entire
Mule character set into a private UCS-4 group. All such groups
live above the 30-bit boundary, single planes are available with
30-bit code points, but are too small. This is the obvious
migration path. We definitely do not want the arrays of runes
passed to redisplay methods to contain lrecords! (They won't, see
below, but we will have to sacrifice transparency and half the
UCS-4 code space.)
o Source separation. For Asian languages, it may be important to
separate Chinese-origin characters from Japanese-origin
characters, etc. Unicode advocates disagree, but there may be
some diehards out there. Olivier thinks this is important, and
plans to implement it. The sensible place to do this is in a
private UCS-4 group.
o UCS-4 will come, maybe not immediately, but within a couple of
years. The basic multilingual plane (Unicode) is basically full;
they're already filling up planes 1 and 2 (or was that 2 and 3).
You could fake that with < 31-bit characters but people who want
to use large character sets (there is at least one font for more
than 80,000 Asian ideographs already in the making) will need full
UCS-4 (until they are standardized they will be private-group
applications).
o You get to Do The Right Thing.
All of those except the last could be done with some inefficient or
non-standard-conformant hacks, but why complexify things?
Sure, those may not excite you. But one thing that worries me is
precisely the fact that the applications that absolutely positively
require 31-bit characters are probably a few years off. So we do a
half-ass hack for the purpose of UCS-4 compliance while maintaining
Mule backwards compatibility.
Eg, fitting the private groups into 30 bits can be done pretty easily
by surgically removing bit 29 from the representation, and
representing bits 29-30 00 -> bit 29 = 0, 01 -> na, 10 -> na, and
11 -> 1, where na means not available in our representation. Finally
four years from now somebody actually gets around to implementing a
library that requires private space characters, and boom! XEmacs
crashes. But the code has bitrotted and nobody knows how to fix it.
Or something weird happens, something gets assigned to space in Groups
20-5F by the ISO and Boom!
Breaking necks? No, I think not:
Hrvoje> Implementing bignums is hard because if you want to do it
Agreed. But...
Hrvoje> right, you have to modify all the C code that relies on
Hrvoje> Lisp_Object integers fitting in an integral type called
Hrvoje> "EMACS_INT" (int or long). And there's a *lot* of such
Hrvoje> code, with possibly long integers propagating all over the
Hrvoje> place.
...I don't see that it has to be done all at once. You can
(partially) implement a bignum type, and implement it where it helps
somebody. Places where it has not yet been implemented will throw a
type error automatically if you try to pass them in; this will
automatically identify the problem. It won't even be ambiguous, like
the Ebola warnings often are, eg in
(mapcar #'(lambda (x) (if (equal x ?A) (smile) (puke))) '(?A 1))
People who don't need BigBuffers[tm] or plan to die before 2004 can
leave the relevant bignum-check-and-convert code out with a compiler
switch.
Yes, this is going to be a big project, like de-ebolification. But it
can be done more gradually, albeit with attendant risks of bignums
escaping from converted code to unconverted code somehow. So we add
Ebola-like checks and regression tests.
Hrvoje> Also, with bignums avaialble, you would probably want to
Hrvoje> provide compiler declarations so that reasonably efficient
Hrvoje> code can be written, i.e. (declare (fixnum x)).
Sure. But again, that's a project that can be put off until it
matters to somebody. (Maybe; I realize that it's possible that checks
for bignums will slow everything down and everybody will care. But
then the --with-bignums=no option wins.)
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for? "Free software rules."