>>>> "ecm" == Eric Marsden
<emarsden(a)mail.dotcom.fr> writes:
>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)iskon.hr> writes:
Hrvoje> char= is all nice, but then we'd also need char<, char>,
Hrvoje> char<= etc. That sounds ugly. How does CL collate
Hrvoje> characters?
ecm> A character in Common Lisp has an associated code (which in
ecm> most implementations is an ASCII value), and the #'char<
ecm> ordering is consistent with #'< on characters'
ecm> codes. Alphanumeric characters are guaranteed to respect a
ecm> sensible partial ordering.
True. And useless in a Mule world. CL is just not specific enough
for the purposes of a multilingual text processor. Unless there are
conventions beyond what's available in the hyperspec, we're on our own
with this stuff.
We can have an arbitrary default collation sequence that satisfies the
minimal restrictions cltl2 puts on characters easily, just use the
Unicode encoding or the Mule encoding. (IIRC with MINIMAL_TAGBITS you
can just compare the internal representations as unsigneds.) The
Unicode standard (2.1) specifies the Unicode order as default, so
that's a reasonable way to go.
But in the long run we want character collation to be dependent on
language environment (IIRC the Scandinavian languages do not agree on
the collation sequence for Latin-1, or maybe they agree on Latin-1 but
disagree about whether they are interleaved with ASCII or all come
after ASCII in ISO-8859-1), which means POSIX-like LC_COLLATE tables
are the most straightforward way to implement.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."