Regexps, for one. Font-handling, for another; OOo and Firefox do a
crummy job with mixed Han texts. I don't know how we're going to
emulate Mule in that respect with Unicode as the primary internal
character set.
I think the Unicode-ish party line on that one would be:
"XEmacs is a text editor, not a typesetting program. It is sufficient
that plain text be readable; it is not necessary for plain text in an
editor to be typographically perfect. Han unification has been
designed so that text is always readable, even if viewed with fonts
from the wrong locale. Correct typography is the responsibility of
higher-level markup and applications."
(Apart from the first sentence, that is a brief paraphrase of the
Unibook section on the topic.)
Now I don't know how true that is. *My* problem is that as someone who
doesn't speak or read any East Asian language, Han unification does
unify glyphs that to me look quite distinct; but if to any CJK
speaker, they don't, then I have to agree with the party line.
Seriously f*cked-up file systems are another (try
reading a file whose name contains UTF-8, KOI8-R, and Shift JIS in
different segments in them other apps ... yes, I've seen such!) In
fact, Python just went through a big debate on coding systems for the
fiile system, which ended when Guido declared that designing a system
to do it right was too hard, so they went with a 99% proposal. And
there's a huge amount of tweaking in the detection code and other
places.
How do systems deal with the problem that in some encodings (any
ISO2022 that allows general character sets) there are many
octet-strings that encode the same abstract text string?
Come to that, how do UTF-8 based filesystems (Windows, Mac) behave
when faced with a filename that is invalid - or are the OSes
sufficiently well written to validate filename on creation?
(Many years ago, we had a Pyramid Unix system, which had a network file
system interface to the Vaxen. This interface did so little checking
of filenames that it was possible, from a Vax, to create a Unix file
on the Pyramid with a '/' in its name! Of course, the only way to
remove it, or access it in any way, was from a Vax.)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta