Julian Bradfield writes:
I think the Unicode-ish party line on that one would be:
"XEmacs is a text editor, not a typesetting program.
I see no reason why it shouldn't be both. However, I believe that
Unicode is sufficient to distinguish different characters, and that
language information should be included in the markup, not in the
character code. On the other hand, I think that language structure
should be reflected in the document structure, not implied by the
characters contained.
Now I don't know how true that is. *My* problem is that as
someone
who doesn't speak or read any East Asian language, Han unification
does unify glyphs that to me look quite distinct; but if to any CJK
speaker, they don't, then I have to agree with the party line.
There is a minority of CJK speakers who disagree. AFAIK the most
vocal ones are mostly Japanese. As a non-speaker, though, you should
not trust your eyes. The number and order in which the strokes are
written is very important, more so than the orientation in many cases.
For example, are you aware that in Han characters, rectangles are
usually triangles (it only takes three strokes to write the character
for "mouth", which is a square)?
However, the not-so-extremists among those folks basically go to the
extent of saying that "A" in Lucida Typewriter is not the same as "A"
in Fraktur! (Extremists even take exception to systematic differences
similar to the use of serifs in Latin glyphs.) There really is no
consensus as to where to draw the line among them, either, whereas the
Unicode advocates have a set of rules that are easy to apply in many
thousands of common cases and ambiguous in very few cases, even for
rare glyphs (except for the case of "lost" glyphs whose meaning is
uncertain). Even the fanatics agree that the unified characters are
closely related, though the Japanese ones insist that "Japanese"
characters have an ineffable "Japanese spirit" not present in Chinese
versions ....
How do systems deal with the problem that in some encodings (any
ISO2022 that allows general character sets) there are many
octet-strings that encode the same abstract text string?
They invariably treat those strings as different strings, just as Mule
does (except as modified by latin-unity and similar GNU features).
(Many years ago, we had a Pyramid Unix system, which had a network
file
system interface to the Vaxen.
Oops! That was your example not Glynn's, I see. Mea maxima culpa!
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta