Stephen writes:
Julian Bradfield writes:
> I think the Unicode-ish party line on that one would be:
> "XEmacs is a text editor, not a typesetting program.
I see no reason why it shouldn't be both. However, I believe that
Urgh. Typesetting is a hard task. Have you ever read the TeX source
code? (It is, incidentally, verging on a tragedy that TeX reached
stability before Unicode was really on a roll. If Unicode had been
five years earlier, or TeX five years later, we would be in a much
better position than we are.)
There is a minority of CJK speakers who disagree. AFAIK the most
vocal ones are mostly Japanese. As a non-speaker, though, you should
not trust your eyes. The number and order in which the strokes are
written is very important, more so than the orientation in many cases.
For example, are you aware that in Han characters, rectangles are
usually triangles (it only takes three strokes to write the character
for "mouth", which is a square)?
Oh yes. I have a basic knowledge, sufficient to use a dictionary.
The kind of thing Unicode does that is really annoying is exemplified
by one of the characters I use most often, namely U+5C06 将. If you're
seeing this in a PRC Chinese font, you'll see the top right component
being an Evening radical 夕. If you're seeing it in a Japanese or Taiwan-CNS
font, you'll see the top right component as a Claw radical 爪. This isn't
a traditional/simplified distinction; the traditional version is
U+5C07 將 (I'm relying on mule-ucs and VM (if I have utf-8 as my
primary charset, which I think I have) to get these out as the right
Unicode values - I haven't yet started using my own Unicode XEmacs for
my mail!), which has the Evening radical in all fonts (in the 4-stroke
rather than 3-stroke version), as well as the fancy 4-stroke
Half-tree-trunk radical rather than the simplified 3-stroke one.
The difference between the reference glyph for U+5C06 and its
rendition in CNS and Japanese is a clear example of a difference that
should prevent unification: a different radical in a component.
Nonetheless, the IRG went ahead and unified them anyway, on the
grounds that no existing character set distinguished them, and they
are in fact stylistic variants of the same abstract character, even
though they might not be.
If you are faced with the Japanese/CNS variant of U+5C06, and try to
look it up in the Unihan radical/stroke index, you will fail, because
it isn't there. And there's no general rule that Evening/Claw radicals
are interchangeable in some positions, even though ultimately it's
presumably because an old form U+355A 㕚 of the claw radical (so
obscure it's on CNS plane 5) is very similar to the evening radical;
you just have to know it for this character.
Unicode advocates have a set of rules that are easy to apply in many
thousands of common cases and ambiguous in very few cases, even for
rare glyphs (except for the case of "lost" glyphs whose meaning is
uncertain).
I wouldn't mind, if they actually applied their own criteria
rigorously.
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta