Ar an cúigiú lá de mí na Samhain, scríobh Julian Bradfield:
>What you’ve done and what you propose appears, to me, to be less
>maintainable and less coherent (more horrible) than the 21.5
>approach. As
That's the kind of response I was looking for ;-) Can you explain why?
What I'm doing seems ultimately not all that different from the 21.5
approach, except that I'm taking native UCS and UTF-8 to be the immediate
goal, and leaving the Mule stuff alone for legacy compatibility; whereas
21.5 takes Mule as basic, and treats UCS as a (partly arbitrary)
collection of Mule charsets, aiming ultimately for a big-bang abolition
of Mule.
You’re losing unification (thus, often, data), even more than the Mule-UCS
approach does. Your Latin-2, Latin-3, Latin-4, Latin-10 ü code points are
distinct, and you have this extra Unicode encoding for it that the
corresponding coding systems won’t know about. You lose data when you encode
a Latin-10 ü using the iso-8859-1 coding system; you lose data when you
encode a Unicode ü using the Latin-10 coding system.
We’ve fixed this in 21.5 to an extent, in that we don’t normally use the ISO
2022 infrastructure for those coding systems where ISO 2022 encoding is not
the norm. And, in those coding systems, we do look at the Unicode encoding
of a character when deciding whether to trash it or not. We don’t have
robust checks for whether a region is encodable or not--see
http://mid.gmane.org/18509.34907.513276.700290@parhasard.net for some work
in that direction, and for me severely underestimating the amount of work I
have to do this year--but I can’t see that your approach makes it easier.
Some questions that you may or may not have thought about:
How are you encoding your Unicode characters in the escape-quoted external
encoding? Since this encoding is used for auto-saves and for byte-compiled
ELC files, it needs to be capable of encoding every possible XEmacs
character.
What will Lisp code see when it calls #'split-char on your Unicode
characters?
Given the lack of unification in what you described, what unification rules
do you have in place or do you intend to put in place?
Given that the Mule code points in 21.4 are 19 bits wide, with only #x80000
possible code points, and that Unicode’s code points go up to #x10FFFF,
how do you encode the excess of #x8FFFF code points?
[...] Sooner or later, the de-unification problem will be sorted out,
by
a combination of language identification (maybe even using those strongly
discouraged language tag characters) and the definition of suitable sets
of ideographic variation sequences; but until a consensus is reached on
how this is to be done in plain text, we will still have documents all
over the place in all the legacy encodings, with people feeling strongly
about getting the right glyph for the right character.
Within XEmacs, the way to do that that seems to be correct to me is creating
extents with language tags on the level of the various legacy coding
systems, and picking a default for the Unicode coding systems, depending on
the current language environment.
>you describe where you’re coming from, to me it seems to be a
better idea to
>fix VM on 21.5 and to improve 21.5’s speed where it matters to you.
You're probably right, really...but meeting non-functional
requirements is much harder than fiddling with functional
requirements! I'm kind of hoping the real coders will deal with that.
OK. I hope you’re not holding your breath, but you seem to have thought
about most of what I had to say.
The idea has been floating around that it would be worthwhile converting
SXEmacs to Unicode internally and essentially dumping Mule, using GNU iconv
to handle all external encodings (which it ably can, with the fixable
exception of the ISO IR 196 sequences that we use in escape-quoted
byte-compiled files in 21.5). We can’t easily do that on 21.5 given the need
for Windows support, something SXEmacs has dropped. Had you seen that idea?
(The really real motivation for this exercise is that I like to have
some moderately intricate but basically easy task to distract
me from the hard stuff I do for a living, and a project that makes me
look hard at the XEmacs sources increases the chance I might one day
make a real contribution to XEmacs.)
--
¿Dónde estará ahora mi sobrino Yoghurtu Nghé, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta