"Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp> writes:
Hrvoje> No, the idea is fine. But is it the best thing to
infest
Hrvoje> random primitives with CODING-SYSTEM arguments? Isn't
Hrvoje> there a more general variable that can be set? Please
Hrvoje> bear with me -- I know very little about coding systems
Hrvoje> and stuff, so maybe I'm just raving.
Sigh. You _should_ rave. Heaven knows, Erik Naggum does.
:-)
In the current implementation of Mule, coding systems are a property
of ... you can't really be sure. And their semantics are ... rather
ad hoc.
This looks like a major problem, in major need of fixing.
In the particular case of Fmd5 and the uses of it that I know about,
it probably doesn't need it; proper care in implementation should
have Fmd5 use the output coding-system (but this must be specified
by the mailer, and most mailers do the translation to standard
Internet mail encodings as a post-processing step). But then, the
original implementor of md5 didn't consider that Steve Baur was
going to use XEmacs to md5 checksum packages' external
representation; he thought it would be OK as long as the buffer
representation would be checksummed correctly. So I wouldn't bet on
it.
For the record: I *do* note that md5 checksums are useful for things
other than the internal buffer representation. But I don't think that
`md5' *function* should have that additional CODING-SYSTEM argument.
Couldn't it pick the coding system from a generally useful variable,
like coding-system-for-write? Am I making any sense?
The problem is that often textual source which you would like to
combine in a buffer have varied encodings to start with. So some
choice of buffer representation and mechanisms for translation are
needed. Note that "textual source" not only means files that are
encoded differently, but also you may have process and user input
in yet other encodings.
Well, this is true, but the case of `md5' calculating a digest is
surely no harder than the case of `C-x C-s' saving the buffer, if you
see what I mean...
The problems that we are currently discussing stem, in my opinion,
from the lack of an good extent abstraction in Mule. This means that
source and target encodings must be defined either as properties of
characters or as a property of a whole stream. The former is ugly,
time-consuming, and mistake-prone; the latter is simply not
fine-grained enough. So you end up with coding-system specifications
in operators. Yuck.
Yes, I think you have a point.
Concretely, this would be like the current situation with no-mule
XEmacs where you can change a buffer displayed as junk to legible
Croatian by changing the face from a *-iso8859-1 to a *-iso8859-2
font.[4]
That would be cool. Even cooler would be to be able to specify the
default latin* character set, so that when I encounter Š anywhere in
a file, I want to see it as Š, not as the copyright symbol. Sort of
"trus me pal, I know it's latin2."
This is one big advantage of non-Mule XEmacs over Mule XEmacs for
Croatian (and similar languages.)
--
Hrvoje Niksic <hniksic(a)srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
Personifiers Unite! You have nothing to lose but Mr. Dignity!