Joachim Schrod <jschrod(a)acm.org> writes:
Well, speaking of UTF-8: Since XEmacs is very happy to destroy lots
of my files with its supposedly smart encoding detection -- and does
so WITHOUT any warning -- your demand for more thought about error
recovery is right on spot. But there seems to be other areas that
are in more severe need of that error handling than recovery from
wrong coding cookies (namely, automatic encoding sniffing). I have
had literally dozens of UTF-8 files with one single Latin1 char in
them, that got reencoded by XEmacs when I opened and saved
them. (I.e., when I didn't pay enough attention to the modeline in
the process of quickly modifying one or two words.) In all these
cases, reliance on a user-supplied coding cookie would have saved me
untold hours and hours of work to redo the result of XEmacs
automatic encoding detection which Really Really Really Sucks.
In this case, I think a solution from the coding system implementation
angle should obliterate the problem more reliably than approaching it
from the detection angle. All of the latin-X codings as well as utf-8
have the property that _valid_ characters have exactly one valid
external representation. If Emacs' internal buffer encoding offers a
possibility for encoding "invalid byte with code xx", then loading and
saving a file without changing the encoding will preserve the
contents. Indeed, an "invalid byte with code xx" kind of character
can't be saved in latin-1 since latin-1 does not usually have an
invalid byte with code xx.
So there are a lot of possibilities for detecting uncurable
inconsistencies and maintaining file coherency even across wrong
detections. I don't think that the potential in that area is used to
its full extent. Automatic detection can't _always_ work right
(random bit patterns _always_ have the potential to look like a coding
cookie). But one can make pretty sure that most used encodings will
preserve all the information present in the source file.
Throwing an error if the coding system is not sufficient is much
preferred to the current state of affairs (arbitrarily choosing a
coding system that XEmacs thinks is right).
As long as it can uniquely encode the buffer in the choice taken, and
the buffer uniquely represents the input encoding, at least the damage
is reversible.
Personally, I find that Emacs 22 does a remarkable job at not
corrupting files, even though there are (quite rare for me, but as a
Western European one is somewhat favored) times when the initial
detection goes wrong.
I suppose that if I were to have Escape-code based encodings in my
normal work set of encodings (when there are several different legal
inputs possible for the same sequence of characters) I might get hit
more often by problems, but as things stand, I can't actually
complain.
--
David Kastrup, Kriemhildstr. 15, 44793 Bochum
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta