Ar an séú lá déag de mí Eanair, scríobh Stephen J. Turnbull:
Aidan Kehoe writes:
> The attached file is UTF-16 with byte order mark, and has an invalid
> sequence after the first ". ". Current GNU Emacs deals with it badly.
> XEmacs deals with it a little better, but not in a particularly
> stellar way right now either.
There have been a couple of long threads on Python-Dev (or maybe
Python-3000) about how to deal with these issues. It's not obvious to
me that there is a "stellar" way to deal with the problem.
My personal feeling is that normally an attempt to use a Unicode
coding system to decode something that is invalid according to that
system should be a fatal error, unless the user requests otherwise.
(Yes, we *must* provide for that request before taking this step,
though.) This is pragmatic, not dogmatic; I've seen too many reports
of irreversible data corruption from "DTRT" codecs.
Right, I agree. This question is about the API for that. F1 f
query-coding-region RET --- should there be a preserve-invalid-sequence
argument? Where?
The Unicode consortium folks have thought about this for a long
time,
and the standard specifies that it's fatal. They didn't do that
lightly. The Python folks, whom I accord lots of respect for the
pragmatics of this kind of thing, came to no firm conclusion. Neither
Guido's crystal ball nor his time machine came to the rescue, either.
David Kastrup’s use case in
http://mid.gmane.org/85fyvc3efj.fsf@lola.goethe.zz convinced me that keeping
the invalid sequences around in some form is preferable to not doing so. I
also think the simple use case of opening a Latin-1 file as utf-8, and making
chanages to ASCII text, should not involve an either-or between discarding
your changes and trashing the Latin-1.
Note that Python has a big advantage over us, too, in that codecs
are
written in Python. I really wish that Handa had made CCL a Lisp subset!
AIUI GNU have support for Lisp codecs, we could have ported that. I don’t
see a need for that right now.
--
¿Dónde estará ahora mi sobrino Yoghurtu Nghe, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta