>>>> "Stephen" == Stephen J Turnbull
<stephen(a)xemacs.org> writes:
>>>> "Jan" == Jan Rychter <jan(a)rychter.com> writes:
Jan> It did this to me once, when I finished editing a huge HTML file
Jan> containing ISO-8859-2 characters, saved it and logged off. Much to
Jan> my surprise, all ISO-8859-2 characters were replaced by
Jan> tildes. Several hours of work gone.
Stephen> This still happens (in GNU Emacs as well) if you force the
Stephen> wrong coding system. It's much less likely to happen
Stephen> inadvertantly now, but still possible.
Stephen> Unfortunately, the intelligent strategy (default to Unicode
Stephen> for any multilingual document) was howled down by
Stephen> byte-pinching Europeans and cultural nationalists (primarily
Stephen> Japanese), and so today we have the embarrassing situation
Stephen> where there is no released Emacs (neither GNU nor XEmacs) that
Stephen> speaks Unicode natively.
Jan> Now, surely there is an explanation -- I think I figured out why
Jan> this happens, but I do not remember the exact reasons.
Stephen> If you force buffer-file-coding-system to ISO-8859-1 and
Stephen> insert Latin 2 characters into the buffer by hand, you will
Stephen> get this result. Technically speaking, Mule isn't doing any
Stephen> conversions. It's dropping characters that can't be
Stephen> represented in the coding system you (perhaps implicitly)
Stephen> requested.
Jan> My XEmacs doesn't seem to do this now: it sometimes converts
Jan> non-7bit characters
Stephen> I believe this is because XEmacs now defaults to preferring a
Stephen> coding system (iso-2022-7) that can encode all characters.
Stephen> See? You don't like the safe setup very much, do you? You
Stephen> want XEmacs to do what you want, even though the attempt to do
Stephen> so guaranteed to cause problems eventually. This is the
Stephen> problem we've always faced. People would like it to be safe,
Stephen> but even if they express this desire IN IMPERATIVE SYTLE, what
Stephen> they really insist on is having their minds read.
Actually, I do like the safe setup -- and no, I do not expect XEmacs to
read my mind, at least not before version 30.0 or so. But I do want it
to scream bloody murder when it is about to lose my data. I certainly do
not expect it to lose data in a default setup, and that is exactly what
it does.
But, I have just checked: I've started "xemacs -vanilla" (that's
21.5-b12), opened a file, entered several ISO-8859-2 characters and
wrote the file to disk. What I get in the file are tildes, with no
warning from XEmacs whatsoever.
Please notice that this is what people in 8859-2 countries get when they
try XEmacs.
To be exact:
Recent keystrokes:
C-x C-f t e s t - l a t i n 2 . t x t RET M-x s e t
- i n p TAB RET l a t TAB 2 TAB p o TAB RET e , o '
a , l / z ' RET C-x C-s C-h l
I therefore beg to differ with your assessment of the problem being
related to "forcing" anything. Perhaps the problem lies in the 8859-1
coding being "forced" quietly, which I would consider a bug.
Overall, the XEmacs experience is like walking through a minefield --
get file-coding-system-alist right for all names of files you'll be
working with, or get your files quietly mashed to pieces behind your
back. Very frustrating, especially for new users.
[...]
Jan> into the MULE coding,
Stephen> I hope not; you should never ever see true Mule coding in a
Stephen> file. What you probably mean is iso-2022-7, which is trivial
Stephen> to convert: read it into a buffer with C-x C-f, then save it
Stephen> with C-u C-x C-w FILENAME RET TARGET-CODING RET. Of course if
Stephen> you choose an encoding that can't represent all the characters
Stephen> in the buffer, you'll lose data.
I'm sorry, tou are of course correct on all points in this
paragraph. Except one: this procedure used to be broken (I've reported
it on Jun 30 2002 in a message titled "set-buffer-file-coding-system
doesn't") and it is still broken, having just checked:
-- open a file containing 2022-7 with xemacs -vanilla
-- C-u C-x C-w new-filename.txt RET iso-8859-2 RET
new-filename.txt still contains 2022-7. Or am I doing something wrong?
Jan> If XEmacs still does this (loses data by doing a one-way
Jan> conversion) for ANY REASON whatsoever, it has to be changed.
Stephen> Unfortunately, it cannot be, at least not yet.
[...]
Can we at least make XEmacs less shy about reporting that it's about to
lose data? The main problem that I have with the behavior described
above is that I have no way of knowing that something is wrong until I
reopen the file again.
--J.