Re: Build Reports

Monday, 28 April 2003

        ...
>>>> "Stephen" == Stephen J Turnbull
<stephen(a)xemacs.org&gt; writes:
>>>> "Jan" == Jan Rychter <jan(a)rychter.com&gt; writes: 
Jan> It did this to me once, when I finished editing a huge HTML file
 Jan> containing ISO-8859-2 characters, saved it and logged off. Much to
 Jan> my surprise, all ISO-8859-2 characters were replaced by
 Jan> tildes. Several hours of work gone.

 Stephen> This still happens (in GNU Emacs as well) if you force the
 Stephen> wrong coding system.  It's much less likely to happen
 Stephen> inadvertantly now, but still possible.

 Stephen> Unfortunately, the intelligent strategy (default to Unicode
 Stephen> for any multilingual document) was howled down by
 Stephen> byte-pinching Europeans and cultural nationalists (primarily
 Stephen> Japanese), and so today we have the embarrassing situation
 Stephen> where there is no released Emacs (neither GNU nor XEmacs) that
 Stephen> speaks Unicode natively.

 Jan> Now, surely there is an explanation -- I think I figured out why
 Jan> this happens, but I do not remember the exact reasons.

 Stephen> If you force buffer-file-coding-system to ISO-8859-1 and
 Stephen> insert Latin 2 characters into the buffer by hand, you will
 Stephen> get this result.  Technically speaking, Mule isn't doing any
 Stephen> conversions.  It's dropping characters that can't be
 Stephen> represented in the coding system you (perhaps implicitly)
 Stephen> requested.

 Jan> My XEmacs doesn't seem to do this now: it sometimes converts
 Jan> non-7bit characters

 Stephen> I believe this is because XEmacs now defaults to preferring a
 Stephen> coding system (iso-2022-7) that can encode all characters.
 Stephen> See?  You don't like the safe setup very much, do you?  You
 Stephen> want XEmacs to do what you want, even though the attempt to do
 Stephen> so guaranteed to cause problems eventually.  This is the
 Stephen> problem we've always faced.  People would like it to be safe,
 Stephen> but even if they express this desire IN IMPERATIVE SYTLE, what
 Stephen> they really insist on is having their minds read.  

Actually, I do like the safe setup -- and no, I do not expect XEmacs to
read my mind, at least not before version 30.0 or so. But I do want it
to scream bloody murder when it is about to lose my data. I certainly do
not expect it to lose data in a default setup, and that is exactly what
it does.

But, I have just checked: I've started "xemacs -vanilla" (that's
21.5-b12), opened a file, entered several ISO-8859-2 characters and
wrote the file to disk. What I get in the file are tildes, with no
warning from XEmacs whatsoever.

Please notice that this is what people in 8859-2 countries get when they
try XEmacs.

To be exact:
  Recent keystrokes:

  C-x C-f t e s t - l a t i n 2 . t x t RET M-x s e t 
  - i n p TAB RET l a t TAB 2 TAB p o TAB RET e , o ' 
  a , l / z ' RET C-x C-s C-h l

I therefore beg to differ with your assessment of the problem being
related to "forcing" anything. Perhaps the problem lies in the 8859-1
coding being "forced" quietly, which I would consider a bug.

Overall, the XEmacs experience is like walking through a minefield --
get file-coding-system-alist right for all names of files you'll be
working with, or get your files quietly mashed to pieces behind your
back. Very frustrating, especially for new users.

[...]
 Jan> into the MULE coding,

 Stephen> I hope not; you should never ever see true Mule coding in a
 Stephen> file.  What you probably mean is iso-2022-7, which is trivial
 Stephen> to convert: read it into a buffer with C-x C-f, then save it
 Stephen> with C-u C-x C-w FILENAME RET TARGET-CODING RET.  Of course if
 Stephen> you choose an encoding that can't represent all the characters
 Stephen> in the buffer, you'll lose data.

I'm sorry, tou are of course correct on all points in this
paragraph. Except one: this procedure used to be broken (I've reported
it on Jun 30 2002 in a message titled "set-buffer-file-coding-system
doesn't") and it is still broken, having just checked:

  -- open a file containing 2022-7 with xemacs -vanilla
  -- C-u C-x C-w new-filename.txt RET iso-8859-2 RET

new-filename.txt still contains 2022-7. Or am I doing something wrong?

 Jan> If XEmacs still does this (loses data by doing a one-way
 Jan> conversion) for ANY REASON whatsoever, it has to be changed.

 Stephen> Unfortunately, it cannot be, at least not yet.
[...]

Can we at least make XEmacs less shy about reporting that it's about to
lose data? The main problem that I have with the behavior described
above is that I have no way of knowing that something is wrong until I
reopen the file again.

--J.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Build Reports