Re: Build Reports

Monday, 28 April 2003

        ...
>>>> "Stephen" == Stephen J Turnbull
<stephen(a)xemacs.org&gt;:
>>>> "Jan" == Jan Rychter <jan(a)rychter.com&gt;:  Jan>
Please notice that this is what people in 8859-2 countries get
 Jan> when they try XEmacs.

 Stephen> No, in fact they don't.  At least not if they have LANG set or
 Stephen> use latin-unity.  What is happening to you, I would guess, is
 Stephen> that you have iso-8859-1 as default coding system.  If you are
 Stephen> using both that and iso-8859-2, you _will_ eventually lose
 Stephen> data, even if you switch to vi or enable latin-unity.  But
 Stephen> iso-8859-1 is a very special case (for historical reasons).

LANG! That is enlightening -- I somehow never thought XEmacs would use
the LANG setting to enforce coding systems for files. In fact, I do have
LANG set to en_US on this machine. I was somehow convinced that XEmacs
tried to stay away from locales as much as possible.

But that could explain where my default coding system came from.

 Jan> I therefore beg to differ with your assessment of the problem
 Jan> being related to "forcing" anything. Perhaps the problem lies in
 Jan> the 8859-1 coding being "forced" quietly, which I would consider a
 Jan> bug.

 Stephen> Unfortunately, almost nobody in the LANG=*_*.ISO8859-1 locales
 Stephen> agrees with you.  And those in other locales feel equally
 Stephen> strongly about forcing defaults for theirs (with the exception
 Stephen> of the Cyrillics, but they want it forced, too, only to KOI8
 Stephen> instead of ISO 8859/5, the Mule default).

 Jan> Overall, the XEmacs experience is like walking through a minefield
 Jan> -- get file-coding-system-alist right for all names of files
 Jan> you'll be working with, or get your files quietly mashed to pieces
 Jan> behind your back.

 Stephen> Or use latin-unity.  (This doesn't apply to non-Latin users
 Stephen> yet, but they mostly don't have these problems anyway.)

Does this solve all cases? I mean, are you sure that this will trap all
cases of data loss?

 Jan> paragraph. Except one: this procedure used to be broken (I've
 Jan> reported it on Jun 30 2002 in a message titled
 Jan> "set-buffer-file-coding-system doesn't") and it is still broken,
 Jan> having just checked:

 Stephen> XEmacs 21.5?  Quite possibly broken then and still broken now;
 Stephen> Ben has been screwing with the coding stuff, and I only
 Stephen> shifted to daily use of 21.5 when I passed 21.4 on to Vin.
 Stephen> Let me check....  Nope, not broken for me.  If I type in a few
 Stephen> characters of Latin 2, save as ISO 7-bit, then read the file
 Stephen> and save as ISO 8859/2, that's what I get in the file.
 Stephen> 21.5.11, CVS 2003-04-22 (ie, just before the beta release of
 Stephen> 21.5.12).  M-x set-buffer-file-coding-system also works as I
 Stephen> expect it to on my system.

 Stephen> Maybe you're referring to the fact that if you set the b-f-c-s
 Stephen> the buffer contents and display don't change?  But that is
 Stephen> correct behavior; the buffer and display representations are
 Stephen> independent of b-f-c-s.  b-f-c-s only affects the
 Stephen> representation in the output file.  If you want to change the
 Stephen> characters in the buffer, you can either use the low-level
 Stephen> APIs {en,de}code-coding-region or the more user-friendly UIs
 Stephen> in latin-unity.

Here's *exactly* what I did (XEmacs 21.5-b12):
  -- open a file containing 2022-7 with xemacs -vanilla using C-x C-f
     filename.txt
  -- C-u C-x C-w new-filename.txt RET iso-8859-2 RET

new-filename.txt still contains ISO-2022-7 where the ISO-8859-2
characters should be. I'm not talking about on-screen representation,
I'm talking about the file contents as viewed with less or vi (to be
sure nothing messes with display).

Perhaps this functionality is also influenced by my LANG setting?

 Jan> new-filename.txt still contains 2022-7. Or am I doing something
 Jan> wrong?

 Stephen> If the old file contained any Latin 1 characters, then they
 Stephen> would continue to be coded using ISO 2022 escapes, at least
 Stephen> through XEmacs 21.4; this is the old Mule safety mechanism.

And that is exactly what I would expect! This is also exactly my point:
I would love if XEmacs could encode characters which it thinks do not
make sense in the current environment with any reasonable coding. "Any
reasonable" means any coding that I can recover from. "Tilde coding"
does not belong to that family.

 Stephen> Only ASCII and Latin 2 characters would be encoded using ISO
 Stephen> 8859/2.  

Please notice that in my case, ISO-8859-2 characters do not come
back. All I have in the file is 2022-7. While it is possible that there
are some ISO-8859-1 characters (and I'd expect XEmacs not to touch
those, or leave them in 2022-7), most are ISO-8859-2, and those do not
come back.

[...]

Stephen, thanks for your explanations. They have been very
interesting. But one more question begs asking: what is the benefit of
having your characters reduced to tildes? I mean, what purpose does it
serve? I tried to think of one, but couldn't, whatever I do I'd *always*
prefer my characters being coded in any way whatsoever, just reversible.
Worst thing that will happen is that somebody will produce an E-mail
message containing characters illegal for the coding at hand.

So, while I understand your explanations about the complexity of the
issues involved, I still don't understand the opposition to just
changing or removing the evil piece of code that changes data to tildes.

--J.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Build Reports