>>>> "Stephen" == Stephen J Turnbull
<stephen(a)xemacs.org>:
>>>> "Jan" == Jan Rychter <jan(a)rychter.com>:
Jan>
Please notice that this is what people in 8859-2 countries get
Jan> when they try XEmacs.
Stephen> No, in fact they don't. At least not if they have LANG set or
Stephen> use latin-unity. What is happening to you, I would guess, is
Stephen> that you have iso-8859-1 as default coding system. If you are
Stephen> using both that and iso-8859-2, you _will_ eventually lose
Stephen> data, even if you switch to vi or enable latin-unity. But
Stephen> iso-8859-1 is a very special case (for historical reasons).
LANG! That is enlightening -- I somehow never thought XEmacs would use
the LANG setting to enforce coding systems for files. In fact, I do have
LANG set to en_US on this machine. I was somehow convinced that XEmacs
tried to stay away from locales as much as possible.
But that could explain where my default coding system came from.
Jan> I therefore beg to differ with your assessment of the problem
Jan> being related to "forcing" anything. Perhaps the problem lies in
Jan> the 8859-1 coding being "forced" quietly, which I would consider a
Jan> bug.
Stephen> Unfortunately, almost nobody in the LANG=*_*.ISO8859-1 locales
Stephen> agrees with you. And those in other locales feel equally
Stephen> strongly about forcing defaults for theirs (with the exception
Stephen> of the Cyrillics, but they want it forced, too, only to KOI8
Stephen> instead of ISO 8859/5, the Mule default).
Jan> Overall, the XEmacs experience is like walking through a minefield
Jan> -- get file-coding-system-alist right for all names of files
Jan> you'll be working with, or get your files quietly mashed to pieces
Jan> behind your back.
Stephen> Or use latin-unity. (This doesn't apply to non-Latin users
Stephen> yet, but they mostly don't have these problems anyway.)
Does this solve all cases? I mean, are you sure that this will trap all
cases of data loss?
Jan> paragraph. Except one: this procedure used to be broken (I've
Jan> reported it on Jun 30 2002 in a message titled
Jan> "set-buffer-file-coding-system doesn't") and it is still broken,
Jan> having just checked:
Stephen> XEmacs 21.5? Quite possibly broken then and still broken now;
Stephen> Ben has been screwing with the coding stuff, and I only
Stephen> shifted to daily use of 21.5 when I passed 21.4 on to Vin.
Stephen> Let me check.... Nope, not broken for me. If I type in a few
Stephen> characters of Latin 2, save as ISO 7-bit, then read the file
Stephen> and save as ISO 8859/2, that's what I get in the file.
Stephen> 21.5.11, CVS 2003-04-22 (ie, just before the beta release of
Stephen> 21.5.12). M-x set-buffer-file-coding-system also works as I
Stephen> expect it to on my system.
Stephen> Maybe you're referring to the fact that if you set the b-f-c-s
Stephen> the buffer contents and display don't change? But that is
Stephen> correct behavior; the buffer and display representations are
Stephen> independent of b-f-c-s. b-f-c-s only affects the
Stephen> representation in the output file. If you want to change the
Stephen> characters in the buffer, you can either use the low-level
Stephen> APIs {en,de}code-coding-region or the more user-friendly UIs
Stephen> in latin-unity.
Here's *exactly* what I did (XEmacs 21.5-b12):
-- open a file containing 2022-7 with xemacs -vanilla using C-x C-f
filename.txt
-- C-u C-x C-w new-filename.txt RET iso-8859-2 RET
new-filename.txt still contains ISO-2022-7 where the ISO-8859-2
characters should be. I'm not talking about on-screen representation,
I'm talking about the file contents as viewed with less or vi (to be
sure nothing messes with display).
Perhaps this functionality is also influenced by my LANG setting?
Jan> new-filename.txt still contains 2022-7. Or am I doing something
Jan> wrong?
Stephen> If the old file contained any Latin 1 characters, then they
Stephen> would continue to be coded using ISO 2022 escapes, at least
Stephen> through XEmacs 21.4; this is the old Mule safety mechanism.
And that is exactly what I would expect! This is also exactly my point:
I would love if XEmacs could encode characters which it thinks do not
make sense in the current environment with any reasonable coding. "Any
reasonable" means any coding that I can recover from. "Tilde coding"
does not belong to that family.
Stephen> Only ASCII and Latin 2 characters would be encoded using ISO
Stephen> 8859/2.
Please notice that in my case, ISO-8859-2 characters do not come
back. All I have in the file is 2022-7. While it is possible that there
are some ISO-8859-1 characters (and I'd expect XEmacs not to touch
those, or leave them in 2022-7), most are ISO-8859-2, and those do not
come back.
[...]
Stephen, thanks for your explanations. They have been very
interesting. But one more question begs asking: what is the benefit of
having your characters reduced to tildes? I mean, what purpose does it
serve? I tried to think of one, but couldn't, whatever I do I'd *always*
prefer my characters being coded in any way whatsoever, just reversible.
Worst thing that will happen is that somebody will produce an E-mail
message containing characters illegal for the coding at hand.
So, while I understand your explanations about the complexity of the
issues involved, I still don't understand the opposition to just
changing or removing the evil piece of code that changes data to tildes.
--J.