>>>> "Jan" == Jan Rychter
<jan(a)rychter.com> writes:
Jan> Please notice that this is what people in 8859-2 countries
Jan> get when they try XEmacs.
No, in fact they don't. At least not if they have LANG set or use
latin-unity. What is happening to you, I would guess, is that you
have iso-8859-1 as default coding system. If you are using both that
and iso-8859-2, you _will_ eventually lose data, even if you switch to
vi or enable latin-unity. But iso-8859-1 is a very special case (for
historical reasons).
Jan> I therefore beg to differ with your assessment of the problem
Jan> being related to "forcing" anything. Perhaps the problem lies
Jan> in the 8859-1 coding being "forced" quietly, which I would
Jan> consider a bug.
Unfortunately, almost nobody in the LANG=*_*.ISO8859-1 locales agrees
with you. And those in other locales feel equally strongly about
forcing defaults for theirs (with the exception of the Cyrillics, but
they want it forced, too, only to KOI8 instead of ISO 8859/5, the Mule
default).
Jan> Overall, the XEmacs experience is like walking through a
Jan> minefield -- get file-coding-system-alist right for all names
Jan> of files you'll be working with, or get your files quietly
Jan> mashed to pieces behind your back.
Or use latin-unity. (This doesn't apply to non-Latin users yet, but
they mostly don't have these problems anyway.)
Jan> paragraph. Except one: this procedure used to be broken (I've
Jan> reported it on Jun 30 2002 in a message titled
Jan> "set-buffer-file-coding-system doesn't") and it is still
Jan> broken, having just checked:
XEmacs 21.5? Quite possibly broken then and still broken now; Ben has
been screwing with the coding stuff, and I only shifted to daily use
of 21.5 when I passed 21.4 on to Vin. Let me check.... Nope, not
broken for me. If I type in a few characters of Latin 2, save as ISO
7-bit, then read the file and save as ISO 8859/2, that's what I get in
the file. 21.5.11, CVS 2003-04-22 (ie, just before the beta release
of 21.5.12). M-x set-buffer-file-coding-system also works as I expect
it to on my system.
Maybe you're referring to the fact that if you set the b-f-c-s the
buffer contents and display don't change? But that is correct
behavior; the buffer and display representations are independent of
b-f-c-s. b-f-c-s only affects the representation in the output file.
If you want to change the characters in the buffer, you can either use
the low-level APIs {en,de}code-coding-region or the more user-friendly
UIs in latin-unity.
Jan> new-filename.txt still contains 2022-7. Or am I doing
Jan> something wrong?
If the old file contained any Latin 1 characters, then they would
continue to be coded using ISO 2022 escapes, at least through XEmacs
21.4; this is the old Mule safety mechanism. Only ASCII and Latin 2
characters would be encoded using ISO 8859/2. I believe this behavior
is compatible with the definition of ISO 8859 (since ISO 8859 is a
version of ISO 2022), but it's not what users expect. The reason this
doesn't come into play in your data-loss case is due to historical
lossage where iso-8859-1 and binary are considered identical. (I.e.,
XEmacs does not consider iso-8859-1 to be an ISO 8-bit coding system,
in the implementation. I'm not sure if GNU Emacs has fixed this
silliness, but we're still stuck with it.) Note that if instead of
the POSIX or .iso8859-1 locale you use a .iso8859-15 locale, you'll
get the "safe but confusing" behavior for Western European text with a
few Central European characters.
Obviously we don't want to change this until latin-unity is both
considered reliable (I have yet to hear complaints of data loss or
inconvenience even from latin-unity users, so that's probably OK) and
has been ported to 21.5. But at that point we will remove the ISO
2022 escape stuff from the ISO 8859 coding systems. (Experimentally;
if there's demand for GNU compatibility, we will implement the
bogus-coding-with-esc coding systems. But my guess is that nobody
except Japanese Mule programmers ever found those useful.)
Jan> Can we at least make XEmacs less shy about reporting that
Jan> it's about to lose data?
In 21.4, put (latin-unity-install) in your init file. This not only
will query before writing to disk if there's incompatible data in the
buffer, but will also recommend a compatible ISO 8859 coding system
(if available) as well as universal coding systems like utf-8 or
iso-2022-7.
In 21.5, I don't know offhand. In my init file, I currently have
(unless (emacs-version>= 21.5) (latin-unity-install)), but I don't
recall why. My intention is to integrate the code into 21.5. The
problem is that although I'm reasonably happy with the UI (it gives
you exactly the info you want AFAICT, mostly only when you really want
it, and is fairly configurable to automatically default your personal
usage cases reasonably), the API still sucks. So I'm not happy about
mandating it for core yet.
I guess a reasonable first step is to fix whatever caused me to
disable it in my init file for 21.5, then release a new latin-unity
package compatible with 21.5. I'll shoot for next weekend, and if
not, I'll followup to let you know why I couldn't do it.
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.