Re: Build Reports

Monday, 28 April 2003

        ...
>>>> "Jan" == Jan Rychter
<jan(a)rychter.com&gt; writes: 
    Jan> It did this to me once, when I finished editing a huge HTML
    Jan> file containing ISO-8859-2 characters, saved it and logged
    Jan> off. Much to my surprise, all ISO-8859-2 characters were
    Jan> replaced by tildes. Several hours of work gone.

This still happens (in GNU Emacs as well) if you force the wrong
coding system.  It's much less likely to happen inadvertantly now, but
still possible.

Unfortunately, the intelligent strategy (default to Unicode for any
multilingual document) was howled down by byte-pinching Europeans and
cultural nationalists (primarily Japanese), and so today we have the
embarrassing situation where there is no released Emacs (neither GNU
nor XEmacs) that speaks Unicode natively.

    Jan> Now, surely there is an explanation -- I think I figured out
    Jan> why this happens, but I do not remember the exact reasons.

If you force buffer-file-coding-system to ISO-8859-1 and insert Latin
2 characters into the buffer by hand, you will get this result.
Technically speaking, Mule isn't doing any conversions.  It's dropping
characters that can't be represented in the coding system you (perhaps
implicitly) requested.

    Jan> My XEmacs doesn't seem to do this now: it sometimes converts
    Jan> non-7bit characters

I believe this is because XEmacs now defaults to preferring a coding
system (iso-2022-7) that can encode all characters.  See?  You don't
like the safe setup very much, do you?  You want XEmacs to do what you
want, even though the attempt to do so guaranteed to cause problems
eventually.  This is the problem we've always faced.  People would
like it to be safe, but even if they express this desire IN IMPERATIVE
SYTLE, what they really insist on is having their minds read.
Otherwise, they'd simply use Unicode.  (Granted, Unicode alone will
lose information for a few multilingual Asian users.  However, that is
not a problem for European languages.)

    Jan> into the MULE coding,

I hope not; you should never ever see true Mule coding in a file.
What you probably mean is iso-2022-7, which is trivial to convert:
read it into a buffer with C-x C-f, then save it with C-u C-x C-w
FILENAME RET TARGET-CODING RET.  Of course if you choose an encoding
that can't represent all the characters in the buffer, you'll lose
data.

    Jan> If XEmacs still does this (loses data by doing a one-way
    Jan> conversion) for ANY REASON whatsoever, it has to be changed.

Unfortunately, it cannot be, at least not yet.

For ISO 8859 Latin users, things are made a lot more safe (not to
mention convenient, in the sense of coalescing files that Emacs
normally needs to save as ISO 2022 or Unicode into a single ISO 8859
set where possible) by use of the latin-unity package.  However, it's
still not perfect.

If you really want safety, the answer is what it has always been:
convert your working files to Unicode.  (Or ISO 2022, but that's silly
if Unicode is usable.)  If they need to be distributed in some other
encoding, use XEmacs or iconv to convert the distribution versions.
Use Mule-UCS with existing Emacsen or the 21.5 branch of XEmacs or the
emacs-unicode branch of GNU Emacs.

Or you can use no-mule XEmacs and live with the
one-charset-per-instance restriction.  That's the only safe way to go;
unfortunately that one charset can't be Unicode in a no-mule XEmacs.

    Jan> So, perhaps it is worth investigating why Adrian's XEmacs ate
    Jan> the euro.

Gnus did it.  ;-)  Seriously, Gnus is far too smart for its own good
about these things in some ways, and not smart enough in others.  I'm
willing to work to improve XEmacs's safety (viz latin-unity), but I'm
not willing to worry about any problem that involves a hairball like
Gnus.  If somebody wants to trace through the Gnus code until they've
made it back to XEmacs, I'll take a look at it then.  But it's a
better use of my time to just work directly on latin-unity and similar
code.

-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Build Reports