[ Added Cc to xemacs-mule ]
Jan Vroonhof <vroonhof(a)math.ethz.ch> writes:
However I still think there is a problem with file coding
there. This code presumably worked in latin-1 and Japanese
environments. Given that he also has a problem writing the file out
again I think this illustrates a general problem. IMHO, All coding
systems should be such that when an arbitrary binary file is read in
with them and written back out the things should be the same.
Even I admit this is impossible. I say "even I" because I've had
many requirements along similar lines, but this one really can't be
met in the current framework.
A little background: when we "read in" a file, we always convert it to
the internal representation. That's what the coding systems are for.
All the other manipulations are performed with the internal data.
Now, when we "write out" the file, we convert it to an external
representation, which might or might not be the same as the one we
read the file in was.
Even when reading and writing representations are the same, you can
lose data. As Stephen J. Turnbull illustrated: imagine that we are
reading a binary file with ISO-2022 and that it contains binary
sequences
<Ltn2><Ltn1><Ltn2>some-chars...
The switch from Latin 2 to Latin 1 and back to Latin 2 will be lost on
the input because it will have been "optimized away". And these
switching sequences are by no means rare. :-(
If you consider that all of out Latin N coding systems are in fact
ISO-2022 with the appropriate default for 160-255 range, you see why
Mule is in trouble.