Re: Mule bugs: misidentification (Latin-1 vs. Chinese), revert issues

Monday, 23 October 2006

 Ar an tríú lá is fiche de mí Deireadh Fómhair, scríobh Michael Sperber: 

...
 I open a file with UTF-8 coding-system.  I touch that file outside
of
 XEmacs and then do M-x revert-buffer RET.  The non-ASCII characters
 get mangled.  From the looks of it, upon re-reading the file is
 treated as Latin-1 (i.e. multibyte UTF-8 encodings get turned into the
 characters represented by their component bytes), and the result is
 then translated to UTF-8.  (The modeline still says "UTF-8", though.)
 For example, "Anfänger" gets turned into "AnfÃ¤nger" upon revert. 
(I
 hope Gnus hasn't screwed this up on send.) 
There’s a comment from Ben in the sources about this problem: 

  /* The replace-mode code is currently implemented by comparing the
     file on disk with the contents in the buffer, character by character.
     That works only if the characters on disk are exactly what will go into
     the buffer -- i.e. `binary' conversion.

     FSF tries to implement this in all situations, even the non-binary
     conversion, by (in that case) loading the whole converted file into a
     separate memory area, then doing the comparison.  I really don't see
     the point of this, and it will fail spectacularly if the file is many
     megabytes in size.  To try to get around this, we could certainly read
     from the beginning and decode as necessary before comparing, but doing
     the same at the end gets very difficult because of the possibility of
     modal coding systems -- trying to decode data from any point forward
     without decoding previous data might always give you different results
     from starting at the beginning.  We could try further tricks like
     keeping track of which coding systems are non-modal and providing some
     extra method for such coding systems to be given a chunk of data that
     came from a specified location in a specified file and ask the coding
     systems to return a "sync point" from which the data can be read
     forward and have results guaranteed to be the same as reading from the
     beginning to that point, but I really don't think it's worth it.  If
     we implemented the FSF "brute-force" method, we would have to put a
     reasonable maximum file size on the files.  Is any of this worth it?
     --ben

     */

Now, I should have said OF COURSE IT’S WORTH IT a couple of years ago and
done something about it. But as it is, it’s a clear bug.

-- 
Santa Maradona, priez pour moi!

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Mule bugs: misidentification (Latin-1 vs. Chinese), revert issues