Re: [Bug: 21.5-b25] Problems with latin-unity and VM

Wednesday, 7 February 2007

 Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod: 

...
 [...] But I still want to address that either I don't understand
the
 precondition under which latin-unity operates or that precondition has a
 flaw.

 This is a buffer where the buffer-file-coding-system has been
 explicitly set to binary. I would have expected that latin-unity does
 NOT attempt to change the encoding at all for such files -- after all,
 they are declared as binary and the notion of Latin characters in
 binary files makes no sense. 
That’s what the FSF have. We (XEmacs) don’t distinguish iso-8859-1 and
binary in your sense; they do. Neither version distinguishes ASCII and
binary.

...
 By definition, binary files consist only of octets and should be
written
 out as-is, without any attempt to change any octets. 
But if the corresponding buffer contains characters that have no clear
mapping to octets with the binary coding system, writing that buffer to disk
will lose data.

control-1 characters have a clear mapping to octets with the binary coding
system. latin-unity not knowing about that mapping is the bug.

In the correct course of events, VM will use MIME-encoding to generate a
buffer where every character is a member of either ascii, control-1 or
latin-iso8859-1, then write the buffer using 'binary. latin-unity will then
look through the contents of the buffer, see that it can be encoded using
'binary, and all will be well.

...
 Actually, I can augment my problem description: VM may not be
involved
 at all. 

     If I open the file, set the buffer-file-coding-system explicitly
     to binary, modify it, and want to save it, latin-unity detects a
     coding system conflict. But it should not do so for binary files.

 Therefore I would have thought that #'latin-unity-sanity-check checks
 for the special case of binary files and returns that encoding as
 save. (Or, maybe that at least binary is added to
 'latin-unity-ucs-list, because one could classify it as an universal
 encoding  
No, one could not. Consider; how can you interpret a sequence of octets on
disk as U+5357, the Han character for ‘southwards,’ without abandoning the
treatment as ‘binary’--a sequence of octets--and checking instead for
ISO-2022-1 or UTF-8 sequences? If that character is in an XEmacs buffer, and
that buffer is to be written using the binary coding system, how can you
represent that character on disk without resorting to an alternative
representation?

...
 -- all octets in binary files represent themselves.) 
All octets in binary files, when those files are read by XEmacs as binary,
are represented by characters in the ascii, control-1 or latin-iso8859-1
character sets. This was a choice we made--an alternative would have been a
combination of latin-jisx0201, control-1 and latin-iso8859-2. There are
better reasons for the first choice, but it was still a choice. 

...
 Of course, it may well be that I have again misunderstood the
approach
 for coding systems, therefore I would appreciate an explanation. 
-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998