[Bug: 21.5-b25] Problems with latin-unity and VM
kehoea at parhasard.net
Wed Feb 7 06:19:44 EST 2007
Ar an seachtú lá de mí Feabhra, scríobh Joachim Schrod:
> [...] But I still want to address that either I don't understand the
> precondition under which latin-unity operates or that precondition has a
> This is a buffer where the buffer-file-coding-system has been
> explicitly set to binary. I would have expected that latin-unity does
> NOT attempt to change the encoding at all for such files -- after all,
> they are declared as binary and the notion of Latin characters in
> binary files makes no sense.
That’s what the FSF have. We (XEmacs) don’t distinguish iso-8859-1 and
binary in your sense; they do. Neither version distinguishes ASCII and
> By definition, binary files consist only of octets and should be written
> out as-is, without any attempt to change any octets.
But if the corresponding buffer contains characters that have no clear
mapping to octets with the binary coding system, writing that buffer to disk
will lose data.
control-1 characters have a clear mapping to octets with the binary coding
system. latin-unity not knowing about that mapping is the bug.
In the correct course of events, VM will use MIME-encoding to generate a
buffer where every character is a member of either ascii, control-1 or
latin-iso8859-1, then write the buffer using 'binary. latin-unity will then
look through the contents of the buffer, see that it can be encoded using
'binary, and all will be well.
> Actually, I can augment my problem description: VM may not be involved
> at all.
> If I open the file, set the buffer-file-coding-system explicitly
> to binary, modify it, and want to save it, latin-unity detects a
> coding system conflict. But it should not do so for binary files.
> Therefore I would have thought that #'latin-unity-sanity-check checks
> for the special case of binary files and returns that encoding as
> save. (Or, maybe that at least binary is added to
> 'latin-unity-ucs-list, because one could classify it as an universal
No, one could not. Consider; how can you interpret a sequence of octets on
disk as U+5357, the Han character for ‘southwards,’ without abandoning the
treatment as ‘binary’--a sequence of octets--and checking instead for
ISO-2022-1 or UTF-8 sequences? If that character is in an XEmacs buffer, and
that buffer is to be written using the binary coding system, how can you
represent that character on disk without resorting to an alternative
> -- all octets in binary files represent themselves.)
All octets in binary files, when those files are read by XEmacs as binary,
are represented by characters in the ascii, control-1 or latin-iso8859-1
character sets. This was a choice we made--an alternative would have been a
combination of latin-jisx0201, control-1 and latin-iso8859-2. There are
better reasons for the first choice, but it was still a choice.
> Of course, it may well be that I have again misunderstood the approach
> for coding systems, therefore I would appreciate an explanation.
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
More information about the XEmacs-Beta