Ar an ceathrú lá déag de mí Eanair, scríobh Julian Bradfield: 
 On 2009-01-14, Aidan Kehoe <kehoea(a)parhasard.net> wrote:
 >
 > At the moment, #'query-coding-region ignores invalid Unicode sequences,
 > it treats them as always encodable--which they are, it is clear what
 > they should correspond to when written to disk. But Unicode says they
 > are not encodable.
 
 What do you mean by "invalid Unicode sequence"? 
XEmacs characters that reflect that Unicode coding systems encountered
invalid octet sequences on disk. E.g. the output of 
(decode-coding-string "\xd8\x00\x00\x01" 'utf-16-be) ;; Invalid surrogates
or 
(decode-coding-string "\xe4" 'utf-8) ;; Attempt to decode Latin 1 as utf-8
We produce them so that loading a, for example, koi8-r file as utf-8, making
a single modification, and saving it does not necessarily trash the
non-ASCII content. 
 How do they get into the buffer in the first place? 
unicode.c:1743 and the code that uses that macro.
Pre v23 GNU Emacs have the eight-bit-graphic character set, which they used
for this situation with UTF-8. 23 doesn’t seem to deal with this situation
well, as far as I can tell, I don’t get the option to write the invalid
sequence to disk. 
The attached file is UTF-16 with byte order mark, and has an invalid
sequence after the first ". ". Current GNU Emacs deals with it badly. XEmacs
deals with it a little better, but not in a particularly stellar way right
now either. 
-- 
¿Dónde estará ahora mi sobrino Yoghurtu Nghe, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta