Ar an ceathrú lá déag de mí Eanair, scríobh Julian Bradfield:
On 2009-01-14, Aidan Kehoe <kehoea(a)parhasard.net> wrote:
>
> At the moment, #'query-coding-region ignores invalid Unicode sequences,
> it treats them as always encodable--which they are, it is clear what
> they should correspond to when written to disk. But Unicode says they
> are not encodable.
What do you mean by "invalid Unicode sequence"?
XEmacs characters that reflect that Unicode coding systems encountered
invalid octet sequences on disk. E.g. the output of
(decode-coding-string "\xd8\x00\x00\x01" 'utf-16-be) ;; Invalid surrogates
or
(decode-coding-string "\xe4" 'utf-8) ;; Attempt to decode Latin 1 as utf-8
We produce them so that loading a, for example, koi8-r file as utf-8, making
a single modification, and saving it does not necessarily trash the
non-ASCII content.
How do they get into the buffer in the first place?
unicode.c:1743 and the code that uses that macro.
Pre v23 GNU Emacs have the eight-bit-graphic character set, which they used
for this situation with UTF-8. 23 doesn’t seem to deal with this situation
well, as far as I can tell, I don’t get the option to write the invalid
sequence to disk.
The attached file is UTF-16 with byte order mark, and has an invalid
sequence after the first ". ". Current GNU Emacs deals with it badly. XEmacs
deals with it a little better, but not in a particularly stellar way right
now either.
--
¿Dónde estará ahora mi sobrino Yoghurtu Nghe, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta