Ar an tríú lá is fiche de mí na Samhain, scríobh David Kastrup:
> I have a tentative plan to add a charset to XEmacs, 256
characters of
> which reflect corrupt Unicode data. These 256 characters will be
> generated by Unicode-oriented coding systems when they encounter
> invalid data:
>
> (decode-coding-string "\x80\x80" 'utf-8)
> => "\200\200" ;; With funky redisplay properties once display tables
> ;; and char tables are integrated. Which, whee, is more
> ;; work.
Here is what Emacs 22 returns:
#("\xc2\x80\xc2\x80" 0 2 (display #("\\200" 0 4 (face escape-glyph))
help-echo utf-8-help-echo untranslated-utf-8 128) 2 4 (display #("\\200" 0 4
(face escape-glyph)) help-echo utf-8-help-echo untranslated-utf-8 128))
> And will be ignored by them when writing:
>
> (encode-coding-string (decode-coding-string "\x80\x80" 'utf-8)
'utf-8)
> => ""
Here is what Emacs 22 returns:
"\200\200"
Quite an old GNU Emacs 23.0.0 gives me this:
(encode-coding-string "\x80\x80" 'utf-8)
=> "\200\200"
(decode-coding-string "\x80\x80" 'utf-8)
=> "\200\200"
Savannah’s being unco-operative about allowing me to cvs update, otherwise
it would be worth reporting the former as a bug. Were it my implementation,
I would regard the latter as a bug too.
Of course, the internal coding for Emacs 22 is emacs-mule, not utf-8
based, so this is not completely relevant. But maybe it is
interesting, nevertheless.
It is, thank you.
> This will allow applications like David Kastrup’s
reconstruct-utf-8
> sequences-from-fragmentary-TeX-error-messages to be possible, while
> not contradicting the relevant Unicode standards. With Unicode as
> the internal encoding, there’s no need to have a separate Mule
> character set; we can stick their codes somewhere above the astral
> planes. But we should maintain the same syntax code for them. Note
> also that, as far as I can work out, these 256 codes will be
> sufficient for representing error data for all the other
> Unicode-oriented representations well as UTF-8.
Not just for "unicode-oriented". The recipe should be workable for
the iso-latin-* stuff as a file encoding, too, I think.
Hmm? _Are_ there invalid sequences for the ISO-8859-N file encodings?
--
Santa Maradona, priez pour moi!
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta