On 2009-01-16, David Kastrup <dak(a)gnu.org> wrote:
There is a transparent way, however. Note that utf-8 is an encoding
scheme that can, even within 4-byte values, encode more than just legal
utf-8. Pick a 256-byte code page from there (either beyond the
2^21-something threshold, or, saving one byte but being more obfuscate,
in the Unicode pages reserved for utf-16 surrogates and thus left free).
Now this is our XEmacs-internal code page. _Any_ bytes that are not
part of valid codes in a particular encoding (and this _includes_
non-minimal code sequences in utf-8 and utf-16) are encoded using this
XEmacs-internal encoding into "bad byte of value xxx" and are displayed
as \xxx octal escapes byte by byte. When writing out, such "byte" code
points get encoded back into single bytes.
Neat! I think I might add that to my native-Unicode 21.4 (which is on
hold while I do some work, sigh).
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta