Michael Sperber writes:
Could you give a hint about detecting UTF-8? (I know what UTF-8
looks
like, but enough about the other coding systems to be able to say what
distinguishes them.)
There are a lot of coding systems. But basically if you have as many
as 3 non-ASCII characters, the chance that any natural language text
"looks like" UTF-8 is vanishingly small. Except at the beginning and
end of the string, a single byte >= 0xC0 gives you information about
*at least* three other bytes: the preceding one may *not* be >= 0xC0,
the following N bytes must be in the range 0x80 to 0xBF, and the next
one after that must not be >= 0xC0.
However, this should all already be part of the 'undecided' coding
system. If it's not working, there's probably something tricky going
on with process buffers.
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta