Ar an ceathrú lá déag de mí Eanair, scríobh Michael Sperber:
"Stephen J. Turnbull" <stephen(a)xemacs.org> writes:
> Michael Sperber writes:
>
> > Could you give a hint about detecting UTF-8? (I know what UTF-8 looks
> > like, but enough about the other coding systems to be able to say what
> > distinguishes them.)
>
> There are a lot of coding systems. But basically if you have as many
> as 3 non-ASCII characters, the chance that any natural language text
> "looks like" UTF-8 is vanishingly small. Except at the beginning and
> end of the string, a single byte >= 0xC0 gives you information about
> *at least* three other bytes: the preceding one may *not* be >= 0xC0,
> the following N bytes must be in the range 0x80 to 0xBF, and the next
> one after that must not be >= 0xC0.
I'm not sure I understand: These are conditions which must hold true for
UTF-8. Is the presence of a valid UTF-8 3-byte encoding in a byte
sequence enough to be able to say that it is UTF-8? What about typical
Latin-1 text, whose UTF-8 encodings will include only 2-byte encodings?
If there are three non-ASCII octets in a text, and they are positioned such
that the text can be interpreted as valid UTF-8, then the chance that the
text is anything but UTF-8 (or something like UTF-8 strings stored in a core
file) is vanishingly small. So Stephen’s statement also holds for Western
European text stored as UTF-8.
On your original question; it’s laughably unlikely (outside of Cygwin, where
this code is not used) that ls will output file names in a coding system
that doesn’t reflect the octets stored in the directory entries. And on OS X
file-name-coding-system (and relatedly, the 'file-name coding system alias)
is unconditionally UTF-8, independent of the locale coding system. I would
suggest binding coding-system-for-read to 'file-name, not
(get-coding-system-from-locale (current-locale)) . For future reference,
too, the coding system alias 'native is equivalent to and much faster than
that form.
--
¿Dónde estará ahora mi sobrino Yoghurtu Nghé, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta