Re: Mule bugs: misidentification (Latin-1 vs. Chinese), revert issues

Monday, 14 January 2008

        Michael Sperber writes:

...
 Could you give a hint about detecting UTF-8?  (I know what UTF-8
looks
 like, but enough about the other coding systems to be able to say what
 distinguishes them.) 
There are a lot of coding systems.  But basically if you have as many
as 3 non-ASCII characters, the chance that any natural language text
"looks like" UTF-8 is vanishingly small.  Except at the beginning and
end of the string, a single byte >= 0xC0 gives you information about
*at least* three other bytes: the preceding one may *not* be >= 0xC0,
the following N bytes must be in the range 0x80 to 0xBF, and the next
one after that must not be >= 0xC0.

However, this should all already be part of the 'undecided' coding
system.  If it's not working, there's probably something tricky going
on with process buffers.

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Mule bugs: misidentification (Latin-1 vs. Chinese), revert issues