Re: Mule autodetection is crap

Thursday, 2 January 2003

        ...
>>>> "SJT" == Stephen J Turnbull
<stephen(a)xemacs.org&gt; writes: 
 SJT> Well, as I suppose you know, what Ben has is mind is a statistical
 SJT> detector that (eg) can distinguish EUC-JP from EUC-TW or EUC-KR
 SJT> (although the really important case is the ISO-8859-X mess and the
 SJT> various non-conforming sets like KOI8 and the Windows 12xx sets).

I can say that there is an extremely good scheme for statistical detection
of various Russian (really Russian, not Cyrillic) encodings, done by
S. V. Znamensky.  I tried it, and it works really wonderful, allowing even
"twice-encoded" text which is seen occasionally.

I thought of adding something like this to XEmacs.  Now if there is a
common infrastructure for this, I'd be glad to help in that area.

 SJT> But the design is completely new, so we need to retune it.  Also
 SJT> there seem to be some bugs in coding priorities.

Uhm,

I'm now playing with current XEmacs-beta.  It recognizes my
~/.xemacs/init.el as UTF-16, and does not let me to change the encoding
with "C-x RET f koi8-r RET" (but "C-x RET c koi8-r C-x C-f" works).  
The
file itself is mostly ASCII, with two strings in Russian inside (near the
end of the file).

Are you interested in such bug reports, and if yes, should I send the file
or what?  Other files it at least detects as "Raw".

set-language-enviroment Cyrillic-KOI8 does not help at all.

--alexm

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Mule autodetection is crap