Re: [Bug: 21.5-b24] Problems with coding systems autodetect

Sunday, 26 February 2006

        Joachim Schrod wrote:

[Cc to Ben because it's his code. Ben, this is about a bug report that I sent to 
xemacs-beta five days ago; coding autodetection for files with German texts does 
not work.]

...
 I have attached a file that has two lines (at the end). If I open
that
 file, I get the coding system big5. I would expect to get the coding
 system iso-8859-1 or similar. 
In private email, Lutz Euler pointed out that this was discussed already in January.

There, Steve mentioned that the problem does not occur if one sets the language 
environment. That's not the case here. If I set-language-environment to 
"Latin-1", the autodetection still does not work. ("German"
doesn't work either; 
but I wouldn't want to use that anyhow as it changes XEmacs' idea of my locale.)

The problem at hand is that there are words with several local characters (from 
the GR plane) in a row. When iso2022_detect() in mule-coding.c sees that, it 
sets all ISO coding categories to `somewhat-unlikely'.
Lutz posted a three-linee change to mule-coding.c; if more than twice the amount 
of odd runs appear than even runs, coding category iso_8_1 is set to 
`somewhat-likely'. See 
http://list-archive.xemacs.org/xemacs-beta/200601/msg00083.html

This change works and makes auto-detection work for all German files that I 
tried. (Many of them were not correctly detected before.) The change has not 
been turned into a patch submission. Would you accept a patch with that change, 
or would it be dropped?

Cheers,
	Joachim

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod				Email: jschrod(a)acm.org
Roedermark, Germany

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: [Bug: 21.5-b24] Problems with coding systems autodetect