Joachim Schrod wrote:
[Cc to Ben because it's his code. Ben, this is about a bug report that I sent to
xemacs-beta five days ago; coding autodetection for files with German texts does
not work.]
I have attached a file that has two lines (at the end). If I open
that
file, I get the coding system big5. I would expect to get the coding
system iso-8859-1 or similar.
In private email, Lutz Euler pointed out that this was discussed already in January.
There, Steve mentioned that the problem does not occur if one sets the language
environment. That's not the case here. If I set-language-environment to
"Latin-1", the autodetection still does not work. ("German"
doesn't work either;
but I wouldn't want to use that anyhow as it changes XEmacs' idea of my locale.)
The problem at hand is that there are words with several local characters (from
the GR plane) in a row. When iso2022_detect() in mule-coding.c sees that, it
sets all ISO coding categories to `somewhat-unlikely'.
Lutz posted a three-linee change to mule-coding.c; if more than twice the amount
of odd runs appear than even runs, coding category iso_8_1 is set to
`somewhat-likely'. See
http://list-archive.xemacs.org/xemacs-beta/200601/msg00083.html
This change works and makes auto-detection work for all German files that I
tried. (Many of them were not correctly detected before.) The change has not
been turned into a patch submission. Would you accept a patch with that change,
or would it be dropped?
Cheers,
Joachim
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Joachim Schrod Email: jschrod(a)acm.org
Roedermark, Germany