"Stephen J. Turnbull" <stephen(a)xemacs.org> さんは書きました:
>>>>> "Mike" == Mike FABIAN
<mfabian(a)suse.de> writes:
Mike> I think XEmacs should set useful defaults depending on the
Mike> locale used in the system when XEmacs was started.
Mike> I don't want that
Mike> (set-language-environment "Japanese")
Mike> sets ja_JP.eucJP locale again.
We'll have to have Ben explain that. I have no idea what the
rationale for that is. Given that most of that function seems to be
intended to deal with Windows variance, possibly the whole thing is
wrong-headed for Unix.
I'm sure this must be wrong. The LANG and the LC_* variables
should not be changed by XEmacs, just used as they are.
Mike> I have only LANG set, I have neither set any LC_*
variables
Mike> nor LC_ALL, I had set these as well I probably would have to
Mike> remember and restore them as well.
No, LANG is the only variable touched by that code AFAICT. LC_*
should be safe from it.
OK. But that means that when starting xemacs for example like this
LANG=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8 LC_PAPER=en_US.UTF-8 xemacs
you would get a mixture of encodings in the LC_* variables after
XEmacs changed LANG, like this:
mfabian@magellan:~$ LANG=ja_JP.eucJP LC_COLLATE=ja_JP.UTF-8 LC_PAPER=en_US.UTF-8
locale
LANG=ja_JP.eucJP
LC_CTYPE="ja_JP.eucJP"
LC_NUMERIC="ja_JP.eucJP"
LC_TIME="ja_JP.eucJP"
LC_COLLATE=ja_JP.UTF-8
LC_MONETARY="ja_JP.eucJP"
LC_MESSAGES="ja_JP.eucJP"
LC_PAPER=en_US.UTF-8
LC_NAME="ja_JP.eucJP"
LC_ADDRESS="ja_JP.eucJP"
LC_TELEPHONE="ja_JP.eucJP"
LC_MEASUREMENT="ja_JP.eucJP"
LC_IDENTIFICATION="ja_JP.eucJP"
LC_ALL=
mfabian@magellan:~$
Such mixtures are not allowed. ``The Open Group Base Specifications
Issue 6'' says about this:
If different character sets are used by the locale categories, the
results achieved by an application utilizing these categories are
undefined.
(see
http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html)
In the worst case, applications may even crash.
I remember a bug report concerning "sort" where "sort" aborted when
LC_TIME and LC_CTYPE were set to locales with different encodings:
(SuSE Bugzilla
http://bugzilla.suse.de/show_bug.cgi?id=26506)
When starting sort like this,
1. unset all LC_* variables. Unset LANG (or set to POSIX)
2. echo -e "l4\nl3" | LC_TIME=de_DE@euro LC_CTYPE=en_US.UTF-8 sort
if failed with a message like
sort: sort.c:717: inittables_mb: Assertion `mblength != (size_t)-1 && mblength
!= (size_t)-2' failed.
Aborted
This was a bug in sort which was fixed by Mitsuru Chinen:
Chinen> Additional Comment #8 From Mitsuru Chinen 2003-07-16 05:07
Chinen>
Chinen> Hello all,
Chinen>
Chinen> sort utility stores the multibyte character strings of months
Chinen> into buffers at first. This behavior is for `-M' option
Chinen> (i.e. compare acccording to month.) At that time, the month
Chinen> multibyte character stirngs are converted into a wide
Chinen> character string in order to ignore the different of uppercase
Chinen> and lowercase.
Chinen>
Chinen> When LC_TIME=de_DE@euro and LC_CTYPE=en_US.UTF-8, the
Chinen> multibyte character strings of month are not able to be
Chinen> converted into wide character. The reason why those strings
Chinen> are not is they have different encoding character from
Chinen> LC_CTYPE.
Chinen>
Chinen> I'll make a patch not to initialize the string of month when
Chinen> `-M' option is not specified. But I will not support the case
Chinen> where `-M' option is specified and the value of LC_TIME is
Chinen> different from the one of LC_CTYPE. I think supporting such a
Chinen> case is difficult by the above-mentioned reason. (The Single
Chinen> Unix specification also says the following:
→ see ``The Open Group Base Specifications Issue 6'' above.
--
Mike FABIAN <mfabian(a)suse.de>
http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。