"Stephen J. Turnbull" <stephen(a)xemacs.org> さんは書きました:
>>>>> "Mike" == Mike FABIAN
<mfabian(a)suse.de> writes:
>> We'll have to have Ben explain that. I have no idea what the
>> rationale for that is. Given that most of that function seems
>> to be intended to deal with Windows variance, possibly the
>> whole thing is wrong-headed for Unix.
Mike> I'm sure this must be wrong. The LANG and the LC_* variables
Mike> should not be changed by XEmacs, just used as they are.
Wishful thinking. The POSIX locale concept is fundamentally broken in
a multilingual context, that's all there is to it.
I agree.
Even with the GNU "local locale" extensions. Minimize
changes to
the environment and make sure they're internally consistent, yes.
But as long as a multilingual application may find itself calling
non-multilingual programs, the only way it has to deal with the
issue is to change the locale.
I don't understand this. If the system is running in an UTF-8 locale,
XEmacs should assume that the external applications handle
UTF-8. Actually most of them either do handle UTF-8 or ignore locales
altogether. There are not many external applications which *have* to
be called in ja_JP.eucJP (like kterm for example). I.e. by switching
from ja_JP.UTF-8 to ja_JP.eucJP in
(set-language-environment "Japanese")
creates more problems than it solves.
Same when the system locale is set to ja_JP.eucJP. That is an
indication that most of the programs in that system are supposed to
work in ja_JP.eucJP, i.e. XEmacs should not change that either.
It would be nice if there were an option to easily call certain
external applications in locales different from the locale set when
XEmacs starts (currently one has to write a few lines of Emacs-Lisp to
do that), but the default should be the locale set in the environment
when XEmacs starts.
Even the character set: consider the persistent breakage of Dired
when
people have LANG set.
Dired as well has more problems when you change the locale.
>> No, LANG is the only variable touched by that code
AFAICT.
>> LC_* should be safe from it.
Mike> OK. But that means that when starting xemacs for example
Mike> like this
Mike> LANG=ja_JP.UTF-8 LC_COLLATE=ja_JP.UTF-8
Mike> LC_PAPER=en_US.UTF-8 xemacs
Mike> you would get a mixture of encodings in the LC_* variables
Mike> after XEmacs changed LANG, like this:
Yes. That is another bug in POSIX, I think.
Yes. I think the only way to avoid this bug is to work consistently
only in UTF-8 locales and avoid using the legacy locales completely.
Then all the problems with the different encodings disappear, almost
everything becomes much easier.
I guess you could hack around it by parsing all LC_* variables found
in the environment.
That's not easily possible. What would you do if the
following variables are set
LANG=ja_JP.UTF-8
LC_PAPER=en_US.UTF-8
and you want to change to using "LANG=ja_JP.eucJP"? Then you cannot
keep the letter paper format because there is no such thing as
en_US.eucJP. Changing LANG from a non-UTF-8 locale to a UTF-8
locale is easier because there is a chance that UTF-8 locales exist
for all the legacy locales used in LC_* (UTF-8 locales don't
have to exist but at least it is likely that they exist
whereas something like en_US.eucJP will most certainly not exist).
I agree that the POSIX locale design appears to be very broken. If it
is not allowed to have different encodings in LC_*, then why do we
have to specify locales with encodings at all? Certainly this could
have been designed better but we cannot do anything about that.
Using only UTF-8 cleans up the mess a lot and makes it possible to
forget about the encoding most of the time.
Mike> Such mixtures are not allowed. ``The Open Group Base
Mike> Specifications Issue 6'' says about this:
Like I said, POSIX locales are just plain broken. Consider the
sort(1) problem:
Chinen> When LC_TIME=de_DE@euro and LC_CTYPE=en_US.UTF-8, the
Chinen> multibyte character strings of month are not able to be
Chinen> converted into wide character. The reason why those
Chinen> strings are not is they have different encoding character
Chinen> from LC_CTYPE.
Note that Mule handles this kind of problem transparently.
I'm not saying we won't try to deal with this. I understand why the
Open Group found themselves forced into this position, and XEmacs is
too dependent on external processes for various services to take some
high-handed attitude that we are going to wait for the rest of the
world to do multilingual processing. But it's not easy to handle
POSIX rules without punting on "multilingual".
But switching the locale in XEmacs when calling external programs
creates a lot of problems and I cannot really think of an example
where it solves a problem.
I don't think anybody would start a kterm as an external program from
an XEmacs running in an UTF-8 locale. And if one really wants to do
that, what's wrong with
M-x shell RET
LANG=ja_JP.eucJP kterm
?
--
Mike FABIAN <mfabian(a)suse.de>
http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。