"Stephen J. Turnbull" <stephen(a)xemacs.org> writes:
by default. This means that Mule elisp files will not be displayed
correctly by default in the C locale. However, AFAIK all the Mule
elisp is maintained by Japanese, me, and Ben, so they'll have a
Japanese language environment by default or know what to do.
Or, emacs-lisp mode could simply recognize the Mule elisp files and do
the right thing. Catting them in a shell buffer wouldn't display
Japanese contents, sure, but that's a feature, not a bug.
Hrvoje> For example, in a UTF-8 environment, Mule would
treat
Hrvoje> unknown input as UTF-8.
Bu-wha-ha-ha! That means depending on Mule-UCS, which _we don't know
how to maintain_ and Himi doesn't seem to be maintaining.
That was only an example, although a pretty important one, given that
Red Hat 8 defaults to UTF-8 European locales.
Doesn't Ben have a workspace with working Unicode? Or is it too
alpha?
Hrvoje> Needless to say, ISO 2022 autodetection should be
turned
Hrvoje> off by default.
If that's needless to say, then you're definitely missing something.
First of all, it is ISO 2022 autodetection that detects all ISO 8859
coding systems.
I'm not sure what you mean by this. Why do "all ISO 8859 coding
systems" need special autodetection? I'm arguing for all such
autodetection to be turned off. Getting rid of ISO 2022 is orthogonal
to that effort, but is not contrary to it.
But that means that ISO-8859-X is also shut off, because the only
no-conversion coding system is iso-8859-1-unix and its aliases.
In an iso-8859-2 locale, input bytes between 160 and 255 should be
considered Latin 2.
Second, if (in an 8-bit locale) you don't need the escape
sequences,
then you don't need Mule.
That's not quite true. I don't need escape sequences *by default*.
But for example, I want Latin 1 mail and news messages to be rendered
as Latin 1, and ditto for Japenese, etc. Mail (MIME) is only one
example of a format that carries charset information with the message
stream; there are others.
Hrvoje> c) Make especially sure that in single-byte language
Hrvoje> environments (e.g. the "C" locale and iso-8859-* locales,
Hrvoje> but not e.g. UTF-8) the conversions from external to
Hrvoje> internal format and vice versa are reversible.
This is already the case for binary == ISO-8859-1.
Yup. But that doesn't help non-ISO 8859-1 users.
I'm not sure whether it can be done trivially for ISO-8859-X, X
!=
1. (It _is_ also true for "true" ISO-8859-X files; as you know, the
problem is the escape processing.)
As far as I can tell, the problem is that it's not easy to turn off
the escape processing. A lot of internal stuff in Mule depends on
it. But the a/b/c thing I wrote was a wish-list anyway.
However, [using UTF-8 for internal representation] is completely
irrelevant to your concerns.
You're right. It was just a "would be cool to have" item, but in no
way a requirement.
Hrvoje> It can't work perfectly for everyone, but it could
work
Hrvoje> much more reasonably by default for most people.
Sure, but for these purposes I'm Japanese, remember.
True. However, we needn't stick to the POSIX locale like crazy,
either. In Japanese locale, we could enable auto-detection. I have
no problem with "doing whatever the users expect, unless they expect
consistency." Most users don't change their locales every day, after
all.
God be thanked for you and Jamie, I have well-described bugs to work
with.
Yeah. If you need a *very* specific one for the test harness, here's
one from me:
(Assert (> (length
(decode-coding-string
(encode-coding-string (string (make-char 'japanese-jisx0208 56 108))
'iso-2022-jp)
'iso-8859-2))
1))
In other words, encoding an ISO 2022 string as "iso-8859-2" should not
produce a Japanese char.