"Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp> writes:
First, Hrvoje's example is with respect to binary _files_. A
user will have multiple backups, in principle there need not be a
problem. But if you trust Mule, reading a binary file can, and often
does, result in a non-raw coding system due to autodetection. This
can definitely destroy data; I've seen it happen. If it can happen to
files, it will obviously be possible for volatile streams.
Yes, it's likely to happen.
Second, the point of having a shell-mode is that the behavior of the
shell is volatile; you cannot count on repeating it.
But why don't you save data to a separate file if you can't
repeat it again? Or why not use binary coding-system?
Third, given that all 8-bit ISO-2022 codes have the same space, it
quite possible for an unsuspecting user to end up in a "strange coding
system". Happens all the time on the (Japanese) Web, because you
never know when an EUC-JP page will link an ISO-8859-1 page. The
former are rarely correctly announced by the server, and the latter is
(unfortunately) allowed not to announce because it is the default.
(Fortunately, web browsers by their nature must do the buffering I
As long as you are doing autodetection, this case can't be
avoided. The only way not to lose any data is using binary
coding-system. But I don't think that is what user wants as
you have written in attached message.
Yoshiki> What we need is automatic detection and explicit
Yoshiki> specification of what coding-system to use.
I don't understand this. Looks like a contradiction to me, but I'm
sure I'm just missing your point.
I guess I saved too much words. What I meant is, autodetect
by default. Allow user to specify coding-system in case
>> Remember, you can't do the equivalent of `C-x C-k
RET C-u C-x
>> C-f "file" RET "the-right-encoding" RET' on a
Yoshiki> Now we are discussing how to do that sensibly, aren't we?
I thought we were discussing autodetection, not recovery from
I thought we were discussing what is a good default for
shell-mode. It includes whether we should do autodetection
Remember, the better the autodetection is, the more users trust it,
the less care they take, and the more surprised they are when it does
This is OK under the current regime, where Mule is an option. Ben
wants to make it a default. Then it is not OK. We need to think
about how to recover from failures.
I think you are right in general. However, shell-mode is
not same as editing files. You can see how it is broken in
shell-mode. I think it's purpose is to keep some output,
not saving binary data. It seems your argument is based on
your proposal about autodetection. I like your idea and
love to see that happen, but that will be XEmacs 21.4. I
want shell-mode that works in current framework until MULE
>> I think we should do something like buffer the first
>> do autodetect on it, and `C-x C-m c' should (optionally?) offer
>> a menu including coding systems and a line of sample text from
>> the buffer to show the user what they are getting.
Yoshiki> This will fail if user accidentally output some amount of
Yoshiki> binary data.
Yoshiki> And we need raw data to autodetect coding-system.
Then you are proposing that until screenfull of data
arrives, users see raw data in shell-mode? Or save output
Yoshiki> 1. Try to autodetect every input/output by resetting
How do you define "every input/output"? Suppose the user does `cat
thisfile.euc thatfile.sjis' in a shell-mode?
I meant every command.
If you are using localized OS such as Solaris 2.6, some
commands output localized text. Say you are using Japanese
version of Solaris and types w RET. Its output contains
euc-jp text. Then execute commands that outputs shift_jis.
If autodetection is not done, that will print garbage.
Yoshiki> 2. If user specify explicitly what coding-system to
Yoshiki> with C-x RET c, then use that. i.e. reset to that
Yoshiki> coding-system instead of auto-detection after every
Something more flexible is appropriate, I think. In particular, if
C-x C-m c is used to set the process coding system, then on
incompatible input (ie, with euc-jp default the process sends a
high-bit-set/high-bit-clear pair of bytes) the autodetect mechanism
should still be used, but rather than set the coding system it should
signal the user that the default is probably inappropriate (as less
does on encountering an apparently binary file).
That should be handled by lstream or some new general
commands, not shell-mode.
Yoshiki> 3. Implement a way to specify coding-system used for
Yoshiki> next command. This will be already existing command
Yoshiki> set-buffer-process-coding-system since it will be reseted
Yoshiki> after one command execution.
Be careful about backward compatibility here.
OK. I'll create a new command.
Yoshiki> 4. (Optional) Implement a way to change
I don't understand this.
You start shell mode with autodetection, but now you know
you only get iso-8859-1. Then you can set coding-system for