"Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp> writes:
First, Hrvoje's example is with respect to binary _files_. A
paranoid
user will have multiple backups, in principle there need not be a
problem. But if you trust Mule, reading a binary file can, and often
does, result in a non-raw coding system due to autodetection. This
can definitely destroy data; I've seen it happen. If it can happen to
files, it will obviously be possible for volatile streams.
Yes, it's likely to happen.
Second, the point of having a shell-mode is that the behavior of the
shell is volatile; you cannot count on repeating it.
But why don't you save data to a separate file if you can't
repeat it again? Or why not use binary coding-system?
Third, given that all 8-bit ISO-2022 codes have the same space, it
is
quite possible for an unsuspecting user to end up in a "strange coding
system". Happens all the time on the (Japanese) Web, because you
never know when an EUC-JP page will link an ISO-8859-1 page. The
former are rarely correctly announced by the server, and the latter is
(unfortunately) allowed not to announce because it is the default.
(Fortunately, web browsers by their nature must do the buffering I
suggest.)
As long as you are doing autodetection, this case can't be
avoided. The only way not to lose any data is using binary
coding-system. But I don't think that is what user wants as
you have written in attached message.
Yoshiki> What we need is automatic detection and explicit
Yoshiki> specification of what coding-system to use.
I don't understand this. Looks like a contradiction to me, but I'm
sure I'm just missing your point.
I guess I saved too much words. What I meant is, autodetect
by default. Allow user to specify coding-system in case
autodetect fails.
>> Remember, you can't do the equivalent of `C-x C-k
RET C-u C-x
>> C-f "file" RET "the-right-encoding" RET' on a
terminal stream
>> yet.
Yoshiki> Now we are discussing how to do that sensibly, aren't we?
Yoshiki> :-)
I thought we were discussing autodetection, not recovery from
autodetection failures?
I thought we were discussing what is a good default for
shell-mode. It includes whether we should do autodetection
at all.
Remember, the better the autodetection is, the more users trust it,
the less care they take, and the more surprised they are when it does
(inevitably) fail.
This is OK under the current regime, where Mule is an option. Ben
wants to make it a default. Then it is not OK. We need to think
about how to recover from failures.
I think you are right in general. However, shell-mode is
not same as editing files. You can see how it is broken in
shell-mode. I think it's purpose is to keep some output,
not saving binary data. It seems your argument is based on
your proposal about autodetection. I like your idea and
love to see that happen, but that will be XEmacs 21.4. I
want shell-mode that works in current framework until MULE
rebuilding comes.
>> I think we should do something like buffer the first
screenful,
>> do autodetect on it, and `C-x C-m c' should (optionally?) offer
>> a menu including coding systems and a line of sample text from
>> the buffer to show the user what they are getting.
Yoshiki> This will fail if user accidentally output some amount of
Yoshiki> binary data.
Of course.
Yoshiki> And we need raw data to autodetect coding-system.
Of course.
Then you are proposing that until screenfull of data
arrives, users see raw data in shell-mode? Or save output
seperately?
Yoshiki> 1. Try to autodetect every input/output by resetting
Yoshiki> coding-system.
How do you define "every input/output"? Suppose the user does `cat
thisfile.euc thatfile.sjis' in a shell-mode?
I meant every command.
If you are using localized OS such as Solaris 2.6, some
commands output localized text. Say you are using Japanese
version of Solaris and types w RET. Its output contains
euc-jp text. Then execute commands that outputs shift_jis.
If autodetection is not done, that will print garbage.
Yoshiki> 2. If user specify explicitly what coding-system to
use
Yoshiki> with C-x RET c, then use that. i.e. reset to that
Yoshiki> coding-system instead of auto-detection after every
Yoshiki> command.
Something more flexible is appropriate, I think. In particular, if
C-x C-m c is used to set the process coding system, then on
incompatible input (ie, with euc-jp default the process sends a
high-bit-set/high-bit-clear pair of bytes) the autodetect mechanism
should still be used, but rather than set the coding system it should
signal the user that the default is probably inappropriate (as less
does on encountering an apparently binary file).
That should be handled by lstream or some new general
commands, not shell-mode.
Yoshiki> 3. Implement a way to specify coding-system used for
only
Yoshiki> next command. This will be already existing command
Yoshiki> set-buffer-process-coding-system since it will be reseted
Yoshiki> after one command execution.
Be careful about backward compatibility here.
OK. I'll create a new command.
Yoshiki> 4. (Optional) Implement a way to change
coding-system
Yoshiki> permanently.
I don't understand this.
You start shell mode with autodetection, but now you know
you only get iso-8859-1. Then you can set coding-system for
input/output forever.
--
Yoshiki Hayashi