Re: Don't set coding-system in shell-mode

Tuesday, 28 December 1999

        "Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp&gt; writes:

...
 First, Hrvoje's example is with respect to binary _files_.  A
paranoid
 user will have multiple backups, in principle there need not be a
 problem.  But if you trust Mule, reading a binary file can, and often
 does, result in a non-raw coding system due to autodetection.  This
 can definitely destroy data; I've seen it happen.  If it can happen to
 files, it will obviously be possible for volatile streams. 
Yes, it's likely to happen.

...
 Second, the point of having a shell-mode is that the behavior of the
 shell is volatile; you cannot count on repeating it. 
But why don't you save data to a separate file if you can't
repeat it again?  Or why not use binary coding-system?

...
 Third, given that all 8-bit ISO-2022 codes have the same space, it
is
 quite possible for an unsuspecting user to end up in a "strange coding
 system".  Happens all the time on the (Japanese) Web, because you
 never know when an EUC-JP page will link an ISO-8859-1 page.  The
 former are rarely correctly announced by the server, and the latter is
 (unfortunately) allowed not to announce because it is the default.
 (Fortunately, web browsers by their nature must do the buffering I
 suggest.) 
As long as you are doing autodetection, this case can't be
avoided.  The only way not to lose any data is using binary
coding-system.  But I don't think that is what user wants as
you have written in attached message.

...
     Yoshiki> What we need is automatic detection and explicit
     Yoshiki> specification of what coding-system to use.

 I don't understand this.  Looks like a contradiction to me, but I'm
 sure I'm just missing your point. 
I guess I saved too much words.  What I meant is, autodetect
by default.  Allow user to specify coding-system in case
autodetect fails.

...
     >> Remember, you can't do the equivalent of `C-x C-k
RET C-u C-x
     >> C-f "file" RET "the-right-encoding" RET' on a
terminal stream
     >> yet.

     Yoshiki> Now we are discussing how to do that sensibly, aren't we? 
     Yoshiki> :-)

 I thought we were discussing autodetection, not recovery from
 autodetection failures? 
I thought we were discussing what is a good default for
shell-mode.  It includes whether we should do autodetection
at all.

...
 Remember, the better the autodetection is, the more users trust it,
 the less care they take, and the more surprised they are when it does
 (inevitably) fail.

 This is OK under the current regime, where Mule is an option.  Ben
 wants to make it a default.  Then it is not OK.  We need to think
 about how to recover from failures. 
I think you are right in general.  However, shell-mode is
not same as editing files.  You can see how it is broken in
shell-mode.  I think it's purpose is to keep some output,
not saving binary data.  It seems your argument is based on
your proposal about autodetection.  I like your idea and
love to see that happen, but that will be XEmacs 21.4.  I
want shell-mode that works in current framework until MULE
rebuilding comes.

...
     >> I think we should do something like buffer the first
screenful,
     >> do autodetect on it, and `C-x C-m c' should (optionally?) offer
     >> a menu including coding systems and a line of sample text from
     >> the buffer to show the user what they are getting.

     Yoshiki> This will fail if user accidentally output some amount of
     Yoshiki> binary data.

 Of course.

     Yoshiki> And we need raw data to autodetect coding-system.

 Of course. 
Then you are proposing that until screenfull of data
arrives, users see raw data in shell-mode?  Or save output
seperately?

...
     Yoshiki> 1. Try to autodetect every input/output by resetting
     Yoshiki> coding-system.

 How do you define "every input/output"?  Suppose the user does `cat
 thisfile.euc thatfile.sjis' in a shell-mode? 
I meant every command.
If you are using localized OS such as Solaris 2.6, some
commands output localized text.  Say you are using Japanese
version of Solaris and types w RET.  Its output contains
euc-jp text.  Then execute commands that outputs shift_jis.
If autodetection is not done, that will print garbage.

...
     Yoshiki> 2. If user specify explicitly what coding-system to
use
     Yoshiki> with C-x RET c, then use that.  i.e. reset to that
     Yoshiki> coding-system instead of auto-detection after every
     Yoshiki> command.

 Something more flexible is appropriate, I think.  In particular, if
 C-x C-m c is used to set the process coding system, then on
 incompatible input (ie, with euc-jp default the process sends a
 high-bit-set/high-bit-clear pair of bytes) the autodetect mechanism
 should still be used, but rather than set the coding system it should
 signal the user that the default is probably inappropriate (as less
 does on encountering an apparently binary file). 
That should be handled by lstream or some new general
commands, not shell-mode.

...
     Yoshiki> 3. Implement a way to specify coding-system used for
only
     Yoshiki> next command.  This will be already existing command
     Yoshiki> set-buffer-process-coding-system since it will be reseted
     Yoshiki> after one command execution.

 Be careful about backward compatibility here. 
OK.  I'll create a new command.

...
     Yoshiki> 4. (Optional) Implement a way to change
coding-system
     Yoshiki> permanently.

 I don't understand this. 
You start shell mode with autodetection, but now you know
you only get iso-8859-1.  Then you can set coding-system for
input/output forever.

-- 
Yoshiki Hayashi

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Don't set coding-system in shell-mode