Re: Char-related crashes (hopefully) fixed

Monday, 15 November 1999

        ...
>>>> "Yoshiki" == Yoshiki Hayashi
<t90553(a)m.ecc.u-tokyo.ac.jp&gt; writes: 
    Yoshiki> This way, most of the code which does (insert (1+ ?a)) or
    Yoshiki> something continues working.

But only in the range of Latin-1.  It _currently_ works in Mule
because of the dependence of representation on character sets.  But it
will stop working, in particular for all other Latin-* character sets
and kanji, if we go to a Unicode-based internal encoding.

I guess most scripts will be OK, since they get their own ranges, and
those in general are contiguous to allow this kind of optimization.
But any scripts using lots of unified characters (accented characters
in Latin-* and kanji) are hosed.

    Yoshiki> Now internal representation is changed a little bit, so
    Yoshiki> disabling > 256 characters will warn those who are
    Yoshiki> dealing with internal representation directly, which is
    Yoshiki> bad.

Dealing with the internal representation directly _is_ bad, period.
People who do that deserve what they get.  Unfortunately, innocent
users sometimes get handed such code, and they should be protected.

If we are going to continue to allow this, we should figure out what
the API is supposed to be.

    Yoshiki> Still, you can do

    Yoshiki> (let ((i 1442))
    Yoshiki>   (while (i < 2000)
    Yoshiki>     (insert (int-to-char i))
    Yoshiki>     (setq i (+1 i))))

This is exactly the kind of code that we should disallow.  What does
each of those 558 characters mean when inserted in the buffer?  Surely
most of them are not characters in any known character set.  Should
they be inserted?  Ignored?  Do you tell the user what you've done?
In which functions do you want to put the checking code?  What are you
supposed to do with the integers that are not characters: signal an
error, display a warning, set a flag for user code to check?

One thing to do with this kind of code is to fix the arithmetic
functions to check the type of the object, and have them (eg) return
an object cast to the type of the first operand.  Maybe they should
make the (probably slow) check that the result is really a valid
object of that type.  And then they should do something sensible if
it's invalid.

The problem is that none of the questions above have been answered,
including, "what's `sensible' behavior for undefined characters?"  If
you want to maintain backwards compatibility so that broken code will
not signal errors on ambiguous operations, OK, I can understand that
doing it right is hard and "mendokusai."  But let's not fix things so
that broken code accidentally does the right thing more often, and
bind ourselves to more backwards-compatibility kludges in the future.

    Yoshiki> It's cleaner to make new function, which does make-char
    Yoshiki> according to the charset of language-info-alist so that
    Yoshiki> people who use that often can bind it to C-q or some
    Yoshiki> other keys.

Possibly you should make a new function for C-q; it should have an
optional argument which can either be a language environment or a
charset.  You could redefine make-char that way, although I don't like
that much.  Probably charsets should be preferred in case there's a
name conflict.

-- 
University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
__________________________________________________________________________
__________________________________________________________________________
What are those two straight lines for?  "Free software rules."

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Char-related crashes (hopefully) fixed