Re: Char-related crashes (hopefully) fixed

Friday, 19 November 1999

        OK, in summation:

1. C-q is a user-level function and should do whatever makes the most sense.
2. int-char is a low-level primitive and should never depend on high-level
settings like language environment.
3. Everything you can do with int-char can and should be done with make-char
-- representation-independent, much less likelihood of bugs, etc.  Therefore
int-char should be removed.
4. Note that CLTL2 also removes int-char.
5. Your statement

...
 In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs
-font
 *-iso8859-2') it implicitly will have dependence whatever you say. 
is confusing internal and external representations.

ben

"Stephen J. Turnbull" wrote:

> Can somebody give a bunch of examples where using integers as
> characters is useful?  For that matter, where they are actually used?
> Ben said "backward compatibility," but I haven't seen this used, and I
> don't really know how to grep for it.  I have grepped for int-char,
> int-to-char, char-int, and char-to-int and they're pretty rare in the
> core and package code (2/3 of it) that I have.
>
> The only one that I ever use is the C-q hack for inserting characters
> by code value at the keyboard, and that could arguably (and in
> Japanese invariably is) delegated to an input method which would know
> about language environment (and return a true character).
>
> For iterating over a character set in "natural" order, only ASCII
> satisfies the requirement of having one, and even that's shaky.  AFAIK
> the Swedes and the Norwegians, or is it the Danes, disagree on
> ordering the _letters_ in ISO-8859-1 character set.  This really
> should be table-driven, and will have to be for everything except
> ASCII and ISO-8859-1 if we go to a Unicode internal representation.
>
> We already have primitives for efficient case conversion and the like.
>
> The only example I can think of offhand where you would really really
> want the facility is to iterate over a code space where you don't know
> which points are legal characters.  Eg, to print out tables of fonts.
> Pretty specialized.  And this can be done through make-char, anyway.
>
> According to CLtL1, the main portable use for char-int is for hashing.
> But that doesn't square with the kind of usage we've been talking
> about (in loops and the like).
>
> What else am I missing?
>
> Ben's desiderata have some problems.
>
> >>>>> "Ben" == Ben Wing <ben(a)666.com&gt; writes:
>
>     Ben> Either int-char should be the mirror opposite of char-int
>     Ben> (i.e. accept all legal char integers), or it should be
>     Ben> removed entirely.
>
> OK.  I agree with this.
>
>     Ben> int-char should *never* have any dependence on the language
>     Ben> environment.
>
...
 In one-byte buffers (either Olivier's 1/2/4 extension or `xemacs
-font
 *-iso8859-2') it implicitly will have dependence whatever you say. > Even
without Mule, people can always use external encoders to change
> raw ISO-8859-2 to ISO-2022 (not that anybody sane ever would, OK,
> Hrvoje?).  Then the two files will be interpreted differently in a
> Latin-1 locale Mule; the ISO-8859-2 file will be recognized as
> ISO-8859-1, and the ISO-2022 file will be internally interpreted as
> ISO-8859-2.
>
> The point is that people normally assume that int-char should accept
> their "natural" integer to character map.  For Americans, that's
> ASCII, for Germans, that's ISO-8859-1, for Croatians, that's
> ISO-8859-2.  And it works "correctly" in a no-mule XEmacs with `-font
> *-iso8859-2'!  Japanese usually use ku-ten or JIS, and there's a
> "natural" map from byte-sized integer pairs to shorts, but it's full
> of holes.  So language environments don't agree on what a legal char
> integer is, and where they do (eg, ISO-8859-1 and ISO-8859-2), they
> don't agree on the map.  To satisfy your dictum (with which I agree,
> but I take to mean we should get rid of these functions) we can take
> the intersection where they agree
>
> ==> legal char integers == ASCII
>
> which is what I prefer, or pick something arbitrary and efficient
>
> ==> char-int returns the internal representation
>
> which I really hate, or something else.  Suggestions?
>
>     Ben> I don't think C-q should either.  If Hrvoje wants to insert
>     Ben> Latin-2 characters by number, then make C-u C-q work so that
>     Ben> it also prompts for a character set, with a default chosen
>     Ben> from the language environment.
>
> And restrict this to ASCII?  Or assume Latin-1 in GR if there is no
> prefix argument?
>
> This is a useful feature.  C-q currently inserts Latin-2 characters
> for Hrvoje in no-mule XEmacs (stretching the point only a little); I
> think it should continue to do so in Mule.  This really is an input
> method issue, not a keyboard issue.  In XEmacs, inserting an integer
> into a buffer has no meaning.  Users insert characters.  So this is a
> completely different issue from the programming API, and should not be
> considered analogous.
>
> Maybe we could have C-q insert according to the Unicode standard, and
> treat C-u C-q as part of the input method.  But I think most users
> would prefer to have C-q insert according to their locale-standard
> tables, and select Unicode explicitly using the C-u C-q idiom.  In
> fact (again this points to the input method idea), Japanese users
> would probably like to have the alternatives of using kuten (pairs
> from 1--94 x 1--94) or JIS (pairs from 0x21--0x7E x 0x21--0x7E) as
> options since both indexing systems are common in tables.
>
> --
> University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
> Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
> __________________________________________________________________________
> __________________________________________________________________________
> What are those two straight lines for?  "Free software rules."

--
ben

--
In order to save my hands, I am cutting back on my responses, especially to
XEmacs-related mail.  You
_will_ get a response, but please be patient.  If you need an immediate
response and its not apparent in
your message, please say so.  Thanks for your understanding.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Char-related crashes (hopefully) fixed