Re: proposed Eistring interface

Wednesday, 19 April 2000

        i absolutely don't understand what you're complaining about.  the point is to
allow things like

(Lisp Object filename from somewhere;)

Eistring (wild);
eicpy_str (wild, filename);
eicat_c (wild, "\\*");

without making people create another Eistring just for this literal.  Obviously
it shouldn't be used for data that isn't 100% known to be ASCII.  This will be
well-documented in the API.  But completely stupid people who can't read
shouldn't be working on the Mule code, anyway.

I'm surprised you're so worked up over a silly convenience function!

"Stephen J. Turnbull" wrote:

...
 >>>>> "Ben" == Ben Wing <ben(a)666.com&gt;
writes:

     Ben> "Stephen J. Turnbull" wrote:

     >> What are these "C strings where non-ASCII characters are
     >> illegal"?  I thought a C string (as defined by <string.h>) was
     >> a char[], with the length determined by the position of the
     >> first ASCII NULL.  Do you mean that, eg, trying to feed it an
     >> ISO-8859-1 string will cause an abort with error-checking?  How
     >> about control characters (C1, obviously, but C0, too)?  These
     >> are all legal characters in both the current leading-byte
     >> representation and any Unicode-based representation.

     Ben> you are confused.

 Oh.

 <+J,$G$7$g$&!#

 Now please explain to me what good your check is going to do with
 the absolutely no ASCII nowhere 100% Japanese string (which means,
 appropriately, "on the contrary, aren't _you_ confused?") on the line
 above.

 The point is that there are _no_ characters representable in one byte
 that can harm _Mule_ if for practical purposes

 #define eicpy_c(ei, c_string) eicpy_ext(ei, c_string, Qbinary)

 They can "only" cause mojibake and even external data corruption if
 written back out under an inappropriate coding system.  OTOH, if you
 want to prevent mojibake and external data corruption, you need to
 have a full (and 100% accurate) character set autodetector, or at
 least a 100% reliable undo on all such operations.

 There may be a way to make handling external-format text Mule-safe,
 but this isn't it.  External-format text is by definition just
 infested with various ISO-sponsored diseases (and Microsoft toxins)
 and needs to be hedged round with real error-checking, with sharp
 teeth and thick hair on its chest.  Not half-ass filters that won't
 detect people doing many common kinds of Stupid Things[tm], while
 preventing arguably legitimate and convenient usages like ISO-8859-1
 literals.  (Remember the actual implementation of Qbinary!)

 If I had a veto, I'd use it on anything that allowed implicit typing
 of any external format data.

     Ben> This is defined only to make it easier to use literal ASCII
     Ben> strings, which is going to be extremely common.  If the
     Ben> string has any non-ASCII characters in it, they need to
     Ben> explicitly specify the encoding -- hence the restriction.

 This is just plain evil.  People are going to write code that is
 allegedly Mule-safe because it used your API, and have it randomly
 abort in non-ASCII locales when somebody feeds it non-ASCII data.

 And need I mention that such code will be 100% un-gettext-izable?
 Apparently.  That issue needs to be carefully thought through before
 you allow literals to be handled by these APIs.

 A better way of dealing with the issues with literals would simply be
 to specify that all XEmacs source files must be encoded in UCS-4 with
 the UTF-8 representation.  (That means that they must respect the
 reserved areas in UCS-4 if they want to go beyond the UTF-32 space.)
 Then string literals in the source code can be treated specially (but
 generic C strings[1] should not be, since you don't know where they
 come from), and will not need a coding system associated with them.
 (Obviously, we have to get a usable Unicode coding-system implemented
 first so that this is transparent to developers and users.)  For the
 moment, we can just make that a restriction to ASCII, except in
 mule-packages.  This will be easily extensible to UTF-8 with zero fuss
 as soon as we have the UTF-8 coding system.  And this would be
 transparent to gettext, should we decide that is the way to go.

 Anyway the pro-ASCII bias is just not PC.

 Footnotes:
 [1]  Hm.  What about .so modules?

 --
 University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
 _________________  _________________  _________________  _________________
 What are those straight lines for?  "XEmacs rules." 
--
Ben

In order to save my hands, I am cutting back on my mail.  I also write
as succinctly as possible -- please don't be offended.  If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail.  If you need an immediate response and it is not
apparent in your message, please say so.  Thanks for your understanding.

See also http://www.666.com/ben/typing.html.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: proposed Eistring interface