Re: proposed Eistring interface

Friday, 21 April 2000

"Stephen J. Turnbull" wrote:

...
 >>>>> "Ben" == Ben Wing <ben(a)666.com&gt;
writes:

     Ben> that way i'll have a reasonable chance of answering your ?'s
     Ben> before my hands fall off.

 Sorry, you've been right all along, and I'm wrong (surprise,
 surprise).  I'm still glad I asked (I wouldn't have figured it out on
 my own for a long time), although I'm sorry about wasting your time
 with my misunderstanding.

 My excuse is that I've been living in buffer.h and mule-*.c for the
 last little while, which has given me a warped view of what "working
 with internally-formatted data" is about.

 I still have some comments and questions, hopefully they're more
 sensible now.

 (0) I suggest changing the description of the interface from "for
     working with internally-formatted data" to "for working with
     internally-formatted data in external contexts" or something like
     that, and emphasizing feature (e) "it provides easy operations to
     convert to/from externally-formatted data".  This is redundant if
     you already know what the interface is for, but more specific for
     the completely-new-to-internals person.  And it probably would
     have kept me from going off half-cocked. :-) 
yes, it tends to be in ext. contexts but doesn't have to.  nothing in xemacs
currently provides a good lightweight string mutator object.  the alternatives
are using a buffer or doing lots of concat()s and Fsubstrings and crap, and even
then, many obvious string ops are missing (and should be impl. god damn it!).

...

 (1) I still think it is harmless, at least for literals, and possibly
     useful to allow arbitrary bytes in ei{cat,cpy}_c(). 
i still disagree because it can corrupt the innards unless you expect automatic
binary conversion?  Then someone who feeds in JIS-encoded data this way will not
get Japanese, but raw data, and perhaps shouldve used eicpy_ext().

...

 (2) Is the usage     (eiref (filename, 0) == '.')    from
     mswindows_get_files() really correct? 
Yes, it's a char. ref not a byte ref, so it works in UTF8 (or in EBCDIC-UTF 8!)
provided our Emchars work like ASCII/Latin1, which, bar an EBCDICkian revolt,
will always be.

...
  I've convinced myself that
     _this_ case works for all internal representations so far proposed,
     because they are all extensions of ASCII,  and the comparison works
     after the char on the rhs is promoted to Emchar.  But this really
     does need to be restricted to ASCII.  C0 controls are OK.  But
     Latin-1 would break for UTF-8 default internal, eg.

     Possibly an extension of the current APIs to include say

            eirefcmp_* (eistring, eiindex, character)

     is called for?  But I can't think of an example offhand where the
     general case would be useful.

 (3) We may want to be a little bit careful with the notion of the
     default internal representation.  I can see that a default
     internal representation of UCS-2 (UTF-16, I presume is what you
     really mean?) would be attractive.  So what happens if you have
     data that is not representable in the default internal
     representation?  Do we just tell those users to get lost?

     It would be kind of weird if the default internal representation
     that Eistrings dealt with was UCS-2 but UTF-8 representation was
     available in buffers, which you don't rule out. 
by its nature, the default int. rep. must be able to represent all chars.  that
would rule out utf16 if we have more than 1,000,000 and some chars.  but it
doesn't rule out ucs4, or some utf16 extension that could encode gigs o' chars,
etc.

...

 --
 University of Tsukuba                Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
 Institute of Policy and Planning Sciences       Tel/fax: +81 (298) 53-5091
 _________________  _________________  _________________  _________________
 What are those straight lines for?  "XEmacs rules." 
--
Ben

In order to save my hands, I am cutting back on my mail.  I also write
as succinctly as possible -- please don't be offended.  If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail.  If you need an immediate response and it is not
apparent in your message, please say so.  Thanks for your understanding.

See also http://www.666.com/ben/typing.html.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: proposed Eistring interface