"Stephen J. Turnbull" wrote:
>>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> that way i'll have a reasonable chance of answering your ?'s
Ben> before my hands fall off.
Sorry, you've been right all along, and I'm wrong (surprise,
surprise). I'm still glad I asked (I wouldn't have figured it out on
my own for a long time), although I'm sorry about wasting your time
with my misunderstanding.
My excuse is that I've been living in buffer.h and mule-*.c for the
last little while, which has given me a warped view of what "working
with internally-formatted data" is about.
I still have some comments and questions, hopefully they're more
sensible now.
(0) I suggest changing the description of the interface from "for
working with internally-formatted data" to "for working with
internally-formatted data in external contexts" or something like
that, and emphasizing feature (e) "it provides easy operations to
convert to/from externally-formatted data". This is redundant if
you already know what the interface is for, but more specific for
the completely-new-to-internals person. And it probably would
have kept me from going off half-cocked. :-)
yes, it tends to be in ext. contexts but doesn't have to. nothing in xemacs
currently provides a good lightweight string mutator object. the alternatives
are using a buffer or doing lots of concat()s and Fsubstrings and crap, and even
then, many obvious string ops are missing (and should be impl. god damn it!).
(1) I still think it is harmless, at least for literals, and possibly
useful to allow arbitrary bytes in ei{cat,cpy}_c().
i still disagree because it can corrupt the innards unless you expect automatic
binary conversion? Then someone who feeds in JIS-encoded data this way will not
get Japanese, but raw data, and perhaps shouldve used eicpy_ext().
(2) Is the usage (eiref (filename, 0) == '.') from
mswindows_get_files() really correct?
Yes, it's a char. ref not a byte ref, so it works in UTF8 (or in EBCDIC-UTF 8!)
provided our Emchars work like ASCII/Latin1, which, bar an EBCDICkian revolt,
will always be.
I've convinced myself that
_this_ case works for all internal representations so far proposed,
because they are all extensions of ASCII, and the comparison works
after the char on the rhs is promoted to Emchar. But this really
does need to be restricted to ASCII. C0 controls are OK. But
Latin-1 would break for UTF-8 default internal, eg.
Possibly an extension of the current APIs to include say
eirefcmp_* (eistring, eiindex, character)
is called for? But I can't think of an example offhand where the
general case would be useful.
(3) We may want to be a little bit careful with the notion of the
default internal representation. I can see that a default
internal representation of UCS-2 (UTF-16, I presume is what you
really mean?) would be attractive. So what happens if you have
data that is not representable in the default internal
representation? Do we just tell those users to get lost?
It would be kind of weird if the default internal representation
that Eistrings dealt with was UCS-2 but UTF-8 representation was
available in buffers, which you don't rule out.
by its nature, the default int. rep. must be able to represent all chars. that
would rule out utf16 if we have more than 1,000,000 and some chars. but it
doesn't rule out ucs4, or some utf16 extension that could encode gigs o' chars,
etc.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."
--
Ben
In order to save my hands, I am cutting back on my mail. I also write
as succinctly as possible -- please don't be offended. If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail. If you need an immediate response and it is not
apparent in your message, please say so. Thanks for your understanding.
See also
http://www.666.com/ben/typing.html.