I couldn't read it easily, so I reformatted it. I'll comment in a
separate mail, as soon as I find the time.
OG.
On Mon, Apr 17, 2000 at 11:23:02PM -0700, Ben Wing wrote:
i've created a new api for simplifying mule-correct string operations.
it'll take awhile to fully implement; perhaps i could get some help,
e.g. hrvoje?
i will be implementing piece by piece, as necessary.
here's what i have so far, and i'd like comments:
(E) For working with Eistrings:
-------------------------------
NOTE: An Eistring is a structure that makes it easy to work with
internally-formatted strings of data. It provides operations similar
in feel to the standard strcpy(), strcat(), strlen(), etc., but
(a) it is Mule-correct
(b) it does dynamic allocation so you never have to worry about size
restrictions (and all allocation is stack-local using alloca(), so
there is no need to explicitly clean up)
(c) it knows its own length, so it does not suffer from standard null
byte brain-damage
(d) it provides a much more powerful set of operations and knows about
all the standard places where string data might reside: Lisp_Objects,
other Eistrings, char * data with or without an explicit length, etc.
(e) it provides easy operations to convert to/from externally-formatted
data, and is much easier to use than the standard TO_INTERNAL_FORMAT
and TO_EXTERNAL_FORMAT macros.
The idea is to make it as easy to write Mule-correct string manipulation
code as it is to write normal string manipulation code. We also make
the API sufficiently general that it can handle multiple internal data
formats (e.g. some fixed-width optimizing formats and a default variable
width format) and allows for *ANY* data format we might choose in the
future for the default format, including UCS2. (In other words, we can't
assume that the internal format is ASCII-compatible and we can't assume
it doesn't have embedded null bytes.) All of this is hidden from the
user.
#### It is really too bad that we don't have a real object-oriented
language, or at least a language with polymorphism!
Eistring (name):
Declare a new Eistring. This is a standard local variable declaration
and can go anywhere in the variable declaration section, but note that
you *MUST* supply the parens.
----- Initialization -----
eicpy_* (eistr, ...):
Initialize the Eistring from somewhere:
eicpy_ei (eistr, eistr2):
... from another Eistring
eicpy_str (eistr, lisp_string):
... from a Lisp_Object string
eicpy_str_off (eistr, lisp_string, charpos, charlen):
... from a section of a Lisp_Object string
eicpy_str_off_byte (eistr, lisp_string, bytepos, bytelen):
... from a section of a Lisp_Object string, with offset and length
specified in bytes rather than chars
eicpy_buf (eistr, lisp_buf, charpos, charlen):
... from a Lisp_Object buffer
eicpy_buf_byte (eistr, lisp_buf, bytepos, bytelen):
... from a Lisp_Object buffer, with offset and length specified in
bytes rather than chars
eicpy_raw (eistr, intdata, intlen, intfmt):
... from raw internal-format data in the specified format
eicpy_c (eistr, c_string):
... from an ASCII null-terminated string. Non-ASCII characters in
the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_c_len (eistr, c_string, len):
... from an ASCII string, with length specified. Non-ASCII characters
in the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_ext (eistr, extdata, coding_system):
... from external null-terminated data, with coding system specified.
eicpy_ext_len (eistr, extdata, extlen, coding_system):
... from external data, with length and coding system specified.
eicpy_lstream (eistr, lstream):
... from an lstream; reads data till eof. Data must be in default
internal format; otherwise, interpose a decoding lstream.
----- Getting the data out of the Eistring -----
eirawdata (eistr):
eimake_string (eistr):
eimake_string_sect (eistr, charpos, charlen):
eimake_string_sect_byte (eistr, bytepos, bytelen):
eicpyout_raw_alloca (eistr, intfmt, intlen_out):
eicpyout_raw_malloc (eistr, intfmt, intlen_out):
eicpyout_c_alloca (eistr):
eicpyout_c_malloc (eistr):
eicpyout_c_len_alloca (eistr, len_out):
eicpyout_c_len_malloc (eistr, len_out):
----- Moving to the heap -----
eito_malloc (eistr):
eifree (eistr):
eito_alloca (eistr):
----- Retrieving the length -----
eilen (eistr):
eilen_byte (eistr):
----- Working with positions -----
eicharpos_to_bytepos (eistr, charpos):
eibytepos_to_charpos (eistr, bytepos):
----- Getting the character at a position -----
eiref (eistr, charpos):
eiref_byte (eistr, bytepos):
----- Concatenation -----
eicat_* (eistr, ...):
Concatenate onto the end of the Eistring, with data coming from the
same places as above. (All functions that take string sources allow
only two possibilities: Another Eistring and a simple C string.
In the general case, create another Eistring from the source.)
eicat_ei (eistr, eistr2):
eicat_c (eistr, c_string):
----- Replacement -----
eisub_* (eistr, charoff, charlen, ...):
eisub_*_byte (eistr, byteoff, bytelen, ...):
Replace a section of the Eistring.
eisub_ei (eistr, charoff, charlen, eistr2):
eisub_ei_byte (eistr, byteoff, bytelen, eistr2):
eisub_c (eistr, charoff, charlen, c_string):
eisub_c_byte (eistr, byteoff, bytelen, c_string):
----- Converting to an external format -----
eito_external (eistr, coding_system):
eiextdata (eistr):
eiextlen (eistr):
----- Searching in the Eistring for a character -----
eichr (eistr, chr):
eichr_byte (eistr, chr):
eichr_off (eistr, chr, charpos):
eichr_off_byte (eistr, chr, bytepos):
eirchr (eistr, chr):
eirchr_byte (eistr, chr):
eirchr_off (eistr, chr, charpos):
eirchr_off_byte (eistr, chr, bytepos):
----- Searching in the Eistring for a string -----
eistr_ei (eistr, eistr2):
eistr_ei_byte (eistr, eistr2):
eistr_ei_off (eistr, eistr2, charpos):
eistr_ei_off_byte (eistr, eistr2, bytepos):
eirstr_ei (eistr, eistr2):
eirstr_ei_byte (eistr, eistr2):
eirstr_ei_off (eistr, eistr2, charpos):
eirstr_ei_off_byte (eistr, eistr2, bytepos):
eistr_c (eistr, c_string):
eistr_c_byte (eistr, c_string):
eistr_c_off (eistr, c_string, charpos):
eistr_c_off_byte (eistr, c_string, bytepos):
eirstr_c (eistr, c_string):
eirstr_c_byte (eistr, c_string):
eirstr_c_off (eistr, c_string, charpos):
eirstr_c_off_byte (eistr, c_string, bytepos):
----- Comparison -----
eicmp_* (eistr, ...):
eicmp_off_* (eistr, charoff, charlen, ...):
eicmp_off_*_byte (eistr, byteoff, bytelen, ...):
eicasecmp_* (eistr, ...):
eicasecmp_off_* (eistr, charoff, charlen, ...):
eicasecmp_off_*_byte (eistr, byteoff, bytelen, ...):
Compare the Eistring with the other data. Return value same as
from strcmp.
eicmp_ei (eistr, eistr2):
eicmp_off_ei (eistr, charoff, charlen, eistr2):
eicmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicasecmp_ei (eistr, eistr2):
eicasecmp_off_ei (eistr, charoff, charlen, eistr2):
eicasecmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicmp_c (eistr, c_string):
eicmp_off_c (eistr, charoff, charlen, c_string):
eicmp_off_c_byte (eistr, byteoff, bytelen, c_string):
eicasecmp_c (eistr, c_string):
eicasecmp_off_c (eistr, charoff, charlen, c_string):
eicasecmp_off_c_byte (eistr, byteoff, bytelen, c_string):
----- Case-changing the Eistring -----
eilwr (eistr):
eiupr (eistr):
And the implementation:
/* ------------------------------ */
/* (E) For working with Eistrings */
/* ------------------------------ */
typedef struct
{
void *data;
Bytecount bytelen;
Charcount charlen;
int mallocp;
void *extdata;
Extcount extlen;
} Eistring_;
Eistring_ the_eistring_zero_init;
#define Eistring(name) Eistring_ name = the_eistring_zero_init
#define EI_ALLOC_(ei, charlen_, bytelen_)
do {
ei.charlen = charlen_;
ei.bytelen = bytelen_;
if (ei.mallocp)
ei.data = xmalloc (ei.bytelen + 1);
else
ei.data = alloca (ei.bytelen + 1);
} while (0)
#define EI_ALLOC_AND_COPY_(ei, data_, charlen_, bytelen_)
do {
EI_ALLOC_ (ei, charlen_, bytelen_);
memcpy (ei.data, data_, ei.bytelen + 1);
} while (0)
#define eicpy_ei(ei, ei2)
do {
Eistring__ *ei__ = &ei2;
EI_ALLOC_AND_COPY_ (ei, ei__->data, ei__->charlen, ei__->bytelen);
} while (0)
#define eicpy_str(ei, lisp_string)
do {
Lisp_Object ei__ = lisp_string;
EI_ALLOC_AND_COPY_ (ei, XSTRING_DATA (ei__), XSTRING_CHAR_LENGTH
(ei__),
XSTRING_LENGTH (ei__));
} while (0)