i've created a new api for simplifying mule-correct string operations.
it'll take awhile to fully implement; perhaps i could get some help,
e.g. hrvoje?
i will be implementing piece by piece, as necessary.
here's what i have so far, and i'd like comments:
(E) For working with Eistrings:
-------------------------------
NOTE: An Eistring is a structure that makes it easy to work with
internally-formatted strings of data. It provides operations similar
in feel to the standard strcpy(), strcat(), strlen(), etc., but
(a) it is Mule-correct
(b) it does dynamic allocation so you never have to worry about size
restrictions (and all allocation is stack-local using alloca(),
so
there is no need to explicitly clean up)
(c) it knows its own length, so it does not suffer from standard null
byte brain-damage
(d) it provides a much more powerful set of operations and knows
about
all the standard places where string data might reside:
Lisp_Objects,
other Eistrings, char * data with or without an explicit length,
etc.
(e) it provides easy operations to convert to/from
externally-formatted
data, and is much easier to use than the standard
TO_INTERNAL_FORMAT
and TO_EXTERNAL_FORMAT macros.
The idea is to make it as easy to write Mule-correct string
manipulation
code as it is to write normal string manipulation code. We also make
the API sufficiently general that it can handle multiple internal
data
formats (e.g. some fixed-width optimizing formats and a default
variable
width format) and allows for *ANY* data format we might choose in the
future for the default format, including UCS2. (In other words, we
can't
assume that the internal format is ASCII-compatible and we can't
assume
it doesn't have embedded null bytes.) All of this is hidden from the
user.
#### It is really too bad that we don't have a real object-oriented
language, or at least a language with polymorphism!
Eistring (name):
Declare a new Eistring. This is a standard local variable
declaration
and can go anywhere in the variable declaration section, but
note that
you *MUST* supply the parens.
----- Initialization -----
eicpy_* (eistr, ...):
Initialize the Eistring from somewhere:
eicpy_ei (eistr, eistr2):
... from another Eistring
eicpy_str (eistr, lisp_string):
... from a Lisp_Object string
eicpy_str_off (eistr, lisp_string, charpos, charlen):
... from a section of a Lisp_Object string
eicpy_str_off_byte (eistr, lisp_string, bytepos, bytelen):
... from a section of a Lisp_Object string, with offset and
length
specified in bytes rather than chars
eicpy_buf (eistr, lisp_buf, charpos, charlen):
... from a Lisp_Object buffer
eicpy_buf_byte (eistr, lisp_buf, bytepos, bytelen):
... from a Lisp_Object buffer, with offset and length specified
in
bytes rather than chars
eicpy_raw (eistr, intdata, intlen, intfmt):
... from raw internal-format data in the specified format
eicpy_c (eistr, c_string):
... from an ASCII null-terminated string. Non-ASCII characters
in
the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_c_len (eistr, c_string, len):
... from an ASCII string, with length specified. Non-ASCII
characters
in the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_ext (eistr, extdata, coding_system):
... from external null-terminated data, with coding system
specified.
eicpy_ext_len (eistr, extdata, extlen, coding_system):
... from external data, with length and coding system specified.
eicpy_lstream (eistr, lstream):
... from an lstream; reads data till eof. Data must be in
default
internal format; otherwise, interpose a decoding lstream.
----- Getting the data out of the Eistring -----
eirawdata (eistr):
eimake_string (eistr):
eimake_string_sect (eistr, charpos, charlen):
eimake_string_sect_byte (eistr, bytepos, bytelen):
eicpyout_raw_alloca (eistr, intfmt, intlen_out):
eicpyout_raw_malloc (eistr, intfmt, intlen_out):
eicpyout_c_alloca (eistr):
eicpyout_c_malloc (eistr):
eicpyout_c_len_alloca (eistr, len_out):
eicpyout_c_len_malloc (eistr, len_out):
----- Moving to the heap -----
eito_malloc (eistr):
eifree (eistr):
eito_alloca (eistr):
----- Retrieving the length -----
eilen (eistr):
eilen_byte (eistr):
----- Working with positions -----
eicharpos_to_bytepos (eistr, charpos):
eibytepos_to_charpos (eistr, bytepos):
----- Getting the character at a position -----
eiref (eistr, charpos):
eiref_byte (eistr, bytepos):
----- Concatenation -----
eicat_* (eistr, ...):
Concatenate onto the end of the Eistring, with data coming from
the
same places as above. (All functions that take string sources allow
only two possibilities: Another Eistring and a simple C string.
In the general case, create another Eistring from the source.)
eicat_ei (eistr, eistr2):
eicat_c (eistr, c_string):
----- Replacement -----
eisub_* (eistr, charoff, charlen, ...):
eisub_*_byte (eistr, byteoff, bytelen, ...):
Replace a section of the Eistring.
eisub_ei (eistr, charoff, charlen, eistr2):
eisub_ei_byte (eistr, byteoff, bytelen, eistr2):
eisub_c (eistr, charoff, charlen, c_string):
eisub_c_byte (eistr, byteoff, bytelen, c_string):
----- Converting to an external format -----
eito_external (eistr, coding_system):
eiextdata (eistr):
eiextlen (eistr):
----- Searching in the Eistring for a character -----
eichr (eistr, chr):
eichr_byte (eistr, chr):
eichr_off (eistr, chr, charpos):
eichr_off_byte (eistr, chr, bytepos):
eirchr (eistr, chr):
eirchr_byte (eistr, chr):
eirchr_off (eistr, chr, charpos):
eirchr_off_byte (eistr, chr, bytepos):
----- Searching in the Eistring for a string -----
eistr_ei (eistr, eistr2):
eistr_ei_byte (eistr, eistr2):
eistr_ei_off (eistr, eistr2, charpos):
eistr_ei_off_byte (eistr, eistr2, bytepos):
eirstr_ei (eistr, eistr2):
eirstr_ei_byte (eistr, eistr2):
eirstr_ei_off (eistr, eistr2, charpos):
eirstr_ei_off_byte (eistr, eistr2, bytepos):
eistr_c (eistr, c_string):
eistr_c_byte (eistr, c_string):
eistr_c_off (eistr, c_string, charpos):
eistr_c_off_byte (eistr, c_string, bytepos):
eirstr_c (eistr, c_string):
eirstr_c_byte (eistr, c_string):
eirstr_c_off (eistr, c_string, charpos):
eirstr_c_off_byte (eistr, c_string, bytepos):
----- Comparison -----
eicmp_* (eistr, ...):
eicmp_off_* (eistr, charoff, charlen, ...):
eicmp_off_*_byte (eistr, byteoff, bytelen, ...):
eicasecmp_* (eistr, ...):
eicasecmp_off_* (eistr, charoff, charlen, ...):
eicasecmp_off_*_byte (eistr, byteoff, bytelen, ...):
Compare the Eistring with the other data. Return value same as
from strcmp.
eicmp_ei (eistr, eistr2):
eicmp_off_ei (eistr, charoff, charlen, eistr2):
eicmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicasecmp_ei (eistr, eistr2):
eicasecmp_off_ei (eistr, charoff, charlen, eistr2):
eicasecmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicmp_c (eistr, c_string):
eicmp_off_c (eistr, charoff, charlen, c_string):
eicmp_off_c_byte (eistr, byteoff, bytelen, c_string):
eicasecmp_c (eistr, c_string):
eicasecmp_off_c (eistr, charoff, charlen, c_string):
eicasecmp_off_c_byte (eistr, byteoff, bytelen, c_string):
----- Case-changing the Eistring -----
eilwr (eistr):
eiupr (eistr):
And the implementation:
/* ------------------------------ */
/* (E) For working with Eistrings */
/* ------------------------------ */
typedef struct
{
void *data;
Bytecount bytelen;
Charcount charlen;
int mallocp;
void *extdata;
Extcount extlen;
} Eistring_;
Eistring_ the_eistring_zero_init;
#define Eistring(name) Eistring_ name = the_eistring_zero_init
#define EI_ALLOC_(ei, charlen_, bytelen_)
do {
ei.charlen = charlen_;
ei.bytelen = bytelen_;
if (ei.mallocp)
ei.data = xmalloc (ei.bytelen + 1);
else
ei.data = alloca (ei.bytelen + 1);
} while (0)
#define EI_ALLOC_AND_COPY_(ei, data_, charlen_, bytelen_)
do {
EI_ALLOC_ (ei, charlen_, bytelen_);
memcpy (ei.data, data_, ei.bytelen + 1);
} while (0)
#define eicpy_ei(ei, ei2)
do {
Eistring__ *ei__ = &ei2;
EI_ALLOC_AND_COPY_ (ei, ei__->data, ei__->charlen, ei__->bytelen);
} while (0)
#define eicpy_str(ei, lisp_string)
do {
Lisp_Object ei__ = lisp_string;
EI_ALLOC_AND_COPY_ (ei, XSTRING_DATA (ei__), XSTRING_CHAR_LENGTH
(ei__),
XSTRING_LENGTH (ei__));
} while (0)
--
Ben
In order to save my hands, I am cutting back on my mail. I also write
as succinctly as possible -- please don't be offended. If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail. If you need an immediate response and it is not
apparent in your message, please say so. Thanks for your understanding.
See also
http://www.666.com/ben/typing.html.