[ added Cc to xemacs-beta ]
"Kirill M. Katsnelson" <kkm(a)kis.ru> writes:
Some time ago, Hrvoje Niksic wrote...
|+
| Such changes are easy to do while the code is being written, but is
| hard to get right *after* the fact. For this reason, I think keeping
| the Muleness is better than massive attacks.
|-
I hearthty agree, except for that a massive attack is already
unavoidable...
Well, only for the parts that are unsafe. Most of XEmacs *is*
Mule-ized already.
I know noting about MULE api.
Here is a small tutorial on using Mule. Which is kind of strange,
coming from one who doesn't even compile with Mule.
* Don't assume that characters are bytes.
OK, this one was obvious. Lose a cred. Basically, XEmacs can deal
with two kinds of buffer data: internal data encoded in an internal
format, and external data.
** Bufbyte
The data in buffers or strings consists of Bufbytes. An ASCII
character is represented by the same Bufbyte, and extended characters
are represented by a sequence of Bufbytes.
Many internal functions like make_string() accept Bufbytes, which
removes the need for them to convert the data they receive. This
makes sense because the wanted data in most circumstances already is
in internal format.
Without Mule, a Bufbyte is just a character.
** Emchar
Emchar is a single Emacs character. It can be an ASCII character or
an extended character, and it fits into an integer. An Emchar is not
the same as its Bufbyte representation.
* Never use char pointers for internal works.
In most circumstances it is a mistake to use a `char *p' variable, and
increment it with p++, only to get Emchars from *p. The correct way
is to declare p as `Bufbyte *p', use INC_CHARPTR (p) to increment it,
and DEC_CHARPTR (p) to decrement it. Use set_charptr_emchar to set an
Emchar at that position.
When an external function returns a char pointer, you should use a
conversion functions to convert it to the internal format. An example
of the conversion function is GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA,
which see.
* Be careful not to confuse Charcount, Bytecount and Bufpos.
Bufpos is the logical buffer position (Lisp code only sees Bufpos).
Charcount is the count of characters between two Bufpos' -- or, a
string index. Bytecount is the number of bytes -- for example, the
difference between two Bufbyte pointers, or the byte size of a string.
It is useful to be diligent about these naming conventions, to avoid
confusion. If you are careful with them, a single glance at the code
will tell you what data the code works with.
* An example.
Take this definition of Fstring:
DEFUN ("string", Fstring, 0, MANY, 0, /*
Concatenate all the argument characters and make the result a string.
*/
(int nargs, Lisp_Object *args))
{
Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
Bufbyte *p = storage;
for (; nargs; nargs--, args++)
{
Lisp_Object lisp_char = *args;
CHECK_CHAR_COERCE_INT (lisp_char);
p += set_charptr_emchar (p, XCHAR (lisp_char));
}
return make_string (storage, p - storage);
}
First, I stack-allocate an array of Bufbytes as long as the maximum
number of Emchars would fit in the new string.
Then, the loop checks each element to be a char (coercing integers to
chars), and stores the emchar to the address. set_charptr_emchar()
also returns a Bytecount that can be used to increment the pointer.
Another useful example is Fnormalize_menu_item_name. Looking at the
code of this function should make the usage of various functions and
macros pretty clear.
DEFUN ("normalize-menu-item-name", Fnormalize_menu_item_name, 1, 2, 0, /*
Convert a menu item name string into normal form, and return the new string.
Menu item names should be converted to normal form before being compared.
*/
(name, buffer))
{
struct buffer *buf = decode_buffer (buffer, 0);
struct Lisp_String *n;
Charcount end;
int i;
Bufbyte *name_data;
Bufbyte *string_result;
Bufbyte *string_result_ptr;
Emchar elt;
int expecting_underscore = 0;
CHECK_STRING (name);
n = XSTRING (name);
end = string_char_length (n);
name_data = string_data (n);
string_result = (Bufbyte *) alloca (end * MAX_EMCHAR_LEN);
string_result_ptr = string_result;
for (i = 0; i < end; i++)
{
elt = charptr_emchar (name_data);
elt = DOWNCASE (buf, elt);
if (expecting_underscore)
{
expecting_underscore = 0;
switch (elt)
{
case '%':
/* Allow `%%' to mean `%'. */
string_result_ptr += set_charptr_emchar (string_result_ptr, '%');
break;
case '_':
break;
default:
string_result_ptr += set_charptr_emchar (string_result_ptr, '%');
string_result_ptr += set_charptr_emchar (string_result_ptr, elt);
}
}
else if (elt == '%')
expecting_underscore = 1;
else
string_result_ptr += set_charptr_emchar (string_result_ptr, elt);
INC_CHARPTR (name_data);
}
return make_string (string_result, string_result_ptr - string_result);
}
How should I rewrite this?
> + command_line = alloca_array (char, (strlen (argv[0])
> + + XSTRING_LENGTH (args_or_ret) + 2));
> + strcpy (command_line, argv[0]);
> + strcat (command_line, " ");
> + strcat (command_line, XSTRING_DATA (args_or_ret));
This depends on where `command_line' will be used. If it will be
passed to a system routine, then you should convert args_or_ret to the
external format:
GET_STRING_EXT_OS_DATA_ALLOCA (args_or_ret, outptr, outlen);
And then:
command_line = alloca_array (char, strlen (argv[0]) + outlen + 2);
...
--
Hrvoje Niksic <hniksic(a)srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
By any chance, have you seen a summoned 9th order fire elemental
wandering around? No? Oh.. Tell me if you do.