XEmacs Internal and External APIs: A Modest Proposal Jerry James May 22, 2001 1. Introduction The introduction of loadable modules has exposed the internals of XEmacs to outside developers. This situation is obviously undesirable, as it allows developers to write code that will break due to small internal changes. That, in turn, puts pressure on XEmacs developers to reduce or eliminate changes to existing functions, variables, and data structures. Clearly, a fixed and limited API should be exposed to module writers. The API should be written in a way that maximizes developer flexibility, yet gives enough power to the module writer to enable nontrivial modules to be developed. It should, at the least, give the module writer access to anything visible at the Lisp level. Developing such an API is called the "external module API problem" throughout this document. It would be a mistake to limit attention to loadable modules. XEmacs is a large enough piece of software that it has a nontrivial internal structure. We should take this opportunity to reflect on the internal APIs that various parts of XEmacs expose to other parts. Determining what those APIs should be is called the "internal API problem" throughout this document. This proposal attempts to address both the external module API problem and the internal API problem (collectively the "API problems"). The rest of this proposal is organized as follows. Part 2 proposes a significant change to the way in which C/Lisp "glue" code is structured. Part 3 shows how the proposal in part 2 solves the API problems. Part 4 discusses the pros and cons of this proposal. Part 5 discusses an approach to implementing the proposal. Part 6 discusses the capabilities of the proposer. Part 7 contains some concluding remarks. 2. A New Glue: Changing the way that Lisp-visible C code is written The currently available way of writing Lisp-visible C code suffers from some restrictions, largely due to limitations of the C preprocessor. However, the need to ship C code to those who build XEmacs does not necessarily imply that all code has to be written in C! In particular, a source-to-source translator with more capabilities than the C preprocessor could be used to write the C/Lisp glue code in a more Lispish fashion. Whether the ultimate source, the preprocessed source, or both would be shipped with releases is not specified in this proposal. This topic should be discussed by the current developers if this proposal is found to have any merit. The preprocessor would have several responsibilities, detailed in the following subsections. A. Function and special form declarations Currently, Lisp-visible C functions are declared with the DEFUN macro, defined in lisp.h. That macro has some limitations, including: 1. Limited number of required parameters: currently the limit is 8. While this could be extended, the current scheme requires *some* fixed limit. 2. Unhelpful documentation strings with MANY: no documentation string is generated for functions that accept a variable number of arguments (the equivalent of &rest in Lisp). If you want help to display the argument list, you have to write the list yourself in the function documentation. This has not been done for many functions (e.g., concat, max, format). 3. Need to know whether a function with MANY is being called: the calling convention is different for functions that do not use MANY (one C parameter per Lisp parameter) and functions that do use MANY (two C parameters: a length, and an array of parameters). 4. Inability to automatically generate the C function name: you have to write essentially the same name twice, once in the Lisp form (e.g., "widget-apply"), and once in the C form (e.g., Fwidget_apply). Nothing prevents a developer from making the Lisp and C names completely unrelated. 5. Requirement to list functions names in syms_of_xxx manually: this is one of those trivial requirements that a computer can do better than a human. The computer never gets tired or overlooks a function. The preprocessor would instead allow code like the following to be written: DEFUN lisp-function-name (arglist) "docstring" "interactive-spec" { body; } where the arglist could contain &rest and &optional declarations. A &rest parameter would be given a Lisp list containing the actual parameters (if any, Qnil if there are none.) For example, the definition of max would look like this: DEFUN max (num &rest nums) "Return largest ..." 0 { ... } ###THINK ABOUT THIS### Making rest args contain Lisp lists is an incompatible change from the current code. It might be better to stick with two arguments: a length and an array. Otherwise, every MANY function will have to be rewritten in a nontrivial way. There would also be a companion DEFSPECIAL for declaring special forms (with unevaluated arguments). The syntax would be equivalent in every other respect, however. The developer would put nothing in syms_of_xxx corresponding to the function declaration. B. Function calls When Lisp calls are made from the C level, the caller currently has to know: - whether the called function is declared in C or Lisp; - if in C, whether the called function takes MANY arguments or not. The calling convention is different for the three cases. Instead, we could use a uniform calling convention, which is mapped to one of these three cases automatically. The new function call syntax would be: FUNCALL (lisp-function-name args) and APPLY (lisp-function-name args) where the semantics are equivalent to the Lisp funcall and apply. C. Variable declarations The current state of Lisp variable declarations is not bad. But as long as we are considering a preprocessor, it behooves us to consider whether that state can be improved at all. With a preprocessor, we could lift the following restrictions: 1. Variables must be declared in vars_of_xxx: instead, we could declare them anywhere (e.g., close to related functions), and the preprocessor could move the declaration to vars_of_xxx automatically. 2. Inability to automatically generate the C variable name: this one has to be thought about carefully, however, due to the dual symbol/variable nature of such declarations and the inconsistent naming patterns in the current code. (That is, some code uses the convention that the Lisp variable foo is called Vfoo in C, and some code uses foo in C as well.) D. Variable access Accessing a Lisp variable from C requires the programmer to know whether the variable is declared in Lisp or C. Instead, we will use the following syntax to read a Lisp variable: LISP_VAR (lisp-variable-name) and this syntax to write a Lisp variable: SETQ lisp-variable-name value; E. Documentation generation A preprocessor would render make-docfile superfluous. It would generate the docfile as it processed the input file. This may actually be a good thing, due to the primitive structure of make-docfile. It is easily fooled (try naming a local variable DEFVAR, for example), and is somewhat fragile (e.g., leave out the documentation comment on a DEFUN and see what happens). 3. Solving the API problems How does changing the glue code solve the API problems? It gives us a uniform way of declaring and accessing Lisp-visible entities in C. Behind the scenes, the preprocessor can select the most efficient mechanism available for a given access, within the current implementation. Thus, a function can migrate from Lisp to C (or vice versa) and no calling code needs to be changed. The preprocessor will change the implementation of calls, but no human needs to be explicitly aware of this. A simple way of accomplishing this is to have the preprocessor automatically generate a list of C-visible Lisp functions and variables. Then accesses to anything on the list can be done in the direct C fashion. Accesses to anything not on the list have to go through the existing Lisp access mechanisms (e.g., call0, call1). This implies that the preprocessor must make at least 2 passes through all of the sources. 4. Pros and Cons Few changes to large systems are unequivocally good. This section presents the good points and the bad points about the proposed change, so that they can be balanced against one another. A. Pros 1. Greater modularity in the internals. That is, the various parts of XEmacs will be more loosely coupled, enabling future migrations between C and Lisp to occur more easily. 2. Ability to target both C and C++. If somebody figures out how to implement lrecords as C++ classes, for example, or error signalling using C++ throw/catch, the preprocessor could be extended to allow those entities to be specified more abstractly. Then a command-line switch could be used to generate either C or C++ code. 3. Meaningful argument lists for Lisp functions defined in C. See section 2A. 4. Subsume the documentation generator. See section 2E. B. Cons 1. Compatibility with FSF Emacs is severely reduced. This will make future synchs more difficult. 2. C/Lisp variable use more obtuse. Now, the C variable that is used as storage for the Lisp variable can be accessed like any normal C variable. The preprocessor would require it to be accessed using a more arcane syntax. 3. Syntax checking. The preprocessor would either have to deal with any possible C syntax error in a manner essentially equivalent to the C compiler itself, or be able to complete its work successfully in the presence of syntax errors. 5. Implementation Get a C compiler front end from somewhere. Michael Sperber hinted that he has something like that. Otherwise, look into cpplib from the gcc folks. Otherwise otherwise, get a lex/yacc (flex/bison) C grammar from somewhere (there are lots of them around) and write the rest ourselves. Extend the accepted syntax to accept the constructs given above. Use the list structure described in section 3 along with a 2-pass preprocessor to do the transformation. 6. About the Proposer This is the part where I try to inspire you with confidence in my ability to actually pull this off. My c.v. is available on my web page (), if that helps. Let me just say that I have a Ph.D. in Computer Science from the University of California, Santa Barbara of very recent vintage (March 2000). My work has largely been in the area of distributed systems. However, I have worked with parsers and source-to-source translators before. For example, I wrote the JParse Java translator and type checker () which I used as part of a source-to-source translator in the Kan distributed Java object system (). I also wrote a very large XEmacs external module, an interface to IBM's ViaVoice (). I am an assistant professor in the Electrical Engineering & Computer Science Department at the University of Kansas. My favorite vices are chocolate, 80's rock music, and philosophy (especially cosmology). 7. Conclusion This proposal actually dodges a fairly thorny problem: what about the non-Lisp-visible parts of the internals? Nevertheless, I think this proposal, if implemented, would represent a big step forward in managing the XEmacs APIs successfully. There are cons, but in my opinion they are outweighed by the pros.