XEmacs Internal and External APIs:
			   A Modest Proposal

		     Jerry James <james@xemacs.org>
			      May 22, 2001

1. Introduction

   The introduction of loadable modules has exposed the internals of
   XEmacs to outside developers.  This situation is obviously
   undesirable, as it allows developers to write code that will break
   due to small internal changes.  That, in turn, puts pressure on
   XEmacs developers to reduce or eliminate changes to existing
   functions, variables, and data structures.

   Clearly, a fixed and limited API should be exposed to module writers.
   The API should be written in a way that maximizes developer
   flexibility, yet gives enough power to the module writer to enable
   nontrivial modules to be developed.  It should, at the least, give
   the module writer access to anything visible at the Lisp level.
   Developing such an API is called the "external module API problem"
   throughout this document.

   It would be a mistake to limit attention to loadable modules.  XEmacs
   is a large enough piece of software that it has a nontrivial internal
   structure.  We should take this opportunity to reflect on the
   internal APIs that various parts of XEmacs expose to other parts.
   Determining what those APIs should be is called the "internal API
   problem" throughout this document.

   This proposal attempts to address both the external module API
   problem and the internal API problem (collectively the "API
   problems").

   The rest of this proposal is organized as follows.  Part 2 proposes a
   significant change to the way in which C/Lisp "glue" code is
   structured.  Part 3 shows how the proposal in part 2 solves the API
   problems.  Part 4 discusses the pros and cons of this proposal.  Part
   5 discusses an approach to implementing the proposal.  Part 6
   discusses the capabilities of the proposer.  Part 7 contains some
   concluding remarks.

2. A New Glue: Changing the way that Lisp-visible C code is written

   The currently available way of writing Lisp-visible C code suffers
   from some restrictions, largely due to limitations of the C
   preprocessor.  However, the need to ship C code to those who build
   XEmacs does not necessarily imply that all code has to be written in
   C!  In particular, a source-to-source translator with more
   capabilities than the C preprocessor could be used to write the
   C/Lisp glue code in a more Lispish fashion.  Whether the ultimate
   source, the preprocessed source, or both would be shipped with
   releases is not specified in this proposal.  This topic should be
   discussed by the current developers if this proposal is found to have
   any merit.

   The preprocessor would have several responsibilities, detailed in the
   following subsections.

   A. Function and special form declarations

      Currently, Lisp-visible C functions are declared with the DEFUN
      macro, defined in lisp.h.  That macro has some limitations,
      including:

      1. Limited number of required parameters: currently the limit is
         8.  While this could be extended, the current scheme requires
         *some* fixed limit.

      2. Unhelpful documentation strings with MANY: no documentation
         string is generated for functions that accept a variable number
         of arguments (the equivalent of &rest in Lisp).  If you want
         help to display the argument list, you have to write the list
         yourself in the function documentation.  This has not been done
         for many functions (e.g., concat, max, format).

      3. Need to know whether a function with MANY is being called: the
         calling convention is different for functions that do not use
         MANY (one C parameter per Lisp parameter) and functions that do
         use MANY (two C parameters: a length, and an array of
         parameters).

      4. Inability to automatically generate the C function name: you
         have to write essentially the same name twice, once in the Lisp
         form (e.g., "widget-apply"), and once in the C form (e.g.,
         Fwidget_apply).  Nothing prevents a developer from making the
         Lisp and C names completely unrelated.

      5. Requirement to list functions names in syms_of_xxx manually:
         this is one of those trivial requirements that a computer can
         do better than a human.  The computer never gets tired or
         overlooks a function.

      The preprocessor would instead allow code like the following to be
      written:

      DEFUN lisp-function-name (arglist) "docstring" "interactive-spec"
      {
        body;
      }

      where the arglist could contain &rest and &optional declarations.
      A &rest parameter would be given a Lisp list containing the actual
      parameters (if any, Qnil if there are none.)  For example, the
      definition of max would look like this:

      DEFUN max (num &rest nums) "Return largest ..." 0
      {
        ...
      }

      ###THINK ABOUT THIS### Making rest args contain Lisp lists is an
      incompatible change from the current code.  It might be better to
      stick with two arguments: a length and an array.  Otherwise, every
      MANY function will have to be rewritten in a nontrivial way.

      There would also be a companion DEFSPECIAL for declaring special
      forms (with unevaluated arguments).  The syntax would be
      equivalent in every other respect, however.

      The developer would put nothing in syms_of_xxx corresponding to
      the function declaration.

   B. Function calls

      When Lisp calls are made from the C level, the caller currently
      has to know:

      - whether the called function is declared in C or Lisp;
      - if in C, whether the called function takes MANY arguments or
        not.

      The calling convention is different for the three cases.  Instead,
      we could use a uniform calling convention, which is mapped to one
      of these three cases automatically.  The new function call syntax
      would be:

      FUNCALL (lisp-function-name args)

      and

      APPLY (lisp-function-name args)

      where the semantics are equivalent to the Lisp funcall and apply.

   C. Variable declarations

      The current state of Lisp variable declarations is not bad.  But
      as long as we are considering a preprocessor, it behooves us to
      consider whether that state can be improved at all.  With a
      preprocessor, we could lift the following restrictions:

      1. Variables must be declared in vars_of_xxx: instead, we could
         declare them anywhere (e.g., close to related functions), and
         the preprocessor could move the declaration to vars_of_xxx
         automatically.

      2. Inability to automatically generate the C variable name: this
         one has to be thought about carefully, however, due to the dual
         symbol/variable nature of such declarations and the
         inconsistent naming patterns in the current code.  (That is,
         some code uses the convention that the Lisp variable foo is
         called Vfoo in C, and some code uses foo in C as well.)

   D. Variable access

      Accessing a Lisp variable from C requires the programmer to know
      whether the variable is declared in Lisp or C.  Instead, we will
      use the following syntax to read a Lisp variable:

      LISP_VAR (lisp-variable-name)

      and this syntax to write a Lisp variable:

      SETQ lisp-variable-name value;

   E. Documentation generation

      A preprocessor would render make-docfile superfluous.  It would
      generate the docfile as it processed the input file.  This may
      actually be a good thing, due to the primitive structure of
      make-docfile.  It is easily fooled (try naming a local variable
      DEFVAR, for example), and is somewhat fragile (e.g., leave out the
      documentation comment on a DEFUN and see what happens).

3. Solving the API problems

   How does changing the glue code solve the API problems?  It gives us
   a uniform way of declaring and accessing Lisp-visible entities in C.
   Behind the scenes, the preprocessor can select the most efficient
   mechanism available for a given access, within the current
   implementation.  Thus, a function can migrate from Lisp to C (or vice
   versa) and no calling code needs to be changed.  The preprocessor
   will change the implementation of calls, but no human needs to be
   explicitly aware of this.

   A simple way of accomplishing this is to have the preprocessor
   automatically generate a list of C-visible Lisp functions and
   variables.  Then accesses to anything on the list can be done in the
   direct C fashion.   Accesses to anything not on the list have to go
   through the existing Lisp access mechanisms (e.g., call0, call1).
   This implies that the preprocessor must make at least 2 passes
   through all of the sources.

4. Pros and Cons

   Few changes to large systems are unequivocally good.  This section
   presents the good points and the bad points about the proposed
   change, so that they can be balanced against one another.

   A. Pros

      1. Greater modularity in the internals.  That is, the various
         parts of XEmacs will be more loosely coupled, enabling future
         migrations between C and Lisp to occur more easily.

      2. Ability to target both C and C++.  If somebody figures out how
         to implement lrecords as C++ classes, for example, or error
         signalling using C++ throw/catch, the preprocessor could be
         extended to allow those entities to be specified more
         abstractly.  Then a command-line switch could be used to
         generate either C or C++ code.

      3. Meaningful argument lists for Lisp functions defined in C.  See
         section 2A.

      4. Subsume the documentation generator.  See section 2E.

   B. Cons

      1. Compatibility with FSF Emacs is severely reduced.  This will
         make future synchs more difficult.

      2. C/Lisp variable use more obtuse.  Now, the C variable that is
         used as storage for the Lisp variable can be accessed like any
         normal C variable.  The preprocessor would require it to be
         accessed using a more arcane syntax.

      3. Syntax checking.  The preprocessor would either have to deal
         with any possible C syntax error in a manner essentially
         equivalent to the C compiler itself, or be able to complete its
         work successfully in the presence of syntax errors.

5. Implementation

   Get a C compiler front end from somewhere.  Michael Sperber hinted
   that he has something like that.  Otherwise, look into cpplib from
   the gcc folks.  Otherwise otherwise, get a lex/yacc (flex/bison) C
   grammar from somewhere (there are lots of them around) and write the
   rest ourselves.  Extend the accepted syntax to accept the constructs
   given above.  Use the list structure described in section 3 along
   with a 2-pass preprocessor to do the transformation.

6. About the Proposer

   This is the part where I try to inspire you with confidence in my
   ability to actually pull this off.  My c.v. is available on my web
   page (<URL:http://www.ittc.ku.edu/~james/>), if that helps.  Let me
   just say that I have a Ph.D. in Computer Science from the University
   of California, Santa Barbara of very recent vintage (March 2000).  My
   work has largely been in the area of distributed systems.  However, I
   have worked with parsers and source-to-source translators before.
   For example, I wrote the JParse Java translator and type checker
   (<URL:http://www.ittc.ku.edu/JParse/>) which I used as part of a
   source-to-source translator in the Kan distributed Java object system
   (<URL:http://www.ittc.ku.edu/kan/>).  I also wrote a very large
   XEmacs external module, an interface to IBM's ViaVoice
   (<URL:http://www.ittc.ku.edu/~james/xemacs/viavoice.html>).  I am an
   assistant professor in the Electrical Engineering & Computer Science
   Department at the University of Kansas.  My favorite vices are
   chocolate, 80's rock music, and philosophy (especially cosmology).

7. Conclusion

   This proposal actually dodges a fairly thorny problem: what about the
   non-Lisp-visible parts of the internals?  Nevertheless, I think this
   proposal, if implemented, would represent a big step forward in
   managing the XEmacs APIs successfully.  There are cons, but in my
   opinion they are outweighed by the pros.