Making Elisp Function Calls Faster

Owner: ???

Effort: ???

Dependencies: ???

Abstract: This page describes many optimizations that can be made to the existing Elisp function call mechanism without too much effort. The most important optimizations can probably be implemented with only a day or two of work. I think it's important to do this work regardless of whether we eventually decide to replace the Lisp engine.

Many complaints have been made about the speed of Elisp, and in particular about the slowness in executing function calls, and rightly so. If you look at the implementation of the funcall function, you'll notice that it does an incredible amount of work. Now logically, it doesn't need to be so. Let's look first from the theoretical standpoint at what absolutely needs to be done to call a Lisp function.

First, let's look at the situation that would exist if we were smart enough to have made lexical scoping be the default language policy. We know at compile time exactly which code can reference the variables that are the formal parameters for the function being called (specifically, only the code that is part of that function's definition) and where these references are. As a result, we can simply push all the values of the variables onto a stack, and convert all the variable references in the function definition into stack references. Therefore, binding lexically-scoped parameters in preparation for a function call involves nothing more than pushing the values of the parameters onto a stack and then setting a new value for the frame pointer, at the same time remembering the old one. Because the byte-code interpreter has a stack-based architecture, however, the parameter values have already been pushed onto the stack at the time of the function call invocation. Therefore, binding the variables involves doing nothing at all, other than dealing with the frame pointer.

With dynamic scoping, the situation is somewhat more complicated. Because the parameters can be referenced anywhere, and these references cannot be located at compile time, their values have to be stored into a global table that maps the name of the parameter to its current value. In Elisp, this table is called the obarray. Variable binding in Elisp is done using the C function specbind(). (This stands for "special binding" where special is the standard Lisp terminology for a dynamically-scoped variable.) What specbind() does, essentially, is retrieve the old value of the variable out of the obarray, remember the value by pushing it, along with the name of the variable, onto what's called the specpdl stack, and then store the new value into the obarray. (Martin thinks that "specpdl" means special pushdown list, where pushdown list is an archaic computer science term for a stack.) These binding operations, however, should still not take very much time because of the use of symbols, i.e. because the location in the obarray where the variable's value is stored has already been determined (specifically, it was determined at the time that the byte code was loaded and the symbol created), so no expensive hash table lookups need to be performed.

An actual function invocation in Elisp does a great deal more work, however, than was just outlined above. Let's just take a look at what happens when one byte-compiled function invokes another byte-compiled function, checking for places where unnecessary work is being done and determining how to optimize these places.

The byte-compiled function's parameter list is stored in exactly the format that the programmer entered it in, which is to say as a Lisp list, complete with &optional and &rest keywords. This list has to be parsed for every function invocation, which means that for every element in a list, the element is checked to see whether it's the &optional or &rest keywords, its surrounding cons cell is checked to make sure that it is indeed a cons cell, the QUIT macro is called, etc. What should be happening here is that the argument list is parsed exactly once, at the time that the byte code is loaded, and converted into a C array. The C array should be stored as part of the byte-code object. The C array should also contain, in addition to the symbols themselves, the number of required and optional arguments. At function call time, the C array can be very quickly retrieved and processed.
For every variable that is to be bound, the specbind() function is called. This actually does quite a lot of things, including:
1. Checking the symbol argument to the function to make sure it's actually a symbol.
2. Checking for specpdl stack overflow, and increasing its size as necessary.
3. Calling symbol_value_buffer_local_info() to retrieve buffer local information for the symbol, and then processing the return value from this function in a series of if statements.
4. Actually storing the old value onto the specpdl stack.
5. Calling Fset() to change the variable's value.

The entire series of calls to specbind() should be inline and merged into the argument processing code as a single tight loop, with no function calls in the vast majority of cases. The specbind() logic should be streamlined as follows:

The symbol argument type checking is unnecessary.
The check for the specpdl stack overflow needs to be done only once, not once per argument.
All of the remaining logic should be boiled down as follows:
1. Retrieve the old value from the symbol's value cell.
2. If this value is a symbol-value-magic object, then call the real specbind() to do the work.
3. Otherwise, we know that nothing complicated needs to be done, so we simply push the symbol and its value onto the specpdl stack, and then replace the value in the symbol's value cell.
4. The only logic that we are omitting is the code in Fset() that checks to make sure a constant isn't being set. These checks should be made at the time that the byte code for the function is loaded and the C array of parameters to the function is created. (Whether a symbol is constant or not is generally known at XEmacs compile time. The only issue here is with symbols whose names begin with a colon. These symbols should simply be disallowed completely as parameter names.)

Other optimizations that could be done are:

At the beginning of the function that implements the byte-code interpreter (this is the Lisp primitive byte-code), the string containing the actual byte code is converted into an array of integers. I added this code specifically for MULE so that the byte-code engine didn't have to deal with the complexities of the internal string format for text. This conversion, however, is generally useful because on modern processors accessing 32-bit values out of an array is significantly faster than accessing unaligned 8-bit values. This conversion takes time, though, and should be done once at load time rather than each time the byte code is executed. This array should be stored in the byte-code object. Currently, this is a bit tricky to do, because byte-code is not actually passed the byte-code object, but rather three of its elements. We can't just change byte-code so that it is directly passed the byte-code object because this function, with its existing argument calling pattern, is called directly from compiled Elisp files. What we can and should do, however, is create a subfunction that does take a byte-code object and actually implements the byte-code interpreter engine. Whenever the C code wants to execute byte code, it calls this subfunction. byte-code itself also calls this subfunction after conjuring up an appropriate byte-code object and storing its arguments into this object. With a small amount of work, it's possible to do this conjuring in such a way that it doesn't generate any garbage.
At the end of a function call, the parameter bindings that have been done need to be undone. This is standardly done by calling unbind_to(). Just as for a specbind(), this function does a lot of work that is unnecessary in the vast majority of cases, and it could also be inlined and streamlined.
As part of each Elisp function call, a whole bunch of checks are done for a series of unlikely but possible conditions that may occur. These include, for example,
- Calling the QUIT macro, which essentially involves checking a global volatile variable to see whether additional processing needs to be done.
- Checking whether a garbage collection needs to be done.
- Checking the variable debug_on_next_call.
- Checking for whether Elisp profiling is active. (An additional optimization that's perhaps not worth the effort is to do some post-processing on the array of integers after it has been converted. For example, whenever a 16-bit value occurs in the byte code, it has to be encoded as two separate 8-bit values. These values could be combined. The tricky part here is that all of the places where a goto occurs across the place where this modification is made would have to have their offsets changed. Other such optimizations can easily be imagined as well.)
With a little bit smarter code, it should be possible to make a single trip variable that indicates whether any of these conditions is true. This variable would be updated by any code that changes the actual variables whose values are checked in the various checks just mentioned. (By the way, all of this is occurring in the C function funcall_recording_as().) There is a little bit of code between each of the checks. This code would simply have to be duplicated between the two cases where this general trip variable is true and is false. (Note: the optimization detailed in this item is probably not worth doing on the first pass.)

Ben Wing