Owner: ???
Effort: ???
Dependencies: ???
Abstract: This page describes many optimizations that can be made to the existing Elisp function call mechanism without too much effort. The most important optimizations can probably be implemented with only a day or two of work. I think it's important to do this work regardless of whether we eventually decide to replace the Lisp engine.
Many complaints have been made about the speed of Elisp, and in
particular about the slowness in executing function calls, and rightly
so. If you look at the implementation of the funcall
function, you'll notice that it does an incredible amount of work.
Now logically, it doesn't need to be so. Let's look first from the
theoretical standpoint at what absolutely needs to be done to call a
Lisp function.
First, let's look at the situation that would exist if we were smart enough to have made lexical scoping be the default language policy. We know at compile time exactly which code can reference the variables that are the formal parameters for the function being called (specifically, only the code that is part of that function's definition) and where these references are. As a result, we can simply push all the values of the variables onto a stack, and convert all the variable references in the function definition into stack references. Therefore, binding lexically-scoped parameters in preparation for a function call involves nothing more than pushing the values of the parameters onto a stack and then setting a new value for the frame pointer, at the same time remembering the old one. Because the byte-code interpreter has a stack-based architecture, however, the parameter values have already been pushed onto the stack at the time of the function call invocation. Therefore, binding the variables involves doing nothing at all, other than dealing with the frame pointer.
With dynamic scoping, the situation is somewhat more complicated.
Because the parameters can be referenced anywhere, and these
references cannot be located at compile time, their values have to be
stored into a global table that maps the name of the parameter to its
current value. In Elisp, this table is called the obarray.
Variable binding in Elisp is done using the C function
specbind()
. (This stands for "special binding" where
special is the standard Lisp terminology for a
dynamically-scoped variable.) What specbind()
does,
essentially, is retrieve the old value of the variable out of the
obarray, remember the value by pushing it, along with the name of the
variable, onto what's called the specpdl stack, and then
store the new value into the obarray. (Martin thinks that "specpdl"
means special pushdown list, where pushdown list is
an archaic computer science term for a stack.) These binding
operations, however, should still not take very much time because of
the use of symbols, i.e. because the location in the obarray where the
variable's value is stored has already been determined (specifically,
it was determined at the time that the byte code was loaded and the
symbol created), so no expensive hash table lookups need to be
performed.
An actual function invocation in Elisp does a great deal more work, however, than was just outlined above. Let's just take a look at what happens when one byte-compiled function invokes another byte-compiled function, checking for places where unnecessary work is being done and determining how to optimize these places.
The byte-compiled function's parameter list is stored in exactly
the format that the programmer entered it in, which is to say as a
Lisp list, complete with &optional
and
&rest
keywords. This list has to be parsed for
every function invocation, which means that for every element
in a list, the element is checked to see whether it's the
&optional
or &rest
keywords, its
surrounding cons cell is checked to make sure that it is indeed a cons
cell, the QUIT
macro is called, etc. What should be
happening here is that the argument list is parsed exactly once, at
the time that the byte code is loaded, and converted into a C array.
The C array should be stored as part of the byte-code object. The C
array should also contain, in addition to the symbols themselves, the
number of required and optional arguments. At function call time, the
C array can be very quickly retrieved and processed.
For every variable that is to be bound, the
specbind()
function is called. This actually does quite
a lot of things, including:
Checking the symbol argument to the function to make sure it's actually a symbol.
Checking for specpdl stack overflow, and increasing its size as necessary.
Calling symbol_value_buffer_local_info()
to retrieve
buffer local information for the symbol, and then processing the
return value from this function in a series of if statements.
Actually storing the old value onto the specpdl stack.
Calling Fset()
to change the variable's value.
The entire series of calls to specbind()
should be
inline and merged into the argument processing code as a single tight
loop, with no function calls in the vast majority of cases. The
specbind()
logic should be streamlined as follows:
The symbol argument type checking is unnecessary.
The check for the specpdl stack overflow needs to be done only once, not once per argument.
All of the remaining logic should be boiled down as follows:
Retrieve the old value from the symbol's value cell.
If this value is a symbol-value-magic object, then call the
real specbind()
to do the work.
Otherwise, we know that nothing complicated needs to be done, so we simply push the symbol and its value onto the specpdl stack, and then replace the value in the symbol's value cell.
The only logic that we are omitting is the code in
Fset()
that checks to make sure a constant isn't being
set. These checks should be made at the time that the byte code for
the function is loaded and the C array of parameters to the function
is created. (Whether a symbol is constant or not is generally known
at XEmacs compile time. The only issue here is with symbols whose
names begin with a colon. These symbols should simply be disallowed
completely as parameter names.)
Other optimizations that could be done are:
At the beginning of the function that implements the byte-code
interpreter (this is the Lisp primitive byte-code
), the
string containing the actual byte code is converted into an array of
integers. I added this code specifically for MULE so that the
byte-code engine didn't have to deal with the complexities of the
internal string format for text. This conversion, however, is
generally useful because on modern processors accessing 32-bit values
out of an array is significantly faster than accessing unaligned 8-bit
values. This conversion takes time, though, and should be done once
at load time rather than each time the byte code is executed. This
array should be stored in the byte-code object. Currently, this is a
bit tricky to do, because byte-code
is not actually
passed the byte-code object, but rather three of its elements. We
can't just change byte-code
so that it is directly passed
the byte-code object because this function, with its existing argument
calling pattern, is called directly from compiled Elisp files. What
we can and should do, however, is create a subfunction that does take
a byte-code object and actually implements the byte-code interpreter
engine. Whenever the C code wants to execute byte code, it calls this
subfunction. byte-code
itself also calls this
subfunction after conjuring up an appropriate byte-code object and
storing its arguments into this object. With a small amount of work,
it's possible to do this conjuring in such a way that it doesn't
generate any garbage.
At the end of a function call, the parameter bindings that have
been done need to be undone. This is standardly done by calling
unbind_to()
. Just as for a specbind()
, this
function does a lot of work that is unnecessary in the vast majority
of cases, and it could also be inlined and streamlined.
As part of each Elisp function call, a whole bunch of checks are done for a series of unlikely but possible conditions that may occur. These include, for example,
Calling the QUIT
macro, which essentially involves
checking a global volatile variable to see whether additional processing
needs to be done.
Checking whether a garbage collection needs to be done.
Checking the variable debug_on_next_call
.
Checking for whether Elisp profiling is active. (An additional optimization that's perhaps not worth the effort is to do some post-processing on the array of integers after it has been converted. For example, whenever a 16-bit value occurs in the byte code, it has to be encoded as two separate 8-bit values. These values could be combined. The tricky part here is that all of the places where a goto occurs across the place where this modification is made would have to have their offsets changed. Other such optimizations can easily be imagined as well.)
With a little bit smarter code, it should be possible to make a
single trip variable that indicates whether any of these conditions is
true. This variable would be updated by any code that changes the
actual variables whose values are checked in the various checks just
mentioned. (By the way, all of this is occurring in the C function
funcall_recording_as()
.) There is a little bit of code
between each of the checks. This code would simply have to be
duplicated between the two cases where this general trip variable is
true and is false. (Note: the optimization detailed in this item is
probably not worth doing on the first pass.)