Martin Buchholz <martin(a)xemacs.org> writes:
For best optimization, you would store the arglist directly inside
the
opaque object with the instructions for better locality of reference.
Same with the constant vector.
Maybe. Maybe that's not really needed. For startes, I would to see
if the parameters looping (via LIST_LOOP_3) takes a measurable amount
of time for the average funcall case. If yes, *any* optimization will
do good. If not, we needn't bother with locality.
Since you carried this off to xemacs-beta, I might as well quote Ben
from <
URL:http://www.666.com/xemacs/faster-elisp.html>:
1. The byte-compiled function's parameter list is stored in exactly
the format that the programmer entered it in, which is to say as a
Lisp list, complete with &optional and &rest keywords. This list
has to be parsed for every function invocation, which means that
for every element in a list, the element is checked to see whether
it's the &optional or &rest keywords, its surrounding cons cell is
checked to make sure that it is indeed a cons cell, the QUIT macro
is called, etc. What should be happening here is that the argument
list is parsed exactly once, at the time that the byte code is
loaded, and converted into a C array. The C array should be stored
as part of the byte-code object. The C array should also contain,
in addition to the symbols themselves, the number of required and
optional arguments. At function call time, the C array can be very
quickly retrieved and processed.
How much of the rest of what this URL describes have you implemented?
The specbind() stuff in the source looks pretty much like the stuff
Ben says we should have. The same goes for the massaged byte-code
stuff, as well as for inlined unbind_to().
This will also speed up garbage collection. bytecode.c will get a
little uglier, however.
It won't buy as much as you'd like, though.
It depends on my value of "like". Making funcalls 5% or 10% faster is
not a bad thing.
On second thought, the obvious way to exercise funcall is using
mapc.
Here's a gprof output for (mapc (lambda (x) (incf z x)) long-list):
% cumulative self self total
time seconds seconds calls ms/call ms/call name
35.3 226.20 226.20 300015011 0.00 0.00
execute_optimized_program <cycle 3> [6]
23.3 375.60 149.40 300013990 0.00 0.00
funcall_compiled_function <cycle 3> [7]
19.4 500.16 124.56 300364373 0.00 0.00 Ffuncall <cycle 3> [8]
According to this benchmark, quite a lot of time is spent in
funcall_compiled_function(). Building with quantify might be useful
here, to see what instructions take up what amount of time.
BTW, how do you obtain gprof output for a single benchmark?
Of course, mapc and friends can be sped up by putting Ffuncall-style
smarts into mapcar1(). But I don't see any way offhand of how to do
this elegantly.
Perhaps by making them into a macro, similar to how SPECBIND was
handled? Admittedly, this is still rather ugly, but...