>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)srce.hr> writes:
Hrvoje> Martin Buchholz <martin(a)xemacs.org> writes:
> For best optimization, you would store the arglist directly
inside the
> opaque object with the instructions for better locality of reference.
> Same with the constant vector.
Hrvoje> Maybe. Maybe that's not really needed. For startes, I would to see
Hrvoje> if the parameters looping (via LIST_LOOP_3) takes a measurable amount
Hrvoje> of time for the average funcall case. If yes, *any* optimization will
Hrvoje> do good. If not, we needn't bother with locality.
LIST_LOOP_3 itself is pretty cheap.
Hrvoje> Since you carried this off to xemacs-beta, I might as well quote Ben
Hrvoje> from <
URL:http://www.666.com/xemacs/faster-elisp.html>:
Hrvoje> 1. The byte-compiled function's parameter list is stored in exactly
Hrvoje> the format that the programmer entered it in, which is to say as a
Hrvoje> Lisp list, complete with &optional and &rest keywords. This list
Hrvoje> has to be parsed for every function invocation, which means that
Hrvoje> for every element in a list, the element is checked to see whether
Hrvoje> it's the &optional or &rest keywords, its surrounding cons cell
is
Hrvoje> checked to make sure that it is indeed a cons cell, the QUIT macro
Hrvoje> is called, etc. What should be happening here is that the argument
Hrvoje> list is parsed exactly once, at the time that the byte code is
Hrvoje> loaded, and converted into a C array. The C array should be stored
Hrvoje> as part of the byte-code object. The C array should also contain,
Hrvoje> in addition to the symbols themselves, the number of required and
Hrvoje> optional arguments. At function call time, the C array can be very
Hrvoje> quickly retrieved and processed.
Hrvoje> How much of the rest of what this URL describes have you implemented?
Hrvoje> The specbind() stuff in the source looks pretty much like the stuff
Hrvoje> Ben says we should have. The same goes for the massaged byte-code
Hrvoje> stuff, as well as for inlined unbind_to().
The QUIT macro is no longer called.
The arglist is only checked for proper-list-ness once.
The other optimizations are not implemented.
Perhaps, Hrvoje, you'd like to?
Keep in mind that the average function has one or two arguments.
There is a danger that introducing too much machinery for argument
parsing will actually be counter-productive. For example, we could
store an arglist-parser function pointer in the compiled-function
object, but the function call overhead is likely to be as high as
parsing the arglist in the first place.
Another win from reorganizing the arglist is simply saving on cons
cells. In my current xemacs I measure 6242 arglist cons cells:
(let ((argcount 0))
(mapatoms
(lambda (s)
(when (and (fboundp s)
(compiled-function-p (symbol-function s)))
(incf argcount (length (compiled-function-arglist (symbol-function s)))))))
argcount)
==> 6242
To put this in perspective, my xemacs seems to have 172131 cons cells
in total.
Every saving on Lisp_Objects makes garbage collection faster and
reduces the XEmacs working set.
Hrvoje> According to this benchmark, quite a lot of time is spent in
Hrvoje> funcall_compiled_function(). Building with quantify might be useful
Hrvoje> here, to see what instructions take up what amount of time.
Hrvoje> Perhaps by making them into a macro, similar to how SPECBIND was
Hrvoje> handled? Admittedly, this is still rather ugly, but...
I have to admit I've really badly wanted computed gotos in C a lot
lately. FORTRAN has them...
Martin
Hrvoje> BTW, how do you obtain gprof output for a single benchmark?
You build with -pg, run temacs, not xemacs, and run the benchmark in a
loop a bazillion times. My benchmark script, which has never seen the
light of day before, follows: