Martin Buchholz <martin(a)xemacs.org> writes:
>>>>> "Hrvoje" == Hrvoje Niksic
<hniksic(a)srce.hr> writes:
Hrvoje> Martin Buchholz <martin(a)xemacs.org> writes:
>> For best optimization, you would store the arglist directly inside the
>> opaque object with the instructions for better locality of reference.
>> Same with the constant vector.
Hrvoje> Maybe. Maybe that's not really needed. For startes, I
Hrvoje> would to see if the parameters looping (via LIST_LOOP_3)
Hrvoje> takes a measurable amount of time for the average funcall
Hrvoje> case. If yes, *any* optimization will do good. If not, we
Hrvoje> needn't bother with locality.
LIST_LOOP_3 itself is pretty cheap.
Hmm, yes. It bothers my esthetic soul to know that every time a
function is called its arglist is parsed in vain search of
`&optional', `&rest', and stuff. It somehow doesn't "feel
right".
Obviously, the yearnings of my soul needn't have anything to with the
actual speed of the code. :-)
Hrvoje> How much of the rest of what this URL describes have you
Hrvoje> implemented? The specbind() stuff in the source looks
Hrvoje> pretty much like the stuff Ben says we should have. The
Hrvoje> same goes for the massaged byte-code stuff, as well as for
Hrvoje> inlined unbind_to().
The QUIT macro is no longer called.
The arglist is only checked for proper-list-ness once.
The other optimizations are not implemented.
Looking at the code, I cannot believe that the specbind() optimization
is not implemented. Also, I cannot believe that unbind_to()
optimization is not implemented.
Perhaps, Hrvoje, you'd like to?
Why not?
Keep in mind that the average function has one or two arguments.
There is a danger that introducing too much machinery for argument
parsing will actually be counter-productive. For example, we could
store an arglist-parser function pointer in the compiled-function
object, but the function call overhead is likely to be as high as
parsing the arglist in the first place.
Not if the arglist is parsed only once, perhaps in
optimize_compiled_function(). All the other times, we have one
Lisp_Object pointer and two or three integers, and just use them. We
can even inline the call to Flist(), or whatever.
Another win from reorganizing the arglist is simply saving on cons
cells. (...)
Another good point, yes.
Hrvoje> BTW, how do you obtain gprof output for a single
benchmark?
You build with -pg, run temacs, not xemacs, and run the benchmark in a
loop a bazillion times.
Ugly ugly ugly... WIBN if gprof supported something like
gprof_start_recording() and gprof_stop_recording(), analogous to what
quantify has?
On Linux, where the source to gprof and libc are available, this might
even be implementable! Yay!