>>>> "OG" == Olivier Galibert
<galibert(a)pobox.com> writes:
OG> On Wed, Feb 16, 2000 at 03:42:25AM -0800, Kyle Jones wrote:
> Martin Buchholz writes:
> > Wouldn't we get a tremendous performance boost by having the
> > lrecord_implementations_table indices be constant? We would lose a
> > level of indirection on every FOOP, because it would not have to
> > examine lrecord_implementations_table itself.
>
> Yes, this is a great idea. I think the reason the changeover to
> minimal tagbits was a wash speedwise was because of increased
> time spent type checking. We got rid of the pointer masking and
> unmasking, but almost all the type checks got more expensive.
OG> We need a game plan.
OG> One one hand, we have the possibility of implementing the paginated
OG> allocator. Type testing will become "*(blah **)(ptr & MASK) ==
OG> constant)"[1]. This is mostly equivalent (one instruction difference)
OG> to the "*(byte *)(ptr) == constant" you'd like to have. A good
thing
OG> with such a paginated allocator is that fixed block types will go, and
OG> lrecord-lists will probably become useless. Bad thing is that it will
OG> be a huge change in the portable dumper. Also, I'm not sure how much
OG> the overhead will be, but I'm afraid it may overweight the space gain
OG> we would have. Also, we will have to kill all the flags.
No, the flags can also be placed at the beginning of the block.
We make a, say, 8k aligned block. At the beginning of the block, we
put a lisp object type identifier. Then we can have an array of
per-object flags. We can decide how many flags per object, but 8 is
probably good. We could choose to have the mark bits for all objects
together in a special mark-bit area, or all the per-object flags could
be put together into one byte. Since the mark-bits are only used at a
special time, it is more cache-friendly to isolate the mark-bits in
their own ghetto.
OG> On the other hand, we can decide not to implement the allocator. I
OG> can hack a way to get the data sharing back which will cost only
OG> ~40Kbytes of memory at run time without, I hope, changing the speed of
OG> marking/unmarking in a noticeable manner. We can also add a small
OG> program to lib-src that will look at all the
OG> DEFINE_*_LRECORD_*_IMPLEMENTATION and generate a big enum in a file
OG> out of them.
We should continue to think about the best way to do this, _BUT_ we
should delay making any such major changes until after 21.2 is
released. Let's try to get our current xemacs releasable. It's been
too long since the last release. I suggest we do the constant type
optimization for 21.2 since it's quite safe and relatively easy to
implement, and delay rewriting the allocation system yet again till
later.
OG> Also, what would you think of:
OG> - killing basic_p and free
I don't really understand the `free' bit yet.
`basic_p' certainly is confusing. I imagine the `c' in lcrecord
stands for `chained', so we should have `chained_p' instead of
`basic_p'.
OG> - splitting the symbol_value_magic type in multiple types, separating
OG> in particular forwarders to C from the rest
Be careful here. The key piece of code you want to be fast is this in
bytecode.c:
do_varref:
{
Lisp_Object symbol = constants_data[n];
Lisp_Object value = XSYMBOL (symbol)->value;
if (SYMBOL_VALUE_MAGIC_P (value))
value = Fsymbol_value (symbol);
PUSH (value);
break;
}
Make sure that SYMBOL_VALUE_MAGIC_P is as fast as possible, which
might be hard if magic values are multiple types. As I said before, I
think setting a bit in the symbol object itself is the way to go here
(and demoting magic values from Lisp_Object status). Allocated memory
that is never visible to the user, and always logically part of some
other object, should not be subject to gc. You want to keep the
objects examined by gc as small as possible. ... but after 21.2....
OG> [1] MASK=~((1<<PAGESIZE)-1), blah = lrecord_implementation *, constant =
&lrecord_<type>