>>>> "Marcus" == Marcus Crestani
<crestani(a)informatik.uni-tuebingen.de> writes:
Marcus> Thanks for your
patch, Jan. A few comments now, I'll take a
Marcus> closer look later.
>>>>> "JR" == Jan Rychter
<jan(a)rychter.com> writes:
JR> 3. The gc_checking_assert in get_mark_bit
really hurts. I'd suggest
JR> that
JR> it be removed as soon as we know it doesn't get hit under normal
JR> circumstances.
Marcus> It should stay in for development builds. When you profile, it
Marcus> would be much better to recompile XEmacs with
Marcus> `--disable-error-checking'. This removes a lot of time
Marcus> overhead (it also removes the costly gc_checking_asserts) and
Marcus> brings better results.
It was a build without error checking. Or at least that's what I
thought. I was actually surprised to find that this assert did not go
away in builds without error checking and I thought this was a mistake.
I'll go back and redo the builds and the profiling over the weekend, to
double check this.
Marcus> How big is the performace benefit of your patch in a
Marcus> `--disable-error-checking' build?
JR> It will only work for one mark bit per object. Do you actually plan
JR> to use more? We will likely lose a lot of the efficiency if we
JR> conditionalize this.
Marcus> The incremental garbage collector needs two mark bits, so we
Marcus> have to move the focus on finding optimizations for the two-bit
Marcus> case.
Ok, in that case I'll have to rework the whole scheme
significantly. Does this mean I can stick to optimizing the two-bit
case, or will we need to support a different number of bits soon?
JR> After the optimizations, the worst performance loss occurs in
JR> kkcc_marking, and that's because of pipeline stalls due to
JR> mispredicted branches. I don't know what we could do about this,
JR> other than rewrite the switch as a series of conditionals ordered
JR> so that branch prediction can do a better job.
Marcus> My todo list contains the following: A basic re-design of the
Marcus> object layout by separating pointer-containing cells from
Marcus> no-pointer opaque data would speed up the mark phase. This
Marcus> way, the traversal has no need to parse a object, it can assume
Marcus> that every cell in the pointer part of the object has to be
Marcus> examined. So no switch is needed at all.
That would be a big improvement indeed, especially on architectures with
long pipelines (P4).
--J.