Kyle Jones <kyle_jones(a)wonderworks.com> writes:
I'm not sure why you would expect memcmp to be faster.
Because the current code is basically an unrolled memory compare loop.
I think the compiler can do better on a real memcmp[1] and in addition the
function becomes simple enough to inline.
I checked that with my changes, compare_runes is indeed inlined.
bits that the old inlined code did. If there's a machine
instruction
that lets you do memcmp in an instruction or two, you will save some
time, but given chip speeds today, most of the time is probably
spent in the memory fetches. You can't get around the fetches.
[When gcc uses its builtin memcmp it uses a very tight loop]
I think I was hoping since that the fetches are basically linear through
memory (the inner loop in redisplay output is comparing two arrays)
the prefetching would help.
Maybe we can force the compiler to inline compare_runes some other way.
(on a machine where memcmp is a function call the memcmp trick doesn't
work. It just replaces one function call with another).
Jan
Footnotes:
[1] i.e. I am betting the compiler use a builtin memcmp. However
looking at the sparc output it seems that gcc doesn't always do that.