Hrvoje had trouble believing that the Mule penalty is only 30%.
Obviously it depends on what you're doing.
There will be places in the sources where switching from non-mule to
mule will change an order-N algorithm to an order-N*N algorithm. I
fixed such an algorithm in casefiddle.c. I am sure there are others.
It is quite likely that using a fixed-width internal representation
for characters instead of the current variable-width one is the right
long-term strategy.
Here are some benchmark results:
pentiumpro
pgcc -mcpu=pentiumpro -march=pentiumpro -O6
-fno-risc -fno-peep-spills -fno-omit-frame-pointer -fno-exceptions
bytecomp font-lock
---------------------------------------
Latin-1 42.8 40.3
Mule 75.5 43.7
Latin-1-M 31.4 40.5
Mule-M 38.4 41.6
Ultrasparc, Sun cc -fast -xO5
bytecomp font-lock
---------------------------------------
Latin-1 31.0 30.9
Mule 51.8 32.5
Latin-1-M 21.7 31.2
Mule-M 25.9 34.3
Ultrasparc, egcs -O3 -mcpu=ultrasparc
bytecomp font-lock
---------------------------------------
Latin-1 31.7 26.5
Mule 55.5 28.0
Latin-1-M 22.0 25.7
Mule-M 25.7 27.5
The bytecomp benchmark is byte-compiling simple.el 20 times.
The font-lock benchmark is font-lock-fontify-buffer-ing redisplay.c 3 times.
The -M suffix means `hacked by Martin'.
Smaller numbers are better.
Only relative values count.
So there may be a significant Mule performance penalty, or maybe not,
depending on what you use xemacs for.
There are some interesting conclusions from the above:
- Sun cc wins on some benchmarks, egcs on others.
- The changes I have made to speed up bytecomp actually slow down
font-lock a little, but only when using Sun cc (??). It is
fortuitous that this accidental change worked this way, since we
ought to be targeting x86 and egcs as the architecture and compiler
that more of our users will be using.
- Benchmarking is hard.
I've been concentrating lately on speeding up the bytecomp benchmark,
since it is one of the purest tests of the lisp engine.
Martin