I think you should still put in that configure option, but make it take a flag, e.g.
--enable-profiling=gprof, because different code is needed for different profiling systems
(Quantify,
Sun's Analyzer, etc.). I don't trust claims at all about pdumper making the need
for this to go away
-- not in the general case, certainly.
Jan Vroonhof wrote:
Ben Wing <ben(a)666.com> writes:
> Why don't you post exactly the steps you took in order to get the
> profiling output, as well as the relevant part of the output itself?
> I think people are scared off of profiling because XEmacs is big and
> does nonstandard things with its executable, and they don't want to
> figure out how to do it.
I had originally planned to do make a --enable-profiling configure
option. Because now you have to hand edit the makefile to make sure
the right startup code gets linked in.
Maybe I should indeed just do a Q&D README becuase the pdumper will
make all linking magic unnecessary hopefully.
> I really really think you should implement your comment about
> memcmp(). Do something like this:
It is on the todo list for one of those long winter nights
(Unfortunately, "Write thesis" is also on that list).
Let me take this opportunity to discuss some things.
grprof data from doing (hanoi 6) 100-times
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
19.87 128.23 128.23 6977571 0.02 0.03 create_text_block
16.35 233.71 105.48 577884641 0.00 0.00 add_emchar_rune
15.36 332.81 99.10 543668585 0.00 0.00 compare_runes
6.99 377.94 45.13 544418806 0.00 0.00 display_table_entry
3.29 399.16 21.22 6977832 0.00 0.00 compare_display_blocks
2.89 417.78 18.62 715922 0.03 0.04 generate_fstring_runes
1.76 429.15 11.37 6978119 0.00 0.00 output_display_line
1.44 438.44 9.29 14864431 0.00 0.00 specifier_instance
1.36 447.23 8.79 23712557 0.00 0.00 specifier_instance_from_inst_list
1.06 454.08 6.85 44087313 0.00 0.00 find_symbol_value_1
Notes
1. add_emchar_rune and compare_runes are called approximately equal
numbers of times. So we must take care not to add too much
overhead to the one to optimize the other.
2. The self_seconds data seems fairly inaccurate and/or I am
interpreting the data wrong. Here is the callgraph data for
add_emchar_rune
33107464 generate_fstring_runes <cycle 1>
[16]
544777177 create_text_block <cycle 1> [5]
[8] 16.3 105.48 0.00 577884641 add_emchar_rune <cycle 1> [8]
0.00 0.00 362/980266 Dynarr_resize [173]
9125117 ensure_face_cachel_contains_charset
<cycle 1> [86]
Note how its children takeup almost no time, so where did the CPU time
go...
Now to completely differnt data. This is from gcov (compile using
--profile-arcs). This was a normal news reading and writing session
using Gnus.
This is in point_in_line_start_cache from redisplay.c
1563 if (!Dynarr_length (cache) || line_start_cache_start (w) > point
branch 0 taken = 37%
call 1 returns = 100%
branch 2 taken = 1%
call 3 returns = 100%
branch 4 taken = 0%
branch 5 taken = 100%
|| line_start_cache_end (w) < point)
{
589 int loop;
Notice that in almost 40% of the cases, there was no cache present.
Note also the callcount of the function and compare that with the
callcount in redisplay_window.
/* Don't pollute the cache if not sure if we are correct */
8843 if (w->start_at_line_beg)
branch 0 taken = 0%
8842 update_line_start_cache (w, start, end, pointm, 1);
call 0 returns = 100%
Is the line_start_cache being ineffective?
Jan
--
ben
--
In order to save my hands, I am cutting back on my responses, especially to XEmacs-related
mail. You
_will_ get a response, but please be patient. If you need an immediate response and itÂ’s
not apparent
in
your message, please say so. Thanks for your understanding.