New subject: 10% redisplay speed up...

Monday, 22 November 1999

        Ben Wing <ben(a)666.com&gt; writes:

...
 Why don't you post exactly the steps you took in order to get
the
 profiling output, as well as the relevant part of the output itself?
 I think people are scared off of profiling because XEmacs is big and
 does nonstandard things with its executable, and they don't want to
 figure out how to do it. 
I had originally planned to do make a --enable-profiling configure
option. Because now you have to hand edit the makefile to make sure
the right startup code gets linked in.

Maybe I should indeed just do a Q&D README becuase the pdumper will
make all linking magic unnecessary hopefully.

...
 I really really think you should implement your comment about
 memcmp(). Do something like this: 
It is on the todo list for one of those long winter nights
(Unfortunately, "Write thesis" is also on that list).

Let me take this opportunity to discuss some things.

grprof data from doing (hanoi 6) 100-times

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 19.87    128.23   128.23  6977571     0.02     0.03  create_text_block
 16.35    233.71   105.48 577884641     0.00     0.00  add_emchar_rune
 15.36    332.81    99.10 543668585     0.00     0.00  compare_runes
  6.99    377.94    45.13 544418806     0.00     0.00  display_table_entry
  3.29    399.16    21.22  6977832     0.00     0.00  compare_display_blocks
  2.89    417.78    18.62   715922     0.03     0.04  generate_fstring_runes
  1.76    429.15    11.37  6978119     0.00     0.00  output_display_line
  1.44    438.44     9.29 14864431     0.00     0.00  specifier_instance
  1.36    447.23     8.79 23712557     0.00     0.00  specifier_instance_from_inst_list
  1.06    454.08     6.85 44087313     0.00     0.00  find_symbol_value_1

Notes
1.  add_emchar_rune and  compare_runes are called approximately equal
    numbers of times. So we must take care not to add too much
    overhead to the one to optimize the other.
2. The self_seconds data seems fairly inaccurate and/or I am
   interpreting the data wrong. Here is the callgraph data for
   add_emchar_rune

                             33107464             generate_fstring_runes <cycle 1>
[16]
                             544777177             create_text_block <cycle 1> [5]
[8]     16.3  105.48    0.00 577884641         add_emchar_rune <cycle 1> [8]
                0.00    0.00     362/980266      Dynarr_resize [173]
                             9125117             ensure_face_cachel_contains_charset
<cycle 1> [86]

Note how its children takeup almost no time, so where did the CPU time
go... 

Now to completely differnt data. This is from gcov (compile using
--profile-arcs). This was a normal news reading and writing session
using Gnus.

This is in point_in_line_start_cache from redisplay.c

        1563      if (!Dynarr_length (cache) || line_start_cache_start (w) > point
branch 0 taken = 37%
call 1 returns = 100%
branch 2 taken = 1%
call 3 returns = 100%
branch 4 taken = 0%
branch 5 taken = 100%
		      || line_start_cache_end (w) < point)
		    {
         589          int loop;

Notice that in almost 40% of the cases, there was no cache present.
Note also the callcount of the function and compare that with the
callcount in redisplay_window.

		      /* Don't pollute the cache if not sure if we are correct */
        8843          if (w->start_at_line_beg)
branch 0 taken = 0%
        8842    	update_line_start_cache (w, start, end, pointm, 1);
call 0 returns = 100%

Is the line_start_cache being ineffective?

Jan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: 10% redisplay speed up...