(Font-lock) slowness and syntax properties

Monday, 11 March 2002

I have been looking back at the recent discussions about
slowness of font-lock and syntax properties and I
didn't see any conclusion being reached. If so I missed it and
this will hopefully still be a decent summary.

The evidence:

[Eparnaud]
   1. Load the TraverseSchema.java file from Xerces that Stefan 
      Eparnaud posted to the list in xemacs-vanilla
   2. M-x font-lock-fontify-buffer
   3. M-x goto-char 413514
   4. C-j

   This reindents a function argument spilled over to the next line.
   Om my box (with some minor improvements mentioned below):

   lookup-syntax-properties == t   :  13.5 seconds
   lookup-syntax-properties == nil :   1.4 seconds

[James]
   Jerry James reports slow font-locking of a file of his. From the
   profiles he posted

                           4             sys_re_match_2 <cycle 2> [1860]
                           6             scan_words <cycle 2> [1773]
                       22536             sys_re_search_2 <cycle 2> [162]
                       39182             find_end_of_comment <cycle 2> [179]
                      101547             scan_sexps_forward <cycle 2> [147]
                     14448488             re_match_2_internal <cycle 2> [8]
36.3    0.59    5.69 14611763         update_syntax_cache <cycle 2> [5]
        0.04    2.78  819566/819566      Fprevious_extent_change [11]
        0.05    2.67  819566/819566      Fnext_extent_change [12]
        0.09    0.00  819566/819616      Fsyntax_table_p [67]
        0.06    0.00 2431641/2441538     make_buffer [74]
                      819566             Fget_char_property <cycle 2> [64]

update_syntax_cache is called a whopping 14milion times from
re_match_2_internal. Presumably it is backtracking a regexp with
some syntax class matching in it.

update_syntax_cache basically
  - Computes the Qsyntax_table property using  Fget_char_property
  - Estimates the boundaries of the region where this table is valid
    using Fprevious_extent_change and Fnext_extent_change
    (note that this is a vast underestimate, most of the time this
     will be (point-min) and (point-max)).

I have got a patch that cleans up things a bit
  - removes some duplication from the UPDATE_SYNTAX_CACHE macros
  - Checks for the valid region in the macros avoid the function call
Some other optimizations that could be made
  - Use the lower level functions directly, just avoiding needless
    make_buffer calls (I have already removed most of those), etc
  - Most users either deal in Byteind's (regex.c) or have them readily
    available (everybody else uses BUF_FETCH_CHAR close by). So it
    would avoid a lot of useless conversion if the cache boundaries
    were byteind's.

However that is all just tweaking, too speed this up more fundamental
changes are needed. Any ideas?

We could:

1. make it a real cache, i.e. keep an array of (consecutive) positions
and their syntax table property values.

2. Look into speeding up the extent manipulation routines, maybe
providing more specialized version for the syntax cache, caching some
the soe data etc, same as is done for redisplay.

3. IIRC correctly syntax property extents cannot overlap, therefore
the boundaries of the extent where we found the current property
value are good values for the cache boundaries and much more likely to
be bigger (possibly even the whole buffer). It would be a pity because
it would make this not a real extent-property.

4. Maybe it is worth more aggressively searching for the real
boundary with a specialized extent walker. That could be slightly
slower but if it could reduce the 900000 calls to the update routines
to just a few by making the cache work better it would pay off
quickly.

Note that a simple

(defun walk-extents ()
  (interactive)
  (let ((pos (point))
	(no 0))
    (while (not (eq pos (point-max)))
      (setq pos (next-extent-change pos))
      (incf no))
    (message "%s" no)))

Walks TraverseSchema.java's 22000 extents in just 5ms!

Jan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

(Font-lock) slowness and syntax properties