[Warning: This patch is wrong it does not handle modifying buffers
properly. But at least I wanted to show you the intermediate results
as I am working only very slowly.]
Bits of this probably belong -design... Anyway here is a little
history of the patch
Summary of the problem: In buffers with a lot of extents syntax
related operations such as font-locking with lot of regexps with
syntax codes in them or reindenting becomes unbearably slow.
The syntax cache maintains an interval saying where the cached value
is valid. It used to compute this with Fnext/Fprev_extent_change which
in heavily font-locked buffers gives quite a small range. In a the
typical case (syntax properties not used) the real range should stretch the
whole buffer!
This means that the cache can often not be used and needs to be
recomputed roughly 10000000 times during a single font-lock run for
Jerry James.
I have been using Stefan Epardaud's TraverseSchema.java example where
a single indent of 1 line costs about 15 seconds on my 950 Mhz athlon.
I think Java files hit this thing (and emacs in general) pretty hard
because they have so few things at top-level.
This change contains the following changes
1. Attempts to streamline cache usage
Made virtually no difference, but does clean-up the code.
In a Mule build there is potential for not using Bufpos as the
cache units.
2. When computing a new range value see whether it can be merged with
the old. Made only a little difference, the reindenting code jumps
around to irregurlarly.
3. A new routine to compute a very good estimate for the validity
range. For the typical case this gives the whole buffer as a range.
It still is quite quick. It currently isn't as quick as I hoped (i.e.
no more expensive than the old code). Unfortunately this change
make things slower in general because the syntax cache is thrown away
so often (thousands of times during a simple indent).
4. A quick hack to not throw away the cache so often. This still needs
some work, such as invalidating on the cache on modifications and
keeping a cache per buffer so that quick regexps on strings don't
trash it.
With 4. in place the changes in 3 finally start paying off. For
instance Stefan's indent case is now 1.5 seconds. An order of
magnitude faster.
Please do a few test-runs with this patch to see how it "feels".
It may also be advisable to do some work on extent-property storage to
make finding extents with a certain property much faster without
growing the structure even more.
src/ChangeLog addition:
2002-04-13 Jan Vroonhof <jan(a)xemacs.org>
* extents.c:
* extents.c (get_text_property_bytind):
* extents.c (get_text_property_1):
* extents.c (put_text_prop_mapper):
* extents.c (put_text_prop):
New 'range' argument in get_text_property_bytind.
* extents.c (get_text_propery_range):
New function, lookup a property and an estimate of its
validity range. Real work done in
* extents.h:
export get_text_propery_range
* syntax.c (update_syntax_cache):
Use get_text_propery_range to approximate the cached
region much better. Attempt to merge old and new cache.
* syntax.h:
* syntax.h (SYNTAX_CACHE_VALID):
* syntax.h (UPDATE_SYNTAX_CACHE_FORWARD):
* syntax.h (UPDATE_SYNTAX_CACHE):
* syntax.h (SETUP_SYNTAX_CACHE):
* syntax.h (SETUP_SYNTAX_CACHE_FOR_BUFFER):
* syntax.h (SETUP_SYNTAX_CACHE_FOR_OBJECT):
Common-up some macros.
Make the region be [prev,next) instead of (prev,next)
Handle lookup-syntax-properties == nil by creating a fake
cache that is always valid.
Q&D hack to only invalidate the cache on true buffer
switches. Does not handle modifications
XEmacs21.4 Patch (cvs -q diff -u):
Index: src/extents.c
===================================================================
RCS file: /local/xemacs/cvs/XEmacs/xemacs/src/extents.c,v
retrieving revision 1.31.2.1
diff -u -u -r1.31.2.1 extents.c
--- src/extents.c 16 Apr 2001 09:24:51 -0000 1.31.2.1
+++ src/extents.c 13 Apr 2002 17:24:44 -0000
@@ -6029,10 +6029,14 @@
Lisp_Object Qtext_prop;
Lisp_Object Qtext_prop_extent_paste_function;
+struct bytind_range {
+ Bytind beg, end;
+};
+
static Lisp_Object
get_text_property_bytind (Bytind position, Lisp_Object prop,
Lisp_Object object, enum extent_at_flag fl,
- int text_props_only)
+ int text_props_only, struct bytind_range *range)
{
Lisp_Object extent;
@@ -6048,16 +6052,116 @@
extent = extent_at_bytind (position, object, Qtext_prop, prior,
fl, 0);
if (NILP (extent))
- return Qnil;
+ break;
if (EQ (prop, Fextent_property (extent, Qtext_prop, Qnil)))
break;
prior = XEXTENT (extent);
}
}
+ if (range)
+ {
+ /* Unfortunately neither the general extent mapper (cannot
+ iterate backwards over endpoints), nor the find_*_of_run routines
+ (no "must-have-property" constraint and duplication of soe searching)
+ is ideal. So we roll our own. */
+ Memind beg, end, mempos;
+ Extent_List *bel = buffer_or_string_extent_list (object);
+ int pos; /* Position in extent list */
+
+ mempos = buffer_or_string_bytind_to_memind (object, position);
+
+ if (!NILP (extent))
+ {
+ beg = extent_start (XEXTENT(extent));
+ end = extent_end (XEXTENT(extent));
+ }
+ else
+ {
+ /* Simulate a virtual extent overlapping the whole buffer, that
+ has all properties. */
+ beg = buffer_or_string_bytind_to_memind(object,
+ buffer_or_string_accessible_begin_byte(object));
+ end = buffer_or_string_bytind_to_memind(object,
+ buffer_or_string_accessible_end_byte(object));
+ }
+
+ /* The endpoints
+ extent 'top' we used for the property-value, i.e. the "top-most" one
+ with the propery provides a zeroth-order approximation of the
+ validity range. As only an convervative estimate is required
+ we only have to worry if the range can be smaller.
+
+ N.B. For the purposes of the following discussion we assume
+ that all extents have the property set.
+
+ How can range.start be later that top.start?
+
+ 1. There is an extent != top
+ that ends before position but after top.start,
+ thus we have to consider all extents e such that e.end in
+ <top.start, position>. In particular we need only the last
+ one in that range in e-order.
+ 2. There is an extent != top that begings before position
+ but after top.start. However, if it would end after position IT would
+ have been the top extent, therefore it must end before position.
+ Therefore it part of the set defined in point 1. Moreover
+ because the end is always after the start it sufficient to
+ consider the end point only, which means that point 1 is sufficient.
+ */
+ if (bel && extent_list_num_els (bel) > 0)
+ {
+ for (pos = extent_list_locate_from_pos (bel, mempos-1, 1) ;
+ pos >= 0 && pos < extent_list_num_els (bel) ;
+ --pos)
+ {
+ EXTENT candidate;
+ Lisp_Object cobj;
+
+ candidate = extent_list_at(bel, pos, 1);
+ /* Only consider extents in the relevant interval */
+ if (extent_end (candidate) <= beg)
+ break;
+ XSETEXTENT(cobj,candidate);
+ if (!NILP (Fextent_property (cobj, prop, Qnil)))
+ {
+ beg = extent_end (candidate);
+ /* We only need the first hit */
+ break;
+ }
+ }
+
+ /* Similarly, we need only condider extents with start in
+ <position,top.end> to find candidates for range.end */
+
+ for (pos = extent_list_locate_from_pos (bel, mempos+1, 0) ;
+ pos < extent_list_num_els (bel) ;
+ ++pos)
+ {
+ EXTENT candidate;
+ Lisp_Object cobj;
+
+ candidate = extent_list_at(bel, pos, 0);
+ /* Only consider extents in the relevant interval */
+ if (extent_start (candidate) >= end)
+ break;
+ XSETEXTENT(cobj,candidate);
+ if (!NILP (Fextent_property (cobj, prop, Qnil)))
+ {
+ end = extent_start (candidate);
+ /* We only need the first hit */
+ break;
+ }
+ }
+ }
+
+ range->beg = buffer_or_string_memind_to_bytind(object,beg);
+ range->end = buffer_or_string_memind_to_bytind(object,end);
+ }
+
if (!NILP (extent))
return Fextent_property (extent, prop, Qnil);
- if (!NILP (Vdefault_text_properties))
+ if (!text_props_only && !NILP (Vdefault_text_properties))
return Fplist_get (Vdefault_text_properties, prop, Qnil);
return Qnil;
}
@@ -6092,13 +6196,32 @@
Lisp_Object val =
get_text_property_bytind (position, prop, object,
decode_extent_at_flag (at_flag),
- text_props_only);
+ text_props_only, NULL);
if (invert)
val = NILP (val) ? Qt : Qnil;
return val;
}
}
+Lisp_Object get_text_propery_range(Bufpos pos, Lisp_Object prop,
+ Lisp_Object object,
+ Bufpos *range_beg,Bufpos *range_end)
+{
+ Bytind position;
+ Lisp_Object val;
+ struct bytind_range range;
+
+ position = get_buffer_or_string_pos_byte (object, make_int(pos),
+ GB_NO_ERROR_IF_BAD);
+ val = get_text_property_bytind (position, prop, object, EXTENT_AT_AFTER,
+ 0, &range);
+ *range_beg = buffer_or_string_bytind_to_bufpos(object,range.beg);
+ *range_end = buffer_or_string_bytind_to_bufpos(object,range.end);
+ return val;
+}
+
+
+
DEFUN ("get-text-property", Fget_text_property, 2, 4, 0, /*
Return the value of the PROP property at the given position.
Optional arg OBJECT specifies the buffer or string to look in, and
@@ -6209,11 +6332,11 @@
set_extent_openness (e, new_start != e_start
? !NILP (get_text_property_bytind
(start, Qstart_open, object,
- EXTENT_AT_AFTER, 1)) : -1,
+ EXTENT_AT_AFTER, 1, NULL)) : -1,
new_end != e_end
? NILP (get_text_property_bytind
(end - 1, Qend_closed, object,
- EXTENT_AT_AFTER, 1))
+ EXTENT_AT_AFTER, 1, NULL))
: -1);
closure->changed_p = 1;
}
@@ -6290,7 +6413,7 @@
set_extent_endpoints (e, e_start, start, Qnil);
set_extent_openness (e, -1, NILP (get_text_property_bytind
(start - 1, Qend_closed, object,
- EXTENT_AT_AFTER, 1)));
+ EXTENT_AT_AFTER, 1, NULL)));
closure->changed_p = 1;
}
}
@@ -6304,7 +6427,7 @@
set_extent_endpoints (e, end, e_end, Qnil);
set_extent_openness (e, !NILP (get_text_property_bytind
(end, Qstart_open, object,
- EXTENT_AT_AFTER, 1)), -1);
+ EXTENT_AT_AFTER, 1, NULL)), -1);
closure->changed_p = 1;
}
}
@@ -6315,11 +6438,11 @@
set_extent_endpoints (e, e_start, start, Qnil);
set_extent_openness (e, -1, NILP (get_text_property_bytind
(start - 1, Qend_closed, object,
- EXTENT_AT_AFTER, 1)));
+ EXTENT_AT_AFTER, 1, NULL)));
set_extent_openness (copy_extent (e, end, e_end, extent_object (e)),
!NILP (get_text_property_bytind
(end, Qstart_open, object,
- EXTENT_AT_AFTER, 1)), -1);
+ EXTENT_AT_AFTER, 1, NULL)), -1);
closure->changed_p = 1;
}
@@ -6421,10 +6544,10 @@
set_extent_openness (XEXTENT (extent),
!NILP (get_text_property_bytind
(start, Qstart_open, object,
- EXTENT_AT_AFTER, 1)),
+ EXTENT_AT_AFTER, 1, NULL)),
NILP (get_text_property_bytind
(end - 1, Qend_closed, object,
- EXTENT_AT_AFTER, 1)));
+ EXTENT_AT_AFTER, 1, NULL)));
}
if (EQ (prop, Qstart_open) || EQ (prop, Qend_closed))
Index: src/extents.h
===================================================================
RCS file: /local/xemacs/cvs/XEmacs/xemacs/src/extents.h,v
retrieving revision 1.11
diff -u -u -r1.11 extents.h
--- src/extents.h 12 Apr 2001 18:23:43 -0000 1.11
+++ src/extents.h 5 Apr 2002 01:10:10 -0000
@@ -390,6 +390,10 @@
void set_extent_endpoints (EXTENT extent, Bytind s, Bytind e,
Lisp_Object object);
+Lisp_Object get_text_propery_range(Bufpos pos, Lisp_Object prop,
+ Lisp_Object object,
+ Bufpos *range_beg,Bufpos *range_end);
+
#ifdef ERROR_CHECK_EXTENTS
void sledgehammer_extent_check (Lisp_Object obj);
#endif
Index: src/syntax.c
===================================================================
RCS file: /local/xemacs/cvs/XEmacs/xemacs/src/syntax.c,v
retrieving revision 1.9.2.1
diff -u -u -r1.9.2.1 syntax.c
--- src/syntax.c 25 Jul 2001 07:45:37 -0000 1.9.2.1
+++ src/syntax.c 13 Apr 2002 17:50:58 -0000
@@ -267,69 +267,70 @@
syntax_cache.next_change = -1;
}
- if (pos > syntax_cache.prev_change &&
+ if (pos >= syntax_cache.prev_change &&
pos < syntax_cache.next_change)
{
/* do nothing */
}
else
{
- if (NILP (syntax_cache.object) || EQ (syntax_cache.object, Qt))
- {
- int get_change_before = pos + 1;
-
- tmp_table = Fget_char_property (make_int(pos), Qsyntax_table,
- make_buffer (syntax_cache.buffer), Qnil);
- syntax_cache.next_change =
- XINT (Fnext_extent_change (make_int (pos > 0 ? pos : 1),
- make_buffer (syntax_cache.buffer)));
-
- if (get_change_before < 1)
- get_change_before = 1;
- else if (get_change_before > BUF_ZV (syntax_cache.buffer))
- get_change_before = BUF_ZV (syntax_cache.buffer);
-
- syntax_cache.prev_change =
- XINT (Fprevious_extent_change (make_int (get_change_before),
- make_buffer (syntax_cache.buffer)));
- }
- else
- {
- int get_change_before = pos + 1;
-
- tmp_table = Fget_char_property (make_int(pos), Qsyntax_table,
- syntax_cache.object, Qnil);
- syntax_cache.next_change =
- XINT (Fnext_extent_change (make_int (pos >= 0 ? pos : 0),
- syntax_cache.object));
-
- if (get_change_before < 0)
- get_change_before = 0;
- else if (get_change_before > XSTRING_LENGTH(syntax_cache.object))
- get_change_before = XSTRING_LENGTH(syntax_cache.object);
-
- syntax_cache.prev_change =
- XINT (Fprevious_extent_change (make_int (pos >= 0 ? pos : 0),
- syntax_cache.object));
- }
+ /* First compute the new values for the cache */
+ struct syntax_cache new_cache;
+ /* fprintf(stderr, init ? "+" : "."); */
+
+ tmp_table = get_text_propery_range(pos, Qsyntax_table,
+ syntax_cache.object,
+ &new_cache.prev_change,
+ &new_cache.next_change);
if (EQ (Fsyntax_table_p (tmp_table), Qt))
{
- syntax_cache.use_code = 0;
- syntax_cache.current_syntax_table =
+ new_cache.use_code = 0;
+ new_cache.current_syntax_table =
XCHAR_TABLE (tmp_table)->mirror_table;
- }
+ }
else if (CONSP (tmp_table) && INTP (XCAR (tmp_table)))
{
- syntax_cache.use_code = 1;
- syntax_cache.syntax_code = XINT (XCAR(tmp_table));
+ new_cache.use_code = 1;
+ new_cache.syntax_code = XINT (XCAR(tmp_table));
}
else
{
- syntax_cache.use_code = 0;
- syntax_cache.current_syntax_table =
+ new_cache.use_code = 0;
+ new_cache.current_syntax_table =
syntax_cache.buffer->mirror_syntax_table;
}
+
+ /* Check whether we can get away with just extending the region,
+ i.e. if the data is the same and the validity region overlaps.
+ */
+ if ( new_cache.use_code == syntax_cache.use_code &&
+ ( new_cache.use_code
+ ? (new_cache.syntax_code == syntax_cache.syntax_code)
+ : EQ(new_cache.current_syntax_table,
+ syntax_cache.current_syntax_table))
+ &&
+ ( new_cache.next_change >= syntax_cache.prev_change
+ && new_cache.prev_change <= syntax_cache.next_change))
+ {
+ /* Just extend the region. This will make the approximate region
+ grow to the true validity region, in the case where we
+ are repeatedly accessing just beyong its borders, such as
+ during regexp matching. */
+ if ( new_cache.next_change > syntax_cache.next_change)
+ syntax_cache.next_change = new_cache.next_change;
+ if ( new_cache.prev_change < syntax_cache.prev_change)
+ syntax_cache.prev_change = new_cache.prev_change;
+ }
+ else
+ {
+ syntax_cache.use_code = new_cache.use_code;
+ syntax_cache.current_syntax_table = new_cache.current_syntax_table;
+ syntax_cache.syntax_code = new_cache.syntax_code;
+ syntax_cache.next_change = new_cache.next_change;
+ syntax_cache.prev_change = new_cache.prev_change;
+ }
+
}
}
Index: src/syntax.h
===================================================================
RCS file: /local/xemacs/cvs/XEmacs/xemacs/src/syntax.h,v
retrieving revision 1.5
diff -u -u -r1.5 syntax.h
--- src/syntax.h 12 Apr 2001 18:24:21 -0000 1.5
+++ src/syntax.h 13 Apr 2002 17:43:15 -0000
@@ -290,25 +290,30 @@
void update_syntax_cache (int pos, int count, int init);
+#define SYNTAX_CACHE_VALID(pos) \
+ (((pos) >= syntax_cache.prev_change) && ((pos) < syntax_cache.next_change))
+
+#define UPDATE_SYNTAX_CACHE_INNER(pos,direction) \
+ do { \
+ Bufpos mypos = (pos); \
+ \
+ if (!SYNTAX_CACHE_VALID(mypos)) \
+ update_syntax_cache ((mypos), (direction), 0); \
+ } while (0)
+
/* Make syntax cache state good for CHARPOS, assuming it is
currently good for a position before CHARPOS. */
#define UPDATE_SYNTAX_CACHE_FORWARD(pos) \
- (lookup_syntax_properties \
- ? (update_syntax_cache ((pos), 1, 0), 1) \
-: 0)
-
+ UPDATE_SYNTAX_CACHE_INNER(pos,1)
+
/* Make syntax cache state good for CHARPOS, assuming it is
currently good for a position after CHARPOS. */
#define UPDATE_SYNTAX_CACHE_BACKWARD(pos) \
- (lookup_syntax_properties \
- ? (update_syntax_cache ((pos), -1, 0), 1) \
-: 0)
+ UPDATE_SYNTAX_CACHE_INNER(pos,-1)
/* Make syntax cache state good for CHARPOS */
#define UPDATE_SYNTAX_CACHE(pos) \
- (lookup_syntax_properties \
- ? (update_syntax_cache ((pos), 0, 0), 1) \
-: 0)
+ UPDATE_SYNTAX_CACHE_INNER(pos,0)
#define SYNTAX_FROM_CACHE(table, c) \
SYNTAX_FROM_CODE (SYNTAX_CODE_FROM_CACHE (table, c))
@@ -357,40 +362,19 @@
#endif /* emacs */
#define SETUP_SYNTAX_CACHE(FROM, COUNT) \
- do { \
- syntax_cache.buffer = current_buffer; \
- syntax_cache.object = Qnil; \
- syntax_cache.current_syntax_table \
- = current_buffer->mirror_syntax_table; \
- syntax_cache.use_code = 0; \
- if (lookup_syntax_properties) \
- update_syntax_cache ((COUNT) > 0 ? (FROM) : (FROM) - 1, \
- (COUNT), 1); \
- } while (0)
+ SETUP_SYNTAX_CACHE_FOR_BUFFER(current_buffer, (FROM), (COUNT))
#define SETUP_SYNTAX_CACHE_FOR_BUFFER(BUFFER, FROM, COUNT) \
- do { \
- syntax_cache.buffer = (BUFFER); \
- syntax_cache.object = Qnil; \
- syntax_cache.current_syntax_table = \
- syntax_cache.buffer->mirror_syntax_table; \
- syntax_cache.use_code = 0; \
- if (lookup_syntax_properties) \
- update_syntax_cache ((FROM) + ((COUNT) > 0 ? 0 : -1), \
- (COUNT), 1); \
- } while (0)
+ SETUP_SYNTAX_CACHE_FOR_OBJECT(Qnil, (BUFFER), (FROM), (COUNT))
#define SETUP_SYNTAX_CACHE_FOR_OBJECT(OBJECT, BUFFER, FROM, COUNT) \
do { \
+ Lisp_Object oldobject = syntax_cache.object; \
syntax_cache.buffer = (BUFFER); \
syntax_cache.object = (OBJECT); \
- if (NILP (syntax_cache.object)) \
+ if (NILP (syntax_cache.object) || EQ (syntax_cache.object, Qt)) \
{ \
- /* do nothing */; \
- } \
- else if (EQ (syntax_cache.object, Qt)) \
- { \
- /* do nothing */; \
+ syntax_cache.object = make_buffer(syntax_cache.buffer); \
} \
else if (STRINGP (syntax_cache.object)) \
{ \
@@ -405,12 +389,20 @@
/* OBJECT must be buffer/string/t/nil */ \
assert(0); \
} \
+ if (EQ(oldobject,syntax_cache.object)) \
+ break; \
syntax_cache.current_syntax_table \
= syntax_cache.buffer->mirror_syntax_table; \
syntax_cache.use_code = 0; \
if (lookup_syntax_properties) \
update_syntax_cache ((FROM) + ((COUNT) > 0 ? 0 : -1), \
(COUNT), 1); \
+ else \
+ { \
+ /* Initial value always valid */ \
+ syntax_cache.prev_change = -1; \
+ syntax_cache.next_change = EMACS_INT_MAX; \
+ } \
} while (0)
#define SYNTAX_CODE_PREFIX(c) \