Re: Speed difference in Gnus

Thursday, 20 October 2005

        Stephen J. Turnbull wrote:

...
>>>>>"Ben" == Ben Wing <ben(a)xemacs.org&gt;
writes:
>>>>>            
>>>>>

    Ben> Stephen J. Turnbull wrote:

    >> It's really ugly to have a buffer property that's actually a
    >> text property just because performance sucks when an extent
    >> covers the whole buffer.

I assume the above is what you're talking about?

 yes.

...
    Ben> what do you mean?  as i said before, i have mostly
finished
    Ben> code that will make text properties efficient by implementing
    Ben> them entirely in non-overlapping intervals, as fsf does.

What I mean is that there are a number of things that we do as buffer
attributes or buffer-local variables (language environment, coded
character set, major mode for starters) that conceptually are text
properties.  Think about it: wouldn't it be cool if instead of needing
a hack like mmm, we simply attached LISP-major-mode to an extent
covering the buffer, string-major-mode to strings, and
comment-major-mode to comments?  Wouldn't it be cool if we could
attach xml-mode to individual strings in xml.el or use html-mode for
strings by default in blog-mode.el?

    Ben> in general this is a very tough problem, and using simpler
    Ben> solutions when possible (e.g. non-overlapping text
    Ben> properties).

Whatever it is it has to be part of the implementation, not a
constraint on user code.  (I'm pretty sure you agree, but it's not
clear to me from that sentence.)

 yes, i do.  once i get the new text property implementation down, then 
any program that only uses text properties and not directly uses extents 
or overlays, or which only uses extents/overlays covering small sections 
of the buffer, will be fast.  however, in the long run that's not 
enough; we'd like for extents to be fast, period, and not make the user 
go through the text property interface. (the reason why this is harder 
is that the text property interface is higher-level and doesn't 
explicitly give you a concept of an entity tied to a particular range.) 
if people normally used text properties, and tended to only use a small 
number of extents, then we might consider as a stop-gap measure 
separating the two internally into a sort of parallel structure -- this 
is much like what FSF does, with its overlays, i think.  however, it 
might turn out to be not much easier to implement that than to just move 
entirely to the "correct" implementation of extents in terms of 
non-overlapping intervals.

...
    Ben> it might be possible to simulate extents in general using
    Ben> non-overlapping intervals, where multiple intervals could
    Ben> represent a single extent and each interval contains parts of
    Ben> all extents crossing over them.  this interval would be
    Ben> guaranteed efficient and not too hard to implement, i think
    Ben> -- in fact it's the same algorithm i'd be using for
    Ben> text-properties, and could be extended to extents in general.

Something like this?  A linked list P of (buffer position, extent
list) pairs, with an intervals defined as [P[n].pos,P[n+1].pos) for
each n.  An extent is a pair of pointers into P defining which
fragments are covered by the extent plus a property list.  P[n].ext is
the list of extents covering that interval.  Then for random access,
finding n is O(lg n), and finding the property is O(|P[n].ext|), which
is the best you can get, I guess.  (For many operations, in particular
redisplay, which walk P in order, finding n is O(1), right?)  P itself
only needs to be updated when text is inserted/deleted in the buffer,
and when an extent is created/deleted.  |P| <= |# boundary points of
all extents|, so that is no worse than what we currently have, and we
never have to update the extents themselves.

this sounds about right, although i'm a little confused about what 
"finding n" means, and finding anything in a linked list is O(n), not 
O(lg n).  we maintain an ordered gap array (or whatever) of 
non-overlapping intervals as defined by the extents, where each interval 
is the maximum stretch not crossed by an extent endpoint.  Each extent 
contains a pointer to the first and last interval covered by the 
extent.  most likely, these will be indices into the gap array, and 
hence need to be adjusted as we do gap movement in the array; this is 
just like what happens currently with the endpoints of an extent in the 
buffer.  we can avoid this update if the gap array, rather than 
containing the interval structures themselves, contains pointers to 
heap-allocated interval objects, and each such object contains a forward 
and probably backward pointers to the next/previous object, so that 
essentially the intervals are in a combination linked-list and array.  
the array, btw,  is necessary so that we can locate the interval 
surrounding a particular position in O(lg n) time.  it's not possible to 
do this in a pure linked list; you'd get O(n) instead.  another data 
structure would involve having the intervals be objects with pointers 
back to the array position they came from.  doing gap movement in the 
array requires updating the intervals rather than the extents; there 
could potentially be significant savings if there are a whole lot of 
overlapping extents but in practice this seems unlikely.  in reality i'm 
somewhat loath to do any of these alternative schemes because it would 
add a lot of space overhead, would fragment the memory, etc.  in 
practice, i think that extents are modified much less often than they 
are accessed by redisplay, and you only get O(n) behavior when inserting 
or deleting extents when the gap is far away from where you previously 
were.  but i'm certainly willing to be proven wrong by appropriate 
profiling data.

the crucial thing, then, is that each interval contains a some sort of 
list of all the extents overlapping that interval.  this list will 
probably be small; so either a linked list or a Stynarr could be used. 
(A Stynarr is a new struct that i invented in my unicode-internal ws.  
it is based on existing face-cachel code.  the idea is that it uses a 
fixed array of small size (e.g. 4 or 6) that's built into the object and 
pointer to a dynarr, which is used to handle overflow.  it's useful when 
you need to maintain a list of objects and handle an arbitrary # of 
objects on the list, but where in practice it very rarely exceeds a 
small # and would like to avoid lots and lots of heap allocation.)

we might still have to maintain the display-order and e-order for extent 
endpoints, like we currently do; i'll have to look into the code, but i 
bet there will still be places that need to look stuff up by endpoints, 
and it may not be too easy to derive this from the interval data.

what do you mean "we never have to update the extents themselves"?

...
    Ben> but first we need some more profiling data.

Why?  Basically in your explanation of how the soe works, you are
saying that large extents (not just extents covering the whole buffer,
but any extent that overlaps many other extents) are very expensive
because they make _all_ the overlapped extents inefficient. 
 why, because i have limited time and i need to schedule things according 
to how badly they slow things down.  i see that my attempts to speed up 
byte<->char conversion actually slowed things down significantly for 
Raymond, although they indeed sped things up in my profiling for 
whatever i was running into problems with.  i think it's because i 
deleted the "known range" cache that was the first line of attack.  i 
did this because fsf didn't have any such thing and seemed to be doing 
fine; but they may have a lot more code internally that operates in byte 
positions rather than char positions [or even in pairs of the two; that 
seems common].  so i'm going to look into putting them back.

meanwhile, i'm mostly done with the rewrite of char tables for 
unicode-internal.  there isn't much coding left before i will start 
compiling; but there's definitely a good deal of work to be done.  for 
one, the font handling is still by charset, meaning that unicode chars 
for which there's no charset won't be displayable!  yuck.  that requires 
some significant work.  for another char-classes haven't been 
implemented yet.  and i don't know what the hell to do with ccl; 
hopefully it really won't be necessary because i've expanded the notion 
of charset to include any rectangular subset of 256x256 (so we can 
properly have a windows-1252 charset, e.g.) and i'm planning on creating 
a general mbcs coding system to replace the ad-hoc shift-jis and big5 
coding systems; it could also be used for windows-125* coding systems 
plus lots of simple coding systems (e.g. iso-8859-1) that are currently 
handled using iso2022.

questions for you, stephen:

[1] how are unicode fonts handled under X windows?

[2] now that i look more at ccl, i see that it's utterly tied to the old mule
encoding, as it expects to look directly at the internal encoding of the text.  it would
have to be radically restructured for it to work on a character-by-character level, not
byte-by-byte.  do you think it's reasonable to just make ccl be an optional feature
that is compiled in only with old mule?

...
 So that
means that code like

...
(let ((header (make-extent (message-beginning) (header-end)))
      (body (make-extent (header-end) (message-end))))
  (font-lock-fontify-buffer))

necessarily sucks.  If I'm not totally missing the point, I have to
ask: Do you consider that restriction acceptable?

 no, not in the long run.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Speed difference in Gnus