>>>> "CW" == Christoph Wedler
<wedler(a)fmi.uni-passau.de> writes:
>>>>> "SJT" == Stephen J Turnbull
<turnbull(a)sk.tsukuba.ac.jp>
>>>>> writes:
>>>> "Hrvoje" == Hrvoje Niksic <hniksic(a)srce.hr> writes:
Hrvoje> Jan Vroonhof <vroonhof(a)math.ethz.ch> writes:
>>> Why is font-lock using text-properties anyway? Isn't
what it
>>> does much more naturally mapped on extents.
Hrvoje> Not really (although it may look that way at the start).
Hrvoje> The font-lock code wants to apply certain properties to
Hrvoje> parts of the buffer, merging or breaking the extents as
Hrvoje> needed. This is exactly what the text-properties are for.
SJT> Could you be more specific? The documentation says something
SJT> quite different: text-properties are for attaching properties
SJT> to characters rather than intervals.
CW> That's exactly what font-lock is supposed to do: attaching
CW> faces to characters matched by specific (parts of) font-lock
CW> keywords.
My point, and I believe Jan's, is that the part of a keyword that a
character matches is the character itself. We don't want characters
that match characters in keywords fontified, we want _intervals_ that
match _whole_ keywords (modulo leading and trailing context)
fontified. Which sounds a lot like extents to me, and to Jan.
SJT> That seems like exactly the _wrong_ thing to do with strings
SJT> and comments, both of which should be locked up in an
SJT> unbreakable (from the point of view of font-lock) extent once
SJT> identified;
CW> There are not such things like unbreakable extents in
CW> font-lock: parts of previously fontified characters can get
For my preferences, not in a string or comment. Why aren't there
unbreakable extents?
CW> additional/new face properties, etc. In fact, two extents are
CW> "glumped" if appropriate, e.g., for a TeX sequence like
CW> \alpha\beta.
Sure. I wasn't talking about things like that, I was talking about
strings and comments, which are syntactically single units but can
easily contain whole instances of other keywords. For that reason
parsing them again could be expensive. Why not just skip them?
Extents, not character properties, are the right way to do that.
I don't know that this observation can be translated into an
optimization of the fontification algorithm, but I would like it
explained why looking at characters one-by-one is a better match to
the majority of fontification operations.
SJT> the individual characters in them should never be considered
SJT> "locally".
CW> ???
Ie, I personally do not (in C) want the word "for" in
/* This comment is for example. */
fontified as a C keyword, although I can imagine someone who might
(because they would want the "for" in
/* Comment explaining a for loop */
fontified as a keyword), and I can't even imagine anyone who would
want that same word fontified as a C keyword in
printf("This output is for example.");
Given that, I don't see why fontlock should _ever again_ look inside
those extents unless they are edited. Ie, from fontlock's point of
view they should be considered atomic. "Been there, done that, OK,
let's hurry on to the next task."
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +1 (298) 53-5091