Re: Reducing the number of 'lock|shot' packages

Thursday, 9 August 2001

        On 06 Aug 2001, Jens Lautenbacher yowled:
...
 On 05 Aug 2001 12:24:12 +0100, Nix wrote:
> The work of obsessive genius heavy-duty fontification regexp? Here it
> is:

 This is sick :-) But reading it a bit the comments in the file, I
 wonder if we shouldn't abandon the idea of using regexps for
 fontification completely. Sure, you can get away with 90% correct 
Absolutely. I've always thought that using regexps for fontification is
a bad idea; they cannot do a good job in the general case and I mean
*cannot*; the languages describable by regexps (FSMs) are a subset of
those that fontification is trying to describe (context-free or
pseudo-context-sensitive). As such, fontification schemes that depend
upon e.g. counting brackets correctly are not implementable in the
current scheme.

...
 result with some reasonable sized regexps, but to get the last 10%
 makes the regexps so complicated and slow, that maybe another aproach
 would be better. 
More, `another approach would work'.

...
 JDE, the emacs java environment, uses the semantic package to
construct
 a real parser to get to the semantic meaning of the code. Building such a
 parser with semantic seems to be a doable work if one has the BNF grammar
 of language in question. 
The ideal language-sensitive mode would use semantic to drive the
indentation *and* fontification; IIRC, this is one of semantic's
eventual design goals. The latest development version of semantic allows
the construction of non-rec-descent parsers, which is a good step; not
all languages can be conveniently described by rec-descent parsers
(although they're damned good for some languages, e.g. C++).

...
 Of course this would be slower than some easy regexps, but judging
from 
Slower; probably not. Regexps have little knowledge of the language
grammar, so getting them right often requires massive backtracking in
the regexp engine. A proper language grammar, on the other hand, can
often be backtrack-free (and for most common languages it is).

...
 jde, perfectly doable on a modern machine. While this overhead is
 bound, introducing more and more complex regexps will make fontlocking
 as slow as one wants to have it. 
There is no overhead. Compare the time taken by that massive regexp to
fontify one of my 2000-line C programs (440 seconds) with the time taken
by GCC-3.0 to parse it (0.9 seconds, according to -ftime-report). Both
engines are written in C; regex.c in the XEmacs core versus c-parse.[yc]
(and some other stuff) in GCC. The difference is that GCC has more
knowledge of the language, and that pure regexp engines are *very* bad
at parsing :)

-- 
`It's all about bossing computers around. Users have to say "please".
Programmers get to say "do what I want NOW or the hard disk gets it".'
                        -- Richard Heathfield on the nature of programming

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Reducing the number of 'lock|shot' packages

Re: Reducing the number of '*lock|shot*' packages

Re: Reducing the number of 'lock|shot' packages