Re: Unbalanced parentheses

Wednesday, 7 December 2005

        ...
>>>> "Zajcev" == Zajcev Evgeny
<zevlg(a)yandex.ru&gt; writes: 
    Zajcev> Ok, in modes where syntax for `"' is "(string quote) -
    Zajcev> C-M-u gives error, but why?

...
>>>> "sjt" == Stephen J Turnbull
<stephen(a)xemacs.org&gt; writes: 
    sjt> Because scanlists is a horrific hack.  The problem is that
    sjt> basically the only way to be sure whether you're inside or
    sjt> outside of a string is to parse forward from the beginning of
    sjt> the buffer.

And scan_lists (the internal function that handles all the parsing)
doesn't even try.  It simply assumes you've just moved on to a string
delimiter from outside, and skips to the next one.  So C-M-u has the
following effects, where < and > denote beginning and end of buffer
respectively, and ! is point:

<xxxxxxx ("abcd!efgh")> ==> unbalanced parentheses    # original
report
<"xxxxx" ("abcd!efgh")> ==> unbalanced parentheses    # extra
" doesn't help
<"x ( x" ("abcd!efgh")> ==> <"x !( x"
("abcdefgh")>   # dives into string!

For now the answer has to be "don't use C-M-u inside a string." :-(

Heuristics are not going to help much.  <(" word ")> is obviously a
list containing a single space-padded word, but there's no general
heuristic to distinguish that from <(concat "func (" arg
");")>.

I suspect that at some point in the distant past (cvs blame seems to
say this hasn't changed in any significant respect since 21.2 at
least, but of course cvs blame doesn't tell you about _deletions_),
scan_lists did check that precondition.  Then the code got eliminated
because it's expensive and looked like a no-op.

I think that the way to handle this is to always parse forward from a
point known to be outside of a comment or string, and thus determine
whether you are now inside a string (so you zip to the matching
delimited) or outside (and resume list-oriented parsing).  This can be
made pretty efficient by caching known places.  Note that because this
kind of parsing always goes forward, you can always postpone parsing
forward past where you are until you need it at no loss.  Then any
time a relevant text change takes place, you invalidate the cache past
the point of the text change.

This strategy can be refined in stages:

1.  No cache---parse comments and strings from the beginning every time.

2.  Cache and invalidate after point on every insertion or deletion.

3.  Cache and invalidate after point on insertion or deletion of a delimiter.

I need to go to sleep, but anybody who wants to look at how GNU Emacs
does this would be a hero.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Unbalanced parentheses