Stephen J. Turnbull wrote:
>>>>>"Zajcev" == Zajcev Evgeny
<zevlg(a)yandex.ru> writes:
>>>>>
>>>>>
Zajcev> Ok, in modes where syntax for `"' is "(string quote) -
Zajcev> C-M-u gives error, but why?
>>>>>"sjt" == Stephen J Turnbull <stephen(a)xemacs.org>
writes:
>>>>>
>>>>>
sjt> Because scanlists is a horrific hack. The problem is that
sjt> basically the only way to be sure whether you're inside or
sjt> outside of a string is to parse forward from the beginning of
sjt> the buffer.
And scan_lists (the internal function that handles all the parsing)
doesn't even try. It simply assumes you've just moved on to a string
delimiter from outside, and skips to the next one. So C-M-u has the
following effects, where < and > denote beginning and end of buffer
respectively, and ! is point:
<xxxxxxx ("abcd!efgh")> ==> unbalanced parentheses # original
report
<"xxxxx" ("abcd!efgh")> ==> unbalanced parentheses # extra
" doesn't help
<"x ( x" ("abcd!efgh")> ==> <"x !( x"
("abcdefgh")> # dives into string!
For now the answer has to be "don't use C-M-u inside a string." :-(
Heuristics are not going to help much. <(" word ")> is obviously a
list containing a single space-padded word, but there's no general
heuristic to distinguish that from <(concat "func (" arg
");")>.
I suspect that at some point in the distant past (cvs blame seems to
say this hasn't changed in any significant respect since 21.2 at
least, but of course cvs blame doesn't tell you about _deletions_),
scan_lists did check that precondition. Then the code got eliminated
because it's expensive and looked like a no-op.
I think that the way to handle this is to always parse forward from a
point known to be outside of a comment or string, and thus determine
whether you are now inside a string (so you zip to the matching
delimited) or outside (and resume list-oriented parsing). This can be
made pretty efficient by caching known places. Note that because this
kind of parsing always goes forward, you can always postpone parsing
forward past where you are until you need it at no loss. Then any
time a relevant text change takes place, you invalidate the cache past
the point of the text change.
This strategy can be refined in stages:
1. No cache---parse comments and strings from the beginning every time.
2. Cache and invalidate after point on every insertion or deletion.
3. Cache and invalidate after point on insertion or deletion of a delimiter.
I need to go to sleep, but anybody who wants to look at how GNU Emacs
does this would be a hero.
dammit, why hasn't anyone written C code to properly maintain syntactic
context for the entire buffer? this cannot be that difficult to write,
nor overly compute-intensive given modern processors. emacs is *way*
behind the times here; i'm positive that most or all other editors with
syntax highlighting do this correctly. we can fall back on heuristics
when the buffer gets too big, if necessary.
no cache of known locations should be needed (since we know everything);
nor, with a bit of cleverness, should it be necessary to invalidate
everything after an insertion.
i bet that stuff like cc-mode would get *much* faster if we had this
available; these modes do all sorts of crap to compute their own
context, and way slow, since they're in lisp.
i'm also positive that this is a problem with well-known, published
solutions; in fact, i wouldn't be at all surprised if there is freely
available code to do this.
ben