-----Original Message-----
From: Stephen J. Turnbull [mailto:stephen@xemacs.org]
Sent: Friday, February 04, 2005 11:47 PM
To: ben(a)xemacs.org
Cc: Albrecht DreX; XEmacs Beta
Subject: Re: [Bug: 21.4.15] Stack overflow in regexp matcher
>>>>> "APA" == Adrian Aichner <Adrian.Aichner(a)t-online.de>
writes:
APA> "Albrecht DreX" <albrecht.dress(a)arcormail.de> writes:
>> Now I middle-click on the error line, but I get a stack
>> overflow in regexp matcher error.
APA> Sorry for the late confirmation, Albrecht.
APA> I can reproduce this in XEmacs 21.5 (beta18) "chestnut"
APA> (+CVS-20050202) [Lucid] (i586-pc-win32, Mule) of Fri Feb 04
APA> 2005 on D5DC120J when doing a make all in the xemacsweb
APA> module of the XEmacs website.
Wow. That is one obscene regexp!!
looking-at((concat "\\([^\n]*: Entering directory `\\([^\n]*\\)'$\\)"
"\\|\\([^\n]*: Leaving directory `\\([^\n]*\\)'$\\)"
"\\|\\(\\(\\([a-zA-Z]?:?[^:( \n]+\\)[:(][
]*\\([0-9]+\\)\\([) ]\\|:\\([^0-9\n]\\|\\([0-9]+:\\)\\)\\)\\)"
"\\|\\(\\(\\([a-zA-Z]:\\)?[^:( \n-]+\\)[:(][
]*\\([0-9]+\\)[:) ]\\)"
"\\|\\(\\(Error\\|Warning\\) \\([EW][0-9]+
\\)?\\([a-zA-Z]?:?[^:( \n]+\\) \\([0-9]+\\)\\([)
]\\|:[^0-9\n]\\)\\)"
"\\|\\(.*[ :]\\([a-zA-Z]?:?[^:( \n]+\\)[:(](+[
]*\\([0-9]+\\))[:) ]*$\\)"
"\\|\\(.*([ ]*\\([a-zA-Z]?:?[^:( \n]+\\)[:(][
]*\\([0-9]+\\))\\)"
"\\|\\([^\n ]+ (\\([0-9]+\\)) in \\([^ \n]+\\)\\)"
"\\|\\(.*in \\([^(\n]+\\)(\\([0-9]+\\))$\\)"
"\\|\\(\\(cfe\\|fort\\): [^:\n]*: \\([^ \n]*\\), line
\\([0-9]+\\):\\)"
"\\|\\(^cc-[0-9]* \\(cc\\|CC\\|f77\\):
\\(REMARK\\|WARNING\\|ERROR\\) File = \\(.*\\), Line =
\\([0-9]*\\)\\)"
"\\|\\(\\(.* on \\)?[Ll]ine[ ]+\\([0-9]+\\)[
]+of[ ]+\"?\\([a-zA-Z]?:?[^\":\n]+\\)\"?:\\)"
"\\|\\(.*\"\\([^,\" \n ]+\\)\", lines?
\\([0-9]+\\)\\([(.]\\([0-9]+\\))?\\)?[:., (-]\\)"
"\\|\\(^File \"\\([^,\" \n ]+\\)\", line \\([0-9]+\\),\\)"
"\\|\\(^File \"\\([^,\" \n ]+\\)\", lines?
\\([0-9]+\\)[-0-9]*, characters? \\([0-9]+\\)\\)"
"\\|\\([a-z0-9/]+: \\([eE]rror\\|[wW]arning\\): \\([^,\" \n
]+\\)[,:] \\(line \\)?\\([0-9]+\\):\\)"
"\\|\\(.*in line \\([0-9]+\\) of file \\([^ \n]+[^. \n]\\)\\.? \\)"
"\\|\\([EW], \\([^(\n]*\\)(\\([0-9]+\\),[ ]*\\([0-9]+\\)\\)"
"\\|\\([a-zA-Z]?:?[^0-9 \n :]+:[ ]*\\([^ \n
:]+\\):\\([0-9]+\\):\\(\\([0-9]+\\)[: ]\\)?\\)"
"\\|\\([^0-9 \n :]+:[ ]*\\([^ \n
:]+\\):\\([0-9]+\\):\\(\\([0-9]+\\):\\)?[A-Za-z]:\\)"
"\\|\\([^\n]* \\([^ \n,\"]+\\), line \\([0-9]+\\):\\)"
"\\|\\([^\n]*: \\([^ \n,\"]+\\): \\([0-9]+\\):\\)"
"\\|\\(\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR
\\([^,\n]+, \\)* File = \\([^,\n]+\\), Line = \\([0-9]+\\)\\)"
"\\|\\(\\([^( \n ]+\\)(\\([0-9]+\\):\\([0-9]+\\)) : \\)"
"\\|\\(\"\\(.*\\)\",\\([0-9]+\\)\\s-+\\(Error\\|Warning\\)\\[[
0-9]+\\]:\\)"
"\\|\\(\\([^, \n ]+\\), line \\([0-9]+\\), char
\\([0-9]+\\)[:.,
(-]\\)"
"\\|\\(.* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]\\)"
"\\|\\(Semantic error at line \\([0-9]+\\), column \\([0-9]+\\), file
\\(.*\\):\\)"
"\\|\\(Error [0-9]+ at (\\([0-9]*\\):\\([^)\n]+\\))\\)"
"\\|\\(.*: ERROR File = \\(.+\\), Line = \\([0-9]+\\)\\)"
"\\|\\(.*: WARNING File = \\(.+\\), Line = \\([0-9]+\\)\\)"
"\\|\\(.* ERROR [a-zA-Z0-9 ]+, File = \\(.+\\), Line = \\([0-9]+\\),
Column =
\\([0-9]+\\)\\)"
"\\|\\(Error:.*\n.* line \\([0-9]+\\) char \\([0-9]+\\) of
file://\\(.+\\)\\)"
"\\|\\(Warning:.*\n.* line \\([0-9]+\\) char \\([0-9]+\\) of
file://\\(.+\\)\\)"
"\\|\\(^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]
+:\\)"
"\\|\\(^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\)"
"\\|\\(file:\\(\\([a-zA-Z]:\\)?[^:( \n]+\\):[ ]*\\([0-9]+\\)[:
]\\)\\)"))
Second and third to last lines above contain iterated negated (starts
with
^)
character classes that match newline. Such a regexp will try to
swallow
the
whole buffer, pushing a retry point on the stack every time it hits
"]",
IIRC.
Try substituting "[^]\n]" for "[^]]" in those two
expressions, especially
if
your *Compile* buffer contains lots of [] pairs.
Ben? Is my analysis correct? Any other ideas, in case that
doesn't pan
out?
I'm very tired, so I'm not sure everything I'm writing is right.
Yur analysis is correct in that it will skip past newlines just like other
chars. However, it will stop when it finds the first right bracket.
Therefore, it won't swallow the whole buffer unless you really have such a
multiline bracket construction spanning the whole buffer -- or, you have an
unmatched bracket -- and it won't push failure points because there won't be
any ]. *If* your crash is due to matching an extremely large amount of text
(i.e. a probable case of forgotten bracket), then the inserted newline would
make the match stop quicker -- but too soon if a bracketed expression can
legally continue across a line.
It would be useful to get debugging info to found out where exactly it's
crashing. There's debugging info in regex.c, although it's a bit hard to
turn on; you'd have to attach to a running program and set the `debug'
variable to 1 around line 743 in regex.c. There should be a var to control
this; I may add this at some point.
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573
JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.