-- "Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp> spake thusly:
>>>>> "sjt" == Stephen J Turnbull
<turnbull(a)sk.tsukuba.ac.jp> writes:
>>>>> "Matt" == Matt Tucker <tuck(a)whistlingfish.net>
writes:
Matt> This one's the syntax stuff again. I'm pretty sure that all
Matt> of these 'VALID_BYTIND_P' crashes (there are at least three
Matt> that I know of) are the same error
sjt> I got this one, same stack trace as Norbert, I think.
sjt> The offending code in my dump seems to be at 4319 of regex.c:
sjt> These changes may require fiddling with the new code
sjt> (changing array manipulations to pointer idioms, eg) to get
sjt> them to make sense.
Some more on this. First, the same code occurs in re_search_2 at line
4042. It needs to be fixed there as well.
I wonder if re_match_2 is the right place for this adjustment.
Shouldn't it be done in re_match_2_internal? (re_search_2 needs to be
checked for this issue too.)
The proper idiom uses the re_char* variable d (eg, from re_search_2
line 4187):
d = ((const unsigned char *)
(startpos >= size1 ? string2 - size1 : string1) + startpos);
d_size = charcount_to_bytecount (d, 1);
range -= d_size;
startpos += d_size;
I'm not sure if you can use `d' in re_match_2(_internal)? because it
seems to be used in a different way from re_search_2. This idiom
doesn't seem to appear in re_match_2_internal at all. I suppose this
is because there is no issue of "skipping" characters since the re is
anchored at point.
This idiom recurs often. It probably ought to be encapsulated in a
set of macros. (That's not your job unless you volunteer for
it---Martin is good at that kind of thing and might be willing to
help. Definitely it can't be included in the release.)
I'm pretty sure I've got it nailed down now. The adjpos was a false
lead. It had nothing to do with mule characters; it was used to
accomodate for the fact that buffers are indexed from 1 and strings are
indexed from 0.
The real problem was that the code in regex.c converts from byte
indices to bufpos's, but the macros I was using in syntax.h weren't
taking narrowed buffers into account (I had to rewrite them when
porting from the GNU code, and somehow lost that bit). This was showing
up most commonly in info buffers, where narrowing is frequently used.
I'm now properly offsetting by BI_BUF_BEGV. Interestingly, this
obviates the need for adjpos (apparently the GNU code had regex.c
adding the 1 for buffers and then then syntax.h macros were subtracting
it again; true weirdness).
I tested the fix against the bindist snippet and it worked fine. I'm
fairly certain I'm doing the right thing now, so I'm going to go ahead
and commit tonight (despite the fact that it's way past my bedtime),
but I'll look at it again in the morning just to be sure.