Re: Dump while making bindist on packages

Tuesday, 20 February 2001

        -- "Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp&gt; spake thusly:

...
>>>>> "sjt" == Stephen J Turnbull
<turnbull(a)sk.tsukuba.ac.jp&gt; writes:

>>>>> "Matt" == Matt Tucker <tuck(a)whistlingfish.net&gt;
writes:
     Matt> This one's the syntax stuff again. I'm pretty sure that all
     Matt> of these 'VALID_BYTIND_P' crashes (there are at least three
     Matt> that I know of) are the same error

     sjt> I got this one, same stack trace as Norbert, I think.

     sjt> The offending code in my dump seems to be at 4319 of regex.c:

     sjt> These changes may require fiddling with the new code
     sjt> (changing array manipulations to pointer idioms, eg) to get
     sjt> them to make sense.

 Some more on this.  First, the same code occurs in re_search_2 at line
 4042.  It needs to be fixed there as well.

 I wonder if re_match_2 is the right place for this adjustment.
 Shouldn't it be done in re_match_2_internal?  (re_search_2 needs to be
 checked for this issue too.)

 The proper idiom uses the re_char* variable d (eg, from re_search_2
 line 4187):

	   d = ((const unsigned char *)
	        (startpos >= size1 ? string2 - size1 : string1) + startpos);
	   d_size = charcount_to_bytecount (d, 1);
	   range -= d_size;
	   startpos += d_size;

 I'm not sure if you can use `d' in re_match_2(_internal)? because it
 seems to be used in a different way from re_search_2.  This idiom
 doesn't seem to appear in re_match_2_internal at all.  I suppose this
 is because there is no issue of "skipping" characters since the re is
 anchored at point.

 This idiom recurs often.  It probably ought to be encapsulated in a
 set of macros.  (That's not your job unless you volunteer for
 it---Martin is good at that kind of thing and might be willing to
 help.  Definitely it can't be included in the release.) 
I'm pretty sure I've got it nailed down now. The adjpos was a false
lead. It had nothing to do with mule characters; it was used to
accomodate for the fact that buffers are indexed from 1 and strings are
indexed from 0.

The real problem was that the code in regex.c converts from byte
indices to bufpos's, but the macros I was using in syntax.h weren't
taking narrowed buffers into account (I had to rewrite them when
porting from the GNU code, and somehow lost that bit). This was showing
up most commonly in info buffers, where narrowing is frequently used.

I'm now properly offsetting by BI_BUF_BEGV. Interestingly, this
obviates the need for adjpos (apparently the GNU code had regex.c
adding the 1 for buffers and then then syntax.h macros were subtracting
it again; true weirdness).

I tested the fix against the bindist snippet and it worked fine. I'm
fairly certain I'm doing the right thing now, so I'm going to go ahead
and commit tonight (despite the fact that it's way past my bedtime),
but I'll look at it again in the morning just to be sure.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Dump while making bindist on packages