No luck finding this bug, but in the process of looking for it I did
find a workaround that is probably an improvement in any case. The args,
min_args, max_args and args_in_array fields of Lisp_Compiled_Function
objects are currently being initialized by make-byte-code. I moved this
code to optimize_compiled_function instead, so that the args array isn't
created until the first time the function is called. This should be
better overall in almost all cases, and it avoids this bug since the
args arrays of most functions don't even exist when pdump is called.
I'm pretty confident about this change working. I did have to change
make_compiled_function to initialize those slots to zero, so the GC
wouldn't get confused, plus I had to change function_argcount to make
sure that optimize_compiled_function has been called before it uses
max_args. I'm pretty certain I've found all the places where it needs to
be initialized.
Is there a stress test that people like to use these days? Dumping and
running larger sets of preloads would be a good test, also just running
lots and lots of code.
Is there any chance of someone looking at this change?
Eric Benson wrote:
Well, I've traced through pdump() a bit without becoming terribly
enlightened. I haven't "caught it in the act" of putting the wrong
value in the args field of zmacs-make-extent-for-region. I have
noticed that different builds cause it to be aliased to different
values, always the args array of some other function. First it was
perform-replace, then it was quit-char, then it was no-upper-case-p.
But always the args field of some other bytecode function. I've gotten
the general idea of what pdump does, although I don't fully understand
it. I've learned enough to know that there are some lines like this:
EMACS_INT rdata = pdump_get_entry (elt->obj)->save_offset;
in pdump_dump_rtables that don't make much sense.
'pdump_get_entry(elt->obj)' should always be exactly the same as
'elt', otherwise something is very wrong, I think. So this line is
just doing a useless indirection and hashtable lookup. I don't think
this has anything to do with the bug I'm trying to track down, but it
does look bogus. At any rate, the bug must be happening somewhere
inside pdump, because the objects are correct in memory before pdump
gets called, but they are incorrect immediately after loading, even
before relocation.
Who's maintaining dumper.c these days?
Eric Benson wrote:
> Here's what I've got so far:
>
> The 'args' field of the Lisp_Compiled_Function object that is the
> function value of zmacs-make-extent-for-region is incorrect. The
> first slot, i.e. f->args[0] should be the symbol 'region' but instead
> it is the symbol 'from-string'. This seems to be the way it arrives
> from the xemacs.dmp file. In fact, the f->args array appears to be
> exactly the one that belongs to the function perform-replace, i.e.
> args[0] is 'from-string', args[1] is replacements, args[2] is
> query-flag, and so on. In fact, it turns out that the f->args array
> in perform-replace is == to the one in zmacs-make-extent-for-region,
> i.e. both functions point to the same args array. This condition
> seems to have occurred during dumping, because the two function
> objects have identical values before relocation after reading the
> dump file. The most likely culprit I can see is the hashing code in
> dump.c. It seems likely that somehow these two arrays are getting
> thrown in the same hash bucket. Why turning on error checking makes
> the problem go away is a mystery. I can only guess that it perturbs
> the data just enough to avert a rarely occurring bug. It does happen
> whether or not it's compiled with optimization.
>
> I'll try to look at this more tomorrow, but any advice is welcome.
>
> Stephen J. Turnbull wrote:
>
>>>>>>> "Eric" == Eric Benson
<eric_a_benson(a)yahoo.com> writes:
>>>>>>>
>>>>>>
>>>>>>
>>
>> Eric> Can I assume this was already fixed on those other
>> Eric> platforms, but the fix did not make it to Darwin? Can
>> Eric> someone fill me in on the fix, so I can apply it to the
>> Eric> Darwin version as well?
>>
>> No. As far as I know this has not been diagnosed.
>>
>>
>>
>>
>