Jerry James <james(a)xemacs.org> wrote:
My brand spanking new GDB 6.0 says:
------------------------------------------------------------------------
(gdb) bt
#0 0xffffe002 in ?? ()
#1 0x420277d6 in kill () from /lib/tls/libc.so.6
#2 0x080e4a11 in fatal_error_signal (sig=17125)
at /usr/src/xemacs/xemacs-21.5/src/emacs.c:3507
#3 <signal handler called>
#4 0xffffe002 in ?? ()
#5 0x42027561 in raise () from /lib/tls/libc.so.6
#6 0x42028b93 in abort () from /lib/tls/libc.so.6
#7 0x080e5c55 in complex_vars_of_emacs() ()
at /usr/src/xemacs/xemacs-21.5/src/emacs.c:4341
#8 0x080e4a94 in assert_failed (file=0x8300580 "ably", line=6,
expr=0x42131a14 " \031\023B")
at /usr/src/xemacs/xemacs-21.5/src/emacs.c:3640
#9 0x081ef493 in printing_major_badness (printcharfun=0,
badness_string=0x6 <Address 0x6 out of bounds>, type=0, val=0x42131a14,
badness=BADNESS_INTEGER_OBJECT)
at /usr/src/xemacs/xemacs-21.5/src/print.c:1459
Is there any way you can provide a stack trace from a debug,
un-optimized XEmacs?
I'm asking this because I think we're chasing the wrong problem;
the problem may not be "stack corruption", but something else that is
causing the "freed lrecord" problem:
1. Mixing debug and optimization flags often causes "scrambling" of
line numbers, which is certainly shown above (e.g,
printing_major_badness(), in print.c:1459 doesn't make any sense --
line 1480 is probably the correct line number).
2. With stack corruption problems, one can generally trace a good stack
trace from main() to some point where corruption is detected. In the
gdb 6.0 trace, it appears as if a good trace exists from main() to
assert_failed() (frame #8, above).
This means that, if stack corruption occurred, it probably occurred
*BELOW* assert_failed().
As assert_failed() is called because of the lrecord problem, any
stack corruption is probably occurring after the lrecord problem
occurs.
The point is that any "stack corruption" is not causing the lrecord
problem.
3. However, there may not be any real stack corruption. In fact, if you
look at emacs.c, the mysterious call, "complex_vars_of_emacs()", is
just before really_abort(), which would make complete sense
(assert_failed() calls really_abort()). It's also possible that the
mixing of debug and opt flags is confusing the debugger.
4. Complicating all this is the fact that XEmacs catches SIGABRT, which
is what occurs when an abort() is done (if I'm reading the source
correctly). So, an lrecord problem occurs, printing_major_badness()
-> assert_failed() --> abort() --> handle_signal_if_fatal() ->
fatal_error_signal() --> kill(0, SIGABRT) --> death.
(I'm *really* hoping that gdb is just printing a confused stack
trace, due to mixing of debug and opt flags, and that stack
corruption really isn't occurring. ;-)
[ On a side note, XEmacs should really not be catching SIGABRT. ]
--
Darryl Okahata
darrylo(a)soco.agilent.com
DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.