Re: [Bug: 21.5-b16] Gnus-induced freed lrecord crash again

Thursday, 16 October 2003

        Jerry James <james(a)xemacs.org&gt; wrote:

...
 My brand spanking new GDB 6.0 says:

 ------------------------------------------------------------------------
 (gdb) bt
 #0  0xffffe002 in ?? ()
 #1  0x420277d6 in kill () from /lib/tls/libc.so.6
 #2  0x080e4a11 in fatal_error_signal (sig=17125)
     at /usr/src/xemacs/xemacs-21.5/src/emacs.c:3507
 #3  <signal handler called>
 #4  0xffffe002 in ?? ()
 #5  0x42027561 in raise () from /lib/tls/libc.so.6
 #6  0x42028b93 in abort () from /lib/tls/libc.so.6
 #7  0x080e5c55 in complex_vars_of_emacs() ()
     at /usr/src/xemacs/xemacs-21.5/src/emacs.c:4341
 #8  0x080e4a94 in assert_failed (file=0x8300580 "ably", line=6,
     expr=0x42131a14 " \031\023B")
     at /usr/src/xemacs/xemacs-21.5/src/emacs.c:3640
 #9  0x081ef493 in printing_major_badness (printcharfun=0,
     badness_string=0x6 <Address 0x6 out of bounds>, type=0, val=0x42131a14,
     badness=BADNESS_INTEGER_OBJECT)
     at /usr/src/xemacs/xemacs-21.5/src/print.c:1459 
     Is there any way you can provide a stack trace from a debug,
un-optimized XEmacs?

     I'm asking this because I think we're chasing the wrong problem;
the problem may not be "stack corruption", but something else that is
causing the "freed lrecord" problem:

1. Mixing debug and optimization flags often causes "scrambling" of
   line numbers, which is certainly shown above (e.g,
   printing_major_badness(), in print.c:1459 doesn't make any sense --
   line 1480 is probably the correct line number).

2. With stack corruption problems, one can generally trace a good stack
   trace from main() to some point where corruption is detected.  In the
   gdb 6.0 trace, it appears as if a good trace exists from main() to
   assert_failed() (frame #8, above).

   This means that, if stack corruption occurred, it probably occurred
   *BELOW* assert_failed().

   As assert_failed() is called because of the lrecord problem, any
   stack corruption is probably occurring after the lrecord problem
   occurs.

   The point is that any "stack corruption" is not causing the lrecord
   problem.

3. However, there may not be any real stack corruption.  In fact, if you
   look at emacs.c, the mysterious call, "complex_vars_of_emacs()", is
   just before really_abort(), which would make complete sense
   (assert_failed() calls really_abort()).  It's also possible that the
   mixing of debug and opt flags is confusing the debugger.

4. Complicating all this is the fact that XEmacs catches SIGABRT, which
   is what occurs when an abort() is done (if I'm reading the source
   correctly).  So, an lrecord problem occurs, printing_major_badness()
   -> assert_failed() --> abort() --> handle_signal_if_fatal() ->
   fatal_error_signal() --> kill(0, SIGABRT) --> death.

   (I'm *really* hoping that gdb is just printing a confused stack
   trace, due to mixing of debug and opt flags, and that stack
   corruption really isn't occurring. ;-)

   [ On a side note, XEmacs should really not be catching SIGABRT.  ]

-- 
	Darryl Okahata
	darrylo(a)soco.agilent.com

DISCLAIMER: this message is the author's personal opinion and does not
constitute the support, opinion, or policy of Agilent Technologies, or
of the little green men that have been following him all day.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: [Bug: 21.5-b16] Gnus-induced freed lrecord crash again