After upgrading from glibc 2.2.5 to 2.3.1 in Debian `sid', the unexec
version of XEmacs builds successfully, but crashes immediately when
invoked. Are there reasons to believe that the new glibc might cause
problems with the unexec'd binary? It may be related to changes in
malloc, see below.
N.B. I don't consider this a glibc bug, I just want to try to confirm
that it is related to the unexec process or malloc (we have an
alternative way to "dump" Lisp now, the so-called "portable dumper").
If it's not, it's a Heisenbug waiting to bite....
Debian version: testing/unstable (updated Mon Nov 4 19:41:15 2002)
Kernel version: Linux tleepslib 2.2.18 #1 SMP Tue Dec 26 11:36:10 JST 2000 i686 unknown
unknown GNU/Linux
glibc version: 2.3.1-3
binutils version: 2.13.90.0.10-2
gcc version: 2.95.4-12
libgcc1: 3.2.1-0pre5
(The version parts after the hyphen are Debian release numbers for the
benefit of the Debian maintainers.) The kernel is locally compiled,
all the other packages mentioned are stock Debian sid.
XEmacs sources (for completeness):
CVSROOT=:pserver:cvs@cvs.xemacs.org:/pack/xemacscvs
(password = cvs)
cvs checkout -r r21-4-10 xemacs
To reproduce the crash (my comments are in square brackets []):
------------------------------------------------------------------------
./configure --without-x11 --without-postgresql --with-database=no \
--with-sound=none --without-modules --with-ncurses=no --with-gpm=no
[This gives pretty close to the minimal number of other libraries:
bash-2.05b$ ldd xemacs
libncurses.so.5 => /lib/libncurses.so.5 (0x40022000)
libm.so.6 => /lib/libm.so.6 (0x40061000)
libutil.so.1 => /lib/libutil.so.1 (0x40082000)
libc.so.6 => /lib/libc.so.6 (0x40085000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
and:]
make
[A couple thousand lines of normal output, then:]
Dumping under the name xemacs
[So temacs works fine, and produces an xemacs by unexec.
The following invocation of ./xemacs does NOT appear in normal make
output due to an `@' in the Makefile.]
Testing for Lisp shadows ...
./xemacs -batch -vanilla -f list-load-path-shadows
Wrong type argument: vectorp, #<INTERNAL OBJECT (XEmacs bug?)
(symbol-value-forward type 7444) 0x40198008>
xemacs exiting
.
make[1]: *** [xemacs] Error 255
make[1]: Leaving directory `/playpen/Projects/XEmacs/staging/xemacs-21.4.10/src'
make: *** [src] Error 2
[This doesn't leave a core. gdb tells me:]
bash-2.05b$ gdb xemacs
GNU gdb 2002-08-18-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux"...
(gdb) run
Starting program: /playpen/Projects/XEmacs/staging/xemacs-21.4.10/src/xemacs
Wrong type argument: vectorp, #<INTERNAL OBJECT (XEmacs bug?) (symbol-value-forward
type 7444) 0x40198008>
Program exited with code 0377.
(gdb) bt
No stack.
------------------------------------------------------------------------
Add --with-system-malloc to the configure, and things proceed
normally. This seems strange, though, because the link list is the
same except for some debugging code (vm-limits.c), while configure
says
GNU version of malloc: no
- User chose not to use GNU allocators.
for the --with-system-malloc=yes case (this short-circuits all
attempts to check for special malloc features and simply uses the
<malloc.h> interface, so configure doesn't recognize GNU malloc in GNU
libc, either), while
GNU version of malloc: yes
- Using Doug Lea's new malloc from the GNU C Library.
for --with-system-malloc=no. There are a number of places where the
relevant conditional compilation (SYSTEM_MALLOC, defined in the first
case, and GNU_MALLOC and DOUG_LEA_MALLOC, defined in the second)
change usage of what look like defined malloc APIs (malloc_get_state
and malloc_set_state, before and after the unexec, it looks like).
They are used for GNU_MALLOC and DOUG_LEA_MALLOC, not used with
SYSTEM_MALLOC. The only place where implementation details seem to be
involved is in the snippet included below.
OK, I'm stuck. Any further information I should provide?
------------------------------------------------------------------------
#ifdef GNU_MALLOC
if (claimed_size < 2 * sizeof (void *))
claimed_size = 2 * sizeof (void *);
# ifdef SUNOS_LOCALTIME_BUG
if (claimed_size < 16)
claimed_size = 16;
# endif
if (claimed_size < 4096)
{
int log = 1;
/* compute the log base two, more or less, then use it to compute
the block size needed. */
claimed_size--;
/* It's big, it's heavy, it's wood! */
while ((claimed_size /= 2) != 0)
++log;
claimed_size = 1;
/* It's better than bad, it's good! */
while (log > 0)
{
claimed_size *= 2;
log--;
}
/* We have to come up with some average about the amount of
blocks used. */
if ((size_t) (rand () & 4095) < claimed_size)
claimed_size += 3 * sizeof (void *);
}
else
{
claimed_size += 4095;
claimed_size &= ~4095;
claimed_size += (claimed_size / 4096) * 3 * sizeof (size_t);
}
#elif defined (SYSTEM_MALLOC)
if (claimed_size < 16)
claimed_size = 16;
claimed_size += 2 * sizeof (void *);
#else
------------------------------------------------------------------------
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.