Re: XEmacs 21.4.10 crashes with glibc 2.3.1

Thursday, 7 November 2002

        Hi,

...
 Thanks you very much for the review and the patch!  The patch
"works"
 in the sense that I can now build. 
Ok, good.

...
 I'll run with it for a while, put it in the next release
candidate,
 and see how it affects XEmacs's footprint.  Probably it will go in the
 next release of 21.4, safety over efficiency.  But I would appreciate
 it if you could help us recover the mmap capabilities. 
Yes, I'll give it a try when I get some time.

...
     Wolfram> N.B. the exact same problem should/could show up
with
     Wolfram> earlier glibc releases, but maybe the allocation pattern
     Wolfram> was slightly different.

 I don't understand the problem so I'm not sure what you're saying.

 First, your patch also affects "portable dumper" builds which build
 and (mostly) run fine.  Is this intentional?  Ie, is this a generic
 problem with our allocator implementation, which "just happened" to
 manifest dramatically only in unexec builds on very recent glibcs? 
That's quite probable.  You see, the problem is Lisp's tagged
pointers.  Pointers to Lisp objects are "coloured" in their most
significant bits (I think 3 bits) with type information.  Therefore,
when the malloc implementation hands out chunks with one of those high
bits already set, you get a clash.  This can indeed happen with
glibc's malloc on Linux, because it hands out mmapped chunks (they
start near 0x40000000 on ix86-linux), but generally only for "large"
allocations.  (You had the M_MMAP_THRESHOLD set to 64k, a reasonable
choice IMHO; one Lisp vector allocation from the "temacs -dump" run
just exceeded that threshold.)  By setting M_MMAP_MAX to 0 I've
disabled all use of mmap; glibc's malloc behaves more like a classic
malloc then.

For GNU Emacs, I've added temporary switches of M_MMAP_MAX to 0 and
back _only_ in the Lisp object allocation paths (there were about half
a dozen places); I'll try to do the same for XEmacs when I find the
time.

...
 Second, if it's a generic problem, is it possible that it would
 generate GCPRO-bug-like symptoms (ie, weird crashes in "obviously
 correct" code because data that we know is correctly initialized
 mysteriously changes)?  For example, we fixed a couple of GCPRO bugs
 recently, but we're still seeing mysterious "illegal bytecode" crashes
 (especially in Gnus), although fewer of them :-).  We're pretty sure
 the bytecompiler isn't responsible for this, because we've checked the
 code in memory. 
Not sure about this, but I suspect that "mysteriously changing" memory
cannot occur due to this.  The chance that a coloured pointer is
masked (by stripping off the top bits) into a valid memory region
seems quite small to me.  But it's certainly not impossible.

...
 Third, is there still a possible problem if we use
 --with-system-malloc?  Ie, we use the Doug Lea malloc from glibc,
 but do no mallopt tweaking. 
Yes, there is the same problem, AFAICS.  IMHO the autoconf test should
check whether mallopt(M_MMAP_MAX) is available, and deduce that it's a
variant of Doug Lea malloc.

Regards,
Wolfram.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: XEmacs 21.4.10 crashes with glibc 2.3.1