Hi,
Thanks you very much for the review and the patch! The patch
"works"
in the sense that I can now build.
Ok, good.
I'll run with it for a while, put it in the next release
candidate,
and see how it affects XEmacs's footprint. Probably it will go in the
next release of 21.4, safety over efficiency. But I would appreciate
it if you could help us recover the mmap capabilities.
Yes, I'll give it a try when I get some time.
Wolfram> N.B. the exact same problem should/could show up
with
Wolfram> earlier glibc releases, but maybe the allocation pattern
Wolfram> was slightly different.
I don't understand the problem so I'm not sure what you're saying.
First, your patch also affects "portable dumper" builds which build
and (mostly) run fine. Is this intentional? Ie, is this a generic
problem with our allocator implementation, which "just happened" to
manifest dramatically only in unexec builds on very recent glibcs?
That's quite probable. You see, the problem is Lisp's tagged
pointers. Pointers to Lisp objects are "coloured" in their most
significant bits (I think 3 bits) with type information. Therefore,
when the malloc implementation hands out chunks with one of those high
bits already set, you get a clash. This can indeed happen with
glibc's malloc on Linux, because it hands out mmapped chunks (they
start near 0x40000000 on ix86-linux), but generally only for "large"
allocations. (You had the M_MMAP_THRESHOLD set to 64k, a reasonable
choice IMHO; one Lisp vector allocation from the "temacs -dump" run
just exceeded that threshold.) By setting M_MMAP_MAX to 0 I've
disabled all use of mmap; glibc's malloc behaves more like a classic
malloc then.
For GNU Emacs, I've added temporary switches of M_MMAP_MAX to 0 and
back _only_ in the Lisp object allocation paths (there were about half
a dozen places); I'll try to do the same for XEmacs when I find the
time.
Second, if it's a generic problem, is it possible that it would
generate GCPRO-bug-like symptoms (ie, weird crashes in "obviously
correct" code because data that we know is correctly initialized
mysteriously changes)? For example, we fixed a couple of GCPRO bugs
recently, but we're still seeing mysterious "illegal bytecode" crashes
(especially in Gnus), although fewer of them :-). We're pretty sure
the bytecompiler isn't responsible for this, because we've checked the
code in memory.
Not sure about this, but I suspect that "mysteriously changing" memory
cannot occur due to this. The chance that a coloured pointer is
masked (by stripping off the top bits) into a valid memory region
seems quite small to me. But it's certainly not impossible.
Third, is there still a possible problem if we use
--with-system-malloc? Ie, we use the Doug Lea malloc from glibc,
but do no mallopt tweaking.
Yes, there is the same problem, AFAICS. IMHO the autoconf test should
check whether mallopt(M_MMAP_MAX) is available, and deduce that it's a
variant of Doug Lea malloc.
Regards,
Wolfram.