Question about internal Lisp objects and KKCC/NEW_GC
crestani at informatik.uni-tuebingen.de
Thu Mar 25 04:42:26 EDT 2010
>>>>>"BW" == Ben Wing <ben at benwing.com> writes:
BW> What this doesn't answer is when do I *HAVE* to convert non-Lisp
BW> structures to Lisp objects? I ask because whether or not it's
BW> theoretically better to do this, in practice it's a lot of extra work
BW> that I'd rather not do. I know that the new GC can handle non-Lisp
BW> structures with pointers to Lisp objects in them.
You need to convert non-Lisp structures that contain pointers to Lisp
objects to Lisp objects. The GC cannot handle non-Lisp objects with
pointers to Lisp objects in them. Non-Lisp structures are not allocated
on the Lisp heap and thus the write barrier does not cover them: Changes
to the pointers to Lisp objects during a GC escape and corrupt the heap.
Eventually XEmacs will crash. What structures are you refering to?
BW> The only limitation I know of that appears to force conversion to
BW> Lisp objects is the apparent inability of dumped Lisp objects to
BW> have finalizers. I say "apparent" here because there are in fact
BW> some dumpable Lisp objects with finalizers during NEW_GC -- at
BW> least, markers and the new number types (ratios, bigfloats, etc.).
BW> In fact, these types have finalizers ONLY during NEW_GC, which is
BW> strange -- maybe someone messed up and meant to do things the other
BW> way around? Maybe it only happens to work because there are none of
BW> these objects actually around at dump time?
Jerry fixed the new number types a couple of weeks ago: They need
finalizers and do not need to be dumped, thus their dumpable flag should
probably be 0.
Markers also have a finalizer #ifndef NEW_GC, see ADDITIONAL_FREE_marker
in alloc.c. Since markers are obviously not around for dump time (you
are right, then it would brake), their dumpable flag should be 0, too.
BW> Also, it's not obvious to me that it's necessarily a good idea to
BW> eliminate all manual allocation/freeing -- for one thing it adds a
BW> significant extra load onto the garbage collector, forcing more
BW> garbage collection. With KKCC/NEW_GC, there is too much garbage
BW> collection already, which makes it significantly slower than the old
BW> garbage collector -- are there plans to fix this?
The new GC is incremental and faster. It's parameters need to be
tweaked to find the right proportion of run time vs. memory overhead
(figuring this out would be a nice term project). First the remaining
problem (a hard-to-reproduce segfault that seems to be related to
lstream finalization) needs to be fixed. Mike and I plan to get to the
code review needed to figure it out one of these days.
BW> An example of where conversion to Lisp objects is a real hassle is
BW> something I ran into in my Unicode-internal repository. I created a
BW> new dumpable type that encapsulated a dynarr of Lisp_Object pointers.
BW> Since I needed to free the dynarr during finalization and I have no
BW> finalizers allowed any more, I had to convert the dynarr to a Lisp
BW> dynarr. Unfortunately, as it currently stands, the Lisp dynarr can
BW> only handle inline Lisp objects, not Lisp_Object pointers. Either I
BW> would have had to create another Lisp object whose only purpose is to
BW> encapsulate a Lisp_Object pointer (yucko) or modify the Lisp dynarr
BW> code so it can also handle Lisp_Object pointers -- not something
BW> that's very obvious to do. What I did instead in practice was to
BW> arrange things so that none of these objects needed to be dumped --
BW> which happened to be possible but was not an ideal solution.
What about using vectors as a basis and extending them to dyn_vectors?
BW> Better documentation would also be good -- much of KKCC and NEW_GC is
BW> sparsely documented, and there is no overview anywhere. At least,
BW> write a long comment providing an overview of how NEW_GC works. E.g.
BW> incremental vs. full GC? gray vs. black bits? How are the various
BW> data structures organized and how do they connect to each other?
BW> There is no mark bit located inside the lrecord header with NEW_GC --
BW> where is it located and in what sort of structure?
BW> What is the difference between KKCC and NEW_GC, and what does it mean
BW> to compile with one but not the other? How does the write barrier
BW> work and what objects is the barrier set on?
Yes, you are right, I'll add documentation to the sources. Until then,
you can find detailed information and answers to all your questions in
my thesis http://crestani.de/xemacs/pdf/thesis-newgc.pdf .
KKCC is a mark algorithm that uses the information of the object's
memory_descriptions (instead of separate mark_object functions) to find
all live Lisp objects. KKCC works with the old and the new GC. NEW_GC
is the new incremental garbage collector and new memory allocator. It
also always uses KKCC.
BW> Also, how do you debug seg faults when NEW_GC is enabled? When I try
BW> running under a debugger, the debugger constantly stops with SIGSEGV
BW> -- apparently triggered by the write barrier, or something. What's
BW> the best way of debugging a genuine seg fault? ("Use the core file"
BW> isn't an option since I'm running under Cygwin, which unfortunately
BW> doesn't have workable core files.)
See section 3.2.6 in the above PDF. Maybe you are out of luck under
Cygwin, though, at least at the time of my writing gdb under Cygwin
wasn't able to debug the write barrier:
"Using a debugger on a Cygwin-built XEmacs with the virtual-dirty-bit
write barrier does not work: Cygwin-based debuggers, such as Cygwin’s
gdb, conflict with the native Windows exception handling; and
Windows-based debuggers, such as the Microsoft Visual Studio
Debugger, do not produce usable results with Cygwin-compiled
BW> Another question: Apparently NEW_GC is unfinished. What is left to be
BW> done and when will it get done?
The lstream finalization crash is the only severe bug I know of. As
said, it is on our list (but no due date set).
More information about the XEmacs-Beta