On Mon, Oct 11, 1999 at 09:29:05PM +0200, Jan Vroonhof wrote:
Do you have an infinite amount of memory too?
384Mb is not infinite but more than enough not to be able to notice.
+/* An estimation (not the exact number) of the number of currently
existing l(c)rcords */
+static int object_count;
+
[..]
+ quickhash_init (object_count*4);
Unless I am going mad the memory overhead is now 4+1 words per object,
i.e. 125% (only during the sweep, but does that matter?).
Yes, it does matter. It is usually allocated at the end of the
memory, and since the gc only frees memory, if your malloc is any
decent, the virtual space is given back to the system when the final
free is done.
The 4 is a bit out of the blue. Some quick calculations with
Mathematica show that the medium number of lookups to do for a search
is 1.33 for 4, 1.5 for 3, 2 for 2 and 3 for 1.5 (for the curious,
1/(1-1/x)). Looking at the curve, 2 is probably a better space/time
ratio.
I don't think you can really say much about the speed of the GC
unless
you use one which actually space efficient (and thus has to do more
shifts & masks).
Shifts and masks are probably faster than the % the hashing one does.
If you don't want to do the pool thing just yet:
I'd like to finish the portable dumper first. All setq-default are
lost (hence the lack of modeline). I know why, I just have to fix it.
After that, the more I think about your idea the more I like it.
Funnily enough, this would be the end of the lcrecord lists and would
change most of the lcrecords into lrecords (8 more bytes saved per
object). It would also allow for a slightly more clever way of
handling the free lists which has a better packing ratio.
What about using a "conservative" garbage collector in the
sense that
you marked the hash(address)'ed bit in a bit array of size
object_count*4. This reduces the sweep array to 4 bits per object, or
<6.3%. This doesn't free all free objects (i.e. those that hash to the
same value as a referenced object), but by slightly varying the hash
function from sweep to sweep (introducing a bit of pseudo-randomness)
most of them will be freed later. Does somebody have numbers on the
expected ratio of hash clashes for varying hash sizes handy?
You'll end up freeing objects that shouldn't be freed because the
marking collision prevents you from going over some of the existing
objects to mark what they point to.
An example may be clearer. A, a root, points to B and C. C points to
D. B and C happen to hash to the same value. The marking goes this
way:
- mark A
- markedp B ? No -> mark B. B points to nothing
- markedp C ? Yes (collision with B), ignore it (if you don't, you'll
loop on circular references).
then the marking ends. Then the sweep removes all non-marked objects.
Ooops, D isn't marked.
P.S. I just spent waaaaaaayyyyy to much time debugging the 21.1.7
unexec problems on Linux, so AFAIC the portable dumper stays whatever
the overhead.
This gc hacking is not necessary for the portable dumper, it just
makes things (much) easier (no need to rebuild the list of objects for
the unmarking phase), and allows to get rid of the purecopy thingy and
still have sharing.
OG.