unicode-internal now gets most of the way through the build process;
currently, it gets most of the way through autoload re-creation before
hitting a syntax error.
but in the process i've noticed that kkcc is *unbelievably* slow with
the new char tables i've designed. or at least i assume that's where
the slowness is; this is the only major change i've made to any
objects. without kkcc, gc is fast but with it, it rapidly balloons, to
the point where it takes 2 *minutes* or more to do a gc once all modules
are loaded.
the format of the char tables is something like a trie. a typical char
table for unicode has three levels. the top level is an array of 256
elements indexing bits 23-16 of a unicode char; each element points to
another array of 256 elements, indexing bits 15-8; each element of the
second level also points to an array of 256 elements, for bits 7-0,
whose values are Lisp objects. there will be a fourth level on top of
the other three if a unicode char of 0x1000000 or greater is seen. in
order to avoid a memory explosion, each place at which there are no
defined elements points to a shared "blank" table. hence, originally
the char table contains a single 256-element array, each of whose
elements points to the same empty table. when a value for a particular
character is added, the tables are expanded along that path.
because the mark method is a method, i can put logic into it to check to
see whether a blank table is being traversed and stop traversing at that
point. the problem seems to be that kkcc can't be told this, and isn't
smart enough (at least i don't think so) to recognize whether it's
already traversed something unless it's a Lisp object -- and the
subtables aren't Lisp objects. i could make them Lisp objects, but this
means either [1] i integrate the header and the table, leading to an
object whose size is slightly over a power of 2 and hence its allocation
is difficult to manage efficiently (i assume, at least -- marcus, how
does mc-alloc handle this case?); or [2] i separate them into two
objects, which doubles the number of memory accesses to look up a character.
which is the lesser of two evils?
ben