Uday S Reddy writes:
Stephen J. Turnbull writes:
Ok, it looks like we have agreement on that issue :-)
> Eh? I may be a few weeks behind, but in the version I synched to most
> recently the ordering of the message list is still done in the basic
> sort function. That just ain't right.
Quoting from the current VM Manual, Sec. 10.1:
"Unlike in previous versions of VM, threading is not a form of
sorting.
I agree that VM is better than Netscape 4.0 in the UI, since it has a
separate toggle for threading, rather than listing it as a kind of
sorting.
But the implementation contradicts what the docs say:
How the threads affect the sort order is of course handled in the
sorting function.
There's no "of course" about it. It's much simpler if you sort the
sibling sets in the tree rather than trying to use thread information
in the sort. It's not clear to me that it's all that much slower in
practice (vm-sort is O(N log N), jwz-thread is O(N log M) where N is
messages in folder and M is number of siblings per parent).
In fact, it turns out that VM's sorting function (as of the version I
have) special cases the thread key to insure it trumps the other other
keys. So not only is the sorting function formally the way (user-
visible) threads are actually computed, but this approach greatly
complicates the code that actually does the work. IIRC this
complexity is introduced in several places in the main sort function.
By contrast, my code doesn't need a vm-sort function at all (although
of course it needs all the comparator functions); it just needs
Emacs's `sort', and a function that tries each of a list of
comparisons in order until it gets a not equal result.
pretty much the same ideas. The details differ.
As far as I can see, there are a couple of ideas missing.
In particular, the empty container at root concept. ISTR something
else, but it was a while back.
Jamie is only concerned with building the thread tree. He
doesn't
care to maintain it.
Well, if it's fast enough (and my experiments suggest it will be), you
don't need to maintain it, just rebuild it as needed.
I don't know if you will be able to do all these things if you
use
Jamie's build-and-discard approach to the threads database.
There are a number of reasons why I don't think discarding the
containers is a good idea in VM. First, the thread tree is not
folder-local, it's universal (ie, across users and nodes on the
Internet!) Once I get the basic algorithm working, I plan to look at
computing "views" per-folder from a global database for the VM
session. (Maybe this isn't a great idea globally for a VM session,
but virtual folders mean a per-real-folder database doesn't fly
either. If you use V A and the like a lot, you're going to be
building a lot of databases unless you reuse existing thread
information.)
Also, note that Jamie's algorithm works fine for *efficiently* adding
information from new messages to the database (ie, the tree).
Finally, you're quite right, that (limited!) use of operations on
trees like reparenting (sub)trees, and adding or removing nodes can be
very useful to help the user present the threads. One that you didn't
mention here but is in the VM code is one that bugs me a lot, namely
multiple instances of the same message. The database of containers
can group these for you in a flexible way, the summary list can't.
Oh, if I do understand why it proved to be so hard, it will
probably go into a journal or perhaps it will turn into a bigger
project!
Message threading in principle isn't hard, because the RFC 1035/5322
definitions of References and In-Reply-To guarantee a tree if all MUAs
implement those fields conformantly, and the information provided
gives the appropriate links. Handling various error conditions can be
a little tricky conceptually, but in practice they're rare, so don't
need to be implemented efficiently. The only real issue is
guaranteeing no loops in the face of arbitrary input, and Jamie's
algorithm does that for you already.
Even implementing the more advanced operations is going to be nearly
trivial, I think, as long as sorting is subordinate to threading, not
the other way around.
If you want a problem that baffles me, at least, try merging the per
file history trees generated by CVS into a project-wide tree a la git
or bzr *without* tree metadata like branch names, but using only
per-commit metadata like timestamp and commit log.
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta