On Wed, Apr 23, 2008 at 6:53 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:

Robert Widhopf-Fenk writes:

> GCC allows for the generation of dependencies for Makefiles
> by -MD.
>
> I wonder if there is something similar for Emacs, but I did
> not find anything.

No. elisp is designed to make this difficult and unreliable.

It is possible to construct a graph of requires, but a require that is
not listed in the package Makefile REQUIRES variable will also be
reported as a fatal error when attempting to build.

However, Lisp code that calls a function that is not defined at load
time or compile time is legal by design, so there is no way (except to
keep a database of all functions ever defined anywhere) to determine
whether a function that is called is a dependency, and if so, it is
not possible to determine where it was defined.

This goes way beyond what Robert wants, but I've thought about making a database of all functions for static analysis purposes. Lisp is a dynamic system, which makes static analysis difficult and unreliable, as you note. However, the set of Lisp files distributed with XEmacs plus the packages at any given point in time comprise a static set that is subject to analysis. I would like to be able to make certain claims about that set, then append the caveat, "and what you, the user, do to this system by taking advantage of Lisp's dynamic nature may render the analysis moot."

Here's the way I would like to construct an analysis engine for Elisp. The bottom layer is a database of all functions defined in C, with type information on the parameters and return values [1] [2]. The next layer up is an Emacs Lisp reader. There is one in XEmacs, but XEmacs is not optimized for this kind of analysis. It would be slow. I have looked at using a Common Lisp engine for the analysis, but Mule-encoded files make that difficult. One possibility is to have XEmacs convert all Mule-encoded files to something else (probably some form of Unicode) off-line, then feed the converted files to the analysis engine.

Next is the definition database. Essentially, you read in every single Lisp file distributed with XEmacs and the packages, and make a list of all definitions and the files in which they occur. This is complicated by conditional definitions (such as defun-when-void) and multiple unconditional definitions of the same name, which can occur in the packages. I haven't thought of a good way of dealing with that problem yet. Assuming I find a magic wand to solve it, then next we generate a dependency graph. Actually, we need two dependency graphs: a build-time dependency graph and a run-time dependency graph. Macros, for example, induce a build-time dependency on the macro definition, and a run-time dependency on functions and variables used in the body of the macro. Some of the dependencies are bogus, because they stem from GNU Emacs-only code. So we have to do some constant folding. At a minimum, we have to know that (featurep 'xemacs) and running-xemacs are both the constant t, and we have to be prepared to perform a string-match on emacs-version. We then have to be able to collapse if, when, unless, etc. forms to eliminate code that XEmacs will never execute. Note that these two dependency graphs both represent definitional dependencies, not (necessarily) functional dependencies.

Then I'd like to start processing code for each function to determine type information. By making nil be a member of a distinct type, we can do nil-tracking to see where we can get nils [3] into code that cannot handle them. We can also see whether we can provoke type-related throws that are not caught by anything.

And I don't know what is after that. Some kind of deeper analysis that I haven't stopped to dream up yet, because I've already got my head way up in the clouds with just this much.

Is anybody interested in working on something like this? If so, I'll bring my attempts at designing function type representations out into public instead of trying to suffer through it in silence.

Also, I'm probably going to continue devoting most of my free time over the next few months to a C analysis engine I'm working on (anybody see a theme here?). I don't anticipate having time to seriously pursue this until this fall ... when I was planning to revive work on voice recognition support in XEmacs. I think both projects are fascinating, so I'm willing to have my priorities swayed by the community. Plus, I haven't sprung for a good microphone yet. :-)

Footnotes:
[1] In fact, I started working on this database some months ago, but haven't gotten very far yet, because I keep finding that my type representation isn't rich enough. That means I have to enrich it and start over. I've done that 3 times already. The sticking point has repeatedly been cases where there are dependencies between the types. For example, the return type of #'identity is the type of its argument. I'm trying to use a type variable approach, much like Haskell and ML. There is also the question of how to maintain the database as the C code is maintained and extended, to which I do not have a satisfactory answer.
[2] I also need to decide whether to attempt "throw" tracking, much like exception declarations in Java. That is, if we track what can be thrown from a given function, then a later analysis step can decide that a throw is "bad" (i.e., indicates a program error) only if it reaches top-level without being caught. This is uncomputable in the general case. What does funcall throw? Whatever the function it invokes throws. On the other hand, we are trying to do the analysis for a closed set of code, so we should be able to figure this out in many cases. Where we can't figure it out (e.g., we are funcalling a user-supplied function), we assume that ANYTHING can be thrown (and that the return type is "anything"). This is right, because such code *should* be prepared to handle arbitrary behavior from a user-supplied function.
[3] Not Oleson. He's a character from "Little House on the Prairie." Besides, he spelled his name "Nels", not "Nils" so why did you even think of him?
--
Jerry James
http://loganjerry.googlepages.com/