>>>> "Simon" == Simon Josefsson
<jas(a)extundo.com> writes:
Simon> Looking at the Makefile in the CVS directory for the
Simon> package [is the best/only way to find maintainers]?
Yes, I think that's true. We should do something about it.
One thing that can be done is to publish the
PACKAGE-maintainer(a)xemacs.org list. Also, Ville or Rendhalver might
want to work up a list from the Makefiles.
Maybe what should be done is to have a Maintainers.values make include
file as a sibling of XEmacs.rules, which would contain a list of
assignments like
EDICT_MAINTAINER = Stephen J. Turnbull <stephen(a)xemacs.org>
mule-ucs_MAINTAINER = Stephen J. Turnbull <stephen(a)xemacs.org>
and XEmacs.rules could do
include Maintainers.values
MAINTAINER = $($(PACKAGE)_MAINTAINER)
(Dealing with whether "mule-ucs_MAINTAINER" is a valid make variable
name is an exercise for the implementer, you could always run it
through tr to upcase and transform ?- to ?_.)
> Writing a formal RFC2822 parser might actually gives us less
> than what we need...
Simon> I don't think so -- having a real parser would allow you to
Simon> say that you want to look at the first comment after an
Simon> addr-spec. A function could try all the standard ways to
Simon> put the full name in a header line and return it to the
Simon> caller.
The problem is that there aren't (as of 1123, I'm not really up on the
RFC2822 clarifications) standardized ways, and identifying them is
actually more a lexical problem than an RFC2?822 syntactic issue. A
correct parser would at least allow us to separate the net address
from the rest of the goop, though, which none of the current
applications I use (Gnus, VM, BBDB, Supercite) do with 100% accuracy.
> Another problem (separate from RFC2822 issues) is that things
> like BBDB store the rendered version, so if we start pulling
> our unrendered chunks of headers, we'd have to render them
> before comparing...
Simon> This is a problem in BBDB currently too. Most if not all
Simon> non-ASCII names have multiple entries in my .bbdb because
Simon> they are encoded differently. OTOH using raw mail headers
Simon> would be just as bad. A proper
Simon> canonicalize-then-compare-for-equality function is needed
Simon> to fix this.
"Proper" probably has to wait for full Unicode support, as people can
always confuse you with [U+00E4] <-> [U+0061 U+0308] in UTF-8, or even
[a "] in pidgen composition. But as long as the BBDB database is
stored in a UCS (which it currently is not), a quite high degree of
canonicalization is automatic in Mule if rendered versions are always
used, as things like Unicode composing characters are rare.
> Maybe it is a case of incrementally fixing the glitches... or
> coming up with something really clever. I think the really
> clever idea is to always deal with (and store) unrendered ASCII
> chunks and be careful when comparing them... but that sounds
> like a lot of work
Which is already solved (differently) by each of the major MUAs, plus
TM and SEMI. Unicode and Mule really are the answer here.
Simon> I dunno, there doesn't seem to be a simple solution.
Simon> Dealing with and storing unrendered ASCII chunks has the
Simon> same problems, the same name can be expressed in many ways.
Exactly. Even at the level of transport encoding, Chinese spammers
use 8-bit, QP, and Base64 for their headers. I don't know if things
have improved recently, but I have often seen 8-bit in headers from
human beings, too. (Naive ports of ASCII-oriented MUAs just extract a
full name from a file and copy it verbatim to the header. You're
supposed to use a utility to translate to MIME, but "power users"
often don't bother---and this works fine for Japanese MUAs, which all
are quite lenient about such things.) Nor are Base64 or QP actually
uniquely defined! Naive implementations of QP may QP all characters,
others the minimal set, and others will go to some trouble to pick a
set of ASCII characters that could cause trouble in headers. E.g.,
the period, which naive software often assumes means "part of a host
spec".
We really need to get this stuff unified and ripped _out_ of the MUAs
and into a separate library so that fixes and improvements to VM,
Gnus, Mew, Wanderlust, TM, SEMI, mh-e, etc etc all go into the same
library. (Yeah, I know, that was the idea behind TM and SEMI and look
how far _that_ got. :-( )
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.