"Martin Schwenke" <martin(a)meltin.net> writes:
[OK, I've finished annoying Joe Wells... By the way, is there an
easy
way of finding out who the current XEmacs maintainer for something
is?]
Looking at the Makefile in the CVS directory for the package?
>>>>> "Simon" == Simon Josefsson
<jas(a)extundo.com> writes:
Simon> Do you mean how to use it? Like I said, call it with ASCII
Simon> only strings. mail-extr.el seems to have been designed for
Simon> parsing name and addressess from RFC 822 headers (see the
Simon> first line of the file, for instance), and mail headers are
Simon> ASCII only. I guess the problems comes when it is used in
Simon> non-RFC822 situations, where the full mail address includes
Simon> non-ASCII, but mail-extr.el was not designed to be used for
Simon> that from what I can tell, and it does a poor job of it in
Simon> practice too (demonstrated by the buggy snippet in this
Simon> thread).
OK, I have a really naive question (since I've always taken the
non-ASCII stuff for granted, since I only have an ASCII name :-):
In general, is there are simple way of getting the contents of the
original 7-bit mail header without rendering it?
Surely that's the right way of handling it...
Yes. In Gnus, you find the original article in the buffer specified
by `gnus-original-article-buffer'.
However, mail-extr.el still doesn't work adequately on the
unrendered
version:
ELISP> (mail-extract-address-components "Ville =?iso-8859-1?q?Skytt=E4?=
<scop(a)xemacs.org>")
("Ville" "scop(a)xemacs.org")
:-(
I'd call this a real bug in mail-extr.el. The heuristic has some code
in it to remove various things regarded as garbage, perhaps it treats
QP as garbage.
Simon> As for implementing this properly, I think it would be
nice
Simon> to have a higher-level API that handled mail addresses with
Simon> non-ASCII content. Perhaps this new API should even
Simon> replace the current `mail-extract-address-components' entry
Simon> point. Of course, that effect could be simulated by
Simon> enhancing the current mail-extr.el but looking at the code
Simon> I doubt it will be a pleasant experience.
OK, as you said, mail headers are ASCII. Once they are rendered using
whatever encoding they might specify, it is probably too late to try
to do something clever to try and handle them as ASCII, because the
rendering isn't necessarily reversible (I guess).
Hmmm... maybe we're saying the same thing?
I think so. OTOH, if you do have the rendered full mail address, you
should be able to use some elisp package to get the mail address only.
`gnus-extract-address-components' does this, but perhaps you don't
want to depend on Gnus. You could copy the code (it is short), but
then you will have to keep it in sync whenever it is updated, so it is
probably not such a good idea.
Simon> (Another rant is that mail-extr.el doesn't
implement a
Simon> correct RFC(2)822 parser, but rather a heuristic parser
Simon> improved over time. In fact, emacs doesn't have a correct
Simon> RFC 2822 parser anywhere, which is a shame.)
I think that would be nice, but there is (at least) one interesting
problem with using a proper RFC2822 parser. My reading of it tells me
that in
Simon Josefsson <jas(a)extundo.com>
the "Simon Josefsson" part is a display-name, which is useful for our
purposes. However, in
jas(a)extundo.com (Simon Josefsson)
the "Simon Josefsson" part is a comment that isn't a separate synactic
(and, therefore, semantic) entity.
Writing a formal RFC2822 parser might actually gives us less than what
we need...
I don't think so -- having a real parser would allow you to say that
you want to look at the first comment after an addr-spec. A function
could try all the standard ways to put the full name in a header line
and return it to the caller.
Another problem (separate from RFC2822 issues) is that things like
BBDB store the rendered version, so if we start pulling our unrendered
chunks of headers, we'd have to render them before comparing...
This is a problem in BBDB currently too. Most if not all non-ASCII
names have multiple entries in my .bbdb because they are encoded
differently. OTOH using raw mail headers would be just as bad. A
proper canonicalize-then-compare-for-equality function is needed to
fix this.
Maybe it is a case of incrementally fixing the glitches... or
coming
up with something really clever. I think the really clever idea is to
always deal with (and store) unrendered ASCII chunks and be careful
when comparing them... but that sounds like a lot of work and involves
getting a lot of people to agree...
I dunno, there doesn't seem to be a simple solution. Dealing with and
storing unrendered ASCII chunks has the same problems, the same name
can be expressed in many ways.