[OK, I've finished annoying Joe Wells... By the way, is there an easy
way of finding out who the current XEmacs maintainer for something
is?]
>>>> "Simon" == Simon Josefsson
<jas(a)extundo.com> writes:
Simon> Do you mean how to use it? Like I said, call it with ASCII
Simon> only strings. mail-extr.el seems to have been designed for
Simon> parsing name and addressess from RFC 822 headers (see the
Simon> first line of the file, for instance), and mail headers are
Simon> ASCII only. I guess the problems comes when it is used in
Simon> non-RFC822 situations, where the full mail address includes
Simon> non-ASCII, but mail-extr.el was not designed to be used for
Simon> that from what I can tell, and it does a poor job of it in
Simon> practice too (demonstrated by the buggy snippet in this
Simon> thread).
OK, I have a really naive question (since I've always taken the
non-ASCII stuff for granted, since I only have an ASCII name :-):
In general, is there are simple way of getting the contents of the
original 7-bit mail header without rendering it?
Surely that's the right way of handling it...
In the case of VM, it looks like the headers need to be taken from the
INBOX buffer instead of the "INBOX Presentation" buffer (substitute
appropriate folder names where necessary). That looks to be an easy
way of getting the ASCII version (and see comment below).
However, mail-extr.el still doesn't work adequately on the unrendered
version:
ELISP> (mail-extract-address-components "Ville =?iso-8859-1?q?Skytt=E4?=
<scop(a)xemacs.org>")
("Ville" "scop(a)xemacs.org")
:-(
[Suddenly Ville joins people such as Prince and Madonna... :-]
Simon> As for implementing this properly, I think it would be nice
Simon> to have a higher-level API that handled mail addresses with
Simon> non-ASCII content. Perhaps this new API should even
Simon> replace the current `mail-extract-address-components' entry
Simon> point. Of course, that effect could be simulated by
Simon> enhancing the current mail-extr.el but looking at the code
Simon> I doubt it will be a pleasant experience.
OK, as you said, mail headers are ASCII. Once they are rendered using
whatever encoding they might specify, it is probably too late to try
to do something clever to try and handle them as ASCII, because the
rendering isn't necessarily reversible (I guess).
Hmmm... maybe we're saying the same thing?
Simon> (Another rant is that mail-extr.el doesn't implement a
Simon> correct RFC(2)822 parser, but rather a heuristic parser
Simon> improved over time. In fact, emacs doesn't have a correct
Simon> RFC 2822 parser anywhere, which is a shame.)
I think that would be nice, but there is (at least) one interesting
problem with using a proper RFC2822 parser. My reading of it tells me
that in
Simon Josefsson <jas(a)extundo.com>
the "Simon Josefsson" part is a display-name, which is useful for our
purposes. However, in
jas(a)extundo.com (Simon Josefsson)
the "Simon Josefsson" part is a comment that isn't a separate synactic
(and, therefore, semantic) entity.
Writing a formal RFC2822 parser might actually gives us less than what
we need...
Another problem (separate from RFC2822 issues) is that things like
BBDB store the rendered version, so if we start pulling our unrendered
chunks of headers, we'd have to render them before comparing...
Maybe it is a case of incrementally fixing the glitches... or coming
up with something really clever. I think the really clever idea is to
always deal with (and store) unrendered ASCII chunks and be careful
when comparing them... but that sounds like a lot of work and involves
getting a lot of people to agree...
peace & happiness,
martin