Re: supercite.el: sc-get-address/sc-attribs-extract-namestring

Tuesday, 26 November 2002

        "Martin Schwenke" <martin(a)meltin.net&gt; writes:

...
 [OK, I've finished annoying Joe Wells...  By the way, is there an
easy
  way of finding out who the current XEmacs maintainer for something
  is?] 
Looking at the Makefile in the CVS directory for the package?

...
>>>>> "Simon" == Simon Josefsson
<jas(a)extundo.com&gt; writes:

     Simon> Do you mean how to use it?  Like I said, call it with ASCII
     Simon> only strings.  mail-extr.el seems to have been designed for
     Simon> parsing name and addressess from RFC 822 headers (see the
     Simon> first line of the file, for instance), and mail headers are
     Simon> ASCII only.  I guess the problems comes when it is used in
     Simon> non-RFC822 situations, where the full mail address includes
     Simon> non-ASCII, but mail-extr.el was not designed to be used for
     Simon> that from what I can tell, and it does a poor job of it in
     Simon> practice too (demonstrated by the buggy snippet in this
     Simon> thread).

 OK, I have a really naive question (since I've always taken the
 non-ASCII stuff for granted, since I only have an ASCII name :-):

   In general, is there are simple way of getting the contents of the
   original 7-bit mail header without rendering it?

 Surely that's the right way of handling it... 
Yes.  In Gnus, you find the original article in the buffer specified
by `gnus-original-article-buffer'.

...
 However, mail-extr.el still doesn't work adequately on the
unrendered
 version:

 ELISP> (mail-extract-address-components "Ville =?iso-8859-1?q?Skytt=E4?=
<scop(a)xemacs.org&gt;&quot;)
 ("Ville" &quot;scop(a)xemacs.org&quot;)

:-( 
I'd call this a real bug in mail-extr.el.  The heuristic has some code
in it to remove various things regarded as garbage, perhaps it treats
QP as garbage.

...
     Simon> As for implementing this properly, I think it would be
nice
     Simon> to have a higher-level API that handled mail addresses with
     Simon> non-ASCII content.  Perhaps this new API should even
     Simon> replace the current `mail-extract-address-components' entry
     Simon> point.  Of course, that effect could be simulated by
     Simon> enhancing the current mail-extr.el but looking at the code
     Simon> I doubt it will be a pleasant experience.

 OK, as you said, mail headers are ASCII.  Once they are rendered using
 whatever encoding they might specify, it is probably too late to try
 to do something clever to try and handle them as ASCII, because the
 rendering isn't necessarily reversible (I guess).

 Hmmm... maybe we're saying the same thing? 
I think so.  OTOH, if you do have the rendered full mail address, you
should be able to use some elisp package to get the mail address only.
`gnus-extract-address-components' does this, but perhaps you don't
want to depend on Gnus.  You could copy the code (it is short), but
then you will have to keep it in sync whenever it is updated, so it is
probably not such a good idea.

...
     Simon> (Another rant is that mail-extr.el doesn't
implement a
     Simon> correct RFC(2)822 parser, but rather a heuristic parser
     Simon> improved over time.  In fact, emacs doesn't have a correct
     Simon> RFC 2822 parser anywhere, which is a shame.)

 I think that would be nice, but there is (at least) one interesting
 problem with using a proper RFC2822 parser.  My reading of it tells me
 that in

   Simon Josefsson <jas(a)extundo.com&gt;

 the "Simon Josefsson" part is a display-name, which is useful for our
 purposes.  However, in

   jas(a)extundo.com (Simon Josefsson)

 the "Simon Josefsson" part is a comment that isn't a separate synactic
 (and, therefore, semantic) entity.

 Writing a formal RFC2822 parser might actually gives us less than what
 we need... 
I don't think so -- having a real parser would allow you to say that
you want to look at the first comment after an addr-spec.  A function
could try all the standard ways to put the full name in a header line
and return it to the caller.

...
 Another problem (separate from RFC2822 issues) is that things like
 BBDB store the rendered version, so if we start pulling our unrendered
 chunks of headers, we'd have to render them before comparing... 
This is a problem in BBDB currently too.  Most if not all non-ASCII
names have multiple entries in my .bbdb because they are encoded
differently.  OTOH using raw mail headers would be just as bad.  A
proper canonicalize-then-compare-for-equality function is needed to
fix this.

...
 Maybe it is a case of incrementally fixing the glitches...  or
coming
 up with something really clever.  I think the really clever idea is to
 always deal with (and store) unrendered ASCII chunks and be careful
 when comparing them... but that sounds like a lot of work and involves
 getting a lot of people to agree... 
I dunno, there doesn't seem to be a simple solution.  Dealing with and
storing unrendered ASCII chunks has the same problems, the same name
can be expressed in many ways.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: supercite.el: sc-get-address/sc-attribs-extract-namestring