Re: supercite.el: sc-get-address/sc-attribs-extract-namestring

Tuesday, 26 November 2002

        "Stephen J. Turnbull" <stephen(a)xemacs.org&gt; writes:

...
     >> Writing a formal RFC2822 parser might actually gives us
less
     >> than what we need...

     Simon> I don't think so -- having a real parser would allow you to
     Simon> say that you want to look at the first comment after an
     Simon> addr-spec.  A function could try all the standard ways to
     Simon> put the full name in a header line and return it to the
     Simon> caller.

 The problem is that there aren't (as of 1123, I'm not really up on the
 RFC2822 clarifications) standardized ways, and identifying them is
 actually more a lexical problem than an RFC2?822 syntactic issue.  A
 correct parser would at least allow us to separate the net address
 from the rest of the goop, though, which none of the current
 applications I use (Gnus, VM, BBDB, Supercite) do with 100% accuracy. 
Identifying the addr-spec should be straight forward, as for
extracting the full name RFC 2822 now says one format SHOULD be used.
There is at least one older format used (first comment after
addr-spec) but supporting those two <famous last words>should be
sufficient</>.

...
     >> Another problem (separate from RFC2822 issues) is that
things
     >> like BBDB store the rendered version, so if we start pulling
     >> our unrendered chunks of headers, we'd have to render them
     >> before comparing...

     Simon> This is a problem in BBDB currently too.  Most if not all
     Simon> non-ASCII names have multiple entries in my .bbdb because
     Simon> they are encoded differently.  OTOH using raw mail headers
     Simon> would be just as bad.  A proper
     Simon> canonicalize-then-compare-for-equality function is needed
     Simon> to fix this.

 "Proper" probably has to wait for full Unicode support, as people can
 always confuse you with [U+00E4] <-> [U+0061 U+0308] in UTF-8, or even
 [a "] in pidgen composition.   
I don't think there will ever be a proper solution.  Full unicode has
its problems too; the main one is that Unicode is a moving target and
keeps evolving, but also that you usually need to implement a
decomposition mechanism to compare strings. And Unicode Inc keeps
remapping the decomposition tables so you never know what results you
get.

...
 But as long as the BBDB database is stored in a UCS (which it
 currently is not), a quite high degree of canonicalization is
 automatic in Mule if rendered versions are always used, as things
 like Unicode composing characters are rare. 
They will become more common when Unicode becomes more common... most
likely a unicode decomposition should be performed before comparing
the strings though.

...
 We really need to get this stuff unified and ripped _out_ of the
MUAs
 and into a separate library so that fixes and improvements to VM,
 Gnus, Mew, Wanderlust, TM, SEMI, mh-e, etc etc all go into the same
 library. 
I agree completely.  Unfortunately I don't have the time. :-/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: supercite.el: sc-get-address/sc-attribs-extract-namestring