Josh Huber <huber(a)alum.wpi.edu> writes:
Simon Josefsson <jas(a)extundo.com> writes:
> RFC (2)822 articles should not contain iso-8859-15 characters.
> Perhaps mail-extr is supposed to operate on raw articles, not MIME
> decoded ones?
Ah, you're probably right. Now I'm not sure what's supposed to be
happening, since BBDB gets passed the header after it's been decoded,
but it uses mail-extract-address-components (mail-extr.el) or
rfc822-addresses (rfc822.el) [both part of mail-lib]. So, where's the
bug? There's code in there already to handle latin-1 chars:
(let* ((latin1-ss (string (make-char 'latin-iso8859-1 223)))
(latin9-ss (string (make-char 'latin-iso8859-15 1759)))
(latin1-addr (concat "Joe Te" latin1-ss "t
<joe.test(a)foo.org>"))
(latin9-addr (concat "Joe Te" latin9-ss "t
<joe.test(a)foo.org>")))
(concat "Works: <" (car (mail-extract-address-components latin1-addr))
">, Broken: <" (car (mail-extract-address-components latin9-addr))
">"))
=> "Works: <Joe Teßt>, Broken: <Joe Te>"
This might just be an accident, I don't think mail-extr.el was
designed for anything but ASCII. Can't BBDB be modified to work with
the raw header? Then it doesn't have to rely on the callee passing it
correctly decoded data.
> mail-extr.el is in need of FSF syncing. Perhaps that would the
> first step? It is a large task undertaking though.
Indeed. Since there isn't support (athough there is mention of it in
regex.h) for POSIX char classes, this will be even more work. Adding
support for them first would be a good thing, imho. Or perhaps we
could rely on the fact that syntax tables are defined for each part of
the address during parsing and just use \sw as the match character?
Emacs' mail-extr.el uses POSIX char classes in three places (which I
suspect might not be entirely correct, it only works for raw 8bits in
headers, which is forbidden), so I think it can be synced without
worrying about it very much.