Re: latin encodings, mail-lib, and posix regexp classes

Thursday, 6 June 2002

        Simon Josefsson <jas(a)extundo.com&gt; writes:

...
 RFC (2)822 articles should not contain iso-8859-15 characters.
 Perhaps mail-extr is supposed to operate on raw articles, not MIME
 decoded ones? 
Ah, you're probably right.  Now I'm not sure what's supposed to be
happening, since BBDB gets passed the header after it's been decoded,
but it uses mail-extract-address-components (mail-extr.el) or
rfc822-addresses (rfc822.el) [both part of mail-lib].  So, where's the
bug?  There's code in there already to handle latin-1 chars:

(let* ((latin1-ss (string (make-char 'latin-iso8859-1 223)))
       (latin9-ss (string (make-char 'latin-iso8859-15 1759)))
       (latin1-addr (concat "Joe Te" latin1-ss "t
<joe.test(a)foo.org&gt;&quot;))
       (latin9-addr (concat "Joe Te" latin9-ss "t
<joe.test(a)foo.org&gt;&quot;)))
  (concat "Works: <" (car (mail-extract-address-components latin1-addr))
	  ">, Broken: <" (car (mail-extract-address-components latin9-addr))
	  ">"))

=> "Works: <Joe Teßt>, Broken: <Joe Te>"   

...
 mail-extr.el is in need of FSF syncing.  Perhaps that would the
 first step?  It is a large task undertaking though. 
Indeed.  Since there isn't support (athough there is mention of it in
regex.h) for POSIX char classes, this will be even more work.  Adding
support for them first would be a good thing, imho.  Or perhaps we
could rely on the fact that syntax tables are defined for each part of
the address during parsing and just use \sw as the match character?

I'm not sure.

-- 
Josh Huber

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: latin encodings, mail-lib, and posix regexp classes