>>>> "Stephen" == Stephen J Turnbull
<stephen(a)xemacs.org> writes:
>>>> "Jan" == Jan Rychter <jan(a)rychter.com> writes:
Jan> And indeed, the Emacs version of mail-extr.el seems to use
Jan> [:alnum:] and [:alpha:]:
Stephen> Which is the right thing to do. But our regexp engine doesn't
Stephen> currently support those. It's not clear to me what the right
Stephen> way to do that is, given the brokenness of the POSIX standard
Stephen> with respect to multilingual text. I guess the best bet is
Stephen> simply to use Unicode's idea of whether something is a word
Stephen> component or not.
Stephen> Volunteers? Note that the implementation will be very
Stephen> different in
Stephen> 21.4 (which has no good access to Unicode tables) and 21.5.
Well, it seems there were no volunteers :-( (and as I wrote, I am not
capable to fix it myself)
Jan> Now, opinions are divided on whether
Jan> mail-extract-address-components should really get multilingual
Jan> text or not. In any case, the current solution is rather broken.
Stephen> mail-extr.el is basically a collection of hacks and kludges
Stephen> anyway; you may as well just add the relevant characters to
Stephen> those regexps. That will work in 21.4 as well as 21.5. Or
Stephen> (as a heuristic), just use [^\000-\255] (you'd have to do that
Stephen> as an alternative rather than a character class). I bet that
Stephen> will work well enough in practice.
Well, I can continue to kludge around this adding relevant characters
myself, but...
The current situation basically means that mail-extr is broken for
anyone who doesn't use ASCII + ISO-8859-1. It breaks BBDB and Supercite
for me. I'd consider it quite serious breakage.
--J.