>>>> "Jan" == Jan Rychter
<jan(a)rychter.com> writes:
Jan> And indeed, the Emacs version of mail-extr.el seems to use
Jan> [:alnum:] and [:alpha:]:
Which is the right thing to do. But our regexp engine doesn't
currently support those. It's not clear to me what the right way to
do that is, given the brokenness of the POSIX standard with respect to
multilingual text. I guess the best bet is simply to use Unicode's
idea of whether something is a word component or not.
Volunteers? Note that the implementation will be very different in
21.4 (which has no good access to Unicode tables) and 21.5.
Jan> Now, opinions are divided on whether
Jan> mail-extract-address-components should really get
Jan> multilingual text or not.
Jan> In any case, the current solution is rather broken.
mail-extr.el is basically a collection of hacks and kludges anyway;
you may as well just add the relevant characters to those regexps.
That will work in 21.4 as well as 21.5. Or (as a heuristic), just use
[^\000-\255] (you'd have to do that as an alternative rather than a
character class). I bet that will work well enough in practice.
--
Institute of Policy and Planning Sciences
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.