latin encodings, mail-lib, and posix regexp classes

Thursday, 6 June 2002

        In an attempt to track down problems I'm having wrt to BBDB and
Latin-9 characters, I've come across a problem.  My understanding of
these issues isn't very complete at the moment.  My guess is that even
when using the latin-unity package some of the characters are not
marked as a word-constituant...

For example:

thisshouldbeßonewordright?

Going to the start of that word, and hitting M-f advances to the ß.
Actually, the ß in this mail is a Latin-1 character, but the effect is
the same as with the iso-8859-15 encoded character.

Basically, how can I get those extra characters in the syntax table in
some kind of proper way?  Or, are they supposed to be there already,
and is my installation screwed in some way?

Now onto what caused me to look for this.  In mail-lib there are a
couple things that concern me.  First, the regexps used to match names
won't work for names with latin encoded characters from iso-8859-15.

For example, here is one regexp:

(defconst mail-extr-first-letters (purecopy "A-Za-z"))

Taking a look at the Emacs CVS, they use [:alpha:], which (in theory)
should match the correct set of characters.

But, I tried changing this, and it didn't work as expected -- does
XEmacs support POSIX character classes?

(string-match "[:alpha:]" "test")
=> nil

I was going to go through the mail-extr.el file to find all the
changes, but decided to hold up until I found out the deal with POSIX
regexp classes. (since they're used fairly extensively in Emacs'
mail-lib).

Thanks for any help,

-- 
Josh Huber

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998