commit/XEmacs: 2 new changesets

Friday, 4 May 2012

        2 new commits in XEmacs:

https://bitbucket.org/xemacs/xemacs/changeset/3df910176b6a/
changeset:   3df910176b6a
user:        kehoea
date:        2012-05-04 22:12:02
summary:     Support predefined character classes in #'skip-chars-{forward,backward},
too

src/ChangeLog addition:

2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* regex.c:
	Move various #defines and enums to regex.h, since we need them
	when implementing #'skip-chars-{backward,forward}.
	* regex.c (re_wctype):
	* regex.c (re_iswctype):
	Be more robust about case insensitivity here.
	* regex.c (regex_compile):
	* regex.h:
	* regex.h (RE_ISWCTYPE_ARG_DECL):
	* regex.h (CHAR_CLASS_MAX_LENGTH):
	* search.c (skip_chars):
	Implement support for the predefined character classes in this
	function.

tests/ChangeLog addition:

2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* automated/regexp-tests.el (equal):
	* automated/regexp-tests.el (Assert-char-class):
	Correct a stray parenthesis; add tests for the predefined
	character classes with #'skip-chars-{forward,backward}; update the
	tests to reflect some changed design decisions on my part.

man/ChangeLog addition:

2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

	* lispref/searching.texi (Regular Expressions):
	* lispref/searching.texi (Syntax of Regexps):
	* lispref/searching.texi (Char Classes):
	* lispref/searching.texi (Regexp Example):
	Document the predefined character classes in this file.
affected #:  8 files

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 man/ChangeLog
--- a/man/ChangeLog
+++ b/man/ChangeLog
＠＠ -1,3 +1,11 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* lispref/searching.texi (Regular Expressions):
+	* lispref/searching.texi (Syntax of Regexps):
+	* lispref/searching.texi (Char Classes):
+	* lispref/searching.texi (Regexp Example):
+	Document the predefined character classes in this file.
+
 2011-12-30  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* cl.texi (Top):

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 man/lispref/searching.texi
--- a/man/lispref/searching.texi
+++ b/man/lispref/searching.texi
＠＠ -180,6 +180,7 ＠＠

 ＠menu
 * Syntax of Regexps::       Rules for writing regular expressions.
+* Char Classes::            Predefined character classes for searching.
 * Regexp Example::          Illustrates regular expression syntax.
 ＠end menu

＠＠ -335,6 +336,11 ＠＠
 To include ＠samp{^} in a set, put it anywhere but at the beginning of
 the set.

+It is also possible to specify named character classes as part of your
+character set; for example, ＠samp{[:xdigit:]} will match hexadecimal
+digits, ＠samp{[:nonascii:]} will match characters outside the basic
+ASCII set.  These are documented elsewhere, ＠pxref{Char Classes}.
+
 ＠item [^ ＠dots{} ]
 ＠cindex ＠samp{^} in regexp
 ＠samp{[^} begins a ＠dfn{complement character set}, which matches any
＠＠ -604,6 +610,61 ＠＠
 ＠end example
 ＠end defun

+＠node Char Classes
+＠subsection Char Classes
+
+These are the predefined character classes available within regular
+expression character sets, and within ＠samp{skip-chars-forward} and
+＠samp{skip-chars-backward}, ＠xref{Skipping Characters}.
+
+＠table ＠samp
+＠item [:alnum:]
+This matches any ASCII letter or digit, or any non-ASCII character
+with word syntax.
+＠item [:alpha:]
+This matches any ASCII letter, or any non-ASCII character with word syntax.
+＠item [:ascii:]
+This matches any character with a numeric value below ＠samp{?\x80}.
+＠item [:blank:]
+This matches space or tab.
+＠item [:cntrl:]
+This matches any character with a numeric value below ＠samp{?\x20},
+the code for space; these are the ASCII control characters.
+＠item [:digit:]
+This matches the characters ＠samp{?0} to ＠samp{?9}, inclusive.
+＠item [:graph:]
+This matches ``graphic'' characters, with numeric values greater than
+＠samp{?\x20}, exclusive of ＠samp{?\x7f}, the delete character. 
+＠item [:lower:]
+This matches minuscule characters, or any character with case
+information if ＠samp{case-fold-search} is non-nil.
+＠item [:multibyte:]
+This matches non-ASCII characters, that is, any character with a
+numeric value above ＠samp{?\x7f}.
+＠item [:nonascii:]
+This is equivalent to ＠samp{[:multibyte:]}.
+＠item [:print:]
+This is equivalent to [:graph:], but also matches the space character,
+＠samp{?\x20}.
+＠item [:punct:]
+This matches non-control, non-alphanumeric ASCII characters, or any
+non-ASCII character without word syntax.
+＠item [:space:]
+This matches any character with whitespace syntax.
+＠item [:unibyte:]
+This is a GNU Emacs extension; in XEmacs it is equivalent to
+＠samp{[:ascii:]}. Note that this means it is not equivalent to
+＠samp{"\x00-\xff"}, which one might have assumed to be the case.
+＠item [:upper:]
+This matches majuscule characters, or any character with case
+information if ＠samp{case-fold-search} is non-nil.
+＠item [:word:]
+This matches any character with word syntax.
+＠item [:xdigit:]
+This matches hexadecimal digits, so the decimal digits ＠samp{0-9} and the
+letters ＠samp{a-F} and ＠samp{A-F}.
+＠end table
+
 ＠node Regexp Example
 ＠subsection Complex Regexp Example

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 src/ChangeLog
--- a/src/ChangeLog
+++ b/src/ChangeLog
＠＠ -1,3 +1,19 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* regex.c:
+	Move various #defines and enums to regex.h, since we need them
+	when implementing #'skip-chars-{backward,forward}.
+	* regex.c (re_wctype):
+	* regex.c (re_iswctype):
+	Be more robust about case insensitivity here.
+	* regex.c (regex_compile):
+	* regex.h:
+	* regex.h (RE_ISWCTYPE_ARG_DECL):
+	* regex.h (CHAR_CLASS_MAX_LENGTH):
+	* search.c (skip_chars):
+	Implement support for the predefined character classes in this
+	function.
+
 2012-04-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* search.c (string_match_1): Actually use the POSIX argument here,

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 src/regex.c
--- a/src/regex.c
+++ b/src/regex.c
＠＠ -178,51 +178,7 ＠＠
 /* isalpha etc. are used for the character classes.  */
 #include <ctype.h>

-#ifdef emacs
-
-/* 1 if C is an ASCII character.  */
-#define ISASCII(c) ((c) < 0x80)
-
-/* 1 if C is a unibyte character.  */
-#define ISUNIBYTE(c) 0
-
-/* The Emacs definitions should not be directly affected by locales.  */
-
-/* In Emacs, these are only used for single-byte characters.  */
-#define ISDIGIT(c) ((c) >= '0' && (c) <= '9')
-#define ISCNTRL(c) ((c) < ' ')
-#define ISXDIGIT(c) (ISDIGIT (c) || ((c) >= 'a' && (c) <=
'f')	\
-		     || ((c) >= 'A' && (c) <= 'F'))
-
-/* This is only used for single-byte characters.  */
-#define ISBLANK(c) ((c) == ' ' || (c) == '\t')
-
-/* The rest must handle multibyte characters.  */
-
-#define ISGRAPH(c) ((c) > ' ' && (c) != 0x7f)
-#define ISPRINT(c) ((c) == ' ' || ISGRAPH (c))
-#define ISALPHA(c) (ISASCII (c) ? (((c) >= 'a' && (c) <=
'z')		\
-				   || ((c) >= 'A' && (c) <= 'Z'))	\
-		    : ISWORD (c))
-#define ISALNUM(c) (ISALPHA (c) || ISDIGIT (c))
-
-#define ISLOWER(c) LOWERCASEP (lispbuf, c)
-
-#define ISPUNCT(c) (ISASCII (c)                                 \
-		    ? ((c) > ' ' && (c) < 0x7F			\
-		       && !(((c) >= 'a' && (c) <= 'z')		\
-		            || ((c) >= 'A' && (c) <= 'Z')	\
-		            || ((c) >= '0' && (c) <= '9')))	\
-		    : !ISWORD (c))
-
-#define ISSPACE(c) \
-	(SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Swhitespace)
-
-#define ISUPPER(c) UPPERCASEP (lispbuf, c)
-
-#define ISWORD(c) (SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Sword)
-
-#else /* not emacs */
+#ifndef emacs /* For the emacs build, we need these in the header. */

 /* 1 if C is an ASCII character.  */
 #define ISASCII(c) ((c) < 0200)
＠＠ -2013,23 +1969,6 ＠＠
 /* The next available element.  */
 #define COMPILE_STACK_TOP (compile_stack.stack[compile_stack.avail])

-/* Bits used to implement the multibyte-part of the various character
-   classes such as [:alnum:] in a charset's range table. XEmacs; use an
-   enum, so they're visible in the debugger. */
-enum
-{
-  BIT_WORD = (1 << 0),
-  BIT_LOWER = (1 << 1),
-  BIT_PUNCT = (1 << 2),
-  BIT_SPACE = (1 << 3),
-  BIT_UPPER = (1 << 4),
-  /* XEmacs; we need this, because we unify treatment of ASCII and non-ASCII
-     (possible matches) in charset_mule. [:alpha:] matches all characters
-     with word syntax, with the exception of [0-9]. We don't need
-     BIT_MULTIBYTE. */
-  BIT_ALPHA = (1 << 5)
-};
-
 /* Set the bit for character C in a bit vector.  */
 #define SET_LIST_BIT(c)				\
   (buf_end[((unsigned char) (c)) / BYTEWIDTH]	\
＠＠ -2059,10 +1998,8 ＠＠
        } 								\
     }

-#define CHAR_CLASS_MAX_LENGTH  9 /* Namely, `multibyte'.  */
-
 /* Map a string to the char class it names (if any).  */
-static re_wctype_t
+re_wctype_t
 re_wctype (const char *string)
 {
   if      (STREQ (string, "alnum"))	return RECC_ALNUM;
＠＠ -2086,17 +2023,10 ＠＠
 }

 /* True if CH is in the char class CC.  */
-static re_bool
-re_iswctype (int ch, re_wctype_t cc)
+int
+re_iswctype (int ch, re_wctype_t cc
+             RE_ISWCTYPE_ARG_DECL)
 {
-#ifdef emacs
-  /* This is cheesy, lispbuf isn't available to us when compiling the
-     pattern. It's effectively only called (on Mule builds) when the current
-     buffer doesn't matter (e.g. for RECC_ASCII, RECC_CNTRL), so it's not a
-     big deal. */
-  struct buffer *lispbuf = current_buffer;
-#endif
-
   switch (cc)
     {
     case RECC_ALNUM: return ISALNUM (ch) != 0;
＠＠ -2105,11 +2035,20 ＠＠
     case RECC_CNTRL: return ISCNTRL (ch) != 0;
     case RECC_DIGIT: return ISDIGIT (ch) != 0;
     case RECC_GRAPH: return ISGRAPH (ch) != 0;
-    case RECC_LOWER: return ISLOWER (ch) != 0;
     case RECC_PRINT: return ISPRINT (ch) != 0;
     case RECC_PUNCT: return ISPUNCT (ch) != 0;
     case RECC_SPACE: return ISSPACE (ch) != 0;
+#ifdef emacs
+    case RECC_UPPER: 
+      return NILP (lispbuf->case_fold_search) ? ISUPPER (ch) != 0
+: !NOCASEP (lispbuf, ch);
+    case RECC_LOWER: 
+      return NILP (lispbuf->case_fold_search) ? ISLOWER (ch) != 0
+: !NOCASEP (lispbuf, ch);
+#else
     case RECC_UPPER: return ISUPPER (ch) != 0;
+    case RECC_LOWER: return ISLOWER (ch) != 0;
+#endif
     case RECC_XDIGIT: return ISXDIGIT (ch) != 0;
     case RECC_ASCII: return ISASCII (ch) != 0;
     case RECC_NONASCII: case RECC_MULTIBYTE: return !ISASCII (ch);
＠＠ -2140,6 +2079,10 ＠＠
     }
 }

+#endif /* MULE */
+
+#ifdef emacs
+
 /* Return a bit-pattern to use in the range-table bits to match multibyte
    chars of class CC.  */
 static unsigned char
＠＠ -2158,7 +2101,8 ＠＠
     case RECC_ASCII: case RECC_DIGIT: case RECC_XDIGIT: case RECC_CNTRL:
     case RECC_BLANK: case RECC_UNIBYTE: case RECC_ERROR: return 0;
     default:
-      abort ();
+      ABORT ();
+      return 0;
     }
 }

＠＠ -2185,9 +2129,12 ＠＠
 					     RE_TRANSLATE_TYPE translate,
 					     reg_syntax_t syntax,
 					     Lisp_Object rtab);
-static reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
-                                         Bitbyte *flags_out);
 #endif /* MULE */
+#ifdef emacs
+reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
+                                  Bitbyte *flags_out);
+#endif
+
 static re_bool group_match_null_string_p (unsigned char **p,
 					  unsigned char *end,
 					  register_info_type *reg_info);
＠＠ -2814,7 +2761,8 ＠＠
 #endif /* MULE */
 			for (ch = 0; ch < (1 << BYTEWIDTH); ++ch)
 			  {
-			    if (re_iswctype (ch, cc))
+			    if (re_iswctype (ch, cc
+                                             RE_ISWCTYPE_ARG (current_buffer)))
 			      {
 				SET_LIST_BIT (ch);
 			      }
＠＠ -3938,7 +3886,11 ＠＠
   return REG_NOERROR;
 }

-static reg_errcode_t
+#endif /* MULE */
+
+#ifdef emacs
+
+reg_errcode_t
 compile_char_class (re_wctype_t cc, Lisp_Object rtab, Bitbyte *flags_out)
 {
   *flags_out |= re_wctype_to_bit (cc);

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 src/regex.h
--- a/src/regex.h
+++ b/src/regex.h
＠＠ -30,6 +30,8 ＠＠
 #define RE_LISP_CONTEXT_ARGS_DECL , Lisp_Object lispobj, struct buffer *lispbuf, struct
syntax_cache *scache
 #define RE_LISP_CONTEXT_ARGS_MULE_DECL , Lisp_Object lispobj, struct buffer *USED_IF_MULE
(lispbuf), struct syntax_cache *scache
 #define RE_LISP_CONTEXT_ARGS , lispobj, lispbuf, scache
+#define RE_ISWCTYPE_ARG_DECL , struct buffer *lispbuf
+#define RE_ISWCTYPE_ARG(varname) , varname
 #else
 #define RE_TRANSLATE_TYPE char *
 #define RE_LISP_SHORT_CONTEXT_ARGS_DECL
＠＠ -37,6 +39,8 ＠＠
 #define RE_LISP_CONTEXT_ARGS_DECL
 #define RE_LISP_CONTEXT_ARGS_MULE_DECL
 #define RE_LISP_CONTEXT_ARGS
+#define RE_ISWCTYPE_ARG_DECL 
+#define RE_ISWCTYPE_ARG(varname)
 #define Elemcount ssize_t
 #define Bytecount ssize_t
 #endif /* emacs */
＠＠ -559,6 +563,86 ＠＠
     RECC_ASCII, RECC_UNIBYTE
 } re_wctype_t;

+#define CHAR_CLASS_MAX_LENGTH  9 /* Namely, `multibyte'.  */
+
+/* Map a string to the char class it names (if any).  */
+re_wctype_t re_wctype (const char *);
+
+/* Is character CH a member of the character class CC? */
+int re_iswctype (int ch, re_wctype_t cc RE_ISWCTYPE_ARG_DECL);
+
+/* Bits used to implement the multibyte-part of the various character
+   classes such as [:alnum:] in a charset's range table. XEmacs; use an
+   enum, so they're visible in the debugger. */
+enum
+{
+  BIT_WORD = (1 << 0),
+  BIT_LOWER = (1 << 1),
+  BIT_PUNCT = (1 << 2),
+  BIT_SPACE = (1 << 3),
+  BIT_UPPER = (1 << 4),
+  /* XEmacs; we need this, because we unify treatment of ASCII and non-ASCII
+     (possible matches) in charset_mule. [:alpha:] matches all characters
+     with word syntax, with the exception of [0-9]. We don't need
+     BIT_MULTIBYTE. */
+  BIT_ALPHA = (1 << 5)
+};
+
+#ifdef emacs
+reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
+                                  Bitbyte *flags_out);
+
+#endif
+
+/* isalpha etc. are used for the character classes.  */
+#include <ctype.h>
+
+#ifdef emacs
+
+/* 1 if C is an ASCII character.  */
+#define ISASCII(c) ((c) < 0x80)
+
+/* 1 if C is a unibyte character.  */
+#define ISUNIBYTE ISASCII
+
+/* The Emacs definitions should not be directly affected by locales.  */
+
+/* In Emacs, these are only used for single-byte characters.  */
+#define ISDIGIT(c) ((c) >= '0' && (c) <= '9')
+#define ISCNTRL(c) ((c) < ' ')
+#define ISXDIGIT(c) (ISDIGIT (c) || ((c) >= 'a' && (c) <=
'f')	\
+		     || ((c) >= 'A' && (c) <= 'F'))
+
+/* This is only used for single-byte characters.  */
+#define ISBLANK(c) ((c) == ' ' || (c) == '\t')
+
+/* The rest must handle multibyte characters.  */
+
+#define ISGRAPH(c) ((c) > ' ' && (c) != 0x7f)
+#define ISPRINT(c) ((c) == ' ' || ISGRAPH (c))
+#define ISALPHA(c) (ISASCII (c) ? (((c) >= 'a' && (c) <=
'z')		\
+				   || ((c) >= 'A' && (c) <= 'Z'))	\
+		    : ISWORD (c))
+#define ISALNUM(c) (ISALPHA (c) || ISDIGIT (c))
+
+#define ISLOWER(c) LOWERCASEP (lispbuf, c)
+
+#define ISPUNCT(c) (ISASCII (c)                                 \
+		    ? ((c) > ' ' && (c) < 0x7F			\
+		       && !(((c) >= 'a' && (c) <= 'z')		\
+		            || ((c) >= 'A' && (c) <= 'Z')	\
+		            || ((c) >= '0' && (c) <= '9')))	\
+		    : !ISWORD (c))
+
+#define ISSPACE(c) \
+	(SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Swhitespace)
+
+#define ISUPPER(c) UPPERCASEP (lispbuf, c)
+
+#define ISWORD(c) (SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Sword)
+
+#endif 
+
 END_C_DECLS

 #endif /* INCLUDED_regex_h_ */

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 src/search.c
--- a/src/search.c
+++ b/src/search.c
＠＠ -887,9 +887,9 ＠＠
      a range table. */
   unsigned char fastmap[0400];
   int negate = 0;
-  REGISTER int i;
   Charbpos limit;
   struct syntax_cache *scache;
+  Bitbyte class_bits = 0;

   if (NILP (lim))
     limit = forwardp ? BUF_ZV (buf) : BUF_BEGV (buf);
＠＠ -957,6 +957,51 ＠＠
 				  Vskip_chars_range_table);
 	      INC_IBYTEPTR (p);
 	    }
+          else if ('[' == c && p != pend && *p == ':')
+            {
+              Ibyte *colonp;
+              Extbyte *classname;
+              int ch = 0;
+              re_wctype_t cc;
+
+              INC_IBYTEPTR (p);
+
+              if (p == pend)
+                {
+                  fastmap ['['] = fastmap[':'] = 1;
+                  break;
+                }
+
+              colonp = memchr (p, ':', pend - p);
+              if (NULL == colonp || (colonp + 1) == pend || colonp[1] != ']')
+                {
+                  fastmap ['['] = fastmap[':'] = 1;
+                  continue;
+                }
+
+              classname = alloca_extbytes (colonp - p + 1);
+              memmove (classname, p, colonp - p);
+              classname[colonp - p] = '\0';
+              cc = re_wctype (classname);
+                  
+              if (cc == RECC_ERROR)
+                {
+                  invalid_argument ("Invalid character class",
+                                    build_extstring (classname, Qbinary));
+                }
+
+              for (ch = 0; ch < countof (fastmap); ++ch)
+                {
+                  if (re_iswctype (ch, cc, buf))
+                    {
+                      fastmap[ch] = 1;
+                    }
+                }
+
+              compile_char_class (cc, Vskip_chars_range_table, &class_bits);
+
+              p = colonp + 2;
+            }
 	  else
 	    {
 	      if (c < 0400)
＠＠ -972,14 +1017,6 ＠＠
   if (syntaxp && fastmap['-'] != 0)
     fastmap[' '] = 1;

-  /* If ^ was the first character, complement the fastmap.
-     We don't complement the range table, however; we just use negate
-     in the comparisons below. */
-
-  if (negate)
-    for (i = 0; i < (int) (sizeof (fastmap)); i++)
-      fastmap[i] ^= 1;
-
   {
     Charbpos start_point = BUF_PT (buf);
     Charbpos pos = start_point;
＠＠ -996,7 +1033,8 ＠＠
 	      while (fastmap[(unsigned char)
 			     syntax_code_spec
 			     [(int) SYNTAX_FROM_CACHE
-			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]])
+			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]]
+                     != negate)
 		{
 		  pos++;
 		  INC_BYTEBPOS (buf, pos_byte);
＠＠ -1013,10 +1051,11 ＠＠
 		pos--;
 		DEC_BYTEBPOS (buf, pos_byte);
 		UPDATE_SYNTAX_CACHE_BACKWARD (scache, pos);
-		if (!fastmap[(unsigned char)
-			     syntax_code_spec
-			     [(int) SYNTAX_FROM_CACHE
-			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]])
+		if (fastmap[(unsigned char)
+                            syntax_code_spec
+                            [(int) SYNTAX_FROM_CACHE
+                             (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]]
+                    == negate)
 		  {
 		    pos++;
 		    pos_byte = savepos;
＠＠ -1027,16 +1066,30 ＠＠
       }
     else
       {
+        struct buffer *lispbuf = buf;
+
+#define CLASS_BIT_CHECK(c)                                              \
+        (class_bits && ((class_bits & BIT_ALPHA && ISALPHA (c))      
  \
+                        || (class_bits & BIT_SPACE && ISSPACE (c))      \
+                        || (class_bits & BIT_PUNCT && ISPUNCT (c))      \
+                        || (class_bits & BIT_WORD && ISWORD (c))        \
+                        || (NILP (buf->case_fold_search) ?              \
+                            ((class_bits & BIT_UPPER && ISUPPER (c))    \
+                             || (class_bits & BIT_LOWER && ISLOWER (c))) \
+: (class_bits & (BIT_UPPER | BIT_LOWER)     \
+                               && !NOCASEP (buf, c)))))
 	if (forwardp)
 	  {
 	    while (pos < limit)
 	      {
 		Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
-		    (NILP (Fget_range_table (make_fixnum (ch),
-					     Vskip_chars_range_table,
-					     Qnil))
-		     == negate))
+
+                if ((ch < countof (fastmap) ? fastmap[ch]
+: (CLASS_BIT_CHECK (ch) ||
+                        (EQ (Qt, Fget_range_table (make_fixnum (ch),
+                                                   Vskip_chars_range_table,
+                                                   Qnil)))))
+                    != negate)
 		  {
 		    pos++;
 		    INC_BYTEBPOS (buf, pos_byte);
＠＠ -1054,11 +1107,12 ＠＠

 		DEC_BYTEBPOS (buf, prev_pos_byte);
 		ch = BYTE_BUF_FETCH_CHAR (buf, prev_pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
-		    (NILP (Fget_range_table (make_fixnum (ch),
-					     Vskip_chars_range_table,
-					     Qnil))
-		     == negate))
+                if ((ch < countof (fastmap) ? fastmap[ch]
+: (CLASS_BIT_CHECK (ch) ||
+                        (EQ (Qt, Fget_range_table (make_fixnum (ch),
+                                                   Vskip_chars_range_table,
+                                                   Qnil)))))
+                    != negate)
 		  {
 		    pos--;
 		    pos_byte = prev_pos_byte;

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 tests/ChangeLog
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
＠＠ -1,3 +1,11 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* automated/regexp-tests.el (equal):
+	* automated/regexp-tests.el (Assert-char-class):
+	Correct a stray parenthesis; add tests for the predefined
+	character classes with #'skip-chars-{forward,backward}; update the
+	tests to reflect some changed design decisions on my part.
+
 2012-04-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* automated/regexp-tests.el: Check that #'posix-string-match

diff -r d026b665014fda7a8d6148e8cc8fb9d046bff7f7 -r
3df910176b6abc7d66f691cc8cff97498bf5d419 tests/automated/regexp-tests.el
--- a/tests/automated/regexp-tests.el
+++ b/tests/automated/regexp-tests.el
＠＠ -76,7 +76,7 ＠＠
   (save-match-data
     (progn (posix-string-match "i\\|ii" "ii") (match-data)))
   '(0 2))
- "checking #'posix-string-match actually returns the longest match"))
+ "checking #'posix-string-match actually returns the longest match")

 ;; looking-at
 (with-temp-buffer
＠＠ -665,7 +665,25 ＠＠
          (Assert (null (string-match ,(concat "[^" class
                                               (string non-matching-char) "]")
                                      ,(concat (string matching-char)
-                                              (string non-matching-char)))))))
+                                              (string non-matching-char)))))
+         (let ((old-case-fold-search case-fold-search))
+           (with-temp-buffer
+             (setq case-fold-search old-case-fold-search)
+             (insert-char ,matching-char 20)
+             (insert-char ,non-matching-char 20)
+             (goto-char (point-min))
+             (Assert (eql (skip-chars-forward ,class) 20)
+                     ,(format "making sure %s skips %S forward"
+                              class matching-char))
+             (Assert (eql (skip-chars-forward ,(concat "^" class)) 20)
+                     ,(format "making sure ^%s skips %S forward"
+                              class non-matching-char))
+             (Assert (eql (skip-chars-backward ,(concat "^" class)) -20)
+                     ,(format "making sure ^%s skips %S backward"
+                              class non-matching-char))
+             (Assert (eql (skip-chars-backward ,class) -20)
+                     ,(format "making sure %s skips %S backward"
+                              class matching-char))))))
      (Assert-never-matching (class &rest characters)
        (cons
         'progn
＠＠ -706,7 +724,7 ＠＠
   (Assert-char-class "[:alnum:]" ?A ?/)
   (Assert-char-class "[:alnum:]" ?Z ?!)
   (Assert-char-class "[:alnum:]" ?0 ?,)
-  (Assert-char-class "[:alnum:]" ?9 ?$)
+  (Assert-char-class "[:alnum:]" ?9 ?\t)
   (Assert-char-class "[:alnum:]" ?b ?\x00)
   (Assert-char-class "[:alnum:]" ?c ?\x09)
   (Assert-char-class "[:alnum:]" ?d ?\   )
＠＠ -724,13 +742,12 ＠＠
    (decode-char 'ucs #x03B2)  ;; GREEK SMALL LETTER BETA
    (decode-char 'ucs #x0385)) ;; GREEK DIALYTIKA TONOS

-  ;; Word is equivalent to alnum in this implementation.
   (Assert-char-class "[:word:]" ?a ?.)
   (Assert-char-class "[:word:]" ?z ?')
   (Assert-char-class "[:word:]" ?A ?/)
   (Assert-char-class "[:word:]" ?Z ?!)
   (Assert-char-class "[:word:]" ?0 ?,)
-  (Assert-char-class "[:word:]" ?9 ?$)
+  (Assert-char-class "[:word:]" ?9 ?\t)
   (Assert-char-class "[:word:]" ?b ?\x00)
   (Assert-char-class "[:word:]" ?c ?\x09)
   (Assert-char-class "[:word:]" ?d ?\   )
＠＠ -1083,7 +1100,7 ＠＠

   (Assert-never-matching
    "[:unibyte:]"
-   ?\x01 ?\t ?A ?B ?C ?\x7f
+   ?\x80 ?\xe4 ?\xdf ?\xf8
    (decode-char 'ucs #x03B2) ;; GREEK SMALL LETTER BETA
    (decode-char 'ucs #x0410) ;; CYRILLIC CAPITAL LETTER A
    (decode-char 'ucs #x0430) ;; CYRILLIC SMALL LETTER A

https://bitbucket.org/xemacs/xemacs/changeset/ddf56c45634e/
changeset:   ddf56c45634e
user:        kehoea
date:        2012-05-04 22:12:51
summary:     Automated merge with file:///Sources/xemacs-21.5-checked-out
affected #:  8 files

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 man/ChangeLog
--- a/man/ChangeLog
+++ b/man/ChangeLog
＠＠ -1,3 +1,11 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* lispref/searching.texi (Regular Expressions):
+	* lispref/searching.texi (Syntax of Regexps):
+	* lispref/searching.texi (Char Classes):
+	* lispref/searching.texi (Regexp Example):
+	Document the predefined character classes in this file.
+
 2011-12-30  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* cl.texi (Top):

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 man/lispref/searching.texi
--- a/man/lispref/searching.texi
+++ b/man/lispref/searching.texi
＠＠ -180,6 +180,7 ＠＠

 ＠menu
 * Syntax of Regexps::       Rules for writing regular expressions.
+* Char Classes::            Predefined character classes for searching.
 * Regexp Example::          Illustrates regular expression syntax.
 ＠end menu

＠＠ -335,6 +336,11 ＠＠
 To include ＠samp{^} in a set, put it anywhere but at the beginning of
 the set.

+It is also possible to specify named character classes as part of your
+character set; for example, ＠samp{[:xdigit:]} will match hexadecimal
+digits, ＠samp{[:nonascii:]} will match characters outside the basic
+ASCII set.  These are documented elsewhere, ＠pxref{Char Classes}.
+
 ＠item [^ ＠dots{} ]
 ＠cindex ＠samp{^} in regexp
 ＠samp{[^} begins a ＠dfn{complement character set}, which matches any
＠＠ -604,6 +610,61 ＠＠
 ＠end example
 ＠end defun

+＠node Char Classes
+＠subsection Char Classes
+
+These are the predefined character classes available within regular
+expression character sets, and within ＠samp{skip-chars-forward} and
+＠samp{skip-chars-backward}, ＠xref{Skipping Characters}.
+
+＠table ＠samp
+＠item [:alnum:]
+This matches any ASCII letter or digit, or any non-ASCII character
+with word syntax.
+＠item [:alpha:]
+This matches any ASCII letter, or any non-ASCII character with word syntax.
+＠item [:ascii:]
+This matches any character with a numeric value below ＠samp{?\x80}.
+＠item [:blank:]
+This matches space or tab.
+＠item [:cntrl:]
+This matches any character with a numeric value below ＠samp{?\x20},
+the code for space; these are the ASCII control characters.
+＠item [:digit:]
+This matches the characters ＠samp{?0} to ＠samp{?9}, inclusive.
+＠item [:graph:]
+This matches ``graphic'' characters, with numeric values greater than
+＠samp{?\x20}, exclusive of ＠samp{?\x7f}, the delete character. 
+＠item [:lower:]
+This matches minuscule characters, or any character with case
+information if ＠samp{case-fold-search} is non-nil.
+＠item [:multibyte:]
+This matches non-ASCII characters, that is, any character with a
+numeric value above ＠samp{?\x7f}.
+＠item [:nonascii:]
+This is equivalent to ＠samp{[:multibyte:]}.
+＠item [:print:]
+This is equivalent to [:graph:], but also matches the space character,
+＠samp{?\x20}.
+＠item [:punct:]
+This matches non-control, non-alphanumeric ASCII characters, or any
+non-ASCII character without word syntax.
+＠item [:space:]
+This matches any character with whitespace syntax.
+＠item [:unibyte:]
+This is a GNU Emacs extension; in XEmacs it is equivalent to
+＠samp{[:ascii:]}. Note that this means it is not equivalent to
+＠samp{"\x00-\xff"}, which one might have assumed to be the case.
+＠item [:upper:]
+This matches majuscule characters, or any character with case
+information if ＠samp{case-fold-search} is non-nil.
+＠item [:word:]
+This matches any character with word syntax.
+＠item [:xdigit:]
+This matches hexadecimal digits, so the decimal digits ＠samp{0-9} and the
+letters ＠samp{a-F} and ＠samp{A-F}.
+＠end table
+
 ＠node Regexp Example
 ＠subsection Complex Regexp Example

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 src/ChangeLog
--- a/src/ChangeLog
+++ b/src/ChangeLog
＠＠ -1,3 +1,19 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* regex.c:
+	Move various #defines and enums to regex.h, since we need them
+	when implementing #'skip-chars-{backward,forward}.
+	* regex.c (re_wctype):
+	* regex.c (re_iswctype):
+	Be more robust about case insensitivity here.
+	* regex.c (regex_compile):
+	* regex.h:
+	* regex.h (RE_ISWCTYPE_ARG_DECL):
+	* regex.h (CHAR_CLASS_MAX_LENGTH):
+	* search.c (skip_chars):
+	Implement support for the predefined character classes in this
+	function.
+
 2012-04-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* search.c (string_match_1): Actually use the POSIX argument here,

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 src/regex.c
--- a/src/regex.c
+++ b/src/regex.c
＠＠ -178,51 +178,7 ＠＠
 /* isalpha etc. are used for the character classes.  */
 #include <ctype.h>

-#ifdef emacs
-
-/* 1 if C is an ASCII character.  */
-#define ISASCII(c) ((c) < 0x80)
-
-/* 1 if C is a unibyte character.  */
-#define ISUNIBYTE(c) 0
-
-/* The Emacs definitions should not be directly affected by locales.  */
-
-/* In Emacs, these are only used for single-byte characters.  */
-#define ISDIGIT(c) ((c) >= '0' && (c) <= '9')
-#define ISCNTRL(c) ((c) < ' ')
-#define ISXDIGIT(c) (ISDIGIT (c) || ((c) >= 'a' && (c) <=
'f')	\
-		     || ((c) >= 'A' && (c) <= 'F'))
-
-/* This is only used for single-byte characters.  */
-#define ISBLANK(c) ((c) == ' ' || (c) == '\t')
-
-/* The rest must handle multibyte characters.  */
-
-#define ISGRAPH(c) ((c) > ' ' && (c) != 0x7f)
-#define ISPRINT(c) ((c) == ' ' || ISGRAPH (c))
-#define ISALPHA(c) (ISASCII (c) ? (((c) >= 'a' && (c) <=
'z')		\
-				   || ((c) >= 'A' && (c) <= 'Z'))	\
-		    : ISWORD (c))
-#define ISALNUM(c) (ISALPHA (c) || ISDIGIT (c))
-
-#define ISLOWER(c) LOWERCASEP (lispbuf, c)
-
-#define ISPUNCT(c) (ISASCII (c)                                 \
-		    ? ((c) > ' ' && (c) < 0x7F			\
-		       && !(((c) >= 'a' && (c) <= 'z')		\
-		            || ((c) >= 'A' && (c) <= 'Z')	\
-		            || ((c) >= '0' && (c) <= '9')))	\
-		    : !ISWORD (c))
-
-#define ISSPACE(c) \
-	(SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Swhitespace)
-
-#define ISUPPER(c) UPPERCASEP (lispbuf, c)
-
-#define ISWORD(c) (SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Sword)
-
-#else /* not emacs */
+#ifndef emacs /* For the emacs build, we need these in the header. */

 /* 1 if C is an ASCII character.  */
 #define ISASCII(c) ((c) < 0200)
＠＠ -2013,23 +1969,6 ＠＠
 /* The next available element.  */
 #define COMPILE_STACK_TOP (compile_stack.stack[compile_stack.avail])

-/* Bits used to implement the multibyte-part of the various character
-   classes such as [:alnum:] in a charset's range table. XEmacs; use an
-   enum, so they're visible in the debugger. */
-enum
-{
-  BIT_WORD = (1 << 0),
-  BIT_LOWER = (1 << 1),
-  BIT_PUNCT = (1 << 2),
-  BIT_SPACE = (1 << 3),
-  BIT_UPPER = (1 << 4),
-  /* XEmacs; we need this, because we unify treatment of ASCII and non-ASCII
-     (possible matches) in charset_mule. [:alpha:] matches all characters
-     with word syntax, with the exception of [0-9]. We don't need
-     BIT_MULTIBYTE. */
-  BIT_ALPHA = (1 << 5)
-};
-
 /* Set the bit for character C in a bit vector.  */
 #define SET_LIST_BIT(c)				\
   (buf_end[((unsigned char) (c)) / BYTEWIDTH]	\
＠＠ -2059,10 +1998,8 ＠＠
        } 								\
     }

-#define CHAR_CLASS_MAX_LENGTH  9 /* Namely, `multibyte'.  */
-
 /* Map a string to the char class it names (if any).  */
-static re_wctype_t
+re_wctype_t
 re_wctype (const char *string)
 {
   if      (STREQ (string, "alnum"))	return RECC_ALNUM;
＠＠ -2086,17 +2023,10 ＠＠
 }

 /* True if CH is in the char class CC.  */
-static re_bool
-re_iswctype (int ch, re_wctype_t cc)
+int
+re_iswctype (int ch, re_wctype_t cc
+             RE_ISWCTYPE_ARG_DECL)
 {
-#ifdef emacs
-  /* This is cheesy, lispbuf isn't available to us when compiling the
-     pattern. It's effectively only called (on Mule builds) when the current
-     buffer doesn't matter (e.g. for RECC_ASCII, RECC_CNTRL), so it's not a
-     big deal. */
-  struct buffer *lispbuf = current_buffer;
-#endif
-
   switch (cc)
     {
     case RECC_ALNUM: return ISALNUM (ch) != 0;
＠＠ -2105,11 +2035,20 ＠＠
     case RECC_CNTRL: return ISCNTRL (ch) != 0;
     case RECC_DIGIT: return ISDIGIT (ch) != 0;
     case RECC_GRAPH: return ISGRAPH (ch) != 0;
-    case RECC_LOWER: return ISLOWER (ch) != 0;
     case RECC_PRINT: return ISPRINT (ch) != 0;
     case RECC_PUNCT: return ISPUNCT (ch) != 0;
     case RECC_SPACE: return ISSPACE (ch) != 0;
+#ifdef emacs
+    case RECC_UPPER: 
+      return NILP (lispbuf->case_fold_search) ? ISUPPER (ch) != 0
+: !NOCASEP (lispbuf, ch);
+    case RECC_LOWER: 
+      return NILP (lispbuf->case_fold_search) ? ISLOWER (ch) != 0
+: !NOCASEP (lispbuf, ch);
+#else
     case RECC_UPPER: return ISUPPER (ch) != 0;
+    case RECC_LOWER: return ISLOWER (ch) != 0;
+#endif
     case RECC_XDIGIT: return ISXDIGIT (ch) != 0;
     case RECC_ASCII: return ISASCII (ch) != 0;
     case RECC_NONASCII: case RECC_MULTIBYTE: return !ISASCII (ch);
＠＠ -2140,6 +2079,10 ＠＠
     }
 }

+#endif /* MULE */
+
+#ifdef emacs
+
 /* Return a bit-pattern to use in the range-table bits to match multibyte
    chars of class CC.  */
 static unsigned char
＠＠ -2158,7 +2101,8 ＠＠
     case RECC_ASCII: case RECC_DIGIT: case RECC_XDIGIT: case RECC_CNTRL:
     case RECC_BLANK: case RECC_UNIBYTE: case RECC_ERROR: return 0;
     default:
-      abort ();
+      ABORT ();
+      return 0;
     }
 }

＠＠ -2185,9 +2129,12 ＠＠
 					     RE_TRANSLATE_TYPE translate,
 					     reg_syntax_t syntax,
 					     Lisp_Object rtab);
-static reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
-                                         Bitbyte *flags_out);
 #endif /* MULE */
+#ifdef emacs
+reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
+                                  Bitbyte *flags_out);
+#endif
+
 static re_bool group_match_null_string_p (unsigned char **p,
 					  unsigned char *end,
 					  register_info_type *reg_info);
＠＠ -2814,7 +2761,8 ＠＠
 #endif /* MULE */
 			for (ch = 0; ch < (1 << BYTEWIDTH); ++ch)
 			  {
-			    if (re_iswctype (ch, cc))
+			    if (re_iswctype (ch, cc
+                                             RE_ISWCTYPE_ARG (current_buffer)))
 			      {
 				SET_LIST_BIT (ch);
 			      }
＠＠ -3938,7 +3886,11 ＠＠
   return REG_NOERROR;
 }

-static reg_errcode_t
+#endif /* MULE */
+
+#ifdef emacs
+
+reg_errcode_t
 compile_char_class (re_wctype_t cc, Lisp_Object rtab, Bitbyte *flags_out)
 {
   *flags_out |= re_wctype_to_bit (cc);

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 src/regex.h
--- a/src/regex.h
+++ b/src/regex.h
＠＠ -30,6 +30,8 ＠＠
 #define RE_LISP_CONTEXT_ARGS_DECL , Lisp_Object lispobj, struct buffer *lispbuf, struct
syntax_cache *scache
 #define RE_LISP_CONTEXT_ARGS_MULE_DECL , Lisp_Object lispobj, struct buffer *USED_IF_MULE
(lispbuf), struct syntax_cache *scache
 #define RE_LISP_CONTEXT_ARGS , lispobj, lispbuf, scache
+#define RE_ISWCTYPE_ARG_DECL , struct buffer *lispbuf
+#define RE_ISWCTYPE_ARG(varname) , varname
 #else
 #define RE_TRANSLATE_TYPE char *
 #define RE_LISP_SHORT_CONTEXT_ARGS_DECL
＠＠ -37,6 +39,8 ＠＠
 #define RE_LISP_CONTEXT_ARGS_DECL
 #define RE_LISP_CONTEXT_ARGS_MULE_DECL
 #define RE_LISP_CONTEXT_ARGS
+#define RE_ISWCTYPE_ARG_DECL 
+#define RE_ISWCTYPE_ARG(varname)
 #define Elemcount ssize_t
 #define Bytecount ssize_t
 #endif /* emacs */
＠＠ -559,6 +563,86 ＠＠
     RECC_ASCII, RECC_UNIBYTE
 } re_wctype_t;

+#define CHAR_CLASS_MAX_LENGTH  9 /* Namely, `multibyte'.  */
+
+/* Map a string to the char class it names (if any).  */
+re_wctype_t re_wctype (const char *);
+
+/* Is character CH a member of the character class CC? */
+int re_iswctype (int ch, re_wctype_t cc RE_ISWCTYPE_ARG_DECL);
+
+/* Bits used to implement the multibyte-part of the various character
+   classes such as [:alnum:] in a charset's range table. XEmacs; use an
+   enum, so they're visible in the debugger. */
+enum
+{
+  BIT_WORD = (1 << 0),
+  BIT_LOWER = (1 << 1),
+  BIT_PUNCT = (1 << 2),
+  BIT_SPACE = (1 << 3),
+  BIT_UPPER = (1 << 4),
+  /* XEmacs; we need this, because we unify treatment of ASCII and non-ASCII
+     (possible matches) in charset_mule. [:alpha:] matches all characters
+     with word syntax, with the exception of [0-9]. We don't need
+     BIT_MULTIBYTE. */
+  BIT_ALPHA = (1 << 5)
+};
+
+#ifdef emacs
+reg_errcode_t compile_char_class (re_wctype_t cc, Lisp_Object rtab,
+                                  Bitbyte *flags_out);
+
+#endif
+
+/* isalpha etc. are used for the character classes.  */
+#include <ctype.h>
+
+#ifdef emacs
+
+/* 1 if C is an ASCII character.  */
+#define ISASCII(c) ((c) < 0x80)
+
+/* 1 if C is a unibyte character.  */
+#define ISUNIBYTE ISASCII
+
+/* The Emacs definitions should not be directly affected by locales.  */
+
+/* In Emacs, these are only used for single-byte characters.  */
+#define ISDIGIT(c) ((c) >= '0' && (c) <= '9')
+#define ISCNTRL(c) ((c) < ' ')
+#define ISXDIGIT(c) (ISDIGIT (c) || ((c) >= 'a' && (c) <=
'f')	\
+		     || ((c) >= 'A' && (c) <= 'F'))
+
+/* This is only used for single-byte characters.  */
+#define ISBLANK(c) ((c) == ' ' || (c) == '\t')
+
+/* The rest must handle multibyte characters.  */
+
+#define ISGRAPH(c) ((c) > ' ' && (c) != 0x7f)
+#define ISPRINT(c) ((c) == ' ' || ISGRAPH (c))
+#define ISALPHA(c) (ISASCII (c) ? (((c) >= 'a' && (c) <=
'z')		\
+				   || ((c) >= 'A' && (c) <= 'Z'))	\
+		    : ISWORD (c))
+#define ISALNUM(c) (ISALPHA (c) || ISDIGIT (c))
+
+#define ISLOWER(c) LOWERCASEP (lispbuf, c)
+
+#define ISPUNCT(c) (ISASCII (c)                                 \
+		    ? ((c) > ' ' && (c) < 0x7F			\
+		       && !(((c) >= 'a' && (c) <= 'z')		\
+		            || ((c) >= 'A' && (c) <= 'Z')	\
+		            || ((c) >= '0' && (c) <= '9')))	\
+		    : !ISWORD (c))
+
+#define ISSPACE(c) \
+	(SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Swhitespace)
+
+#define ISUPPER(c) UPPERCASEP (lispbuf, c)
+
+#define ISWORD(c) (SYNTAX (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf), c) == Sword)
+
+#endif 
+
 END_C_DECLS

 #endif /* INCLUDED_regex_h_ */

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 src/search.c
--- a/src/search.c
+++ b/src/search.c
＠＠ -887,9 +887,9 ＠＠
      a range table. */
   unsigned char fastmap[0400];
   int negate = 0;
-  REGISTER int i;
   Charbpos limit;
   struct syntax_cache *scache;
+  Bitbyte class_bits = 0;

   if (NILP (lim))
     limit = forwardp ? BUF_ZV (buf) : BUF_BEGV (buf);
＠＠ -957,6 +957,51 ＠＠
 				  Vskip_chars_range_table);
 	      INC_IBYTEPTR (p);
 	    }
+          else if ('[' == c && p != pend && *p == ':')
+            {
+              Ibyte *colonp;
+              Extbyte *classname;
+              int ch = 0;
+              re_wctype_t cc;
+
+              INC_IBYTEPTR (p);
+
+              if (p == pend)
+                {
+                  fastmap ['['] = fastmap[':'] = 1;
+                  break;
+                }
+
+              colonp = memchr (p, ':', pend - p);
+              if (NULL == colonp || (colonp + 1) == pend || colonp[1] != ']')
+                {
+                  fastmap ['['] = fastmap[':'] = 1;
+                  continue;
+                }
+
+              classname = alloca_extbytes (colonp - p + 1);
+              memmove (classname, p, colonp - p);
+              classname[colonp - p] = '\0';
+              cc = re_wctype (classname);
+                  
+              if (cc == RECC_ERROR)
+                {
+                  invalid_argument ("Invalid character class",
+                                    build_extstring (classname, Qbinary));
+                }
+
+              for (ch = 0; ch < countof (fastmap); ++ch)
+                {
+                  if (re_iswctype (ch, cc, buf))
+                    {
+                      fastmap[ch] = 1;
+                    }
+                }
+
+              compile_char_class (cc, Vskip_chars_range_table, &class_bits);
+
+              p = colonp + 2;
+            }
 	  else
 	    {
 	      if (c < 0400)
＠＠ -972,14 +1017,6 ＠＠
   if (syntaxp && fastmap['-'] != 0)
     fastmap[' '] = 1;

-  /* If ^ was the first character, complement the fastmap.
-     We don't complement the range table, however; we just use negate
-     in the comparisons below. */
-
-  if (negate)
-    for (i = 0; i < (int) (sizeof (fastmap)); i++)
-      fastmap[i] ^= 1;
-
   {
     Charbpos start_point = BUF_PT (buf);
     Charbpos pos = start_point;
＠＠ -996,7 +1033,8 ＠＠
 	      while (fastmap[(unsigned char)
 			     syntax_code_spec
 			     [(int) SYNTAX_FROM_CACHE
-			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]])
+			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]]
+                     != negate)
 		{
 		  pos++;
 		  INC_BYTEBPOS (buf, pos_byte);
＠＠ -1013,10 +1051,11 ＠＠
 		pos--;
 		DEC_BYTEBPOS (buf, pos_byte);
 		UPDATE_SYNTAX_CACHE_BACKWARD (scache, pos);
-		if (!fastmap[(unsigned char)
-			     syntax_code_spec
-			     [(int) SYNTAX_FROM_CACHE
-			      (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]])
+		if (fastmap[(unsigned char)
+                            syntax_code_spec
+                            [(int) SYNTAX_FROM_CACHE
+                             (scache, BYTE_BUF_FETCH_CHAR (buf, pos_byte))]]
+                    == negate)
 		  {
 		    pos++;
 		    pos_byte = savepos;
＠＠ -1027,16 +1066,30 ＠＠
       }
     else
       {
+        struct buffer *lispbuf = buf;
+
+#define CLASS_BIT_CHECK(c)                                              \
+        (class_bits && ((class_bits & BIT_ALPHA && ISALPHA (c))      
  \
+                        || (class_bits & BIT_SPACE && ISSPACE (c))      \
+                        || (class_bits & BIT_PUNCT && ISPUNCT (c))      \
+                        || (class_bits & BIT_WORD && ISWORD (c))        \
+                        || (NILP (buf->case_fold_search) ?              \
+                            ((class_bits & BIT_UPPER && ISUPPER (c))    \
+                             || (class_bits & BIT_LOWER && ISLOWER (c))) \
+: (class_bits & (BIT_UPPER | BIT_LOWER)     \
+                               && !NOCASEP (buf, c)))))
 	if (forwardp)
 	  {
 	    while (pos < limit)
 	      {
 		Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
-		    (NILP (Fget_range_table (make_fixnum (ch),
-					     Vskip_chars_range_table,
-					     Qnil))
-		     == negate))
+
+                if ((ch < countof (fastmap) ? fastmap[ch]
+: (CLASS_BIT_CHECK (ch) ||
+                        (EQ (Qt, Fget_range_table (make_fixnum (ch),
+                                                   Vskip_chars_range_table,
+                                                   Qnil)))))
+                    != negate)
 		  {
 		    pos++;
 		    INC_BYTEBPOS (buf, pos_byte);
＠＠ -1054,11 +1107,12 ＠＠

 		DEC_BYTEBPOS (buf, prev_pos_byte);
 		ch = BYTE_BUF_FETCH_CHAR (buf, prev_pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
-		    (NILP (Fget_range_table (make_fixnum (ch),
-					     Vskip_chars_range_table,
-					     Qnil))
-		     == negate))
+                if ((ch < countof (fastmap) ? fastmap[ch]
+: (CLASS_BIT_CHECK (ch) ||
+                        (EQ (Qt, Fget_range_table (make_fixnum (ch),
+                                                   Vskip_chars_range_table,
+                                                   Qnil)))))
+                    != negate)
 		  {
 		    pos--;
 		    pos_byte = prev_pos_byte;

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 tests/ChangeLog
--- a/tests/ChangeLog
+++ b/tests/ChangeLog
＠＠ -1,3 +1,11 ＠＠
+2012-05-04  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* automated/regexp-tests.el (equal):
+	* automated/regexp-tests.el (Assert-char-class):
+	Correct a stray parenthesis; add tests for the predefined
+	character classes with #'skip-chars-{forward,backward}; update the
+	tests to reflect some changed design decisions on my part.
+
 2012-04-25  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* automated/regexp-tests.el: Check that #'posix-string-match

diff -r cc6f0266bc367884f3d0d8cda3f82528935743ef -r
ddf56c45634e53e4b1cdfd4777a53c95f6501fb5 tests/automated/regexp-tests.el
--- a/tests/automated/regexp-tests.el
+++ b/tests/automated/regexp-tests.el
＠＠ -76,7 +76,7 ＠＠
   (save-match-data
     (progn (posix-string-match "i\\|ii" "ii") (match-data)))
   '(0 2))
- "checking #'posix-string-match actually returns the longest match"))
+ "checking #'posix-string-match actually returns the longest match")

 ;; looking-at
 (with-temp-buffer
＠＠ -665,7 +665,25 ＠＠
          (Assert (null (string-match ,(concat "[^" class
                                               (string non-matching-char) "]")
                                      ,(concat (string matching-char)
-                                              (string non-matching-char)))))))
+                                              (string non-matching-char)))))
+         (let ((old-case-fold-search case-fold-search))
+           (with-temp-buffer
+             (setq case-fold-search old-case-fold-search)
+             (insert-char ,matching-char 20)
+             (insert-char ,non-matching-char 20)
+             (goto-char (point-min))
+             (Assert (eql (skip-chars-forward ,class) 20)
+                     ,(format "making sure %s skips %S forward"
+                              class matching-char))
+             (Assert (eql (skip-chars-forward ,(concat "^" class)) 20)
+                     ,(format "making sure ^%s skips %S forward"
+                              class non-matching-char))
+             (Assert (eql (skip-chars-backward ,(concat "^" class)) -20)
+                     ,(format "making sure ^%s skips %S backward"
+                              class non-matching-char))
+             (Assert (eql (skip-chars-backward ,class) -20)
+                     ,(format "making sure %s skips %S backward"
+                              class matching-char))))))
      (Assert-never-matching (class &rest characters)
        (cons
         'progn
＠＠ -706,7 +724,7 ＠＠
   (Assert-char-class "[:alnum:]" ?A ?/)
   (Assert-char-class "[:alnum:]" ?Z ?!)
   (Assert-char-class "[:alnum:]" ?0 ?,)
-  (Assert-char-class "[:alnum:]" ?9 ?$)
+  (Assert-char-class "[:alnum:]" ?9 ?\t)
   (Assert-char-class "[:alnum:]" ?b ?\x00)
   (Assert-char-class "[:alnum:]" ?c ?\x09)
   (Assert-char-class "[:alnum:]" ?d ?\   )
＠＠ -724,13 +742,12 ＠＠
    (decode-char 'ucs #x03B2)  ;; GREEK SMALL LETTER BETA
    (decode-char 'ucs #x0385)) ;; GREEK DIALYTIKA TONOS

-  ;; Word is equivalent to alnum in this implementation.
   (Assert-char-class "[:word:]" ?a ?.)
   (Assert-char-class "[:word:]" ?z ?')
   (Assert-char-class "[:word:]" ?A ?/)
   (Assert-char-class "[:word:]" ?Z ?!)
   (Assert-char-class "[:word:]" ?0 ?,)
-  (Assert-char-class "[:word:]" ?9 ?$)
+  (Assert-char-class "[:word:]" ?9 ?\t)
   (Assert-char-class "[:word:]" ?b ?\x00)
   (Assert-char-class "[:word:]" ?c ?\x09)
   (Assert-char-class "[:word:]" ?d ?\   )
＠＠ -1083,7 +1100,7 ＠＠

   (Assert-never-matching
    "[:unibyte:]"
-   ?\x01 ?\t ?A ?B ?C ?\x7f
+   ?\x80 ?\xe4 ?\xdf ?\xf8
    (decode-char 'ucs #x03B2) ;; GREEK SMALL LETTER BETA
    (decode-char 'ucs #x0410) ;; CYRILLIC CAPITAL LETTER A
    (decode-char 'ucs #x0430) ;; CYRILLIC SMALL LETTER A

Repository URL: https://bitbucket.org/xemacs/xemacs/

--

This is a commit notification from bitbucket.org. You are receiving
this because you have the service enabled, addressing the recipient of
this email.

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

commit/XEmacs: 2 new changesets