Hrvoje Niksic <hniksic(a)xemacs.org> writes:
> Go to your *scratch* buffer. Insert "Ü" (that's
\334, Uuml)
>
> (re-search-forward "Ü") => works
> (re-search-forward "[Ü]") => fails
I can repeat this.
I think the problem is in the case conversions inside the regexp
engine. If I search for "[ü]" instead, or if I set `case-fold-search'
to nil, the regexp succeeds. I'm not sure if either of these
workarounds would work for you, but that's the best I can come up with
right now.
Yes, it's case conversion bug.
Yoshiki Hayashi, who introduced non-ASCII case folding to regex.c,
might be in the position to fix this one. Yoshiki, if you're still
around, could you take a look at this one?
OK, here's the patch to fix it.
I take the blame for not implementing it correctly. I must
confess that I've never really understood charset_mule part
of regex engine.
We need more test cases to see fastmap and charset_mule are
working correctly but I don't have time to write thorough
test cases right now. Please wait until the weekend. I'll
probably be able to find some time to work on this again.
2003-09-12 Yoshiki Hayashi <yoshiki(a)xemacs.org>
* regex.c (TRANSLATE_EXTENDED_UNSAFE): Remove.
(re_search_2): Match the fisrt byte of Bufbyte, not Emchar.
(re_match_2_internal): Use TRANSLATE instead of
TRANSLATE_EXTENDED_UNSAFE. The latter was a hack to bypass
non-ASCII char case conversion.
Index: src/regex.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/regex.c,v
retrieving revision 1.25.2.4
diff -u -r1.25.2.4 regex.c
--- src/regex.c 2003/06/19 03:34:42 1.25.2.4
+++ src/regex.c 2003/09/12 18:01:01
@@ -1591,13 +1591,6 @@
when we use a character as a subscript we must make it unsigned. */
#define TRANSLATE(d) (TRANSLATE_P (translate) ? RE_TRANSLATE (d) : (d))
-#ifdef MULE
-
-#define TRANSLATE_EXTENDED_UNSAFE(emch) \
- (TRANSLATE_P (translate) && emch < 0x80 ? RE_TRANSLATE (emch) : (emch))
-
-#endif
-
/* Macros for outputting the compiled pattern into `buffer'. */
/* If the buffer isn't allocated when it comes in, use this. */
@@ -4114,10 +4107,12 @@
{
#ifdef MULE
Emchar buf_ch;
+ Bufbyte str[MAX_EMCHAR_LEN];
buf_ch = charptr_emchar (d);
buf_ch = RE_TRANSLATE (buf_ch);
- if (buf_ch >= 0200 || fastmap[(unsigned char) buf_ch])
+ set_charptr_emchar (str, buf_ch);
+ if (buf_ch >= 0200 || fastmap[(unsigned char) *str])
break;
#else
if (fastmap[(unsigned char)RE_TRANSLATE (*d)])
@@ -4865,7 +4860,7 @@
REGEX_PREFETCH ();
c = charptr_emchar ((const Bufbyte *) d);
- c = TRANSLATE_EXTENDED_UNSAFE (c); /* The character to match. */
+ c = TRANSLATE (c); /* The character to match. */
if (EQ (Qt, unified_range_table_lookup (p, c, Qnil)))
not_p = !not_p;
--
Yoshiki Hayashi