SUPERSEDE APPROVE COMMIT 21.5
Please discuss on xemacs-patches; reply-to set.
Stephen J. Turnbull writes:
21.5 RECOMMEND 21.4
Unfortunately, I can't recommend this for 21.4 at this point. By
adding tests to show that certain error conditions were properly
signaled, I discovered that certain bad arguments weren't being caught
at all. I've fixed that, but I'm not sure that code might not depend
on some errors passing silently.
Specifically, `replace-match' was happily accepting a STRBUFFER
argument that specified a subexpression that had not matched. Errors
you might now see are "match data register invalid" and "match data
register not set". I'd really like to hear about those (which is why
I'm cross posting to -beta; -beta readers can stop reading now ;-).
I also added a bunch of documentation (in particular the case-munging
escapes in replace-match are now documented in the manual), and a
gratuitous patch to suppress some "unexpected error" in os-tests (due
to inability to find `executable-find').
We still need a lot more tests, but I'm not going to wait this patch
on that.
Index: lisp/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/ChangeLog,v
retrieving revision 1.842
diff -u -U0 -r1.842 ChangeLog
--- lisp/ChangeLog 30 Sep 2007 11:59:34 -0000 1.842
+++ lisp/ChangeLog 1 Oct 2007 05:06:24 -0000
@@ -0,0 +1,4 @@
+2007-09-23 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * subr.el (replace-regexp-in-string): Handle SUBEXP arg properly.
+
Index: man/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/ChangeLog,v
retrieving revision 1.360
diff -u -U0 -r1.360 ChangeLog
--- man/ChangeLog 30 Sep 2007 15:26:05 -0000 1.360
+++ man/ChangeLog 1 Oct 2007 05:06:25 -0000
@@ -0,0 +1,6 @@
+2007-09-22 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * lispref/searching.texi (Replacing Match): Document the escapes
+ for changing case in `replace-match'. Document the change to
+ STRBUFFER to permit subexpressions in string replacement.
+
Index: src/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/ChangeLog,v
retrieving revision 1.1097
diff -u -U0 -r1.1097 ChangeLog
--- src/ChangeLog 30 Sep 2007 11:37:59 -0000 1.1097
+++ src/ChangeLog 1 Oct 2007 05:06:28 -0000
@@ -0,0 +1,9 @@
+2007-09-30 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * search.c (Freplace_match): Improve range-checking on STRBUFFER.
+
+2007-09-22 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * search.c (Freplace_match): Allow STRBUFFER to specify a
+ subexpression when the source is a string.
+
Index: tests/ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/ChangeLog,v
retrieving revision 1.95
diff -u -U0 -r1.95 ChangeLog
--- tests/ChangeLog 21 Aug 2007 12:39:06 -0000 1.95
+++ tests/ChangeLog 1 Oct 2007 05:06:28 -0000
@@ -0,0 +1,16 @@
+2007-09-30 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * automated/os-tests.el: Suppress `executable-find' lossage.
+
+ * automated/regexp-tests.el (replace-regexp-in-string):
+ New tests for error conditions.
+
+ * automated/test-harness.el (test-harness-from-buffer): Comment
+ about inability to check the FROB in invalid_argument.
+
+2007-09-23 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * automated/regexp-tests.el (replace-regexp-in-string):
+ New tests for correct operation. Comment need for `replace-match'
+ tests.
+
Index: lisp/subr.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/subr.el,v
retrieving revision 1.41
diff -u -r1.41 subr.el
--- lisp/subr.el 7 Aug 2007 23:08:52 -0000 1.41
+++ lisp/subr.el 1 Oct 2007 05:06:34 -0000
@@ -791,20 +791,33 @@
Return a new string containing the replacements.
-Optional arguments FIXEDCASE, LITERAL and SUBEXP are like the
-arguments with the same names of function `replace-match'. If START
-is non-nil, start replacements at that index in STRING.
+Optional arguments FIXEDCASE and LITERAL are like the arguments with
+the same names of function `replace-match'. If START is non-nil,
+start replacements at that index in STRING.
+
+For compatibility with old XEmacs code and with recent GNU Emacs, the
+interpretation of SUBEXP is somewhat complicated. If SUBEXP is a
+buffer, it is interpreted as the buffer which provides syntax tables
+and case tables for the match and replacement. If it is not a buffer,
+the current buffer is used. If SUBEXP is an integer, it is the index
+of the subexpression of REGEXP which is to be replaced.
REP is either a string used as the NEWTEXT arg of `replace-match' or a
function. If it is a function it is applied to each match to generate
the replacement passed to `replace-match'; the match-data at this
-point are such that match 0 is the function's argument.
+point are such that `(match-string SUBEXP STRING)' is the function's
+argument if SUBEXP is an integer \(otherwise the whole match is passed
+and replaced).
To replace only the first match (if any), make REGEXP match up to \\'
and replace a sub-expression, e.g.
(replace-regexp-in-string \"\\(foo\\).*\\'\" \"bar\" \"
foo foo\" nil nil 1)
=> \" bar foo\"
-"
+
+Signals `invalid-argument' if SUBEXP is not an integer, buffer, or nil;
+or is an integer, but the indicated subexpression was not matched.
+Signals `invalid-argument' if STRING is nil but the last text matched was a string,
+or if STRING is a string but the last text matched was a buffer."
;; To avoid excessive consing from multiple matches in long strings,
;; don't just call `replace-match' continually. Walk down the
@@ -817,6 +830,7 @@
;; might be reasonable to do so for long enough STRING.]
(let ((l (length string))
(start (or start 0))
+ (expndx (if (integerp subexp) subexp 0))
matches str mb me)
(save-match-data
(while (and (< start l) (string-match regexp string start))
@@ -833,7 +847,8 @@
(setq matches
(cons (replace-match (if (stringp rep)
rep
- (funcall rep (match-string 0 str)))
+ (funcall rep (match-string expndx str)))
+ ;; no, this subexp shouldn't be expndx
fixedcase literal str subexp)
(cons (substring string start mb) ; unmatched prefix
matches)))
Index: man/lispref/searching.texi
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/lispref/searching.texi,v
retrieving revision 1.13
diff -u -r1.13 searching.texi
--- man/lispref/searching.texi 23 Feb 2005 15:33:38 -0000 1.13
+++ man/lispref/searching.texi 1 Oct 2007 05:06:34 -0000
@@ -1282,19 +1282,14 @@
@var{replacement}.
If you did the last search in a buffer, you should specify @code{nil}
-for @var{string}. Then @code{replace-match} does the replacement by
-editing the buffer; it leaves point at the end of the replacement text,
-and returns @code{t}.
+for @var{string}. (An error will be signaled if you don't.) Then
+@code{replace-match} does the replacement by editing the buffer; it
+leaves point at the end of the replacement text, and returns @code{t}.
If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
-
-If the fourth argument @var{string} is a string, fifth argument
-@var{strbuffer} specifies the buffer to be used for syntax-table and
-case-table lookup and defaults to the current buffer. When @var{string}
-is not a string, the buffer that the match occurred in has automatically
-been remembered and you do not need to specify it.
+(An error will be signaled if you specify nil.) Then
+@code{replace-match} does the replacement by constructing and returning
+a new string.
If @var{fixedcase} is non-@code{nil}, then the case of the replacement
text is not changed; otherwise, the replacement text is converted to a
@@ -1317,20 +1312,64 @@
@table @asis
@item @samp{\&}
-@cindex @samp{&} in replacement
+@cindex @samp{\&} in replacement
@samp{\&} stands for the entire text being replaced.
@item @samp{\@var{n}}
@cindex @samp{\@var{n}} in replacement
+@cindex @samp{\@var{digit}} in replacement
@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
matched the @var{n}th subexpression in the original regexp.
Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
@item @samp{\\}
-@cindex @samp{\} in replacement
+@cindex @samp{\\} in replacement
@samp{\\} stands for a single @samp{\} in the replacement text.
+
+@item @samp{\u}
+@cindex @samp{\u} in replacement
+@samp{\u} means upcase the next character.
+
+@item @samp{\l}
+@cindex @samp{\l} in replacement
+@samp{\l} means downcase the next character.
+
+@item @samp{\U}
+@cindex @samp{\U} in replacement
+@samp{\U} means begin upcasing all following characters.
+
+@item @samp{\L}
+@cindex @samp{\L} in replacement
+@samp{\L} means begin downcasing all following characters.
+
+@item @samp{\E}
+@cindex @samp{\E} in replacement
+@samp{\E} means terminate the effect of any @samp{\U} or @samp{\L}.
@end table
+
+Case changes made with @samp{\u}, @samp{\l}, @samp{\U}, and @samp{\L}
+override all other case changes that may be made in the replaced text.
+
+The fifth argument @var{strbuffer} may be a buffer to be used for
+syntax-table and case-table lookup. If @var{strbuffer} is not a buffer,
+the current buffer is used. When @var{string} is not a string, the
+buffer that the match occurred in has automatically been remembered and
+you do not need to specify it. @var{string} may also be an integer,
+specifying the index of the subexpression to match. When @var{string}
+is not an integer, the ``subexpression'' is 0, @emph{i.e.}, the whole
+match. An @code{invalid-argument} error will be signaled if you specify
+a buffer when @var{string} is nil, or specify a subexpression which was
+not matched.
+
+It is not possible to specify both a buffer and a subexpression, but the
+idiom
+@example
+(with-current-buffer @var{buffer} (replace-match ... @var{integer}))
+@end example
+may be used.
+
@end defun
+
@node Entire Match Data
@subsection Accessing the Entire Match Data
Index: src/search.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/search.c,v
retrieving revision 1.46
diff -u -r1.46 search.c
--- src/search.c 24 Jan 2005 23:34:09 -0000 1.46
+++ src/search.c 1 Oct 2007 05:06:34 -0000
@@ -2364,15 +2364,22 @@
DEFUN ("replace-match", Freplace_match, 1, 5, 0, /*
Replace text matched by last search with REPLACEMENT.
-If second arg FIXEDCASE is non-nil, do not alter case of replacement text.
+Leaves point at end of replacement text.
+Optional boolean FIXEDCASE inhibits matching case of REPLACEMENT to source.
+Optional boolean LITERAL inhibits interpretation of escape sequences.
+Optional STRING provides the source text to replace.
+Optional STRBUFFER may be a buffer, providing match context, or an integer
+ specifying the subexpression to replace.
+
+If FIXEDCASE is non-nil, do not alter case of replacement text.
Otherwise maybe capitalize the whole text, or maybe just word initials,
based on the replaced text.
-If the replaced text has only capital letters
-and has at least one multiletter word, convert REPLACEMENT to all caps.
+If the replaced text has only capital letters and has at least one
+multiletter word, convert REPLACEMENT to all caps.
If the replaced text has at least one word starting with a capital letter,
then capitalize each word in REPLACEMENT.
-If third arg LITERAL is non-nil, insert REPLACEMENT literally.
+If LITERAL is non-nil, insert REPLACEMENT literally.
Otherwise treat `\\' as special:
`\\&' in REPLACEMENT means substitute original matched text.
`\\N' means substitute what matched the Nth `\\(...\\)'.
@@ -2385,24 +2392,31 @@
`\\E' means terminate the effect of any `\\U' or `\\L'.
Case changes made with `\\u', `\\l', `\\U', and `\\L' override
all other case changes that may be made in the replaced text.
-FIXEDCASE and LITERAL are optional arguments.
-Leaves point at end of replacement text.
-The optional fourth argument STRING can be a string to modify.
-In that case, this function creates and returns a new string
-which is made by replacing the part of STRING that was matched.
-When fourth argument is a string, fifth argument STRBUFFER specifies
-the buffer to be used for syntax-table and case-table lookup and
-defaults to the current buffer. When fourth argument is not a string,
-the buffer that the match occurred in has automatically been remembered
-and you do not need to specify it.
-
-When fourth argument is nil, STRBUFFER specifies a subexpression of
-the match. It says to replace just that subexpression instead of the
-whole match. This is useful only after a regular expression search or
-match since only regular expressions have distinguished subexpressions.
+If non-nil, STRING is the source string, and a new string with the specified
+replacements is created and returned. Otherwise the current buffer is the
+source text.
+
+If non-nil, STRBUFFER may be an integer, interpreted as the index of the
+subexpression to replace in the source text, or a buffer to provide the
+syntax table and case table. If nil, then the \"subexpression\" is 0, i.e.,
+the whole match, and the current buffer provides the syntax and case tables.
+If STRING is nil, STRBUFFER must be nil or an integer.
+
+Specifying a subexpression is only useful after a regular expression match,
+since a fixed string search has no non-trivial subexpressions.
+
+It is not possible to specify both a buffer and a subexpression. If that is
+desired, the idiom `(with-current-buffer BUFFER (replace-match ... INTEGER))'
+may be appropriate.
+
+If STRING is nil but the last thing matched (or searched) was a string, or
+STRING is a string but the last thing matched was a buffer, an
+`invalid-argument' error will be signaled. (XEmacs does not check that the
+last thing searched is the source string, but it is not useful to use a
+different string as source.)
-If no match (including searches) has been conducted or the requested
+If no match (including searches) has been successful or the requested
subexpression was not matched, an `args-out-of-range' error will be
signaled. (If no match has ever been conducted in this instance of
XEmacs, an `invalid-operation' error will be signaled. This is very
@@ -2430,31 +2444,59 @@
CHECK_STRING (replacement);
+ /* Because GNU decided to be incompatible here, we support the following
+ baroque and bogus API for the STRING and STRBUFFER arguments:
+ types interpretations
+ STRING STRBUFFER STRING STRBUFFER
+ nil nil none 0 = index of subexpression to replace
+ nil integer none index of subexpression to replace
+ nil other ***** error *****
+ string nil source current buffer provides syntax table
+ subexpression = 0 (whole match)
+ string buffer source buffer providing syntax table
+ subexpression = 0 (whole match)
+ string integer source current buffer provides syntax table
+ subexpression = STRBUFFER
+ string other ***** error *****
+ */
+
+ /* Do STRBUFFER first; if STRING is nil, we'll overwrite BUF and BUFFER. */
+
+ /* If the match data were abstracted into a special "match data" type
+ instead of the typical half-assed "let the implementation be visible"
+ form it's in, we could extend it to include the last string matched
+ and the buffer used for that matching. But of course we can't change
+ it as it is.
+ */
+ if (NILP (strbuffer) || BUFFERP (strbuffer))
+ {
+ buf = decode_buffer (strbuffer, 0);
+ }
+ else if (!NILP (strbuffer))
+ {
+ CHECK_INT (strbuffer);
+ sub = XINT (strbuffer);
+ if (sub < 0 || sub >= (int) search_regs.num_regs)
+ invalid_argument ("match data register invalid", strbuffer);
+ if (search_regs.start[sub] < 0)
+ invalid_argument ("match data register not set", strbuffer);
+ buf = current_buffer;
+ }
+ else
+ invalid_argument ("STRBUFFER must be nil, a buffer, or an integer",
+ strbuffer);
+ buffer = wrap_buffer (buf);
+
if (! NILP (string))
{
CHECK_STRING (string);
if (!EQ (last_thing_searched, Qt))
- invalid_argument ("last thing matched was not a string", Qunbound);
- /* If the match data
- were abstracted into a special "match data" type instead
- of the typical half-assed "let the implementation be
- visible" form it's in, we could extend it to include
- the last string matched and the buffer used for that
- matching. But of course we can't change it as it is. */
- buf = decode_buffer (strbuffer, 0);
- buffer = wrap_buffer (buf);
+ invalid_argument ("last thing matched was not a string", Qunbound);
}
else
{
- if (!NILP (strbuffer))
- {
- CHECK_INT (strbuffer);
- sub = XINT (strbuffer);
- if (sub < 0 || sub >= (int) search_regs.num_regs)
- args_out_of_range (strbuffer, make_int (search_regs.num_regs));
- }
if (!BUFFERP (last_thing_searched))
- invalid_argument ("last thing matched was not a buffer", Qunbound);
+ invalid_argument ("last thing matched was not a buffer", Qunbound);
buffer = last_thing_searched;
buf = XBUFFER (buffer);
}
@@ -2557,8 +2599,8 @@
Lisp_Object before, after;
speccount = specpdl_depth ();
- before = Fsubstring (string, Qzero, make_int (search_regs.start[0]));
- after = Fsubstring (string, make_int (search_regs.end[0]), Qnil);
+ before = Fsubstring (string, Qzero, make_int (search_regs.start[sub]));
+ after = Fsubstring (string, make_int (search_regs.end[sub]), Qnil);
/* Do case substitution into REPLACEMENT if desired. */
if (NILP (literal))
@@ -2600,6 +2642,8 @@
substart = search_regs.start[0];
subend = search_regs.end[0];
}
+ /* #### This logic is totally broken,
+ since we can have backrefs like "\99", right? */
else if (c >= '1' && c <= '9' &&
c <= search_regs.num_regs + '0')
{
@@ -2759,6 +2803,8 @@
(buffer,
make_int (search_regs.start[0] + offset),
make_int (search_regs.end[0] + offset));
+ /* #### This logic is totally broken,
+ since we can have backrefs like "\99", right? */
else if (c >= '1' && c <= '9' &&
c <= search_regs.num_regs + '0')
{
Index: tests/automated/os-tests.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/automated/os-tests.el,v
retrieving revision 1.1
diff -u -r1.1 os-tests.el
--- tests/automated/os-tests.el 28 Oct 2004 11:31:23 -0000 1.1
+++ tests/automated/os-tests.el 1 Oct 2007 05:06:34 -0000
@@ -36,15 +36,18 @@
;; in <b9yoeipvwn0.fsf(a)jpl.org>.
;; tac works by lines, unfortunately
+;; #### The contortions around `executable-find' gag me, but I don't have time
+;; to deal today. If we have `executable-find', we should use its value!
(let* ((original-string "a\nb\nc\nd\n")
- (tac-cases (if (executable-find "tac")
+ ;; `executable-find' is in a package and may be unavailable.
+ (tac-cases (if (and (fboundp 'executable-find) (executable-find
"tac"))
'((1 . "c\nb\na\nd\n")
(3 . "a\nc\nb\nd\n")
(5 . "a\nc\nb\nd\n")
(7 . "a\nc\nb\nd\n")
(9 . "a\nd\nc\nb\n"))
nil))
- (cat-cases (if (executable-find "cat")
+ (cat-cases (if (and (fboundp 'executable-find) (executable-find
"cat"))
'((1 . "b\nc\na\nd\n")
(3 . "a\nb\nc\nd\n")
(5 . "a\nb\nc\nd\n")
Index: tests/automated/regexp-tests.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/automated/regexp-tests.el,v
retrieving revision 1.15
diff -u -r1.15 regexp-tests.el
--- tests/automated/regexp-tests.el 2 Feb 2005 11:37:21 -0000 1.15
+++ tests/automated/regexp-tests.el 1 Oct 2007 05:06:34 -0000
@@ -459,3 +459,72 @@
(Assert (null (match-string 2 text2)))
)
+;; replace-regexp-in-string (regexp rep source
+;; fixedcase literal buf-or-subexp start)
+
+;; Currently we test the following cases:
+;; where `cbuf' and `bar-or-empty' are bound below.
+
+;; #### Tests for the various functional features (fixedcase, literal, start)
+;; should be added.
+
+(with-temp-buffer
+ (flet ((bar-or-empty (subexp) (if (string= subexp "foo") "bar"
"")))
+ (let ((cbuf (current-buffer)))
+ (dolist (test-case
+ ;; REP BUF-OR-SUBEXP EXPECTED RESULT
+ `(("bar" nil " bar")
+ ("bar" ,cbuf " bar")
+ ("bar" 0 " bar")
+ ("bar" 1 " bar foo")
+ (bar-or-empty nil " ")
+ (bar-or-empty ,cbuf " ")
+ (bar-or-empty 0 " ")
+ (bar-or-empty 1 " bar foo")))
+ (Assert
+ (string=
+ (nth 2 test-case)
+ (replace-regexp-in-string "\\(foo\\).*\\'" (nth 0 test-case)
+ " foo foo" nil nil (nth 1 test-case)))))
+ ;; #### Why doesn't this loop work right?
+; (dolist (test-case
+; ;; REP BUF-OR-SUBEXP EXPECTED ERROR EXPECTED MESSAGE
+; `(;; expected message was "bufferp, symbol" up to 21.5.28
+; ("bar" 'symbol wrong-type-argument "integerp,
symbol")
+; ("bar" -1 invalid-argument
+; "match data register invalid, -1")
+; ("bar" 2 invalid-argument
+; "match data register not set, 2")
+; ))
+; (eval
+; `(Check-Error-Message ,(nth 2 test-case) ,(nth 3 test-case)
+; (replace-regexp-in-string "\\(foo\\).*\\'" ,(nth 0 test-case)
+; " foo foo" nil nil ,(nth 1 test-case)))))
+ ;; #### Can't test the message with w-t-a, see test-harness.el.
+ (Check-Error wrong-type-argument
+ (replace-regexp-in-string "\\(foo\\).*\\'"
+ "bar"
+ " foo foo" nil nil
+ 'symbol))
+ ;; #### Can't test the FROB (-1), see test-harness.el.
+ (Check-Error-Message invalid-argument
+ "match data register invalid"
+ (replace-regexp-in-string "\\(foo\\).*\\'"
+ "bar"
+ " foo foo" nil nil
+ -1))
+ ;; #### Can't test the FROB (-1), see test-harness.el.
+ (Check-Error-Message invalid-argument
+ "match data register not set"
+ (replace-regexp-in-string "\\(foo\\).*\\'"
+ "bar"
+ " foo foo" nil nil
+ 2))
+ )))
+
+;; replace-match (REPLACEMENT &optional FIXEDCASE LITERAL STRING STRBUFFER)
+
+;; #### Write some tests! Much functionality is implicitly tested above
+;; via `replace-regexp-in-string', but we should specifically test bogus
+;; combinations of STRING and STRBUFFER.
+
Index: tests/automated/test-harness.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/tests/automated/test-harness.el,v
retrieving revision 1.22
diff -u -r1.22 test-harness.el
--- tests/automated/test-harness.el 24 Jun 2006 14:30:39 -0000 1.22
+++ tests/automated/test-harness.el 1 Oct 2007 05:06:34 -0000
@@ -285,6 +285,10 @@
,quoted-body ',expected-error)
(incf no-error-failures))
(,expected-error
+ ;; #### Damn, this binding doesn't capture frobs, eg, for
+ ;; invalid_argument() ... you only get the REASON. And for
+ ;; wrong_type_argument(), there's no reason only FROBs.
+ ;; If this gets fixed, fix tests in regexp-tests.el.
(let ((error-message (second error-info)))
(if (string-match ,expected-error-regexp error-message)
(progn
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches