Ar an chéad lá is fiche de mí Feabhra, scríobh Stephen J. Turnbull:
The semantics of a regexp that matches only the empty string at
point
are well-defined: the replacement string should be inserted once.
OTOH, if it matches a non-empty string, it should replace that string
_once_ (and due to the default greedy semantics, will prefer it to the
null match).
Well-defined by Perl and other regexp engines, you mean? For example, this
does nothing of the kind:
(with-string-as-buffer-contents
"ac" (replace-regexp "a*" "b"))
=>"ac"
Its Perl equivalent does do what you say, though (ignoring that your example
result didn’t pay attention to the starting anchor empty match);
ns5 [ perl -ne 'chop; s/a*/b/g; print "=>$_\n";'
ac
=>bbcb
ns5 [
Ditto for PHP;
derrick [ php -r 'print
"=>".ereg_replace("a*","b","ac")."\n";'
=>bbcb
derrick [
[...] But I would argue the algorithm should match the null string
at
point iff it has moved since the last replacement. So it matches "a"
(because it's greedy) and replaces it. Now looking at "c" it matches the
null string, but it hasn't moved so it should ignore that match.
Skipping past ?c, it now is looking at "", which it does match, and
should be replaced by "b", giving "bcb".
We should review the C code for this bug (the lack of trailing "b" is
arguably convenient, but the replacement of a single "a" with "bb"
is
definitely a bug, right?)
Yeah, when you take the above regexp intepretation as the right one, which
is exactly what we should do.
We should review buffer code for this bug.
We should check current GNU and notify them if they're busted.
[They are.]
21.4 will need these patches, I'm pretty sure.
Whee.
--
“Ah come on now Ted, a Volkswagen with a mind of its own, driving all over
the place and going mad, if that’s not scary I don’t know what is.”