On Jan 2, 2011, at 10:53 PM, Stephen J. Turnbull wrote:
> It also sure would be nice if copying a regexp from some random
> piece of perl code and pasting it into emacs-lisp code just worked,
I'm not sure that can happen. You'll need to wrap it either with a
function or a variable binding because Elisp will have to default to
Emacs regexps for the forseeable future. Is such wrapping good
enough?
A real world example: here's some Perl code to canonicalize the various forms of
Youtube URLs:
# Youtube /watch?v= or /watch#!v= or /v/ URLs, with or without subdomain,
# or possibly on
youtube-nocookie.com.
if ($url =~ m@^http:// (?:[a-z]+\.)? (youtube) (?:-nocookie)? \.com/
(?: (?: watch )? (?: \? | \#! ) v= | v/ )
([^<>?&,'"]+) ($|&) @sx) {
my ($site, $id) = ($1, $2);
$url = "http://www.$site.com/watch?v=$id";
Wouldn't it be nice to cut and paste that hairy regexp into Emacs-Lisp unchanged:
(if (string-match
#R@^http:// (?:[a-z]+\.)? (youtube) (?:-nocookie)? \.com/
(?: (?: watch )? (?: \? | \#! ) v= | v/ )
([^<>?&,'"]+) ($|&) @sx
url)
(setq url (replace-match #r"http://www.\2.com/watch?v=\1" nil nil url)))
(Oooh, and what if replace-match also hacked $foo into (symbol-value 'foo)? Sweet!)
So, "#r" can mean "just do backslash hacking", as now, for historical
compatibility, and the new "#R" can mean "read this in Perl syntax".
Pretty random (although Lars came up with the same kind of thing, so
maybe ...). But remember, #r is reader syntax. How do you propose
conveying those flags to the regexp compiler and/or search driver? As
string properties, maybe. Seems unlispy to me.
It's no more unlispy than any reader macro. The contract of the "#R" macro
is that the character following the "R" is a double-quote-like delimiter, like
in the Common Lisp |foo| symbol-quoting syntax or #|foo|# block-comment syntax. So the
flow for the C code implementing "#R" would be:
- read the next character, number 3 (usually /, sometimes @, sometimes something weird).
- copy all literal bytes until the next occurrence of that character: that's the
regexp string.
- read following bytes until whitespace or EOF: that's the set of post-regexp flags.
- return Funcall ("perl-regexp-to-emacs-regexp", regexp_string, regexp_flags);
That function can either mechanically transform the string into an Emacs-syntax string, or
it can return a new object of with type-of "perl-regexp" which prints itself
with #R syntax and which string-match accepts as an argument. (Maybe perl-regexp objects
can also be typep 'string.)
(I can't even imagine how much RMS would object to such "unnecessary
complexity". Oh wait, yes I can!)
And what about #r/foo\bar/unknown -- would that be a syntax error?
But then you'd need to embed knowledge of the flag syntax in the lexer.
#'perl-regexp-to-emacs-regexp just signals, resulting in a parse error.
Incidentally, #r is inconsistent, as it has no way to read a string ending in a backslash
or a double-quote. That is, the rules for parsing character N differ from the rules for
characters [0-N). A more consistent grammar would prohibit \" within #r, which is
exactly why sed-syntax regexps allow arbitrary quote-characters after the "s".
#r"\"X" -> "\\\"X" [3 bytes]
#r"\"" -> "\\\"" [2 bytes]
#r"\\X" -> "\\\\X" [3 bytes]
#r"\\" -> "\\\\" [2 bytes]
#r"\X" -> "\\X" [2 bytes]
#r"\" -> EOF
--
Jamie Zawinski
http://www.jwz.org/ http://www.dnalounge.com/
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta