Brian Palmer wrote:
On 10/3/05, *Ben Wing* <ben(a)666.com <mailto:ben@666.com>>
wrote:
-- [4] the perl regexp \p syntax should be adopted for referencing
charsets. (char categories just suck.) for that matter, we should
move
in the direction of being as perl-compatible as possible with our
regexps, since that is where the world is going. (cf java, python,
ruby,
c#, ...) the big problem here is \( and (, which are backwards. the
only reasonable solutions i can see are [a] a global variable to
control
which kinds of regexps are used; [b] a double set of all functions
that
take regexps. comments?
The other big problem is that using strings to encode regexes is
backslash pain. I've long thought a way to shift to pcre would be to
extend the reader with something like
#r/ re(g|u) lar ... /
where inside the delimiters (/ and /, for example), it works mostly
like perl's quoting and regex rules. This could
o either give a new regex object type, which would mean a bunch
of functions that accept regexes would need to switch on the argument
type (which is a fair amount of work, probably mostly for 3rd-party
libraries that are already switching on argument type and assume that
regular expressions would be strings);
o or it could result in a string that is boiled down to regular
elisp regex syntax. This might complicate using the added power of pcre.
An added advantage of that sort of reader macro would be that it's in
principle extendable:
#r:rx/.../ to use the rx library, #r:pcre/ ... / , which would be sort
of ugly, but nicer than simple strings with no room for metadata.
Of course, metadata on regex strings could always be done with string
properties.
I've looked a little bit at what it'd take to add this sort of macro,
but like many things in xemacs's source, lread.c defeats me.
i think we should follow the lead of python and C# and allow for "raw"
aka "verbatim" strings to be defined. python's syntax is
foo = r"\(.*\)\s*\\?$"
or
foo = r'\(.*\)\s*\\?$'
or
foo = r"""\(.*\)\s*\\?$"""
[let's say that this perl/python/c# style regexps, and means to match an
expression in parens followed by optional whitespace followed by an
optional blackslash at EOL.]
python has multiple string delimiters. the triple-quote version does
something different for them it doesn't matter too much exactly what;
for our purposes, we could treat them all the same.
c# would write:
foo = @"\(.*\)\s*\\?$";
same idea.
these just return strings, and are not tied to the regexp engine; so you
could also write
foo = r"c:\new\test.txt"
and not worry about \n and \t getting interpreted wrongly.
stephen?
ben