On 10/3/05, Ben Wing <ben@666.com> wrote:
-- [4] the perl regexp \p syntax should be adopted for referencing
charsets. (char categories just suck.) for that matter, we should move
in the direction of being as perl-compatible as possible with our
regexps, since that is where the world is going. (cf java, python, ruby,
c#, ...) the big problem here is \( and (, which are backwards.  the
only reasonable solutions i can see are [a] a global variable to control
which kinds of regexps are used; [b] a double set of all functions that
take regexps.  comments?

The other big problem is that using strings to encode regexes is backslash pain. I've long thought a way to shift to pcre would be to extend the reader with something like
#r/ re(g|u) lar ... /
where inside the delimiters (/ and /, for example), it works mostly like perl's quoting and regex rules. This could
   o     either give a new regex object type, which would mean a bunch of functions that accept regexes would need to switch on the argument type (which is a fair amount of work, probably mostly for 3rd-party libraries that are already switching on argument type and assume that regular expressions would be strings);
   o     or it could result in a string that is boiled down to regular elisp regex syntax. This might complicate using the added power of pcre.
 
An added advantage of that sort of reader macro would be that it's in principle extendable:
#r:rx/.../ to use the rx library, #r:pcre/ ... / , which would be sort of ugly, but nicer than simple strings with no room for metadata.

Of course, metadata on regex strings could always be done with string properties.

I've looked a little bit at what it'd take to add this sort of macro, but like many things in xemacs's source, lread.c defeats me.