Ben Wing <ben(a)666.com> writes:
-- [1] at some point, use extent properties to track the language of
a
text. this is well-recognized.
I'm a bit hazy on the concept of tracking "language". How is that
supposed to work, exactly? I mean, a word processor can do it because
it has a chance to save its markup when saving the document. Emacs
works, in most cases, with bare characters, or with charset (not
language) annotations, as is the case with coding cookies or with Gnus
processing MIME messages.
-- [4] the perl regexp \p syntax should be adopted for referencing
charsets. (char categories just suck.) for that matter, we should move
in the direction of being as perl-compatible as possible with our
regexps, since that is where the world is going. (cf java, python,
ruby, c#, ...)
It's true that the world is moving to Perl-compatible regexps. Note,
however, that everyone chooses a subset they like -- implementing the
whole thing is next to impossible. Also note that Perl itself is
moving *away* from Perl regexps: see Apocalypse 5.
the big problem here is \( and (, which are backwards. the only
reasonable solutions i can see are [a] a global variable to control
which kinds of regexps are used; [b] a double set of all functions
that take regexps. comments?
The problem with [a] is that library functions can and do use regexps,
and setting the variable to something they don't expect will break
them. This is already the case with case-fold-search, but that one is
well-known to library authors. Introducing a new one would break huge
amounts of code.
I agree with Stephen that The Right Thing would be to expose "compiled
regexps" to Lisp. Python's "re" module provides an example of how
this can be done.