Hello!
I am proposing a patch to lisp/paragraphs.el to make the sentence-end
regexp more flexible.
The problem: Inline markup (as in XML) should not be considered part
of the sentence.
Example: Editing the document fragment
<para>One sentence. Another sentence.</para>
A user would see the following sentences:
<para>One sentence. Another sentence.</para>
\----first--/ \----second-----/
Emacs, in moving and killing, sees these:
<para>One sentence. Another sentence.</para>
\----first--/ \--------second--------/
Diagnosis: sentence-end consists of an end-marker and a
trailing-context part. These are not differentiated in the
definition. Instead, the function forward-sentence
searches for the entire regexp and then skips back over
whitespace.
Proposed fix: Mark within the sentence-end expression the end-marker
as the first subexpression. Jump to its match-end
instead of skipping backwards.
Limitations:
* Always looks at the first subexpression. If preceding context is
to be considered, that may have to contain subexpressions. This
may be alleviated using shy grouping, but that is not very
portable.
* Probably breaks every sentence-end redefinition without the
possibility of detection. Only the rare case where the end-marker
happened to be the first group would survive.
* Addresses only sentence ends, not sentence beginnings.
* Does not fix/change other instances of whitespace skipping. These
skips always seem to use space, tab, and newline hard-coded in the
code, neglecting the syntax-table.
Alternatively, this might be turned into a sgml-forward-sentence,
since that is the language family most prominently affected. But
forward-sentence is used in several other places (kill-sentence, for
example), that would have to be overwritten, too.
For completeness, here's the sentence-end regexp I use in xml-mode:
"\\([.?!]\\)[]\"')}]*\\($\\| $\\|\t\\| \\|</[a-zA-Z:_-]*>\\)\\([
\t\n]\\|</[a-zA-Z:_-]*>\\)*"
Comments?
diff -u -r1.1 paragraphs.el
--- paragraphs.el 2002/11/12 16:51:46 1.1
+++ paragraphs.el 2002/11/12 16:57:28
@@ -134,10 +134,16 @@
ensures that the paragraph functions will work equally within a region of
text indented by a margin setting.")
-(defconst sentence-end "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
"\
+(defconst sentence-end "\\([.?!]\\)[]\"')}]*\\($\\| $\\|\t\\| \\)[
\t\n]*" "\
*Regexp describing the end of a sentence.
All paragraph boundaries also end sentences, regardless.
+Mark the actual end of the sentence as the first subexpression.
+Usually, this would be the sentence-ending punctuation. The remainder
+of the regexp then specifies required matching context. If you have
+to use subexpressions before the `sentence-end', use the shy grouping
+operator \(?:...\) in XEmacs.
+
In order to be recognized as the end of a sentence, the ending period,
question mark, or exclamation point must be followed by two spaces,
unless it's inside some sort of quotes or parenthesis.")
@@ -352,8 +358,8 @@
(end-of-paragraph-text))))))
(defun forward-sentence (&optional arg)
- "Move forward to next `sentence-end'. With argument, repeat.
-With negative argument, move backward repeatedly to `sentence-beginning'.
+ "Move forward to next `sentence-end'. With ARG, repeat.
+With negative ARG, move backward repeatedly to `sentence-beginning'.
The variable `sentence-end' is a regular expression that matches ends of
sentences. A paragraph boundary also terminates a sentence."
@@ -361,6 +367,9 @@
(or arg (setq arg 1))
(while (< arg 0)
(let ((par-beg (save-excursion (start-of-paragraph-text) (point))))
+ ;; Not good: The concatenated string corresponds to the
+ ;; whitespace list at the end of the sentence-end regular
+ ;; expression.
(if (re-search-backward (concat sentence-end "[^ \t\n]") par-beg t)
(goto-char (1- (match-end 0)))
(goto-char par-beg)))
@@ -368,7 +377,12 @@
(while (> arg 0)
(let ((par-end (save-excursion (end-of-paragraph-text) (point))))
(if (re-search-forward sentence-end par-end t)
- (skip-chars-backward " \t\n")
+ ;; If this happens to be used in a context where (a) no shy
+ ;; grouping is available and (b) there must be a group
+ ;; before the sentence-ending punctuation (as in "\(sentence
+ ;; type a\|sentence type b\)\([punct]\)"), the `1' would
+ ;; have to be replaced by something configurable.
+ (goto-char (match-end 1))
(goto-char par-end)))
(setq arg (1- arg))))
Best wishes,
--
Felix H. Gatzemeier fxg(a)i3.informatik.rwth-aachen.de
Office Phone: (0(049)241)80-21313
Disclaimer: I do not speak for anyone but myself.
Please do not send me mails containing documents in proprietary
formats (such as Microsoft Word) unless you really need to.