This is the development of
http://mid.gmane.org/18509.34907.513276.700290@parhasard.net . The patch as
shown in this mail is the output of hg outgoing -npM , which excludes
merges; I’ve attached a 120k compressed version of the patch with the
merges.
The speed of this implementation is not acceptable for interactive use. I
want to commit it soon despite that, because a) in its current version
interactive use does not involve the code in the patch and b) though I am
not particularly optimistic, it offers an opportunity for other people
besides me to move the #'query-coding-region implementations of the coding
systems into C. It certainly offers people the opportunity to examine the
architecture.
It includes some work on compatibility with GNU. I would like to add more,
but I look at comments like:
;; The text contains only ASCII characters. Any coding
;; systems are safe.
and I do not want to rush into implementing anything that excludes us
implementing EBCDIC or the GSM alphabet (which, for example, can’t encode
most of the control characters below #x20).
comparing with /Sources/xemacs-21.5-checked-out
searching for changes
changeset: 4559:e6a7054a9c30
tag: tip
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sun Dec 28 22:51:14 2008 +0000
summary: Add check-coding-systems-region, test it and others, fix some bugs.
diff -r 80e0588fb42f -r e6a7054a9c30 lisp/ChangeLog
--- a/lisp/ChangeLog Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/ChangeLog Sun Dec 28 22:51:14 2008 +0000
@@ -1,3 +1,28 @@
+2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * coding.el (query-coding-region):
+ (query-coding-string):
+ Make these defsubsts, they're short enough and they're called
+ explicitly rarely enough that it makes some sense. The alternative
+ would be compiler macros that avoid the binding of the arguments.
+ (unencodable-char-position):
+ Document where the docstring and API are from.
+ Correct a special case for zero--check-argument-type returns nil
+ when it succeeds, we can't usefully chain its result in an and
+ here.
+ (check-coding-systems-region): New. API taken from GNU; docstring
+ and implementation are independent.
+ (encode-coding-char):
+ Add an optional third argument, as used by recent GNU. Document
+ the origen of the docstring.
+ (default-query-coding-region): Add a short docstring to the
+ non-Mule implementation of this function.
+ * unicode.el:
+ Don't set the query-coding-function property for unicode coding
+ systems if we're on non-mule. Unintern
+ unicode-query-coding-region, unicode-query-coding-skip-chars-arg
+ in the same context.
+
2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
* coding.el (default-query-coding-region):
diff -r 80e0588fb42f -r e6a7054a9c30 lisp/coding.el
--- a/lisp/coding.el Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/coding.el Sun Dec 28 22:51:14 2008 +0000
@@ -398,7 +398,7 @@
(values nil ranges)
(values t nil))))))
-(defun query-coding-region (start end coding-system &optional buffer
+(defsubst query-coding-region (start end coding-system &optional buffer
errorp highlight)
"Work out whether CODING-SYSTEM can losslessly encode a region.
@@ -423,7 +423,7 @@
#'default-query-coding-region)
start end coding-system buffer errorp highlight))
-(defun query-coding-string (string coding-system &optional errorp highlight)
+(defsubst query-coding-string (string coding-system &optional errorp highlight)
"Work out whether CODING-SYSTEM can losslessly encode STRING.
CODING-SYSTEM is the coding system to check.
@@ -446,6 +446,7 @@
;; ### Will highlight work here?
errorp highlight)))
+;; Function docstring and API are taken from GNU coding.c version 1.353, GPLv2.
(defun unencodable-char-position (start end coding-system
&optional count string)
"Return position of first un-encodable character in a region.
@@ -486,9 +487,9 @@
(check-argument-type #'integer-or-marker-p start)
(check-argument-type #'integer-or-marker-p end)
(check-coding-system coding-system)
- (and count (check-argument-type #'natnump count)
- ;; Special-case zero, sigh.
- (if (zerop count) (setq count 1)))
+ (when count (check-argument-type #'natnump count)
+ ;; Special-case zero, sigh.
+ (if (zerop count) (setq count 1)))
(and string (check-argument-type #'stringp string))
(if string
(with-temp-buffer
@@ -496,9 +497,64 @@
(funcall thunk start end coding-system count))
(funcall thunk start end coding-system count))))
-(defun encode-coding-char (char coding-system)
+;; XEmacs; this is a GPLv3 function in coding.c in GNU. This is why we have
+;; both a very divergent docstring and a very divergent implementation.
+(defun check-coding-systems-region (begin end coding-system-list)
+ "Can coding systems in CODING-SYSTEM-LIST encode text in a region?
+
+CODING-SYSTEM-LIST must be a list of coding systems. BEGIN and END are
+normally buffer positions delimiting the region. If some coding system in
+CODING-SYSTEM-LIST cannot encode the entire region, the return value of this
+function is an alist mapping coding system names to lists of individual
+buffer positions (not ranges) that the individual coding systems cannot
+encode.
+
+If all coding systems in CODING-SYSTEM-LIST can encode the region,
+this function returns t. This conflicts with the documented, but not
+with the observed, GNU behavior.
+
+If BEGIN is a string, `check-coding-systems-region' ignores END, and checks
+whether the coding systems can encode BEGIN. The alist that is returned
+uses zero-based string indices, not one-based buffer positions.
+
+This function is for GNU compatibility. See also `query-coding-region'."
+ (let ((thunk
+ #'(lambda (begin end coding-system-list stringp)
+ (loop
+ for coding-system in coding-system-list
+ with result = nil
+ with intermediate = nil
+ with range-lambda = (if stringp
+ #'(lambda (begin end value)
+ (while (< begin end)
+ (push (1- begin) intermediate)
+ (incf begin)))
+ #'(lambda (begin end value)
+ (while (< begin end)
+ (push begin intermediate)
+ (incf begin))))
+ do (setq coding-system (check-coding-system coding-system))
+ (multiple-value-bind (encoded ranges)
+ (query-coding-region begin end coding-system)
+ (unless encoded
+ (setq intermediate (list (coding-system-name coding-system)))
+ (map-range-table range-lambda ranges)
+ (push (nreverse intermediate) result)))
+ finally return (or result t)))))
+ (if (stringp begin)
+ (with-temp-buffer
+ (insert begin)
+ (funcall thunk (point-min) (point-max) coding-system-list t))
+ (check-argument-type #'integer-or-marker-p begin)
+ (check-argument-type #'integer-or-marker-p end)
+ (funcall thunk begin end coding-system-list nil))))
+
+;; XEmacs; docstring taken from GNU, international/mule-cmds.el, revision
+;; 1.311, GPLv2.
+(defun encode-coding-char (char coding-system &optional charset)
"Encode CHAR by CODING-SYSTEM and return the resulting string.
-If CODING-SYSTEM can't safely encode CHAR, return nil."
+If CODING-SYSTEM can't safely encode CHAR, return nil.
+The optional third argument CHARSET is, for the moment, ignored."
(check-argument-type #'characterp char)
(multiple-value-bind (succeededp)
(query-coding-string char coding-system)
@@ -509,7 +565,9 @@
;; If we're under non-Mule, every XEmacs character can be encoded
;; with every XEmacs coding system.
(fset #'default-query-coding-region
- #'(lambda (&rest ignored) (values t nil)))
+ #'(lambda (&rest ignored)
+ "Stub `query-coding-region' implementation. Always succeeds."
+ (values t nil)))
(unintern 'default-query-coding-region-safe-charset-skip-chars-map))
;;; coding.el ends here
diff -r 80e0588fb42f -r e6a7054a9c30 lisp/unicode.el
--- a/lisp/unicode.el Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/unicode.el Sun Dec 28 22:51:14 2008 +0000
@@ -678,6 +678,7 @@
(loop
for coding-system in (coding-system-list)
+ initially (unless (featurep 'mule) (return))
do (when (eq 'unicode (coding-system-type coding-system))
(coding-system-put coding-system 'query-coding-function
#'unicode-query-coding-region)))
@@ -691,7 +692,8 @@
(mapcar #'unintern
'(ccl-encode-to-ucs-2 unicode-error-default-translation-table
unicode-invalid-regexp-range frob-unicode-errors-region
- unicode-error-translate-region)))
+ unicode-error-translate-region unicode-query-coding-region
+ unicode-query-coding-skip-chars-arg)))
;; #### UTF-7 is not yet implemented, and it's tricky to do. There's
;; an implementation in appendix A.1 of the Unicode Standard, Version
diff -r 80e0588fb42f -r e6a7054a9c30 tests/ChangeLog
--- a/tests/ChangeLog Sun Dec 28 14:55:02 2008 +0000
+++ b/tests/ChangeLog Sun Dec 28 22:51:14 2008 +0000
@@ -1,3 +1,10 @@
+2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * automated/query-coding-tests.el:
+ Add tests for #'unencodable-char-position,
+ #'check-coding-systems-region, #'encode-coding-char. Remove some
+ debugging statements.
+
2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
* automated/query-coding-tests.el:
diff -r 80e0588fb42f -r e6a7054a9c30 tests/automated/query-coding-tests.el
--- a/tests/automated/query-coding-tests.el Sun Dec 28 14:55:02 2008 +0000
+++ b/tests/automated/query-coding-tests.el Sun Dec 28 22:51:14 2008 +0000
@@ -91,58 +91,31 @@
coding-system))
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-region (point-min) (point-max) coding-system)
- (q-c-debug "checking type, coding-system, q-c-s, q-c-t %S"
- (list (coding-system-type coding-system)
- coding-system query-coding-succeeded
- query-coding-table))
- (unless (and (eq t query-coding-succeeded)
- (null query-coding-table))
- (q-c-debug "(eq t query-coding-succeeded) %S, (\
-null query-coding-table) %S" (eq t query-coding-succeeded)
- (null query-coding-table)))
(Assert (eq t query-coding-succeeded))
(Assert (null query-coding-table)))
- (q-c-debug "testing the ASCII strings for %S" coding-system)
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-string ascii-chars-string coding-system)
- (unless (and (eq t query-coding-succeeded)
- (null query-coding-table))
- (q-c-debug "(eq t query-coding-succeeded) %S, (\
-null query-coding-table) %S" (eq t query-coding-succeeded)
- (null query-coding-table)))
(Assert (eq t query-coding-succeeded))
(Assert (null query-coding-table))))
- (q-c-debug "past the loop through the coding systems")
(delete-region (point-min) (point-max))
;; Check for success from the two Latin-1 coding systems
(insert latin-1-chars-string)
- (q-c-debug "point is now %S" (point))
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
(Assert (eq t query-coding-succeeded))
(Assert (null query-coding-table)))
- (q-c-debug "point is now %S" (point))
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-string (buffer-string) 'iso-8859-1-unix)
(Assert (eq t query-coding-succeeded))
(Assert (null query-coding-table)))
- (q-c-debug "point is now %S" (point))
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix)
(Assert (eq t query-coding-succeeded))
(Assert (null query-coding-table)))
- (q-c-debug "point is now %S" (point))
;; Make it fail, check that it fails correctly
(insert (decode-char 'ucs #x20AC)) ;; EURO SIGN
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
- (unless (and (null query-coding-succeeded)
- (equal query-coding-table
- #s(range-table type start-closed-end-open data
- ((257 258) t))))
- (q-c-debug "dealing with %S" 'iso-8859-1-unix)
- (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
(Assert (null query-coding-succeeded))
(Assert (equal query-coding-table
#s(range-table type start-closed-end-open data
@@ -153,12 +126,6 @@
;; Stupidly, this succeeds. The behaviour is compatible with
;; GNU, though, and we encourage people not to use
;; iso-latin-1-with-esc-unix anyway:
-
- (unless (and query-coding-succeeded
- (null query-coding-table))
- (q-c-debug "dealing with %S" 'iso-latin-1-with-esc-unix)
- (q-c-debug "query-coding-succeeded %S, query-coding-table \
-%S" query-coding-succeeded query-coding-table))
(Assert query-coding-succeeded)
(Assert (null query-coding-table)))
;; Check that it errors correctly.
@@ -186,13 +153,6 @@
(insert ?\x80)
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-region (point-min) (point-max) 'windows-1252-unix)
- (unless (and (null query-coding-succeeded)
- (equal query-coding-table
- #s(range-table type start-closed-end-open data
- ((257 258) t))))
- (q-c-debug "dealing with %S" 'windows-1252-unix)
- (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
(Assert (null query-coding-succeeded))
(Assert (equal query-coding-table
#s(range-table type start-closed-end-open data
@@ -212,17 +172,6 @@
(Assert (null query-coding-table)))
(multiple-value-bind (query-coding-succeeded query-coding-table)
(query-coding-region (point-min) (point-max) 'windows-1252-unix)
- (unless (and (null query-coding-succeeded)
- (equal query-coding-table
- #s(range-table type start-closed-end-open
- data ((129 131) t (132 133) t
- (139 140) t (141 146) t
- (155 156) t (157 161) t
- (162 170) t (173 176) t
- (178 187) t (189 192) t
- (193 257) t))))
- (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
(Assert (null query-coding-succeeded))
(Assert (equal query-coding-table
#s(range-table type start-closed-end-open
@@ -290,4 +239,68 @@
(query-coding-region (point-min) 173 coding-system nil t)
(text-conversion-error
(setq text-conversion-error-signalled t)))
- (Assert (null text-conversion-error-signalled))))))
+ (Assert (null text-conversion-error-signalled)))
+
+ ;; Now to test #'encode-coding-char. Most of the functionality was
+ ;; tested in the query-coding-region tests above, so we don't go into
+ ;; as much detail here.
+ (Assert (null (encode-coding-char
+ (decode-char 'ucs #x20ac) 'iso-8859-1)))
+ (Assert (equal "\x80" (encode-coding-char
+ (decode-char 'ucs #x20ac) 'windows-1252)))
+ (delete-region (point-min) (point-max))
+
+ ;; And #'unencodable-char-position.
+ (insert latin-1-chars-string)
+ (insert (decode-char 'ucs #x20ac))
+ (Assert (= 257 (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1)))
+ (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 1)))
+ ;; Compatiblity, sigh:
+ (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 0)))
+ (dotimes (i 6) (insert (decode-char 'ucs #x20ac)))
+ ;; Check if it stops at one:
+ (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 1)))
+ ;; Check if it stops at four:
+ (Assert (equal '(260 259 258 257)
+ (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 4)))
+ ;; Check whether it stops at seven:
+ (Assert (equal '(263 262 261 260 259 258 257)
+ (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 7)))
+ ;; Check that it still stops at seven:
+ (Assert (equal '(263 262 261 260 259 258 257)
+ (unencodable-char-position (point-min) (point-max)
+ 'iso-8859-1 2000)))
+ ;; Now, #'check-coding-systems-region.
+ ;; UTF-8 should certainly be able to encode these characters:
+ (Assert (eq t (check-coding-systems-region (point-min) (point-max)
+ '(utf-8))))
+ (Assert (equal '((iso-8859-1 257 258 259 260 261 262 263)
+ (windows-1252 129 131 132 133 134 135 136 137 138 139
+ 140 141 143 146 147 148 149 150 151 152
+ 153 154 155 156 157 159 160))
+ (sort
+ (check-coding-systems-region (point-min) (point-max)
+ '(utf-8 iso-8859-1
+ windows-1252))
+ ;; (The sort is to make the algorithm irrelevant.)
+ #'(lambda (left right)
+ (string< (car left) (car right))))))
+ ;; Ensure that the indices are all decreased by one when passed a
+ ;; string:
+ (Assert (equal '((iso-8859-1 256 257 258 259 260 261 262)
+ (windows-1252 128 130 131 132 133 134 135 136 137 138
+ 139 140 142 145 146 147 148 149 150 151
+ 152 153 154 155 156 158 159))
+ (sort
+ (check-coding-systems-region (buffer-string) nil
+ '(utf-8 iso-8859-1
+ windows-1252))
+ #'(lambda (left right)
+ (string< (car left) (car right)))))))))
+
changeset: 4513:1d74a1d115ee
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sun Dec 28 14:46:24 2008 +0000
summary: Add #'query-coding-region tests; do the work necessary to get them
running.
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/ChangeLog
--- a/lisp/ChangeLog Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/ChangeLog Sun Dec 28 14:46:24 2008 +0000
@@ -1,3 +1,57 @@
+2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * coding.el (default-query-coding-region):
+ Declare using defun*, so we can #'return-from to it on
+ encountering a safe-charsets value of t. Comment out a few
+ debug messages.
+ (query-coding-region):
+ Correct the docstring, it deals with a region, not a string.
+ (unencodable-char-position):
+ Correct the implementation for non-nil COUNT, special-case a zero
+ value for count, treat it as one. Don't rely on dynamic scope when
+ calling the main lambda.
+ * unicode.el (unicode-query-coding-region):
+ Comment out some debug messages here.
+ * mule/mule-coding.el (8-bit-fixed-query-coding-region):
+ Comment out some debug messages here.
+
+ * code-init.el (raw-text):
+ Add a safe-charsets property to this coding system.
+ * mule/korean.el (iso-2022-int-1):
+ * mule/korean.el (euc-kr):
+ * mule/korean.el (iso-2022-kr):
+ Add safe-charsets properties for these coding systems.
+ * mule/japanese.el (iso-2022-jp):
+ * mule/japanese.el (jis7):
+ * mule/japanese.el (jis8):
+ * mule/japanese.el (shift-jis):
+ * mule/japanese.el (iso-2022-jp-1978-irv):
+ * mule/japanese.el (euc-jp):
+ Add safe-charsets properties for all these coding systems.
+ * mule/iso-with-esc.el:
+ Add safe-charsets properties to all the coding systems in
+ here. Comment on the downside of a safe-charsets value of t for
+ iso-latin-1-with-esc.
+ * mule/hebrew.el (ctext-hebrew):
+ Add a safe-charsets property for this coding system.
+ * mule/devanagari.el (in-is13194-devanagari):
+ Add a safe-charsets property for this coding system.
+ * mule/chinese.el (cn-gb-2312):
+ * mule/chinese.el (hz-gb-2312):
+ * mule/chinese.el (big5):
+ Add safe-charsets properties for these coding systems.
+ * mule/latin.el (iso-8859-14):
+ Add an implementation for this, using #'make-8-bit-coding-system.
+ * mule/mule-coding.el (ctext):
+ * mule/mule-coding.el (iso-2022-8bit-ss2):
+ * mule/mule-coding.el (iso-2022-7bit-ss2):
+ * mule/mule-coding.el (iso-2022-jp-2):
+ * mule/mule-coding.el (iso-2022-7bit):
+ * mule/mule-coding.el (iso-2022-8):
+ * mule/mule-coding.el (escape-quoted):
+ * mule/mule-coding.el (iso-2022-lock):
+ Add safe-charsets properties for all these coding systems.
+
2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
* mule/mule-coding.el (make-8-bit-coding-system):
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/code-init.el
--- a/lisp/code-init.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/code-init.el Sun Dec 28 14:46:24 2008 +0000
@@ -394,4 +394,6 @@
(reset-language-environment)
+(coding-system-put 'raw-text 'safe-charsets '(ascii control-1
latin-iso8859-1))
+
;;; code-init.el ends here
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/coding.el
--- a/lisp/coding.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/coding.el Sun Dec 28 14:46:24 2008 +0000
@@ -300,8 +300,8 @@
(extent-face extent))
(delete-extent extent))) buffer begin end))
-(defun default-query-coding-region (begin end coding-system
- &optional buffer errorp highlightp)
+(defun* default-query-coding-region (begin end coding-system
+ &optional buffer errorp highlightp)
"The default `query-coding-region' implementation.
Uses the `safe-charsets' and `safe-chars' coding system properties.
@@ -324,8 +324,11 @@
(gethash safe-charsets
default-query-coding-region-safe-charset-skip-chars-map))
(ranges (make-range-table))
- fail-range-start fail-range-end previous-fail char-after
+ fail-range-start fail-range-end char-after
looking-at-arg failed extent)
+ ;; Coding systems with a value of t for safe-charsets support everything.
+ (when (eq t safe-charsets)
+ (return-from default-query-coding-region (values t nil)))
(unless skip-chars-arg
(setq skip-chars-arg
(puthash safe-charsets
@@ -355,9 +358,9 @@
(goto-char begin buffer)
(skip-chars-forward skip-chars-arg end buffer)
(while (< (point buffer) end)
- (message
- "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
- fail-range-start previous-fail (point buffer) end)
+ ; (message
+ ; "fail-range-start is %S, point is %S, end is %S"
+ ; fail-range-start (point buffer) end)
(setq char-after (char-after (point buffer) buffer)
fail-range-start (point buffer))
(while (and
@@ -411,8 +414,8 @@
This function returns a list; the intention is that callers use
`multiple-value-bind' or the related CL multiple value functions to deal
-with it. The first element is `t' if the string can be encoded using
-CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string
+with it. The first element is `t' if the region can be encoded using
+CODING-SYSTEM, or `nil' if not. The second element is `nil' if the region
can be encoded using CODING-SYSTEM; otherwise, it is a range table
describing the positions of the unencodable characters. See
`make-range-table'."
@@ -456,33 +459,42 @@
If optional 5th argument STRING is non-nil, it is a string to search
for un-encodable characters. In that case, START and END are indexes
in the string."
- (flet ((thunk ()
- (multiple-value-bind (result ranges)
- (query-coding-region start end coding-system)
- (if result
- ;; If query-coding-region thinks the entire region is
- ;; encodable, result will be t, and the thunk should
- ;; return nil, because there are no unencodable
- ;; positions in the region.
- nil
- (if count
- (block counted
- (map-range-table
- #'(lambda (begin end value)
- (while (and (<= begin end) (<= begin count))
- (push begin result)
- (incf begin))
- (if (> begin count) (return-from counted)))
- ranges))
- (map-range-table
- #'(lambda (begin end value)
- (while (<= begin end)
- (push begin result)
- (incf begin))) ranges))
- result))))
+ (let ((thunk
+ #'(lambda (start end coding-system &optional count)
+ (multiple-value-bind (result ranges)
+ (query-coding-region start end coding-system)
+ (if result
+ nil
+ (block worked-it-all-out
+ (if count
+ (map-range-table
+ #'(lambda (begin end value)
+ (while (and (< begin end)
+ (< (length result) count))
+ (push begin result)
+ (incf begin))
+ (when (= (length result) count)
+ (return-from worked-it-all-out result)))
+ ranges)
+ (map-range-table
+ #'(lambda (begin end value)
+ (return-from worked-it-all-out begin))
+ ranges))
+ (assert (not (null count)) t
+ "We should never reach this point with null COUNT.")
+ result))))))
+ (check-argument-type #'integer-or-marker-p start)
+ (check-argument-type #'integer-or-marker-p end)
+ (check-coding-system coding-system)
+ (and count (check-argument-type #'natnump count)
+ ;; Special-case zero, sigh.
+ (if (zerop count) (setq count 1)))
+ (and string (check-argument-type #'stringp string))
(if string
- (with-temp-buffer (insert string) (thunk))
- (thunk))))
+ (with-temp-buffer
+ (insert string)
+ (funcall thunk start end coding-system count))
+ (funcall thunk start end coding-system count))))
(defun encode-coding-char (char coding-system)
"Encode CHAR by CODING-SYSTEM and return the resulting string.
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/chinese.el
--- a/lisp/mule/chinese.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/chinese.el Sun Dec 28 14:46:24 2008 +0000
@@ -157,6 +157,7 @@
charset-g1 chinese-gb2312
charset-g2 chinese-sisheng
charset-g3 t
+ safe-charsets (ascii chinese-gb2312 chinese-sisheng)
mnemonic "Zh-GB/EUC"
documentation
"Chinese EUC (Extended Unix Code), the standard Chinese encoding on Unix.
@@ -190,6 +191,7 @@
"Hz/ZW (Chinese)"
'(mnemonic "Zh-GB/Hz"
eol-type lf
+ safe-charsets (ascii chinese-gb2312)
post-read-conversion post-read-decode-hz
pre-write-conversion pre-write-encode-hz
documentation "Hz/ZW 7-bit encoding for Chinese GB2312 (MIME:HZ-GB-2312)"
@@ -259,6 +261,7 @@
'big5 'big5
"Big5"
'(mnemonic "Zh/Big5"
+ safe-charsets (ascii chinese-big5-1 chinese-big5-2)
documentation
"A non-modal encoding formed by five large Taiwanese companies
\(hence \"Big5\") to produce a character set and encoding for
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/devanagari.el
--- a/lisp/mule/devanagari.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/devanagari.el Sun Dec 28 14:46:24 2008 +0000
@@ -50,6 +50,7 @@
charset-g2 t
charset-g3 t
mnemonic "In-13194"
+ safe-charsets (ascii indian-is13194)
documentation
"8-bit encoding for ASCII (MSB=0) and IS13194-Devanagari (MSB=1)"
safe-charsets (ascii indian-is13194)
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/hebrew.el
--- a/lisp/mule/hebrew.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/hebrew.el Sun Dec 28 14:46:24 2008 +0000
@@ -92,6 +92,7 @@
charset-g1 hebrew-iso8859-8
charset-g2 t
charset-g3 t
+ safe-charsets (ascii hebrew-iso8859-8)
mnemonic "CText/Hbrw"
))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/iso-with-esc.el
--- a/lisp/mule/iso-with-esc.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/iso-with-esc.el Sun Dec 28 14:46:24 2008 +0000
@@ -28,6 +28,10 @@
;;; Code:
+;; It is not particularly reasonable that iso-latin-1-with-esc has a
+;; value of t for the safe-charsets property. We discourage its use,
+;; though, and this behaviour is compatible with GNU.
+
;;;###autoload
(define-coding-system-alias 'iso-latin-1-with-esc 'iso-2022-8)
@@ -38,6 +42,7 @@
charset-g1 latin-iso8859-2
charset-g2 t
charset-g3 t
+ safe-charsets (ascii latin-iso8859-2)
mnemonic "MIME/Ltn-2"))
;;;###autoload
@@ -47,6 +52,7 @@
charset-g1 latin-iso8859-3
charset-g2 t
charset-g3 t
+ safe-charsets (ascii latin-iso8859-3)
mnemonic "MIME/Ltn-3"))
;;;###autoload
@@ -56,6 +62,7 @@
charset-g1 latin-iso8859-4
charset-g2 t
charset-g3 t
+ safe-charsets (ascii latin-iso8859-4)
mnemonic "MIME/Ltn-4"))
;;;###autoload
@@ -63,6 +70,7 @@
'iso-latin-9-with-esc 'iso2022
"ISO 4873 conforming 8-bit code (ASCII + Latin 9; aka Latin-1 with Euro)"
'(mnemonic "MIME/Ltn-9" ; bletch
+ safe-charsets (ascii latin-iso8859-15)
eol-type nil
charset-g0 ascii
charset-g1 latin-iso8859-15
@@ -76,6 +84,7 @@
charset-g1 latin-iso8859-9
charset-g2 t
charset-g3 t
+ safe-charsets (ascii latin-iso8859-9)
mnemonic "MIME/Ltn-5"))
;;;###autoload
@@ -86,6 +95,7 @@
charset-g1 cyrillic-iso8859-5
charset-g2 t
charset-g3 t
+ safe-charsets (ascii cyrillic-iso8859-5)
mnemonic "ISO8/Cyr"))
;;;###autoload
@@ -97,6 +107,7 @@
charset-g2 t
charset-g3 t
no-iso6429 t
+ safe-charsets (ascii hebrew-iso8859-8)
mnemonic "MIME/Hbrw"))
;;;###autoload
@@ -106,6 +117,7 @@
charset-g1 greek-iso8859-7
charset-g2 t
charset-g3 t
+ safe-charsets (ascii greek-iso8859-7)
mnemonic "Grk"))
;; ISO 8859-6 is such a useless character set that it seems a waste of
@@ -201,5 +213,6 @@
charset-g2 t
charset-g3 t
no-iso6429 t
+ safe-charsets (ascii arabic-iso8859-6)
mnemonic "MIME/Arbc"))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/japanese.el
--- a/lisp/mule/japanese.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/japanese.el Sun Dec 28 14:46:24 2008 +0000
@@ -195,6 +195,8 @@
seven t
input-charset-conversion ((latin-jisx0201 ascii)
(japanese-jisx0208-1978 japanese-jisx0208))
+ safe-charsets (ascii japanese-jisx0208-1978 japanese-jisx0208
+ latin-jisx0201 japanese-jisx0212 katakana-jisx0201)
mnemonic "MULE/7bit"
documentation
"Coding system used for communication with mail and news in Japan."
@@ -210,6 +212,7 @@
lock-shift t
input-charset-conversion ((latin-jisx0201 ascii)
(japanese-jisx0208-1978 japanese-jisx0208))
+ safe-charsets (latin-jisx0201 ascii japanese-jisx0208-1978 japanese-jisx0208)
mnemonic "JIS7"
documentation
"Old JIS 7-bit encoding; mostly superseded by ISO-2022-JP.
@@ -224,6 +227,8 @@
short t
input-charset-conversion ((latin-jisx0201 ascii)
(japanese-jisx0208-1978 japanese-jisx0208))
+ safe-charsets (latin-jisx0201 ascii japanese-jisx0208-1978
+ japanese-jisx0208)
mnemonic "JIS8"
documentation
"Old JIS 8-bit encoding; mostly superseded by ISO-2022-JP.
@@ -261,6 +266,8 @@
"Shift-JIS"
'(mnemonic "Ja/SJIS"
documentation "The standard Japanese encoding in MS Windows."
+ safe-charsets (ascii japanese-jisx0208 japanese-jisx0208-1978
+ latin-jisx0201 katakana-jisx0201)
))
;; A former name?
@@ -286,6 +293,8 @@
seven t
output-charset-conversion ((ascii latin-jisx0201)
(japanese-jisx0208 japanese-jisx0208-1978))
+ safe-charsets (ascii latin-jisx0201 japanese-jisx0208
+ japanese-jisx0208-1978)
documentation
"This is a coding system used for old JIS terminals. It's an ISO
2022 based 7-bit encoding for Japanese JISX0208-1978 and JISX0201-Roman."
@@ -314,6 +323,7 @@
charset-g1 japanese-jisx0208
charset-g2 katakana-jisx0201
charset-g3 japanese-jisx0212
+ safe-charsets (ascii japanese-jisx0208 katakana-jisx0201 japanese-jisx0212)
short t
mnemonic "Ja/EUC"
documentation
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/korean.el
--- a/lisp/mule/korean.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/korean.el Sun Dec 28 14:46:24 2008 +0000
@@ -57,6 +57,7 @@
"ISO-2022-INT-1 (Korean)"
'(charset-g0 ascii
charset-g1 korean-ksc5601
+ safe-charsets (ascii korean-ksc5601)
short t
seven t
lock-shift t
@@ -92,6 +93,7 @@
'(charset-g0 ascii
charset-g1 korean-ksc5601
mnemonic "ko/EUC"
+ safe-charsets (ascii korean-ksc5601)
documentation
"Korean EUC (Extended Unix Code), the standard Korean encoding on Unix.
This follows the same overall EUC principles (see the description under
@@ -122,6 +124,7 @@
force-g1-on-output t
seven t
lock-shift t
+ safe-charsets (ascii korean-ksc5601)
mnemonic "Ko/7bit"
documentation "Coding-System used for communication with mail in Korea."
eol-type lf))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/latin.el
--- a/lisp/mule/latin.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/latin.el Sun Dec 28 14:46:24 2008 +0000
@@ -631,6 +631,43 @@
(#xDD #xFD) ;; Y WITH ACUTE
(#xDE #xFE))) ;; Y WITH CIRCUMFLEX
+(make-8-bit-coding-system
+ 'iso-8859-14
+ '((#xA1 ?\u1E02) ;; LATIN CAPITAL LETTER B WITH DOT ABOVE
+ (#xA2 ?\u1E03) ;; LATIN SMALL LETTER B WITH DOT ABOVE
+ (#xA4 ?\u010A) ;; LATIN CAPITAL LETTER C WITH DOT ABOVE
+ (#xA5 ?\u010B) ;; LATIN SMALL LETTER C WITH DOT ABOVE
+ (#xA6 ?\u1E0A) ;; LATIN CAPITAL LETTER D WITH DOT ABOVE
+ (#xA8 ?\u1E80) ;; LATIN CAPITAL LETTER W WITH GRAVE
+ (#xAA ?\u1E82) ;; LATIN CAPITAL LETTER W WITH ACUTE
+ (#xAB ?\u1E0B) ;; LATIN SMALL LETTER D WITH DOT ABOVE
+ (#xAC ?\u1EF2) ;; LATIN CAPITAL LETTER Y WITH GRAVE
+ (#xAF ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS
+ (#xB0 ?\u1E1E) ;; LATIN CAPITAL LETTER F WITH DOT ABOVE
+ (#xB1 ?\u1E1F) ;; LATIN SMALL LETTER F WITH DOT ABOVE
+ (#xB2 ?\u0120) ;; LATIN CAPITAL LETTER G WITH DOT ABOVE
+ (#xB3 ?\u0121) ;; LATIN SMALL LETTER G WITH DOT ABOVE
+ (#xB4 ?\u1E40) ;; LATIN CAPITAL LETTER M WITH DOT ABOVE
+ (#xB5 ?\u1E41) ;; LATIN SMALL LETTER M WITH DOT ABOVE
+ (#xB7 ?\u1E56) ;; LATIN CAPITAL LETTER P WITH DOT ABOVE
+ (#xB8 ?\u1E81) ;; LATIN SMALL LETTER W WITH GRAVE
+ (#xB9 ?\u1E57) ;; LATIN SMALL LETTER P WITH DOT ABOVE
+ (#xBA ?\u1E83) ;; LATIN SMALL LETTER W WITH ACUTE
+ (#xBB ?\u1E60) ;; LATIN CAPITAL LETTER S WITH DOT ABOVE
+ (#xBC ?\u1EF3) ;; LATIN SMALL LETTER Y WITH GRAVE
+ (#xBD ?\u1E84) ;; LATIN CAPITAL LETTER W WITH DIAERESIS
+ (#xBE ?\u1E85) ;; LATIN SMALL LETTER W WITH DIAERESIS
+ (#xBF ?\u1E61) ;; LATIN SMALL LETTER S WITH DOT ABOVE
+ (#xD0 ?\u0174) ;; LATIN CAPITAL LETTER W WITH CIRCUMFLEX
+ (#xD7 ?\u1E6A) ;; LATIN CAPITAL LETTER T WITH DOT ABOVE
+ (#xDE ?\u0176) ;; LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
+ (#xF0 ?\u0175) ;; LATIN SMALL LETTER W WITH CIRCUMFLEX
+ (#xF7 ?\u1E6B) ;; LATIN SMALL LETTER T WITH DOT ABOVE
+ (#xFE ?\u0177)) ;; LATIN SMALL LETTER Y WITH CIRCUMFLEX
+ "ISO-8859-14 (Latin-8)"
+ '(mnemonic "Latin 8"
+ aliases (iso-latin-8 latin-8)))
+
;; The syntax table code for ISO 8859-15 and ISO 8859-16 requires that the
;; guillemets not have parenthesis syntax, which they used to have in the
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/mule-coding.el Sun Dec 28 14:46:24 2008 +0000
@@ -104,6 +104,7 @@
'(charset-g0 ascii
charset-g1 latin-iso8859-1
eol-type nil
+ safe-charsets t ;; Reasonable
mnemonic "CText"))
(make-coding-system
@@ -113,6 +114,9 @@
charset-g1 latin-iso8859-1
charset-g2 t ;; unspecified but can be used later.
short t
+ safe-charsets (ascii katakana-jisx0201 japanese-jisx0208-1978
+ japanese-jisx0208 japanese-jisx0212 japanese-jisx0213-1
+ japanese-jisx0213-2)
mnemonic "ISO8/SS"
documentation "ISO 2022 based 8-bit encoding using SS2 for 96-charset"
))
@@ -124,6 +128,7 @@
charset-g2 t ;; unspecified but can be used later.
seven t
short t
+ safe-charsets t
mnemonic "ISO7/SS"
documentation "ISO 2022 based 7-bit encoding using SS2 for 96-charset"
eol-type nil))
@@ -136,6 +141,7 @@
charset-g2 t ;; unspecified but can be used later.
seven t
short t
+ safe-charsets t
mnemonic "ISO7/SS"
eol-type nil))
@@ -145,6 +151,7 @@
'(charset-g0 ascii
seven t
short t
+ safe-charsets t
mnemonic "ISO7"
documentation "ISO-2022-based 7-bit encoding using only G0"
))
@@ -158,6 +165,7 @@
'(charset-g0 ascii
charset-g1 latin-iso8859-1
short t
+ safe-charsets t
mnemonic "ISO8"
documentation "ISO-2022 eight-bit coding system. No single-shift or
locking-shift."
))
@@ -169,6 +177,7 @@
charset-g1 latin-iso8859-1
eol-type lf
escape-quoted t
+ safe-charsets t
mnemonic "ESC/Quot"
documentation "ISO-2022 eight-bit coding system with escape quoting; used for
.ELC files."
))
@@ -180,6 +189,7 @@
charset-g1 t ;; unspecified but can be used later.
seven t
lock-shift t
+ safe-charsets t
mnemonic "ISO7/Lock"
documentation "ISO-2022 coding system using Locking-Shift for 96-charset."
))
@@ -574,14 +584,14 @@
(goto-char begin buffer)
(skip-chars-forward skip-chars-arg end buffer)
(while (< (point buffer) end)
- (message
- "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
- fail-range-start previous-fail (point buffer) end)
+ ; (message
+ ; "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+ ; fail-range-start previous-fail (point buffer) end)
(setq char-after (char-after (point buffer) buffer)
fail-range-start (point buffer))
- (message "arguments are %S %S"
- (< (point buffer) end)
- (not (gethash (encode-char char-after 'ucs) from-unicode)))
+ ; (message "arguments are %S %S"
+ ; (< (point buffer) end)
+ ; (not (gethash (encode-char char-after 'ucs) from-unicode)))
(while (and
(< (point buffer) end)
(not (gethash (encode-char char-after 'ucs) from-unicode)))
@@ -593,7 +603,7 @@
;; system; check the characters past it.
(forward-char 1 buffer)
;; The character actually failed.
- (message "past the move through, point now %S" (point buffer))
+ ; (message "past the move through, point now %S" (point buffer))
(when errorp
(error 'text-conversion-error
(format "Cannot encode %s using coding system"
@@ -608,12 +618,12 @@
(point-max buffer)))
t ranges)
(when highlightp
- (message "highlighting")
+ ; (message "highlighting")
(setq extent (make-extent fail-range-start fail-range-end buffer))
(set-extent-priority extent (+ mouse-highlight-priority 2))
(set-extent-face extent 'query-coding-warning-face))
(skip-chars-forward skip-chars-arg end buffer)))
- (message "about to give the result, ranges %S" ranges)
+ ; (message "about to give the result, ranges %S" ranges)
(if failed
(values nil ranges)
(values t nil)))))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/thai-xtis.el
--- a/lisp/mule/thai-xtis.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/thai-xtis.el Sun Dec 28 14:46:24 2008 +0000
@@ -355,6 +355,7 @@
`(mnemonic "TIS620"
decode ccl-decode-thai-xtis
encode ccl-encode-thai-xtis
+ safe-charsets (ascii thai-xtis)
documentation "external=tis620, internal=thai-xtis"))
(coding-system-put 'tis-620 'category 'iso-8-1))
(make-coding-system
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/unicode.el
--- a/lisp/unicode.el Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/unicode.el Sun Dec 28 14:46:24 2008 +0000
@@ -626,7 +626,7 @@
(let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
(ranges (make-range-table))
(looking-at-arg (concat "[" skip-chars-arg "]"))
- fail-range-start fail-range-end previous-fail char-after failed
+ fail-range-start fail-range-end char-after failed
extent)
(save-excursion
(when highlightp
@@ -638,8 +638,8 @@
(skip-chars-forward skip-chars-arg end buffer)
(while (< (point buffer) end)
; (message
-; "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
-; fail-range-start previous-fail (point buffer) end)
+; "fail-range-start is %S, point is %S, end is %S"
+; fail-range-start (point buffer) end)
(setq char-after (char-after (point buffer) buffer)
fail-range-start (point buffer))
(while (and
@@ -647,7 +647,6 @@
(not (looking-at looking-at-arg))
(= -1 (char-to-unicode char-after)))
(forward-char 1 buffer)
- (message "what?!?")
(setq char-after (char-after (point buffer) buffer)
failed t))
(if (= fail-range-start (point buffer))
diff -r 84d618b355f5 -r 1d74a1d115ee src/ChangeLog
--- a/src/ChangeLog Sat Aug 09 13:15:09 2008 +0200
+++ b/src/ChangeLog Sun Dec 28 14:46:24 2008 +0000
@@ -1,3 +1,17 @@
+2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * file-coding.c (Fmake_coding_system):
+ Document our use of the safe-chars and safe-charsets properties,
+ and the differences compared to GNU.
+ (make_coding_system_1): Don't drop the safe-chars and
+ safe-charsets properties.
+ (Fcoding_system_property): Return the safe-chars and safe-charsets
+ properties when asked for them.
+ * file-coding.h (CODING_SYSTEM_SAFE_CHARSETS):
+ * coding-system-slots.h:
+ Make the safe-chars and safe-charsets slots available in these
+ headers.
+
2008-08-05 Aidan Kehoe <kehoea(a)parhasard.net>
* mule-charset.c (complex_vars_of_mule_charset):
diff -r 84d618b355f5 -r 1d74a1d115ee src/coding-system-slots.h
--- a/src/coding-system-slots.h Sat Aug 09 13:15:09 2008 +0200
+++ b/src/coding-system-slots.h Sun Dec 28 14:46:24 2008 +0000
@@ -105,6 +105,10 @@
coding system). */
MARKED_SLOT (canonical)
+ MARKED_SLOT (safe_charsets)
+
+ MARKED_SLOT (safe_chars)
+
#undef MARKED_SLOT
#undef MARKED_SLOT_ARRAY
#undef CODING_SYSTEM_SLOT_DECLARATION
diff -r 84d618b355f5 -r 1d74a1d115ee src/file-coding.c
--- a/src/file-coding.c Sat Aug 09 13:15:09 2008 +0200
+++ b/src/file-coding.c Sun Dec 28 14:46:24 2008 +0000
@@ -1125,9 +1125,9 @@
else if (EQ (key, Qtranslation_table_for_encode))
;
else if (EQ (key, Qsafe_chars))
- ;
+ CODING_SYSTEM_SAFE_CHARS (cs) = value;
else if (EQ (key, Qsafe_charsets))
- ;
+ CODING_SYSTEM_SAFE_CHARSETS (cs) = value;
else if (EQ (key, Qmime_charset))
;
else if (EQ (key, Qvalid_codes))
@@ -1326,20 +1326,7 @@
`translation-table-for-encode'
The value is a translation table to be applied on encoding. This is
not applicable to CCL-based coding systems.
-
-`safe-chars'
- The value is a char table. If a character has non-nil value in it,
- the character is safely supported by the coding system. This
- overrides the specification of safe-charsets.
-
-`safe-charsets'
- The value is a list of charsets safely supported by the coding
- system. The value t means that all charsets Emacs handles are
- supported. Even if some charset is not in this list, it doesn't
- mean that the charset can't be encoded in the coding system;
- it just means that some other receiver of text encoded
- in the coding system won't be able to handle that charset.
-
+
`mime-charset'
The value is a symbol of which name is `MIME-charset' parameter of
the coding system.
@@ -1350,7 +1337,27 @@
In the former case, the integer value is a valid byte code. In the
latter case, the integers specifies the range of valid byte codes.
-
+The following properties are used by `default-query-coding-region',
+the default implementation of `query-coding-region'. This
+implementation and these properties are not used by the Unicode coding
+systems, nor by those CCL coding systems created with
+`make-8-bit-coding-system'.
+
+`safe-chars'
+ The value is a char table. If a character has non-nil value in it,
+ the character is safely supported by the coding system.
+ Under XEmacs, for the moment, this is used in addition to the
+ `safe-charsets' property. It does not override it as it does
+ under GNU Emacs. #### We need to consider if we should keep this
+ behaviour.
+
+`safe-charsets'
+ The value is a list of charsets safely supported by the coding
+ system. For coding systems based on ISO 2022, XEmacs may try to
+ encode characters outside these character sets, but outside of
+ East Asia and East Asian coding systems, it is unlikely that
+ consumers of the data will understand XEmacs' encoding.
+ The value t means that all XEmacs character sets handles are supported.
The following additional property is recognized if TYPE is `convert-eol':
@@ -1862,6 +1869,10 @@
return XCODING_SYSTEM_POST_READ_CONVERSION (coding_system);
else if (EQ (prop, Qpre_write_conversion))
return XCODING_SYSTEM_PRE_WRITE_CONVERSION (coding_system);
+ else if (EQ (prop, Qsafe_charsets))
+ return XCODING_SYSTEM_SAFE_CHARSETS (coding_system);
+ else if (EQ (prop, Qsafe_chars))
+ return XCODING_SYSTEM_SAFE_CHARS (coding_system);
else
{
Lisp_Object value = CODESYSMETH_OR_GIVEN (XCODING_SYSTEM (coding_system),
diff -r 84d618b355f5 -r 1d74a1d115ee src/file-coding.h
--- a/src/file-coding.h Sat Aug 09 13:15:09 2008 +0200
+++ b/src/file-coding.h Sun Dec 28 14:46:24 2008 +0000
@@ -583,6 +583,8 @@
#define CODING_SYSTEM_AUTO_EOL_WRAPPER(codesys) ((codesys)->auto_eol_wrapper)
#define CODING_SYSTEM_SUBSIDIARY_PARENT(codesys) ((codesys)->subsidiary_parent)
#define CODING_SYSTEM_CANONICAL(codesys) ((codesys)->canonical)
+#define CODING_SYSTEM_SAFE_CHARSETS(codesys) ((codesys)->safe_charsets)
+#define CODING_SYSTEM_SAFE_CHARS(codesys) ((codesys)->safe_chars)
#define CODING_SYSTEM_CHAIN_CHAIN(codesys) \
(CODING_SYSTEM_TYPE_DATA (codesys, chain)->chain)
@@ -623,6 +625,10 @@
CODING_SYSTEM_SUBSIDIARY_PARENT (XCODING_SYSTEM (codesys))
#define XCODING_SYSTEM_CANONICAL(codesys) \
CODING_SYSTEM_CANONICAL (XCODING_SYSTEM (codesys))
+#define XCODING_SYSTEM_SAFE_CHARSETS(codesys) \
+ CODING_SYSTEM_SAFE_CHARSETS (XCODING_SYSTEM (codesys))
+#define XCODING_SYSTEM_SAFE_CHARS(codesys) \
+ CODING_SYSTEM_SAFE_CHARS (XCODING_SYSTEM (codesys))
#define XCODING_SYSTEM_CHAIN_CHAIN(codesys) \
CODING_SYSTEM_CHAIN_CHAIN (XCODING_SYSTEM (codesys))
diff -r 84d618b355f5 -r 1d74a1d115ee tests/ChangeLog
--- a/tests/ChangeLog Sat Aug 09 13:15:09 2008 +0200
+++ b/tests/ChangeLog Sun Dec 28 14:46:24 2008 +0000
@@ -1,3 +1,9 @@
+2008-12-28 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * automated/query-coding-tests.el:
+ New file, testing the functionality of #'query-coding-region and
+ #'query-coding-string.
+
2008-05-21 Aidan Kehoe <kehoea(a)parhasard.net>
* automated/mule-tests.el (featurep):
diff -r 84d618b355f5 -r 1d74a1d115ee tests/automated/query-coding-tests.el
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/automated/query-coding-tests.el Sun Dec 28 14:46:24 2008 +0000
@@ -0,0 +1,293 @@
+;; Copyright (C) 2008 Free Software Foundation, Inc. -*- coding: iso-8859-1 -*-
+
+;; Author: Aidan Kehoe <kehoea(a)parhasard.net>
+;; Maintainer: Aidan Kehoe <kehoea(a)parhasard.net>
+;; Created: 2008
+;; Keywords: tests, query-coding-region
+
+;; This file is part of XEmacs.
+
+;; XEmacs is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+
+;; XEmacs is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with XEmacs; see the file COPYING. If not, write to the Free
+;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+;; 02111-1307, USA.
+
+;;; Synched up with: Not in FSF.
+
+;;; Commentary:
+
+;; Test the query-coding-region and query-coding-string implementations for
+;; some well-known coding systems.
+
+(require 'bytecomp)
+
+(defun q-c-debug (&rest aerger)
+ (let ((standard-output (get-buffer-create "query-coding-debug"))
+ (fmt (condition-case nil
+ (and (stringp (first aerger))
+ (apply #'format aerger))
+ (error nil))))
+ (if fmt
+ (progn
+ (princ (apply #'format aerger))
+ (terpri))
+ (princ "--> ")
+ (let ((i 1))
+ (dolist (sgra aerger)
+ (if (> i 1) (princ " "))
+ (princ (format "%d. " i))
+ (prin1 sgra)
+ (incf i))
+ (terpri)))))
+
+;; Comment this out if debugging:
+(defalias 'q-c-debug #'ignore)
+
+(when (featurep 'mule)
+ (let ((ascii-chars-string (apply #'string
+ (loop for i from #x0 to #x7f
+ collect (int-to-char i))))
+ (latin-1-chars-string (apply #'string
+ (loop for i from #x0 to #xff
+ collect (int-to-char i))))
+ unix-coding-system text-conversion-error-signalled)
+ (with-temp-buffer
+ (insert ascii-chars-string)
+ ;; First, check all the coding systems that are ASCII-transparent for
+ ;; ASCII-transparency in the check.
+ (dolist (coding-system
+ (delete-duplicates
+ (mapcar #'(lambda (coding-system)
+ (unless (coding-system-alias-p coding-system)
+ ;; We're only interested in the version with
+ ;; Unix line endings right now.
+ (setq unix-coding-system
+ (subsidiary-coding-system
+ (coding-system-base coding-system) 'lf))
+ (when (and
+ ;; ASCII-transparent
+ (equal ascii-chars-string
+ (encode-coding-string
+ ascii-chars-string
+ unix-coding-system))
+ (not
+ (memq (coding-system-type
+ unix-coding-system)
+ '(undecided chain))))
+ unix-coding-system)))
+ (coding-system-list nil))
+:test #'eq))
+ (q-c-debug "looking at coding system %S" (coding-system-name
+ coding-system))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) coding-system)
+ (q-c-debug "checking type, coding-system, q-c-s, q-c-t %S"
+ (list (coding-system-type coding-system)
+ coding-system query-coding-succeeded
+ query-coding-table))
+ (unless (and (eq t query-coding-succeeded)
+ (null query-coding-table))
+ (q-c-debug "(eq t query-coding-succeeded) %S, (\
+null query-coding-table) %S" (eq t query-coding-succeeded)
+ (null query-coding-table)))
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (q-c-debug "testing the ASCII strings for %S" coding-system)
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-string ascii-chars-string coding-system)
+ (unless (and (eq t query-coding-succeeded)
+ (null query-coding-table))
+ (q-c-debug "(eq t query-coding-succeeded) %S, (\
+null query-coding-table) %S" (eq t query-coding-succeeded)
+ (null query-coding-table)))
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table))))
+ (q-c-debug "past the loop through the coding systems")
+ (delete-region (point-min) (point-max))
+ ;; Check for success from the two Latin-1 coding systems
+ (insert latin-1-chars-string)
+ (q-c-debug "point is now %S" (point))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (q-c-debug "point is now %S" (point))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-string (buffer-string) 'iso-8859-1-unix)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (q-c-debug "point is now %S" (point))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (q-c-debug "point is now %S" (point))
+ ;; Make it fail, check that it fails correctly
+ (insert (decode-char 'ucs #x20AC)) ;; EURO SIGN
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
+ (unless (and (null query-coding-succeeded)
+ (equal query-coding-table
+ #s(range-table type start-closed-end-open data
+ ((257 258) t))))
+ (q-c-debug "dealing with %S" 'iso-8859-1-unix)
+ (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+ (Assert (null query-coding-succeeded))
+ (Assert (equal query-coding-table
+ #s(range-table type start-closed-end-open data
+ ((257 258) t)))))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max)
+ 'iso-latin-1-with-esc-unix)
+ ;; Stupidly, this succeeds. The behaviour is compatible with
+ ;; GNU, though, and we encourage people not to use
+ ;; iso-latin-1-with-esc-unix anyway:
+
+ (unless (and query-coding-succeeded
+ (null query-coding-table))
+ (q-c-debug "dealing with %S" 'iso-latin-1-with-esc-unix)
+ (q-c-debug "query-coding-succeeded %S, query-coding-table \
+%S" query-coding-succeeded query-coding-table))
+ (Assert query-coding-succeeded)
+ (Assert (null query-coding-table)))
+ ;; Check that it errors correctly.
+ (setq text-conversion-error-signalled nil)
+ (condition-case nil
+ (query-coding-region (point-min) (point-max) 'iso-8859-1-unix nil t)
+ (text-conversion-error
+ (setq text-conversion-error-signalled t)))
+ (Assert text-conversion-error-signalled)
+ (setq text-conversion-error-signalled nil)
+ (condition-case nil
+ (query-coding-region (point-min) (point-max)
+ 'iso-latin-1-with-esc-unix nil t)
+ (text-conversion-error
+ (setq text-conversion-error-signalled t)))
+ (Assert (null text-conversion-error-signalled))
+ (delete-region (point-min) (point-max))
+ (insert latin-1-chars-string)
+ (decode-coding-region (point-min) (point-max) 'windows-1252-unix)
+ (goto-char (point-max)) ;; #'decode-coding-region just messed up point.
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (insert ?\x80)
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+ (unless (and (null query-coding-succeeded)
+ (equal query-coding-table
+ #s(range-table type start-closed-end-open data
+ ((257 258) t))))
+ (q-c-debug "dealing with %S" 'windows-1252-unix)
+ (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+ (Assert (null query-coding-succeeded))
+ (Assert (equal query-coding-table
+ #s(range-table type start-closed-end-open data
+ ((257 258) t)))))
+ ;; Try a similar approach with koi8-o, the koi8 variant with
+ ;; support for Old Church Slavonic.
+ (delete-region (point-min) (point-max))
+ (insert latin-1-chars-string)
+ (decode-coding-region (point-min) (point-max) 'koi8-o-unix)
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'koi8-o-unix)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'escape-quoted)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+ (unless (and (null query-coding-succeeded)
+ (equal query-coding-table
+ #s(range-table type start-closed-end-open
+ data ((129 131) t (132 133) t
+ (139 140) t (141 146) t
+ (155 156) t (157 161) t
+ (162 170) t (173 176) t
+ (178 187) t (189 192) t
+ (193 257) t))))
+ (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+ (Assert (null query-coding-succeeded))
+ (Assert (equal query-coding-table
+ #s(range-table type start-closed-end-open
+ data ((129 131) t (132 133) t (139 140) t
+ (141 146) t (155 156) t (157 161) t
+ (162 170) t (173 176) t (178 187) t
+ (189 192) t (193 257) t)))))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) 'koi8-r-unix)
+ (Assert (null query-coding-succeeded))
+ (Assert (equal query-coding-table
+ #s(range-table type start-closed-end-open
+ data ((129 154) t (155 161) t (162 164) t
+ (165 177) t (178 180) t
+ (181 192) t)))))
+ ;; Check that the Unicode coding systems handle characters
+ ;; without Unicode mappings.
+ (delete-region (point-min) (point-max))
+ (insert latin-1-chars-string)
+ (decode-coding-region (point-min) (point-max) 'greek-iso-8bit-with-esc)
+ (dolist (coding-system
+ '(utf-16-mac ucs-4-mac utf-16-little-endian-bom-dos ucs-4-dos
+ utf-16-little-endian-mac utf-16-bom-unix
+ utf-16-little-endian ucs-4 utf-16-dos
+ ucs-4-little-endian-dos utf-16-bom-mac utf-16-bom
+ utf-16-unix utf-32-unix utf-32-little-endian
+ utf-32-dos utf-32 utf-32-little-endian-dos utf-8-bom
+ utf-16-bom-dos ucs-4-unix
+ utf-16-little-endian-bom-unix utf-8-bom-mac
+ utf-32-little-endian-unix utf-16
+ utf-16-little-endian-dos utf-16-little-endian-bom-mac
+ utf-8-bom-dos ucs-4-little-endian-mac utf-8-bom-unix
+ utf-32-little-endian-mac utf-8-dos utf-8-unix
+ utf-32-mac utf-8-mac utf-16-little-endian-unix
+ ucs-4-little-endian ucs-4-little-endian-unix utf-8
+ utf-16-little-endian-bom))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) (point-max) coding-system)
+ (Assert (null query-coding-succeeded))
+ (Assert (equal query-coding-table
+ #s(range-table type start-closed-end-open data
+ ((173 174) t (209 210) t
+ (254 255) t)))))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region (point-min) 173 coding-system)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region 174 209 coding-system)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ (multiple-value-bind (query-coding-succeeded query-coding-table)
+ (query-coding-region 210 254 coding-system)
+ (Assert (eq t query-coding-succeeded))
+ (Assert (null query-coding-table)))
+ ;; Check that it errors correctly.
+ (setq text-conversion-error-signalled nil)
+ (condition-case nil
+ (query-coding-region (point-min) (point-max) coding-system nil t)
+ (text-conversion-error
+ (setq text-conversion-error-signalled t)))
+ (Assert text-conversion-error-signalled)
+ (setq text-conversion-error-signalled nil)
+ (condition-case nil
+ (query-coding-region (point-min) 173 coding-system nil t)
+ (text-conversion-error
+ (setq text-conversion-error-signalled t)))
+ (Assert (null text-conversion-error-signalled))))))
changeset: 4512:84d618b355f5
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sat Aug 09 13:15:09 2008 +0200
summary: 2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
diff -r 26aae3bacf99 -r 84d618b355f5 lisp/ChangeLog
--- a/lisp/ChangeLog Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/ChangeLog Sat Aug 09 13:15:09 2008 +0200
@@ -1,3 +1,9 @@
+2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * mule/mule-coding.el (make-8-bit-coding-system):
+ * mule/general-late.el (posix-charset-to-coding-system-hash):
+ Use #'skip-chars-quote as appropriate.
+
2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
* subr.el (skip-chars-quote): New.
@@ -6,8 +12,6 @@
#'skip-chars-backward.
2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
-
- * subr.el (skip-chars-quote): New.
* mule/cyril-util.el: Remove. Use the version in packages instead.
diff -r 26aae3bacf99 -r 84d618b355f5 lisp/mule/general-late.el
--- a/lisp/mule/general-late.el Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/mule/general-late.el Sat Aug 09 13:15:09 2008 +0200
@@ -90,7 +90,7 @@
(setq skip-chars-string
(concat skip-chars-string
(charset-skip-chars-string charset))))
- finally return skip-chars-string))))
+ finally return (skip-chars-quote skip-chars-string)))))
;; At this point in the dump, all the charsets have been loaded. Now, load
;; their Unicode mappings.
diff -r 26aae3bacf99 -r 84d618b355f5 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/mule/mule-coding.el Sat Aug 09 13:15:09 2008 +0200
@@ -699,7 +699,8 @@
(coding-system-put name 'category
(make-8-bit-choose-category decode-table))
(coding-system-put name '8-bit-fixed-query-skip-chars
- (apply #'string (append decode-table nil)))
+ (skip-chars-quote
+ (apply #'string (append decode-table nil))))
(coding-system-put name '8-bit-fixed-query-from-unicode encode-table)
(coding-system-put name 'query-coding-function
@@ -786,7 +787,8 @@
(coding-system-put ',name 'category
',(make-8-bit-choose-category decode-table))
(coding-system-put ',name '8-bit-fixed-query-skip-chars
- ',(apply #'string (append decode-table nil)))
+ ',(skip-chars-quote
+ (apply #'string (append decode-table nil))))
(coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
(coding-system-put ',name 'query-coding-function
#'8-bit-fixed-query-coding-region)
changeset: 4510:31344162cf9a
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sat Aug 09 13:06:24 2008 +0200
summary: Add #'skip-chars-quote to subr.el
diff -r 89406c31b125 -r 31344162cf9a lisp/ChangeLog
--- a/lisp/ChangeLog Sat Aug 09 12:13:19 2008 +0200
+++ b/lisp/ChangeLog Sat Aug 09 13:06:24 2008 +0200
@@ -1,4 +1,13 @@
2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * subr.el (skip-chars-quote): New.
+ Given STRING, return a string that means that all characters in
+ STRING will be skipped when passed to #'skip-chars-forward,
+ #'skip-chars-backward.
+
+2008-08-09 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * subr.el (skip-chars-quote): New.
* mule/cyril-util.el: Remove. Use the version in packages instead.
diff -r 89406c31b125 -r 31344162cf9a lisp/subr.el
--- a/lisp/subr.el Sat Aug 09 12:13:19 2008 +0200
+++ b/lisp/subr.el Sat Aug 09 13:06:24 2008 +0200
@@ -1722,4 +1722,25 @@
;; define-mail-user-agent is in simple.el.
+;; XEmacs; added.
+(defun skip-chars-quote (string)
+ "Return a string that means all characters in STRING will be skipped,
+if passed to `skip-chars-forward' or `skip-chars-backward'.
+
+Ranges and carets are not treated specially. This implementation is
+in Lisp; do not use it in performance-critical code."
+ (let ((list (delete-duplicates (string-to-list string) :test #'=)))
+ (when (equal list '((?- ?\[) (?\[ ?\-)))
+ (error 'invalid-argument
+ "Cannot create `skip-chars-forward' arg from string"
+ string))
+ (when (memq ?\] list)
+ (setq list (cons ?\] (delq ?\] list))))
+ (when (eq ?^ (car list))
+ (setq list (nconc (cdr list) '(?^))))
+ (when (memq ?- list)
+ (setq list (delq ?- list)
+ list (nconc list (list (second list) ?- (second list) ?-))))
+ (apply #'string list)))
+
;;; subr.el ends here
changeset: 4478:bd1a68c34d44
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Wed May 21 21:49:19 2008 +0200
summary: Merge my change of 2008-05-14 to the query-coding-region code.
diff -r d9fcb5442c95 -r bd1a68c34d44 lisp/ChangeLog
--- a/lisp/ChangeLog Wed May 21 21:47:42 2008 +0200
+++ b/lisp/ChangeLog Wed May 21 21:49:19 2008 +0200
@@ -1,3 +1,8 @@
+2008-05-21 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * mule/mule-coding.el (make-8-bit-choose-category):
+ Merge my change of 2008-05-14 to the query-coding-region code.
+
2008-05-14 Stephen J. Turnbull <stephen(a)xemacs.org>
* subr.el (add-to-list): Fix Aidan's last commit.
diff -r d9fcb5442c95 -r bd1a68c34d44 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el Wed May 21 21:47:42 2008 +0200
+++ b/lisp/mule/mule-coding.el Wed May 21 21:49:19 2008 +0200
@@ -531,7 +531,7 @@
(check-argument-range (length decode-table) #x100 #x100)
(loop
named category
- for i from #x80 to #xBF
+ for i from #x80 to #x9F
do (unless (= i (aref decode-table i))
(return-from category 'no-conversion))
finally return 'iso-8-1))
changeset: 4453:20c32e489235
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sun May 11 19:50:10 2008 +0200
summary: Add #'query-coding-clear-highlights.
diff -r 4953b7353349 -r 20c32e489235 lisp/ChangeLog
--- a/lisp/ChangeLog Sat May 03 13:09:06 2008 +0200
+++ b/lisp/ChangeLog Sun May 11 19:50:10 2008 +0200
@@ -1,3 +1,10 @@
+2008-05-11 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * coding.el (query-coding-clear-highlights):
+ New function--clear any face information added by
+ `query-coding-region'.
+ (default-query-coding-region): Use it.
+
2008-04-13 Henry S. Thompson <ht(a)inf.ed.ac.uk>, Mike Sperber
<mike(a)xemacs.org>
* window-xemacs.el (save-window-excursion/mapping,
diff -r 4953b7353349 -r 20c32e489235 lisp/coding.el
--- a/lisp/coding.el Sat May 03 13:09:06 2008 +0200
+++ b/lisp/coding.el Sun May 11 19:50:10 2008 +0200
@@ -286,6 +286,20 @@
#s(hash-table test equal data ())
"A map from list of charsets to `skip-chars-forward' arguments for
them.")
+(defsubst query-coding-clear-highlights (begin end &optional buffer)
+ "Remove extent faces added by `query-coding-region' between BEGIN and END.
+
+Optional argument BUFFER is the buffer to use, and defaults to the current
+buffer.
+
+The HIGHLIGHTP argument to `query-coding-region' indicates that it should
+display unencodable characters using `query-coding-warning-face'. After
+this function has been called, this will no longer be the case. "
+ (map-extents #'(lambda (extent ignored-arg)
+ (when (eq 'query-coding-warning-face
+ (extent-face extent))
+ (delete-extent extent))) buffer begin end))
+
(defun default-query-coding-region (begin end coding-system
&optional buffer errorp highlightp)
"The default `query-coding-region' implementation.
@@ -319,10 +333,7 @@
safe-charsets "")
default-query-coding-region-safe-charset-skip-chars-map)))
(when highlightp
- (map-extents #'(lambda (extent ignored-arg)
- (when (eq 'query-coding-warning-face
- (extent-face extent))
- (delete-extent extent))) buffer begin end))
+ (query-coding-clear-highlights begin end buffer))
(if (and (zerop (length skip-chars-arg)) (null safe-chars))
(progn
;; Uh-oh, nothing known about this coding system. Fail.
@@ -384,7 +395,7 @@
(values nil ranges)
(values t nil))))))
-(defsubst query-coding-region (start end coding-system &optional buffer
+(defun query-coding-region (start end coding-system &optional buffer
errorp highlight)
"Work out whether CODING-SYSTEM can losslessly encode a region.
changeset: 4443:75654496fa0e
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Sat May 03 13:08:54 2008 +0200
summary: Correct a docstring
diff -r 9c1cfceab252 -r 75654496fa0e lisp/coding.el
--- a/lisp/coding.el Thu Mar 13 10:24:34 2008 +0100
+++ b/lisp/coding.el Sat May 03 13:08:54 2008 +0200
@@ -398,7 +398,7 @@
Optional argument HIGHLIGHT says to display unencodable characters in the
region using `query-coding-warning-face'. It defaults to nil.
-This function returns a list; the intention is that callers use use
+This function returns a list; the intention is that callers use
`multiple-value-bind' or the related CL multiple value functions to deal
with it. The first element is `t' if the string can be encoded using
CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string
changeset: 4413:6812571bfcb9
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Thu Mar 13 10:21:01 2008 +0100
summary: Fix some bugs.
diff -r 1217f19ce196 -r 6812571bfcb9 lisp/coding.el
--- a/lisp/coding.el Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/coding.el Thu Mar 13 10:21:01 2008 +0100
@@ -299,8 +299,13 @@
(check-argument-type #'integer-or-marker-p begin)
(check-argument-type #'integer-or-marker-p end)
(let* ((safe-charsets
- (coding-system-get coding-system 'safe-charsets))
- (safe-chars (coding-system-get coding-system 'safe-chars))
+ (or (coding-system-get coding-system 'safe-charsets)
+ (coding-system-get (coding-system-base coding-system)
+ 'safe-charsets)))
+ (safe-chars
+ (or (coding-system-get coding-system 'safe-chars)
+ (coding-system-get (coding-system-base coding-system)
+ 'safe-chars)))
(skip-chars-arg
(gethash safe-charsets
default-query-coding-region-safe-charset-skip-chars-map))
@@ -313,6 +318,11 @@
(mapconcat #'charset-skip-chars-string
safe-charsets "")
default-query-coding-region-safe-charset-skip-chars-map)))
+ (when highlightp
+ (map-extents #'(lambda (extent ignored-arg)
+ (when (eq 'query-coding-warning-face
+ (extent-face extent))
+ (delete-extent extent))) buffer begin end))
(if (and (zerop (length skip-chars-arg)) (null safe-chars))
(progn
;; Uh-oh, nothing known about this coding system. Fail.
diff -r 1217f19ce196 -r 6812571bfcb9 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/mule/mule-coding.el Thu Mar 13 10:21:01 2008 +0100
@@ -553,15 +553,24 @@
(check-argument-type #'integer-or-marker-p begin)
(check-argument-type #'integer-or-marker-p end)
(let ((from-unicode
- (coding-system-get coding-system '8-bit-fixed-query-from-unicode))
+ (or (coding-system-get coding-system '8-bit-fixed-query-from-unicode)
+ (coding-system-get (coding-system-base coding-system)
+ '8-bit-fixed-query-from-unicode)))
(skip-chars-arg
- (coding-system-get coding-system '8-bit-fixed-query-skip-chars))
+ (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars)
+ (coding-system-get (coding-system-base coding-system)
+ '8-bit-fixed-query-skip-chars)))
(ranges (make-range-table))
char-after fail-range-start fail-range-end previous-fail extent
failed)
(check-type from-unicode hash-table)
(check-type skip-chars-arg string)
(save-excursion
+ (when highlightp
+ (map-extents #'(lambda (extent ignored-arg)
+ (when (eq 'query-coding-warning-face
+ (extent-face extent))
+ (delete-extent extent))) buffer begin end))
(goto-char begin buffer)
(skip-chars-forward skip-chars-arg end buffer)
(while (< (point buffer) end)
@@ -588,7 +597,7 @@
(when errorp
(error 'text-conversion-error
(format "Cannot encode %s using coding system"
- (buffer-substring fail-range-start (point buffeR)
+ (buffer-substring fail-range-start (point buffer)
buffer))
(coding-system-name coding-system)))
(put-range-table fail-range-start
@@ -603,8 +612,8 @@
(setq extent (make-extent fail-range-start fail-range-end buffer))
(set-extent-priority extent (+ mouse-highlight-priority 2))
(set-extent-face extent 'query-coding-warning-face))
- (skip-chars-forward skip-chars-arg end buffer))
- (message "about to give the result, ranges %S" ranges))
+ (skip-chars-forward skip-chars-arg end buffer)))
+ (message "about to give the result, ranges %S" ranges)
(if failed
(values nil ranges)
(values t nil)))))
diff -r 1217f19ce196 -r 6812571bfcb9 lisp/unicode.el
--- a/lisp/unicode.el Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/unicode.el Thu Mar 13 10:21:01 2008 +0100
@@ -624,15 +624,20 @@
(let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
(ranges (make-range-table))
(looking-at-arg (concat "[" skip-chars-arg "]"))
- fail-range-start fail-range-end previous-fail char-after
- failed extent)
+ fail-range-start fail-range-end previous-fail char-after failed
+ extent)
(save-excursion
+ (when highlightp
+ (map-extents #'(lambda (extent ignored-arg)
+ (when (eq 'query-coding-warning-face
+ (extent-face extent))
+ (delete-extent extent))) buffer begin end))
(goto-char begin buffer)
(skip-chars-forward skip-chars-arg end buffer)
(while (< (point buffer) end)
- (message
- "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
- fail-range-start previous-fail (point buffer) end)
+; (message
+; "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+; fail-range-start previous-fail (point buffer) end)
(setq char-after (char-after (point buffer) buffer)
fail-range-start (point buffer))
(while (and
@@ -646,7 +651,7 @@
(if (= fail-range-start (point buffer))
;; The character can actually be encoded by the coding
;; system; check the characters past it.
- (forward-char 1 buffer)
+ (forward-char 1 buffer)
;; Can't be encoded; note this.
(when errorp
(error 'text-conversion-error
changeset: 4403:68d1ca56cffa
user: Aidan Kehoe <kehoea(a)parhasard.net>
date: Mon Jan 21 22:51:21 2008 +0100
summary: First part of interactive checks that coding systems encode regions.
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/ChangeLog
--- a/lisp/ChangeLog Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/ChangeLog Mon Jan 21 22:51:21 2008 +0100
@@ -1,3 +1,50 @@
+2008-01-21 Aidan Kehoe <kehoea(a)parhasard.net>
+
+ * coding.el (decode-coding-string):
+ (encode-coding-string): Accept GNU's NOCOPY argument for
+ these. Todo; write compiler macros to use it.
+ (query-coding-warning-face): New face, to show unencodable
+ characters.
+ (default-query-coding-region-safe-charset-skip-chars-map):
+ New variable, a cache used by #'default-query-coding-region.
+ (default-query-coding-region): Default implementation of
+ #'query-coding-region, using the safe-charsets and safe-chars
+ coding systemproperties.
+ (query-coding-region): New function; can a given coding system
+ encode a given region?
+ (query-coding-string): New function; can a given coding system
+ encode a given string?
+ (unencodable-char-position): Function API taken from GNU; return
+ the first unencodable position given a string and coding system.
+ (encode-coding-char): Function API taken from GNU; return CHAR
+ encoded using CODING-SYSTEM, or nil if CODING-SYSTEM would trash
+ CHAR.
+ ((unless (featurep 'mule)): Override the default
+ query-coding-region implementation on non-Mule.
+ * mule/mule-coding.el (make-8-bit-generate-helper): Eliminate a
+ duplicate comment.
+ (make-8-bit-choose-category): Simplify implementation.
+ (8-bit-fixed-query-coding-region): Implementation of
+ #'query-coding-region for coding systems created with
+ #'make-8-bit-coding-system.
+ (make-8-bit-coding-system): Initialise the #'query-coding-region
+ implementation for these character sets.
+ (make-8-bit-coding-system): Ditto for the compiler macro version
+ of this function.
+ * unicode.el (unicode-query-coding-skip-chars-arg): New variable,
+ used by unicode-query-coding-region, initialised in
+ mule/general-late.el.
+ (unicode-query-coding-region): New function, the
+ #'query-coding-region implementation for Unicode coding systems.
+ Initialise the query-coding-function property for the Unicode
+ coding systems to #'unicode-query-coding-region.
+ * mule/mule-charset.el (charset-skip-chars-string): New
+ function. Return a #'skip-chars-forward argument that skips all
+ characters in CHARSET.
+ (map-charset-chars): Function synced from GNU, modified to work
+ with XEmacs. Map FUNC across the int value charset ranges of
+ CHARSET.
+
2008-01-17 Mike Sperber <mike(a)xemacs.org>
* files.el (insert-directory): Bind `coding-system-for-read' to
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/coding.el
--- a/lisp/coding.el Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/coding.el Mon Jan 21 22:51:21 2008 +0100
@@ -125,15 +125,20 @@
(interactive "r\nP")
(princ (detect-coding-region start end)))
-(defun decode-coding-string (str coding-system)
+(defun decode-coding-string (str coding-system &optional nocopy)
"Decode the string STR which is encoded in CODING-SYSTEM.
-Does not modify STR. Returns the decoded string on successful conversion."
+Normally does not modify STR. Returns the decoded string on
+successful conversion.
+Optional argument NOCOPY says that modifying STR and returning it is
+allowed."
(with-string-as-buffer-contents
str (decode-coding-region (point-min) (point-max) coding-system)))
-(defun encode-coding-string (str coding-system)
+(defun encode-coding-string (str coding-system &optional nocopy)
"Encode the string STR using CODING-SYSTEM.
-Does not modify STR. Returns the encoded string on successful conversion."
+Does not modify STR. Returns the encoded string on successful conversion.
+Optional argument NOCOPY says that the original string may be returned
+if does not differ from the encoded string. "
(with-string-as-buffer-contents
str (encode-coding-region (point-min) (point-max) coding-system)))
@@ -274,4 +279,204 @@
(make-compatible-variable 'enable-multibyte-characters "Unimplemented")
+;; Sure would be nice to be able to use defface here.
+(copy-face 'highlight 'query-coding-warning-face)
+
+(defvar default-query-coding-region-safe-charset-skip-chars-map
+ #s(hash-table test equal data ())
+ "A map from list of charsets to `skip-chars-forward' arguments for
them.")
+
+(defun default-query-coding-region (begin end coding-system
+ &optional buffer errorp highlightp)
+ "The default `query-coding-region' implementation.
+
+Uses the `safe-charsets' and `safe-chars' coding system properties.
+The former is a list of XEmacs character sets that can be safely
+encoded by CODING-SYSTEM; the latter a char table describing, in
+addition, characters that can be safely encoded by CODING-SYSTEM."
+ (check-argument-type #'coding-system-p
+ (setq coding-system (find-coding-system coding-system)))
+ (check-argument-type #'integer-or-marker-p begin)
+ (check-argument-type #'integer-or-marker-p end)
+ (let* ((safe-charsets
+ (coding-system-get coding-system 'safe-charsets))
+ (safe-chars (coding-system-get coding-system 'safe-chars))
+ (skip-chars-arg
+ (gethash safe-charsets
+ default-query-coding-region-safe-charset-skip-chars-map))
+ (ranges (make-range-table))
+ fail-range-start fail-range-end previous-fail char-after
+ looking-at-arg failed extent)
+ (unless skip-chars-arg
+ (setq skip-chars-arg
+ (puthash safe-charsets
+ (mapconcat #'charset-skip-chars-string
+ safe-charsets "")
+ default-query-coding-region-safe-charset-skip-chars-map)))
+ (if (and (zerop (length skip-chars-arg)) (null safe-chars))
+ (progn
+ ;; Uh-oh, nothing known about this coding system. Fail.
+ (when errorp
+ (error 'text-conversion-error
+ "Coding system doesn't say what it can encode"
+ (coding-system-name coding-system)))
+ (put-range-table begin end t ranges)
+ (when highlightp
+ (setq extent (make-extent begin end buffer))
+ (set-extent-priority extent (+ mouse-highlight-priority 2))
+ (set-extent-face extent 'query-coding-warning-face))
+ (values nil ranges))
+ (setq looking-at-arg (if (equal "" skip-chars-arg)
+ ;; Regexp that will never match.
+ #r".\{0,0\}"
+ (concat "[" skip-chars-arg "]")))
+ (save-excursion
+ (goto-char begin buffer)
+ (skip-chars-forward skip-chars-arg end buffer)
+ (while (< (point buffer) end)
+ (message
+ "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+ fail-range-start previous-fail (point buffer) end)
+ (setq char-after (char-after (point buffer) buffer)
+ fail-range-start (point buffer))
+ (while (and
+ (< (point buffer) end)
+ (not (looking-at looking-at-arg))
+ (or (not safe-chars)
+ (not (get-char-table char-after safe-chars))))
+ (forward-char 1 buffer)
+ (setq char-after (char-after (point buffer) buffer)
+ failed t))
+ (if (= fail-range-start (point buffer))
+ ;; The character can actually be encoded by the coding
+ ;; system; check the characters past it.
+ (forward-char 1 buffer)
+ ;; Can't be encoded; note this.
+ (when errorp
+ (error 'text-conversion-error
+ (format "Cannot encode %s using coding system"
+ (buffer-substring fail-range-start (point buffer)
+ buffer))
+ (coding-system-name coding-system)))
+ (put-range-table fail-range-start
+ ;; If char-after is non-nil, we're not at
+ ;; the end of the buffer.
+ (setq fail-range-end (if char-after
+ (point buffer)
+ (point-max buffer)))
+ t ranges)
+ (when highlightp
+ (setq extent (make-extent fail-range-start fail-range-end buffer))
+ (set-extent-priority extent (+ mouse-highlight-priority 2))
+ (set-extent-face extent 'query-coding-warning-face)))
+ (skip-chars-forward skip-chars-arg end buffer))
+ (if failed
+ (values nil ranges)
+ (values t nil))))))
+
+(defsubst query-coding-region (start end coding-system &optional buffer
+ errorp highlight)
+ "Work out whether CODING-SYSTEM can losslessly encode a region.
+
+START and END are the beginning and end of the region to check.
+CODING-SYSTEM is the coding system to try.
+
+Optional argument BUFFER is the buffer to check, and defaults to the current
+buffer. Optional argument ERRORP says to signal a `text-conversion-error'
+if some character in the region cannot be encoded, and defaults to nil.
+
+Optional argument HIGHLIGHT says to display unencodable characters in the
+region using `query-coding-warning-face'. It defaults to nil.
+
+This function returns a list; the intention is that callers use use
+`multiple-value-bind' or the related CL multiple value functions to deal
+with it. The first element is `t' if the string can be encoded using
+CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string
+can be encoded using CODING-SYSTEM; otherwise, it is a range table
+describing the positions of the unencodable characters. See
+`make-range-table'."
+ (funcall (or (coding-system-get coding-system 'query-coding-function)
+ #'default-query-coding-region)
+ start end coding-system buffer errorp highlight))
+
+(defun query-coding-string (string coding-system &optional errorp highlight)
+ "Work out whether CODING-SYSTEM can losslessly encode STRING.
+CODING-SYSTEM is the coding system to check.
+
+Optional argument ERRORP says to signal a `text-conversion-error' if some
+character in the region cannot be encoded, and defaults to nil.
+
+Optional argument HIGHLIGHT says to display unencodable characters in the
+region using `query-coding-warning-face'. It defaults to nil.
+
+This function returns a list; the intention is that callers use use
+`multiple-value-bind' or the related CL multiple value functions to deal
+with it. The first element is `t' if the string can be encoded using
+CODING-SYSTEM, or `nil' if not. The second element is `nil' if the string
+can be encoded using CODING-SYSTEM; otherwise, it is a range table
+describing the positions of the unencodable characters. See
+`make-range-table'."
+ (with-temp-buffer
+ (insert string)
+ (query-coding-region (point-min) (point-max) coding-system (current-buffer)
+ ;; ### Will highlight work here?
+ errorp highlight)))
+
+(defun unencodable-char-position (start end coding-system
+ &optional count string)
+ "Return position of first un-encodable character in a region.
+START and END specify the region and CODING-SYSTEM specifies the
+encoding to check. Return nil if CODING-SYSTEM does encode the region.
+
+If optional 4th argument COUNT is non-nil, it specifies at most how
+many un-encodable characters to search. In this case, the value is a
+list of positions.
+
+If optional 5th argument STRING is non-nil, it is a string to search
+for un-encodable characters. In that case, START and END are indexes
+in the string."
+ (flet ((thunk ()
+ (multiple-value-bind (result ranges)
+ (query-coding-region start end coding-system)
+ (if result
+ ;; If query-coding-region thinks the entire region is
+ ;; encodable, result will be t, and the thunk should
+ ;; return nil, because there are no unencodable
+ ;; positions in the region.
+ nil
+ (if count
+ (block counted
+ (map-range-table
+ #'(lambda (begin end value)
+ (while (and (<= begin end) (<= begin count))
+ (push begin result)
+ (incf begin))
+ (if (> begin count) (return-from counted)))
+ ranges))
+ (map-range-table
+ #'(lambda (begin end value)
+ (while (<= begin end)
+ (push begin result)
+ (incf begin))) ranges))
+ result))))
+ (if string
+ (with-temp-buffer (insert string) (thunk))
+ (thunk))))
+
+(defun encode-coding-char (char coding-system)
+ "Encode CHAR by CODING-SYSTEM and return the resulting string.
+If CODING-SYSTEM can't safely encode CHAR, return nil."
+ (check-argument-type #'characterp char)
+ (multiple-value-bind (succeededp)
+ (query-coding-string char coding-system)
+ (when succeededp
+ (encode-coding-string char coding-system))))
+
+(unless (featurep 'mule)
+ ;; If we're under non-Mule, every XEmacs character can be encoded
+ ;; with every XEmacs coding system.
+ (fset #'default-query-coding-region
+ #'(lambda (&rest ignored) (values t nil)))
+ (unintern 'default-query-coding-region-safe-charset-skip-chars-map))
+
;;; coding.el ends here
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/general-late.el
--- a/lisp/mule/general-late.el Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/general-late.el Mon Jan 21 22:51:21 2008 +0100
@@ -63,7 +63,34 @@
(decode-coding-string
Installation-string
Installation-file-coding-system)
- Installation-string))
+ Installation-string)
+
+ ;; Convince the byte compiler that, really, this file can't be encoded
+ ;; as binary. Ugh.
+ system-type (symbol-value (intern "\u0073ystem-type"))
+
+ unicode-query-coding-skip-chars-arg
+ (eval-when-compile
+ (when-fboundp #'map-charset-chars
+ (loop
+ for charset in (charset-list)
+ with skip-chars-string = ""
+ do
+ (block no-ucs-mapping
+ (map-charset-chars
+ #'(lambda (begin end)
+ (loop
+ while (/= end begin)
+ do
+ (when (= -1 (char-to-unicode begin))
+ (setq this-charset-works nil)
+ (return-from no-ucs-mapping))
+ (setq begin (int-to-char (1+ begin)))))
+ charset)
+ (setq skip-chars-string
+ (concat skip-chars-string
+ (charset-skip-chars-string charset))))
+ finally return skip-chars-string))))
;; At this point in the dump, all the charsets have been loaded. Now, load
;; their Unicode mappings.
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/mule-charset.el
--- a/lisp/mule/mule-charset.el Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/mule-charset.el Mon Jan 21 22:51:21 2008 +0100
@@ -116,6 +116,65 @@
(defun charset-bytes (charset)
"Useless in XEmacs, returns 1."
1)
+
+(defun charset-skip-chars-string (charset)
+ "Given CHARSET, return a string suitable for for `skip-chars-forward'.
+Passing the string to `skip-chars-forward' will cause it to skip all
+characters in CHARSET."
+ (setq charset (get-charset charset))
+ (cond
+ ;; Aargh, the general algorithm doesn't work for these charsets, because
+ ;; make-char strips the high bit. Hard code them.
+ ((eq (find-charset 'ascii) charset) "\x00-\x7f")
+ ((eq (find-charset 'control-1) charset) "\x80-\x9f")
+ (t
+ (let (charset-lower charset-upper row-upper row-lower)
+ (if (= 1 (charset-dimension charset))
+ (condition-case args-out-of-range
+ (make-char charset #x100)
+ (args-out-of-range
+ (setq charset-lower (third args-out-of-range)
+ charset-upper (fourth args-out-of-range))
+ (format "%c-%c"
+ (make-char charset charset-lower)
+ (make-char charset charset-upper))))
+ (condition-case args-out-of-range
+ (make-char charset #x100 #x22)
+ (args-out-of-range
+ (setq row-lower (third args-out-of-range)
+ row-upper (fourth args-out-of-range))))
+ (condition-case args-out-of-range
+ (make-char charset #x22 #x100)
+ (args-out-of-range
+ (setq charset-lower (third args-out-of-range)
+ charset-upper (fourth args-out-of-range))))
+ (format "%c-%c"
+ (make-char charset row-lower charset-lower)
+ (make-char charset row-upper charset-upper)))))))
+;; From GNU.
+(defun map-charset-chars (func charset)
+ "Use FUNC to map over all characters in CHARSET for side effects.
+FUNC is a function of two args, the start and end (inclusive) of a
+character code range. Thus FUNC should iterate over [START, END]."
+ (check-argument-type #'functionp func)
+ (check-argument-type #'charsetp (setq charset (find-charset charset)))
+ (let* ((dim (charset-dimension charset))
+ (chars (charset-chars charset))
+ (start (if (= chars 94)
+ 33
+ 32)))
+ (if (= dim 1)
+ (cond
+ ((eq (find-charset 'ascii) charset) (funcall func ?\x00 ?\x7f))
+ ((eq (find-charset 'control-1) charset) (funcall func ?\x80 ?\x9f))
+ (t
+ (funcall func
+ (make-char charset start)
+ (make-char charset (+ start chars -1)))))
+ (dotimes (i chars)
+ (funcall func
+ (make-char charset (+ i start) start)
+ (make-char charset (+ i start) (+ start chars -1)))))))
;;;; Define setf methods for all settable Charset properties
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/mule-coding.el Mon Jan 21 22:51:21 2008 +0100
@@ -238,8 +238,6 @@
(if (r0 == ,(charset-id 'ascii))
(write r1)
((if (r0 == #xABAB)
- ;; #xBFFE is a sentinel in the compiled
- ;; program.
;; #xBFFE is a sentinel in the compiled
;; program.
((r0 = r1 & #x7F)
@@ -531,12 +529,85 @@
disk to XEmacs characters for some fixed-width 8-bit coding system. "
(check-argument-type #'vectorp decode-table)
(check-argument-range (length decode-table) #x100 #x100)
- (block category
- (loop
- for i from #x80 to #xBF
- do (unless (= i (aref decode-table i))
- (return-from category 'no-conversion)))
- 'iso-8-1))
+ (loop
+ named category
+ for i from #x80 to #xBF
+ do (unless (= i (aref decode-table i))
+ (return-from category 'no-conversion))
+ finally return 'iso-8-1))
+
+(defun 8-bit-fixed-query-coding-region (begin end coding-system
+ &optional buffer errorp highlightp)
+ "The `query-coding-region' implementation for 8-bit-fixed coding systems.
+
+Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars'
+coding system properties. The former is a hash table mapping from valid
+Unicode code points to on-disk octets in the coding system; the latter a set
+of characters as used by `skip-chars-forward'. Both of these properties are
+generated automatically by `make-8-bit-coding-system'.
+
+See that the documentation of `query-coding-region'; see also
+`make-8-bit-coding-system'. "
+ (check-argument-type #'coding-system-p
+ (setq coding-system (find-coding-system coding-system)))
+ (check-argument-type #'integer-or-marker-p begin)
+ (check-argument-type #'integer-or-marker-p end)
+ (let ((from-unicode
+ (coding-system-get coding-system '8-bit-fixed-query-from-unicode))
+ (skip-chars-arg
+ (coding-system-get coding-system '8-bit-fixed-query-skip-chars))
+ (ranges (make-range-table))
+ char-after fail-range-start fail-range-end previous-fail extent
+ failed)
+ (check-type from-unicode hash-table)
+ (check-type skip-chars-arg string)
+ (save-excursion
+ (goto-char begin buffer)
+ (skip-chars-forward skip-chars-arg end buffer)
+ (while (< (point buffer) end)
+ (message
+ "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+ fail-range-start previous-fail (point buffer) end)
+ (setq char-after (char-after (point buffer) buffer)
+ fail-range-start (point buffer))
+ (message "arguments are %S %S"
+ (< (point buffer) end)
+ (not (gethash (encode-char char-after 'ucs) from-unicode)))
+ (while (and
+ (< (point buffer) end)
+ (not (gethash (encode-char char-after 'ucs) from-unicode)))
+ (forward-char 1 buffer)
+ (setq char-after (char-after (point buffer) buffer)
+ failed t))
+ (if (= fail-range-start (point buffer))
+ ;; The character can actually be encoded by the coding
+ ;; system; check the characters past it.
+ (forward-char 1 buffer)
+ ;; The character actually failed.
+ (message "past the move through, point now %S" (point buffer))
+ (when errorp
+ (error 'text-conversion-error
+ (format "Cannot encode %s using coding system"
+ (buffer-substring fail-range-start (point buffeR)
+ buffer))
+ (coding-system-name coding-system)))
+ (put-range-table fail-range-start
+ ;; If char-after is non-nil, we're not at
+ ;; the end of the buffer.
+ (setq fail-range-end (if char-after
+ (point buffer)
+ (point-max buffer)))
+ t ranges)
+ (when highlightp
+ (message "highlighting")
+ (setq extent (make-extent fail-range-start fail-range-end buffer))
+ (set-extent-priority extent (+ mouse-highlight-priority 2))
+ (set-extent-face extent 'query-coding-warning-face))
+ (skip-chars-forward skip-chars-arg end buffer))
+ (message "about to give the result, ranges %S" ranges))
+ (if failed
+ (values nil ranges)
+ (values t nil)))))
;;;###autoload
(defun make-8-bit-coding-system (name unicode-map &optional description props)
@@ -618,13 +689,27 @@
(coding-system-put name '8-bit-fixed t)
(coding-system-put name 'category
(make-8-bit-choose-category decode-table))
+ (coding-system-put name '8-bit-fixed-query-skip-chars
+ (apply #'string (append decode-table nil)))
+ (coding-system-put name '8-bit-fixed-query-from-unicode encode-table)
+
+ (coding-system-put name 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put (intern (format "%s-unix" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put (intern (format "%s-dos" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put (intern (format "%s-mac" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
(loop for alias in aliases
do (define-coding-system-alias alias name))
result))
(define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map
&optional description props)
-
;; We provide the compiler macro (= macro that is expanded only on
;; compilation, and that can punt to a runtime version of the
;; associate function if necessary) not for reasons of speed, though
@@ -674,8 +759,9 @@
;; (invalid-read-syntax "Multiply defined symbol label" 1)
;;
;; when the file is byte compiled.
- (case-fold-search t))
- (define-translation-hash-table encode-table-sym ,encode-table)
+ (case-fold-search t)
+ (encode-table ,encode-table))
+ (define-translation-hash-table encode-table-sym encode-table)
(make-coding-system
',name 'ccl ,description
(plist-put (plist-put ',props 'decode
@@ -688,8 +774,22 @@
(symbol-value 'encode-table-sym)))
',encode-program))))
(coding-system-put ',name '8-bit-fixed t)
- (coding-system-put ',name 'category ',
- (make-8-bit-choose-category decode-table))
+ (coding-system-put ',name 'category
+ ',(make-8-bit-choose-category decode-table))
+ (coding-system-put ',name '8-bit-fixed-query-skip-chars
+ ',(apply #'string (append decode-table nil)))
+ (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
+ (coding-system-put ',name 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put ',(intern (format "%s-unix" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put ',(intern (format "%s-dos" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
+ (coding-system-put ',(intern (format "%s-mac" name))
+ 'query-coding-function
+ #'8-bit-fixed-query-coding-region)
,(macroexpand `(loop for alias in ',aliases
do (define-coding-system-alias alias
',name)))
@@ -703,4 +803,3 @@
'(mnemonic "Latin 1"
documentation "The most used encoding of Western Europe and the Americas."
aliases (iso-latin-1 latin-1)))
-
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/unicode.el
--- a/lisp/unicode.el Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/unicode.el Mon Jan 21 22:51:21 2008 +0100
@@ -611,6 +611,71 @@
(translate-region start finish table))
begin end buffer))
+(defvar unicode-query-coding-skip-chars-arg nil ;; Set in general-late.el
+ "Used by `unicode-query-coding-region' to skip chars with known
mappings.")
+
+(defun unicode-query-coding-region (begin end coding-system
+ &optional buffer errorp highlightp)
+ "The `query-coding-region' implementation for Unicode coding systems."
+ (check-argument-type #'coding-system-p
+ (setq coding-system (find-coding-system coding-system)))
+ (check-argument-type #'integer-or-marker-p begin)
+ (check-argument-type #'integer-or-marker-p end)
+ (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
+ (ranges (make-range-table))
+ (looking-at-arg (concat "[" skip-chars-arg "]"))
+ fail-range-start fail-range-end previous-fail char-after
+ failed extent)
+ (save-excursion
+ (goto-char begin buffer)
+ (skip-chars-forward skip-chars-arg end buffer)
+ (while (< (point buffer) end)
+ (message
+ "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+ fail-range-start previous-fail (point buffer) end)
+ (setq char-after (char-after (point buffer) buffer)
+ fail-range-start (point buffer))
+ (while (and
+ (< (point buffer) end)
+ (not (looking-at looking-at-arg))
+ (= -1 (char-to-unicode char-after)))
+ (forward-char 1 buffer)
+ (message "what?!?")
+ (setq char-after (char-after (point buffer) buffer)
+ failed t))
+ (if (= fail-range-start (point buffer))
+ ;; The character can actually be encoded by the coding
+ ;; system; check the characters past it.
+ (forward-char 1 buffer)
+ ;; Can't be encoded; note this.
+ (when errorp
+ (error 'text-conversion-error
+ (format "Cannot encode %s using coding system"
+ (buffer-substring fail-range-start (point buffer)
+ buffer))
+ (coding-system-name coding-system)))
+ (put-range-table fail-range-start
+ ;; If char-after is non-nil, we're not at
+ ;; the end of the buffer.
+ (setq fail-range-end (if char-after
+ (point buffer)
+ (point-max buffer)))
+ t ranges)
+ (when highlightp
+ (setq extent (make-extent fail-range-start fail-range-end buffer))
+ (set-extent-priority extent (+ mouse-highlight-priority 2))
+ (set-extent-face extent 'query-coding-warning-face)))
+ (skip-chars-forward skip-chars-arg end buffer))
+ (if failed
+ (values nil ranges)
+ (values t nil)))))
+
+(loop
+ for coding-system in (coding-system-list)
+ do (when (eq 'unicode (coding-system-type coding-system))
+ (coding-system-put coding-system 'query-coding-function
+ #'unicode-query-coding-region)))
+
(unless (featurep 'mule)
;; We do this in such a roundabout way--instead of having the above defun
;; and defvar calls inside a (when (featurep 'mule) ...) form--to have
--
¿Dónde estará ahora mi sobrino Yoghurtu Nghé, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches