New subject: [COMMIT] Re: Add #'query-coding-region, #'query-coding-string.

Sunday, 28 December 2008

This is the development of
http://mid.gmane.org/18509.34907.513276.700290＠parhasard.net . The patch as
shown in this mail is the output of hg outgoing -npM , which excludes
merges; I’ve attached a 120k compressed version of the patch with the
merges.

The speed of this implementation is not acceptable for interactive use. I
want to commit it soon despite that, because a) in its current version
interactive use does not involve the code in the patch and b) though I am
not particularly optimistic, it offers an opportunity for other people
besides me to move the #'query-coding-region implementations of the coding
systems into C. It certainly offers people the opportunity to examine the
architecture.

It includes some work on compatibility with GNU. I would like to add more,
but I look at comments like: 
	;; The text contains only ASCII characters.  Any coding
	;; systems are safe.
and I do not want to rush into implementing anything that excludes us
implementing EBCDIC or the GSM alphabet (which, for example, can’t encode
most of the control characters below #x20).

comparing with /Sources/xemacs-21.5-checked-out
searching for changes
changeset:   4559:e6a7054a9c30
tag:         tip
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sun Dec 28 22:51:14 2008 +0000
summary:     Add check-coding-systems-region, test it and others, fix some bugs.

diff -r 80e0588fb42f -r e6a7054a9c30 lisp/ChangeLog
--- a/lisp/ChangeLog	Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/ChangeLog	Sun Dec 28 22:51:14 2008 +0000
＠＠ -1,3 +1,28 ＠＠
+2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* coding.el (query-coding-region): 
+	(query-coding-string): 
+	Make these defsubsts, they're short enough and they're called
+	explicitly rarely enough that it makes some sense. The alternative
+	would be compiler macros that avoid the binding of the arguments. 
+	(unencodable-char-position): 
+	Document where the docstring and API are from. 
+	Correct a special case for zero--check-argument-type returns nil
+	when it succeeds, we can't usefully chain its result in an and
+	here. 
+	(check-coding-systems-region): New. API taken from GNU; docstring
+	and implementation are independent. 
+	(encode-coding-char): 
+	Add an optional third argument, as used by recent GNU. Document
+	the origen of the docstring. 
+	(default-query-coding-region): Add a short docstring to the
+	non-Mule implementation of this function. 
+	* unicode.el: 
+	Don't set the query-coding-function property for unicode coding
+	systems if we're on non-mule. Unintern
+	unicode-query-coding-region, unicode-query-coding-skip-chars-arg
+	in the same context. 
+
 2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* coding.el (default-query-coding-region): 
diff -r 80e0588fb42f -r e6a7054a9c30 lisp/coding.el
--- a/lisp/coding.el	Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/coding.el	Sun Dec 28 22:51:14 2008 +0000
＠＠ -398,7 +398,7 ＠＠
 	    (values nil ranges)
 	  (values t nil))))))

-(defun query-coding-region (start end coding-system &optional buffer
+(defsubst query-coding-region (start end coding-system &optional buffer
                                errorp highlight)
   "Work out whether CODING-SYSTEM can losslessly encode a region.

＠＠ -423,7 +423,7 ＠＠
                #'default-query-coding-region)
            start end coding-system buffer errorp highlight))

-(defun query-coding-string (string coding-system &optional errorp highlight)
+(defsubst query-coding-string (string coding-system &optional errorp highlight)
   "Work out whether CODING-SYSTEM can losslessly encode STRING.
 CODING-SYSTEM is the coding system to check.

＠＠ -446,6 +446,7 ＠＠
                          ;; ### Will highlight work here?
                          errorp highlight)))

+;; Function docstring and API are taken from GNU coding.c version 1.353, GPLv2. 
 (defun unencodable-char-position  (start end coding-system
                                    &optional count string) 
   "Return position of first un-encodable character in a region.
＠＠ -486,9 +487,9 ＠＠
     (check-argument-type #'integer-or-marker-p start)
     (check-argument-type #'integer-or-marker-p end)
     (check-coding-system coding-system)
-    (and count (check-argument-type #'natnump count)
-	 ;; Special-case zero, sigh. 
-	 (if (zerop count) (setq count 1)))
+    (when count (check-argument-type #'natnump count)
+	  ;; Special-case zero, sigh. 
+	  (if (zerop count) (setq count 1)))
     (and string (check-argument-type #'stringp string))
     (if string
 	(with-temp-buffer
＠＠ -496,9 +497,64 ＠＠
 	  (funcall thunk start end coding-system count))
       (funcall thunk start end coding-system count))))

-(defun encode-coding-char (char coding-system)
+;; XEmacs; this is a GPLv3 function in coding.c in GNU. This is why we have
+;; both a very divergent docstring and a very divergent implementation.
+(defun check-coding-systems-region (begin end coding-system-list)
+  "Can coding systems in CODING-SYSTEM-LIST encode text in a region?
+
+CODING-SYSTEM-LIST must be a list of coding systems.  BEGIN and END are
+normally buffer positions delimiting the region.  If some coding system in
+CODING-SYSTEM-LIST cannot encode the entire region, the return value of this
+function is an alist mapping coding system names to lists of individual
+buffer positions (not ranges) that the individual coding systems cannot
+encode.
+
+If all coding systems in CODING-SYSTEM-LIST can encode the region,
+this function returns t.  This conflicts with the documented, but not
+with the observed, GNU behavior.
+
+If BEGIN is a string, `check-coding-systems-region' ignores END, and checks
+whether the coding systems can encode BEGIN.  The alist that is returned
+uses zero-based string indices, not one-based buffer positions.
+
+This function is for GNU compatibility.  See also `query-coding-region'."
+  (let ((thunk
+	 #'(lambda (begin end coding-system-list stringp)
+	     (loop
+               for coding-system in coding-system-list
+               with result = nil
+               with intermediate = nil
+               with range-lambda = (if stringp
+                                       #'(lambda (begin end value)
+                                           (while (< begin end)
+                                             (push (1- begin) intermediate)
+                                             (incf begin)))
+                                     #'(lambda (begin end value)
+                                         (while (< begin end)
+                                           (push begin intermediate)
+                                           (incf begin))))
+               do (setq coding-system (check-coding-system coding-system))
+               (multiple-value-bind (encoded ranges)
+		   (query-coding-region begin end coding-system)
+                 (unless encoded
+                   (setq intermediate (list (coding-system-name coding-system)))
+                   (map-range-table range-lambda ranges)
+                   (push (nreverse intermediate) result)))
+               finally return (or result t)))))
+  (if (stringp begin)
+      (with-temp-buffer
+	(insert begin)
+	(funcall thunk (point-min) (point-max) coding-system-list t))
+    (check-argument-type #'integer-or-marker-p begin)
+    (check-argument-type #'integer-or-marker-p end)
+    (funcall thunk begin end coding-system-list nil))))
+
+;; XEmacs; docstring taken from GNU, international/mule-cmds.el, revision
+;; 1.311, GPLv2. 
+(defun encode-coding-char (char coding-system &optional charset)
   "Encode CHAR by CODING-SYSTEM and return the resulting string.
-If CODING-SYSTEM can't safely encode CHAR, return nil."
+If CODING-SYSTEM can't safely encode CHAR, return nil.
+The optional third argument CHARSET is, for the moment, ignored."
   (check-argument-type #'characterp char)
   (multiple-value-bind (succeededp)
       (query-coding-string char coding-system)
＠＠ -509,7 +565,9 ＠＠
   ;; If we're under non-Mule, every XEmacs character can be encoded
   ;; with every XEmacs coding system.
   (fset #'default-query-coding-region
-	#'(lambda (&rest ignored) (values t nil)))
+	#'(lambda (&rest ignored)
+	    "Stub `query-coding-region' implementation. Always succeeds."
+	    (values t nil)))
   (unintern 'default-query-coding-region-safe-charset-skip-chars-map))

 ;;; coding.el ends here
diff -r 80e0588fb42f -r e6a7054a9c30 lisp/unicode.el
--- a/lisp/unicode.el	Sun Dec 28 14:55:02 2008 +0000
+++ b/lisp/unicode.el	Sun Dec 28 22:51:14 2008 +0000
＠＠ -678,6 +678,7 ＠＠

 (loop
   for coding-system in (coding-system-list)
+  initially (unless (featurep 'mule) (return))
   do (when (eq 'unicode (coding-system-type coding-system))
        (coding-system-put coding-system 'query-coding-function
 			  #'unicode-query-coding-region)))
＠＠ -691,7 +692,8 ＠＠
   (mapcar #'unintern
           '(ccl-encode-to-ucs-2 unicode-error-default-translation-table
             unicode-invalid-regexp-range frob-unicode-errors-region
-            unicode-error-translate-region)))
+            unicode-error-translate-region unicode-query-coding-region
+            unicode-query-coding-skip-chars-arg)))

 ;; #### UTF-7 is not yet implemented, and it's tricky to do.  There's
 ;; an implementation in appendix A.1 of the Unicode Standard, Version
diff -r 80e0588fb42f -r e6a7054a9c30 tests/ChangeLog
--- a/tests/ChangeLog	Sun Dec 28 14:55:02 2008 +0000
+++ b/tests/ChangeLog	Sun Dec 28 22:51:14 2008 +0000
＠＠ -1,3 +1,10 ＠＠
+2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* automated/query-coding-tests.el: 
+	Add tests for #'unencodable-char-position,
+	#'check-coding-systems-region, #'encode-coding-char. Remove some
+	debugging statements.
+
 2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* automated/query-coding-tests.el: 
diff -r 80e0588fb42f -r e6a7054a9c30 tests/automated/query-coding-tests.el
--- a/tests/automated/query-coding-tests.el	Sun Dec 28 14:55:02 2008 +0000
+++ b/tests/automated/query-coding-tests.el	Sun Dec 28 22:51:14 2008 +0000
＠＠ -91,58 +91,31 ＠＠
                                                   coding-system))
         (multiple-value-bind (query-coding-succeeded query-coding-table)
             (query-coding-region (point-min) (point-max) coding-system)
-          (q-c-debug "checking type, coding-system, q-c-s, q-c-t %S"
-                     (list (coding-system-type coding-system)
-                           coding-system query-coding-succeeded
-                           query-coding-table))
-          (unless (and (eq t query-coding-succeeded)
-                       (null query-coding-table))
-            (q-c-debug "(eq t query-coding-succeeded) %S, (\
-null query-coding-table) %S" (eq t query-coding-succeeded)
-                             (null query-coding-table)))
           (Assert (eq t query-coding-succeeded))
           (Assert (null query-coding-table)))
-        (q-c-debug "testing the ASCII strings for %S" coding-system)
         (multiple-value-bind (query-coding-succeeded query-coding-table)
             (query-coding-string ascii-chars-string coding-system)
-          (unless (and (eq t query-coding-succeeded)
-                       (null query-coding-table))
-            (q-c-debug "(eq t query-coding-succeeded) %S, (\
-null query-coding-table) %S" (eq t query-coding-succeeded)
-                             (null query-coding-table)))
           (Assert (eq t query-coding-succeeded))
           (Assert (null query-coding-table))))
-      (q-c-debug "past the loop through the coding systems")
       (delete-region (point-min) (point-max))
       ;; Check for success from the two Latin-1 coding systems 
       (insert latin-1-chars-string)
-      (q-c-debug "point is now %S" (point))
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
         (Assert (eq t query-coding-succeeded))
         (Assert (null query-coding-table)))
-      (q-c-debug "point is now %S" (point))
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-string (buffer-string) 'iso-8859-1-unix)
         (Assert (eq t query-coding-succeeded))
         (Assert (null query-coding-table)))
-      (q-c-debug "point is now %S" (point))
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix)
         (Assert (eq t query-coding-succeeded))
         (Assert (null query-coding-table)))
-      (q-c-debug "point is now %S" (point))
       ;; Make it fail, check that it fails correctly
       (insert (decode-char 'ucs #x20AC)) ;; EURO SIGN
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
-        (unless (and (null query-coding-succeeded)
-                     (equal query-coding-table
-                            #s(range-table type start-closed-end-open data
-                                           ((257 258) t))))
-          (q-c-debug "dealing with %S" 'iso-8859-1-unix)
-          (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
         (Assert (null query-coding-succeeded))
         (Assert (equal query-coding-table
                        #s(range-table type start-closed-end-open data
＠＠ -153,12 +126,6 ＠＠
         ;; Stupidly, this succeeds. The behaviour is compatible with
         ;; GNU, though, and we encourage people not to use
         ;; iso-latin-1-with-esc-unix anyway:
-
-        (unless (and query-coding-succeeded
-                     (null query-coding-table))
-          (q-c-debug "dealing with %S" 'iso-latin-1-with-esc-unix)
-          (q-c-debug "query-coding-succeeded %S, query-coding-table \
-%S" query-coding-succeeded query-coding-table))
         (Assert query-coding-succeeded)
         (Assert (null query-coding-table)))
       ;; Check that it errors correctly. 
＠＠ -186,13 +153,6 ＠＠
       (insert ?\x80)
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-region (point-min) (point-max) 'windows-1252-unix)
-        (unless (and (null query-coding-succeeded)
-                     (equal query-coding-table
-                            #s(range-table type start-closed-end-open data
-                                           ((257 258) t))))
-          (q-c-debug "dealing with %S" 'windows-1252-unix)
-          (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
         (Assert (null query-coding-succeeded))
         (Assert (equal query-coding-table
                        #s(range-table type start-closed-end-open data
＠＠ -212,17 +172,6 ＠＠
         (Assert (null query-coding-table)))
       (multiple-value-bind (query-coding-succeeded query-coding-table)
           (query-coding-region (point-min) (point-max) 'windows-1252-unix)
-        (unless (and (null query-coding-succeeded)
-                     (equal query-coding-table
-                            #s(range-table type start-closed-end-open
-                                           data ((129 131) t (132 133) t
-                                                 (139 140) t (141 146) t
-                                                 (155 156) t (157 161) t
-                                                 (162 170) t (173 176) t
-                                                 (178 187) t (189 192) t
-                                                 (193 257) t))))
-          (q-c-debug "query-coding-succeeded not null, query-coding-table \
-%S" query-coding-table))
         (Assert (null query-coding-succeeded))
         (Assert (equal query-coding-table
                        #s(range-table type start-closed-end-open
＠＠ -290,4 +239,68 ＠＠
             (query-coding-region (point-min) 173 coding-system nil t)
           (text-conversion-error
            (setq text-conversion-error-signalled t)))
-        (Assert (null text-conversion-error-signalled))))))
+        (Assert (null text-conversion-error-signalled)))
+
+      ;; Now to test #'encode-coding-char. Most of the functionality was
+      ;; tested in the query-coding-region tests above, so we don't go into
+      ;; as much detail here.
+      (Assert (null (encode-coding-char
+                     (decode-char 'ucs #x20ac) 'iso-8859-1)))
+      (Assert (equal "\x80" (encode-coding-char 
+                             (decode-char 'ucs #x20ac) 'windows-1252)))
+      (delete-region (point-min) (point-max))
+
+      ;; And #'unencodable-char-position. 
+      (insert latin-1-chars-string)
+      (insert (decode-char 'ucs #x20ac))
+      (Assert (= 257 (unencodable-char-position (point-min) (point-max)
+                                                'iso-8859-1)))
+      (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 1)))
+      ;; Compatiblity, sigh: 
+      (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 0)))
+      (dotimes (i 6) (insert (decode-char 'ucs #x20ac)))
+      ;; Check if it stops at one:
+      (Assert (equal '(257) (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 1)))
+      ;; Check if it stops at four:
+      (Assert (equal '(260 259 258 257)
+                     (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 4)))
+      ;; Check whether it stops at seven: 
+      (Assert (equal '(263 262 261 260 259 258 257)
+                     (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 7)))
+      ;; Check that it still stops at seven:
+      (Assert (equal '(263 262 261 260 259 258 257)
+                     (unencodable-char-position (point-min) (point-max)
+                                                       'iso-8859-1 2000)))
+      ;; Now, #'check-coding-systems-region. 
+      ;; UTF-8 should certainly be able to encode these characters:
+      (Assert (eq t (check-coding-systems-region (point-min) (point-max)
+                                                 '(utf-8))))
+      (Assert (equal '((iso-8859-1 257 258 259 260 261 262 263)
+                       (windows-1252 129 131 132 133 134 135 136 137 138 139
+                                     140 141 143 146 147 148 149 150 151 152
+                                     153 154 155 156 157 159 160))
+                       (sort
+                        (check-coding-systems-region (point-min) (point-max)
+                                                     '(utf-8 iso-8859-1
+                                                       windows-1252))
+                        ;; (The sort is to make the algorithm irrelevant.)
+                        #'(lambda (left right)
+                            (string< (car left) (car right))))))
+      ;; Ensure that the indices are all decreased by one when passed a
+      ;; string:
+      (Assert (equal '((iso-8859-1 256 257 258 259 260 261 262)
+                       (windows-1252 128 130 131 132 133 134 135 136 137 138
+                                     139 140 142 145 146 147 148 149 150 151
+                                     152 153 154 155 156 158 159))
+                     (sort
+                      (check-coding-systems-region (buffer-string) nil
+                                                   '(utf-8 iso-8859-1
+                                                     windows-1252))
+                      #'(lambda (left right)
+                          (string< (car left) (car right)))))))))
+

changeset:   4513:1d74a1d115ee
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sun Dec 28 14:46:24 2008 +0000
summary:     Add #'query-coding-region tests; do the work necessary to get them
running.

diff -r 84d618b355f5 -r 1d74a1d115ee lisp/ChangeLog
--- a/lisp/ChangeLog	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/ChangeLog	Sun Dec 28 14:46:24 2008 +0000
＠＠ -1,3 +1,57 ＠＠
+2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* coding.el (default-query-coding-region): 
+	Declare using defun*, so we can #'return-from to it on
+	encountering a safe-charsets value of t. Comment out a few
+	debug messages. 
+	(query-coding-region): 
+	Correct the docstring, it deals with a region, not a string.
+	(unencodable-char-position): 
+	Correct the implementation for non-nil COUNT, special-case a zero
+	value for count, treat it as one. Don't rely on dynamic scope when
+	calling the main lambda.
+	* unicode.el (unicode-query-coding-region): 
+	Comment out some debug messages here. 
+	* mule/mule-coding.el (8-bit-fixed-query-coding-region): 
+	Comment out some debug messages here. 
+
+	* code-init.el (raw-text): 
+	Add a safe-charsets property to this coding system. 
+	* mule/korean.el (iso-2022-int-1): 
+	* mule/korean.el (euc-kr): 
+	* mule/korean.el (iso-2022-kr): 
+	Add safe-charsets properties for these coding systems. 
+	* mule/japanese.el (iso-2022-jp): 
+	* mule/japanese.el (jis7): 
+	* mule/japanese.el (jis8): 
+	* mule/japanese.el (shift-jis): 
+	* mule/japanese.el (iso-2022-jp-1978-irv): 
+	* mule/japanese.el (euc-jp): 
+	Add safe-charsets properties for all these coding systems. 
+	* mule/iso-with-esc.el: 
+	Add safe-charsets properties to all the coding systems in
+	here. Comment on the downside of a safe-charsets value of t for
+	iso-latin-1-with-esc.
+	* mule/hebrew.el (ctext-hebrew): 
+	Add a safe-charsets property for this coding system. 
+	* mule/devanagari.el (in-is13194-devanagari): 
+	Add a safe-charsets property for this coding system. 
+	* mule/chinese.el (cn-gb-2312): 
+	* mule/chinese.el (hz-gb-2312): 
+	* mule/chinese.el (big5): 
+	Add safe-charsets properties for these coding systems. 
+	* mule/latin.el (iso-8859-14): 
+	Add an implementation for this, using #'make-8-bit-coding-system.
+	* mule/mule-coding.el (ctext): 
+	* mule/mule-coding.el (iso-2022-8bit-ss2): 
+	* mule/mule-coding.el (iso-2022-7bit-ss2): 
+	* mule/mule-coding.el (iso-2022-jp-2): 
+	* mule/mule-coding.el (iso-2022-7bit): 
+	* mule/mule-coding.el (iso-2022-8): 
+	* mule/mule-coding.el (escape-quoted): 
+	* mule/mule-coding.el (iso-2022-lock): 
+	Add safe-charsets properties for all these coding systems. 
+	
 2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* mule/mule-coding.el (make-8-bit-coding-system): 
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/code-init.el
--- a/lisp/code-init.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/code-init.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -394,4 +394,6 ＠＠

 (reset-language-environment)

+(coding-system-put 'raw-text 'safe-charsets '(ascii control-1
latin-iso8859-1))
+
 ;;; code-init.el ends here
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/coding.el
--- a/lisp/coding.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/coding.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -300,8 +300,8 ＠＠
                              (extent-face extent))
                      (delete-extent extent))) buffer begin end))

-(defun default-query-coding-region (begin end coding-system
-				    &optional buffer errorp highlightp)
+(defun* default-query-coding-region (begin end coding-system
+				     &optional buffer errorp highlightp)
   "The default `query-coding-region' implementation.

 Uses the `safe-charsets' and `safe-chars' coding system properties.
＠＠ -324,8 +324,11 ＠＠
           (gethash safe-charsets
                    default-query-coding-region-safe-charset-skip-chars-map))
          (ranges (make-range-table))
-         fail-range-start fail-range-end previous-fail char-after
+         fail-range-start fail-range-end char-after
 	 looking-at-arg failed extent)
+    ;; Coding systems with a value of t for safe-charsets support everything.
+    (when (eq t safe-charsets)
+      (return-from default-query-coding-region (values t nil)))
     (unless skip-chars-arg
       (setq skip-chars-arg
 	    (puthash safe-charsets
＠＠ -355,9 +358,9 ＠＠
 	(goto-char begin buffer)
 	(skip-chars-forward skip-chars-arg end buffer)
 	(while (< (point buffer) end)
-	  (message
-	   "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
-	   fail-range-start previous-fail (point buffer) end)
+	  ; (message
+	  ; "fail-range-start is %S, point is %S, end is %S"
+	  ;  fail-range-start (point buffer) end)
 	  (setq char-after (char-after (point buffer) buffer)
 		fail-range-start (point buffer))
 	  (while (and
＠＠ -411,8 +414,8 ＠＠

 This function returns a list; the intention is that callers use 
 `multiple-value-bind' or the related CL multiple value functions to deal
-with it.  The first element is `t' if the string can be encoded using
-CODING-SYSTEM, or `nil' if not.  The second element is `nil' if the string
+with it.  The first element is `t' if the region can be encoded using
+CODING-SYSTEM, or `nil' if not.  The second element is `nil' if the region
 can be encoded using CODING-SYSTEM; otherwise, it is a range table
 describing the positions of the unencodable characters. See
 `make-range-table'."
＠＠ -456,33 +459,42 ＠＠
 If optional 5th argument STRING is non-nil, it is a string to search
 for un-encodable characters.  In that case, START and END are indexes
 in the string."
-  (flet ((thunk ()
-	   (multiple-value-bind (result ranges)
-	       (query-coding-region start end coding-system)
-	     (if result
-		 ;; If query-coding-region thinks the entire region is
-		 ;; encodable, result will be t, and the thunk should
-		 ;; return nil, because there are no unencodable
-		 ;; positions in the region.
-                 nil
-               (if count 
-                   (block counted
-                     (map-range-table
-                      #'(lambda (begin end value)
-                          (while (and (<= begin end) (<= begin count))
-                            (push begin result)
-                            (incf begin))
-                          (if (> begin count) (return-from counted)))
-                      ranges))
-                 (map-range-table
-                  #'(lambda (begin end value)
-		      (while (<= begin end)
-			(push begin result)
-			(incf begin))) ranges))
-	       result))))
+  (let ((thunk
+	 #'(lambda (start end coding-system &optional count)
+	     (multiple-value-bind (result ranges)
+		 (query-coding-region start end coding-system)
+	       (if result
+		   nil
+		 (block worked-it-all-out
+		   (if count
+		       (map-range-table
+			#'(lambda (begin end value)
+			    (while (and (< begin end)
+					(< (length result) count))
+			      (push begin result)
+			      (incf begin))
+			    (when (= (length result) count)
+			      (return-from worked-it-all-out result)))
+			ranges)
+		     (map-range-table
+		      #'(lambda (begin end value)
+			  (return-from worked-it-all-out begin))
+		      ranges))
+		   (assert (not (null count)) t
+			   "We should never reach this point with null COUNT.")
+		   result))))))
+    (check-argument-type #'integer-or-marker-p start)
+    (check-argument-type #'integer-or-marker-p end)
+    (check-coding-system coding-system)
+    (and count (check-argument-type #'natnump count)
+	 ;; Special-case zero, sigh. 
+	 (if (zerop count) (setq count 1)))
+    (and string (check-argument-type #'stringp string))
     (if string
-	(with-temp-buffer (insert string) (thunk))
-      (thunk))))
+	(with-temp-buffer
+	  (insert string)
+	  (funcall thunk start end coding-system count))
+      (funcall thunk start end coding-system count))))

 (defun encode-coding-char (char coding-system)
   "Encode CHAR by CODING-SYSTEM and return the resulting string.
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/chinese.el
--- a/lisp/mule/chinese.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/chinese.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -157,6 +157,7 ＠＠
    charset-g1 chinese-gb2312
    charset-g2 chinese-sisheng
    charset-g3 t
+   safe-charsets (ascii chinese-gb2312 chinese-sisheng)
    mnemonic "Zh-GB/EUC"
    documentation
    "Chinese EUC (Extended Unix Code), the standard Chinese encoding on Unix.
＠＠ -190,6 +191,7 ＠＠
  "Hz/ZW (Chinese)"
  '(mnemonic "Zh-GB/Hz"
    eol-type lf
+   safe-charsets (ascii chinese-gb2312)
    post-read-conversion post-read-decode-hz
    pre-write-conversion pre-write-encode-hz
    documentation "Hz/ZW 7-bit encoding for Chinese GB2312 (MIME:HZ-GB-2312)"
＠＠ -259,6 +261,7 ＠＠
  'big5 'big5
  "Big5"
  '(mnemonic "Zh/Big5"
+   safe-charsets (ascii chinese-big5-1 chinese-big5-2)
    documentation
    "A non-modal encoding formed by five large Taiwanese companies
 \(hence \"Big5\") to produce a character set and encoding for
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/devanagari.el
--- a/lisp/mule/devanagari.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/devanagari.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -50,6 +50,7 ＠＠
    charset-g2 t
    charset-g3 t
    mnemonic "In-13194"
+   safe-charsets (ascii indian-is13194)
    documentation
    "8-bit encoding for ASCII (MSB=0) and IS13194-Devanagari (MSB=1)"
    safe-charsets (ascii indian-is13194)
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/hebrew.el
--- a/lisp/mule/hebrew.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/hebrew.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -92,6 +92,7 ＠＠
    charset-g1 hebrew-iso8859-8
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii hebrew-iso8859-8)
    mnemonic "CText/Hbrw"
    ))

diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/iso-with-esc.el
--- a/lisp/mule/iso-with-esc.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/iso-with-esc.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -28,6 +28,10 ＠＠

 ;;; Code:

+;; It is not particularly reasonable that iso-latin-1-with-esc has a
+;; value of t for the safe-charsets property. We discourage its use,
+;; though, and this behaviour is compatible with GNU.
+
 ;;;###autoload
 (define-coding-system-alias 'iso-latin-1-with-esc 'iso-2022-8)

＠＠ -38,6 +42,7 ＠＠
    charset-g1 latin-iso8859-2
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii latin-iso8859-2)
    mnemonic "MIME/Ltn-2"))

 ;;;###autoload
＠＠ -47,6 +52,7 ＠＠
    charset-g1 latin-iso8859-3
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii latin-iso8859-3)
    mnemonic "MIME/Ltn-3"))

 ;;;###autoload
＠＠ -56,6 +62,7 ＠＠
    charset-g1 latin-iso8859-4
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii latin-iso8859-4)
    mnemonic "MIME/Ltn-4"))

 ;;;###autoload
＠＠ -63,6 +70,7 ＠＠
  'iso-latin-9-with-esc 'iso2022
   "ISO 4873 conforming 8-bit code (ASCII + Latin 9; aka Latin-1 with Euro)"
   '(mnemonic "MIME/Ltn-9"		; bletch
+    safe-charsets (ascii latin-iso8859-15)
     eol-type nil
     charset-g0 ascii
     charset-g1 latin-iso8859-15
＠＠ -76,6 +84,7 ＠＠
    charset-g1 latin-iso8859-9
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii latin-iso8859-9)
    mnemonic "MIME/Ltn-5"))

 ;;;###autoload
＠＠ -86,6 +95,7 ＠＠
    charset-g1 cyrillic-iso8859-5
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii cyrillic-iso8859-5)
    mnemonic "ISO8/Cyr"))

 ;;;###autoload
＠＠ -97,6 +107,7 ＠＠
    charset-g2 t
   charset-g3 t
    no-iso6429 t
+   safe-charsets (ascii hebrew-iso8859-8)
    mnemonic "MIME/Hbrw"))

 ;;;###autoload
＠＠ -106,6 +117,7 ＠＠
    charset-g1 greek-iso8859-7
    charset-g2 t
    charset-g3 t
+   safe-charsets (ascii greek-iso8859-7)
    mnemonic "Grk"))

 ;; ISO 8859-6 is such a useless character set that it seems a waste of
＠＠ -201,5 +213,6 ＠＠
    charset-g2 t
    charset-g3 t
    no-iso6429 t
+   safe-charsets (ascii arabic-iso8859-6)
    mnemonic "MIME/Arbc"))

diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/japanese.el
--- a/lisp/mule/japanese.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/japanese.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -195,6 +195,8 ＠＠
    seven t
    input-charset-conversion ((latin-jisx0201 ascii)
 			     (japanese-jisx0208-1978 japanese-jisx0208))
+   safe-charsets (ascii japanese-jisx0208-1978 japanese-jisx0208
+			latin-jisx0201 japanese-jisx0212 katakana-jisx0201)
    mnemonic "MULE/7bit"
    documentation
    "Coding system used for communication with mail and news in Japan."
＠＠ -210,6 +212,7 ＠＠
    lock-shift t
    input-charset-conversion ((latin-jisx0201 ascii)
 			     (japanese-jisx0208-1978 japanese-jisx0208))
+   safe-charsets (latin-jisx0201 ascii japanese-jisx0208-1978 japanese-jisx0208)
    mnemonic "JIS7"
    documentation
    "Old JIS 7-bit encoding; mostly superseded by ISO-2022-JP.
＠＠ -224,6 +227,8 ＠＠
    short t
    input-charset-conversion ((latin-jisx0201 ascii)
 			     (japanese-jisx0208-1978 japanese-jisx0208))
+   safe-charsets (latin-jisx0201 ascii japanese-jisx0208-1978
+                                 japanese-jisx0208)
    mnemonic "JIS8"
    documentation
    "Old JIS 8-bit encoding; mostly superseded by ISO-2022-JP.
＠＠ -261,6 +266,8 ＠＠
  "Shift-JIS"
  '(mnemonic "Ja/SJIS"
    documentation "The standard Japanese encoding in MS Windows."
+   safe-charsets (ascii japanese-jisx0208 japanese-jisx0208-1978
+                        latin-jisx0201 katakana-jisx0201)
 ))

 ;; A former name?
＠＠ -286,6 +293,8 ＠＠
    seven t
    output-charset-conversion ((ascii latin-jisx0201)
 			      (japanese-jisx0208 japanese-jisx0208-1978))
+   safe-charsets (ascii latin-jisx0201 japanese-jisx0208
+                        japanese-jisx0208-1978)
    documentation
    "This is a coding system used for old JIS terminals.  It's an ISO
 2022 based 7-bit encoding for Japanese JISX0208-1978 and JISX0201-Roman."
＠＠ -314,6 +323,7 ＠＠
    charset-g1 japanese-jisx0208
    charset-g2 katakana-jisx0201
    charset-g3 japanese-jisx0212
+   safe-charsets (ascii japanese-jisx0208 katakana-jisx0201 japanese-jisx0212)
    short t
    mnemonic "Ja/EUC"
    documentation
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/korean.el
--- a/lisp/mule/korean.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/korean.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -57,6 +57,7 ＠＠
  "ISO-2022-INT-1 (Korean)"
  '(charset-g0 ascii
    charset-g1 korean-ksc5601
+   safe-charsets (ascii korean-ksc5601)
    short t
    seven t
    lock-shift t
＠＠ -92,6 +93,7 ＠＠
  '(charset-g0 ascii
    charset-g1 korean-ksc5601
    mnemonic "ko/EUC"
+   safe-charsets (ascii korean-ksc5601)
    documentation
    "Korean EUC (Extended Unix Code), the standard Korean encoding on Unix.
 This follows the same overall EUC principles (see the description under
＠＠ -122,6 +124,7 ＠＠
    force-g1-on-output t
    seven t
    lock-shift t
+   safe-charsets (ascii korean-ksc5601)
    mnemonic "Ko/7bit"
    documentation "Coding-System used for communication with mail in Korea."
    eol-type lf))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/latin.el
--- a/lisp/mule/latin.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/latin.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -631,6 +631,43 ＠＠
    (#xDD #xFD) ;; Y WITH ACUTE
    (#xDE #xFE))) ;; Y WITH CIRCUMFLEX

+(make-8-bit-coding-system
+ 'iso-8859-14
+ '((#xA1 ?\u1E02) ;; LATIN CAPITAL LETTER B WITH DOT ABOVE
+   (#xA2 ?\u1E03) ;; LATIN SMALL LETTER B WITH DOT ABOVE
+   (#xA4 ?\u010A) ;; LATIN CAPITAL LETTER C WITH DOT ABOVE
+   (#xA5 ?\u010B) ;; LATIN SMALL LETTER C WITH DOT ABOVE
+   (#xA6 ?\u1E0A) ;; LATIN CAPITAL LETTER D WITH DOT ABOVE
+   (#xA8 ?\u1E80) ;; LATIN CAPITAL LETTER W WITH GRAVE
+   (#xAA ?\u1E82) ;; LATIN CAPITAL LETTER W WITH ACUTE
+   (#xAB ?\u1E0B) ;; LATIN SMALL LETTER D WITH DOT ABOVE
+   (#xAC ?\u1EF2) ;; LATIN CAPITAL LETTER Y WITH GRAVE
+   (#xAF ?\u0178) ;; LATIN CAPITAL LETTER Y WITH DIAERESIS
+   (#xB0 ?\u1E1E) ;; LATIN CAPITAL LETTER F WITH DOT ABOVE
+   (#xB1 ?\u1E1F) ;; LATIN SMALL LETTER F WITH DOT ABOVE
+   (#xB2 ?\u0120) ;; LATIN CAPITAL LETTER G WITH DOT ABOVE
+   (#xB3 ?\u0121) ;; LATIN SMALL LETTER G WITH DOT ABOVE
+   (#xB4 ?\u1E40) ;; LATIN CAPITAL LETTER M WITH DOT ABOVE
+   (#xB5 ?\u1E41) ;; LATIN SMALL LETTER M WITH DOT ABOVE
+   (#xB7 ?\u1E56) ;; LATIN CAPITAL LETTER P WITH DOT ABOVE
+   (#xB8 ?\u1E81) ;; LATIN SMALL LETTER W WITH GRAVE
+   (#xB9 ?\u1E57) ;; LATIN SMALL LETTER P WITH DOT ABOVE
+   (#xBA ?\u1E83) ;; LATIN SMALL LETTER W WITH ACUTE
+   (#xBB ?\u1E60) ;; LATIN CAPITAL LETTER S WITH DOT ABOVE
+   (#xBC ?\u1EF3) ;; LATIN SMALL LETTER Y WITH GRAVE
+   (#xBD ?\u1E84) ;; LATIN CAPITAL LETTER W WITH DIAERESIS
+   (#xBE ?\u1E85) ;; LATIN SMALL LETTER W WITH DIAERESIS
+   (#xBF ?\u1E61) ;; LATIN SMALL LETTER S WITH DOT ABOVE
+   (#xD0 ?\u0174) ;; LATIN CAPITAL LETTER W WITH CIRCUMFLEX
+   (#xD7 ?\u1E6A) ;; LATIN CAPITAL LETTER T WITH DOT ABOVE
+   (#xDE ?\u0176) ;; LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
+   (#xF0 ?\u0175) ;; LATIN SMALL LETTER W WITH CIRCUMFLEX
+   (#xF7 ?\u1E6B) ;; LATIN SMALL LETTER T WITH DOT ABOVE
+   (#xFE ?\u0177)) ;; LATIN SMALL LETTER Y WITH CIRCUMFLEX
+ "ISO-8859-14 (Latin-8)"
+ '(mnemonic "Latin 8"
+   aliases (iso-latin-8 latin-8)))
+

 ;; The syntax table code for ISO 8859-15 and ISO 8859-16 requires that the
 ;; guillemets not have parenthesis syntax, which they used to have in the
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/mule-coding.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -104,6 +104,7 ＠＠
  '(charset-g0 ascii
    charset-g1 latin-iso8859-1
    eol-type nil
+   safe-charsets t ;; Reasonable
    mnemonic "CText"))

 (make-coding-system
＠＠ -113,6 +114,9 ＠＠
    charset-g1 latin-iso8859-1
    charset-g2 t ;; unspecified but can be used later.
    short t
+   safe-charsets (ascii katakana-jisx0201 japanese-jisx0208-1978
+                  japanese-jisx0208 japanese-jisx0212 japanese-jisx0213-1
+                  japanese-jisx0213-2)
    mnemonic "ISO8/SS"
    documentation "ISO 2022 based 8-bit encoding using SS2 for 96-charset"
    ))
＠＠ -124,6 +128,7 ＠＠
    charset-g2 t ;; unspecified but can be used later.
    seven t
    short t
+   safe-charsets t
    mnemonic "ISO7/SS"
    documentation "ISO 2022 based 7-bit encoding using SS2 for 96-charset"
    eol-type nil))
＠＠ -136,6 +141,7 ＠＠
    charset-g2 t ;; unspecified but can be used later.
    seven t
    short t
+   safe-charsets t
    mnemonic "ISO7/SS"
    eol-type nil))

＠＠ -145,6 +151,7 ＠＠
  '(charset-g0 ascii
    seven t
    short t
+   safe-charsets t
    mnemonic "ISO7"
    documentation "ISO-2022-based 7-bit encoding using only G0"
    ))
＠＠ -158,6 +165,7 ＠＠
  '(charset-g0 ascii
    charset-g1 latin-iso8859-1
    short t
+   safe-charsets t
    mnemonic "ISO8"
    documentation "ISO-2022 eight-bit coding system.  No single-shift or
locking-shift."
    ))
＠＠ -169,6 +177,7 ＠＠
    charset-g1 latin-iso8859-1
    eol-type lf
    escape-quoted t
+   safe-charsets t
    mnemonic "ESC/Quot"
    documentation "ISO-2022 eight-bit coding system with escape quoting; used for
.ELC files."
    ))
＠＠ -180,6 +189,7 ＠＠
    charset-g1 t ;; unspecified but can be used later.
    seven t
    lock-shift t
+   safe-charsets t
    mnemonic "ISO7/Lock"
    documentation "ISO-2022 coding system using Locking-Shift for 96-charset."
    ))
＠＠ -574,14 +584,14 ＠＠
       (goto-char begin buffer)
       (skip-chars-forward skip-chars-arg end buffer)
       (while (< (point buffer) end)
-        (message
-	 "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
-	 fail-range-start previous-fail (point buffer) end)
+        ; (message
+	; "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+	; fail-range-start previous-fail (point buffer) end)
 	(setq char-after (char-after (point buffer) buffer)
 	      fail-range-start (point buffer))
-	(message "arguments are %S %S"
-		 (< (point buffer) end)
-		 (not (gethash (encode-char char-after 'ucs) from-unicode)))
+	; (message "arguments are %S %S"
+	;	 (< (point buffer) end)
+	;	 (not (gethash (encode-char char-after 'ucs) from-unicode)))
 	(while (and
 		(< (point buffer) end)
 		(not (gethash (encode-char char-after 'ucs) from-unicode)))
＠＠ -593,7 +603,7 ＠＠
 	    ;; system; check the characters past it.
 	    (forward-char 1 buffer)
 	  ;; The character actually failed. 
-	  (message "past the move through, point now %S" (point buffer))
+	  ; (message "past the move through, point now %S" (point buffer))
 	  (when errorp 
 	    (error 'text-conversion-error
 		   (format "Cannot encode %s using coding system"
＠＠ -608,12 +618,12 ＠＠
 						  (point-max buffer)))
 			   t ranges)
 	  (when highlightp
-	    (message "highlighting")
+	    ; (message "highlighting")
 	    (setq extent (make-extent fail-range-start fail-range-end buffer))
 	    (set-extent-priority extent (+ mouse-highlight-priority 2))
 	    (set-extent-face extent 'query-coding-warning-face))
 	  (skip-chars-forward skip-chars-arg end buffer)))
-      (message "about to give the result, ranges %S" ranges)
+      ; (message "about to give the result, ranges %S" ranges)
       (if failed 
 	  (values nil ranges)
 	(values t nil)))))
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/mule/thai-xtis.el
--- a/lisp/mule/thai-xtis.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/mule/thai-xtis.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -355,6 +355,7 ＠＠
        `(mnemonic "TIS620"
 	 decode ccl-decode-thai-xtis
 	 encode ccl-encode-thai-xtis
+         safe-charsets (ascii thai-xtis)
 	 documentation "external=tis620, internal=thai-xtis"))
       (coding-system-put 'tis-620 'category 'iso-8-1))
   (make-coding-system
diff -r 84d618b355f5 -r 1d74a1d115ee lisp/unicode.el
--- a/lisp/unicode.el	Sat Aug 09 13:15:09 2008 +0200
+++ b/lisp/unicode.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -626,7 +626,7 ＠＠
   (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
          (ranges (make-range-table))
          (looking-at-arg (concat "[" skip-chars-arg "]"))
-         fail-range-start fail-range-end previous-fail char-after failed
+         fail-range-start fail-range-end char-after failed
 	 extent)
     (save-excursion
       (when highlightp
＠＠ -638,8 +638,8 ＠＠
       (skip-chars-forward skip-chars-arg end buffer)
       (while (< (point buffer) end)
 ;        (message
-;         "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
-;         fail-range-start previous-fail (point buffer) end)
+;         "fail-range-start is %S, point is %S, end is %S"
+;         fail-range-start (point buffer) end)
         (setq char-after (char-after (point buffer) buffer)
               fail-range-start (point buffer))
         (while (and
＠＠ -647,7 +647,6 ＠＠
                 (not (looking-at looking-at-arg))
                 (= -1 (char-to-unicode char-after)))
           (forward-char 1 buffer)
-	  (message "what?!?")
           (setq char-after (char-after (point buffer) buffer)
                 failed t))
         (if (= fail-range-start (point buffer))
diff -r 84d618b355f5 -r 1d74a1d115ee src/ChangeLog
--- a/src/ChangeLog	Sat Aug 09 13:15:09 2008 +0200
+++ b/src/ChangeLog	Sun Dec 28 14:46:24 2008 +0000
＠＠ -1,3 +1,17 ＠＠
+2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* file-coding.c (Fmake_coding_system): 
+	Document our use of the safe-chars and safe-charsets properties,
+	and the differences compared to GNU. 
+	(make_coding_system_1): Don't drop the safe-chars and
+	safe-charsets properties. 
+	(Fcoding_system_property): Return the safe-chars and safe-charsets
+	properties when asked for them.
+	* file-coding.h (CODING_SYSTEM_SAFE_CHARSETS): 
+	* coding-system-slots.h: 
+	Make the safe-chars and safe-charsets slots available in these
+	headers. 
+
 2008-08-05  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* mule-charset.c (complex_vars_of_mule_charset): 
diff -r 84d618b355f5 -r 1d74a1d115ee src/coding-system-slots.h
--- a/src/coding-system-slots.h	Sat Aug 09 13:15:09 2008 +0200
+++ b/src/coding-system-slots.h	Sun Dec 28 14:46:24 2008 +0000
＠＠ -105,6 +105,10 ＠＠
      coding system). */
   MARKED_SLOT (canonical)

+  MARKED_SLOT (safe_charsets)
+
+  MARKED_SLOT (safe_chars)
+
 #undef MARKED_SLOT
 #undef MARKED_SLOT_ARRAY
 #undef CODING_SYSTEM_SLOT_DECLARATION
diff -r 84d618b355f5 -r 1d74a1d115ee src/file-coding.c
--- a/src/file-coding.c	Sat Aug 09 13:15:09 2008 +0200
+++ b/src/file-coding.c	Sun Dec 28 14:46:24 2008 +0000
＠＠ -1125,9 +1125,9 ＠＠
 	else if (EQ (key, Qtranslation_table_for_encode))
 	  ;
 	else if (EQ (key, Qsafe_chars))
-	  ;
+	  CODING_SYSTEM_SAFE_CHARS (cs) = value;
 	else if (EQ (key, Qsafe_charsets))
-	  ;
+	  CODING_SYSTEM_SAFE_CHARSETS (cs) = value;
 	else if (EQ (key, Qmime_charset))
 	  ;
 	else if (EQ (key, Qvalid_codes))
＠＠ -1326,20 +1326,7 ＠＠
 `translation-table-for-encode'
      The value is a translation table to be applied on encoding.  This is
      not applicable to CCL-based coding systems.
-    
-`safe-chars'
-     The value is a char table.  If a character has non-nil value in it,
-     the character is safely supported by the coding system.  This
-     overrides the specification of safe-charsets.
-   
-`safe-charsets'
-     The value is a list of charsets safely supported by the coding
-     system.  The value t means that all charsets Emacs handles are
-     supported.  Even if some charset is not in this list, it doesn't
-     mean that the charset can't be encoded in the coding system;
-     it just means that some other receiver of text encoded
-     in the coding system won't be able to handle that charset.
-    
+     
 `mime-charset'
      The value is a symbol of which name is `MIME-charset' parameter of
      the coding system.
＠＠ -1350,7 +1337,27 ＠＠
      In the former case, the integer value is a valid byte code.  In the
      latter case, the integers specifies the range of valid byte codes.

-
+The following properties are used by `default-query-coding-region',
+the default implementation of `query-coding-region'. This
+implementation and these properties are not used by the Unicode coding
+systems, nor by those CCL coding systems created with
+`make-8-bit-coding-system'. 
+
+`safe-chars'
+     The value is a char table.  If a character has non-nil value in it,
+     the character is safely supported by the coding system.  
+     Under XEmacs, for the moment, this is used in addition to the
+     `safe-charsets' property. It does not override it as it does
+     under GNU Emacs. #### We need to consider if we should keep this
+     behaviour.
+   
+`safe-charsets'
+     The value is a list of charsets safely supported by the coding
+     system.  For coding systems based on ISO 2022, XEmacs may try to
+     encode characters outside these character sets, but outside of
+     East Asia and East Asian coding systems, it is unlikely that
+     consumers of the data will understand XEmacs' encoding.
+     The value t means that all XEmacs character sets handles are supported.  

 The following additional property is recognized if TYPE is `convert-eol':

＠＠ -1862,6 +1869,10 ＠＠
     return XCODING_SYSTEM_POST_READ_CONVERSION (coding_system);
   else if (EQ (prop, Qpre_write_conversion))
     return XCODING_SYSTEM_PRE_WRITE_CONVERSION (coding_system);
+  else if (EQ (prop, Qsafe_charsets))
+    return XCODING_SYSTEM_SAFE_CHARSETS (coding_system);
+  else if (EQ (prop, Qsafe_chars))
+    return XCODING_SYSTEM_SAFE_CHARS (coding_system);
   else
     {
       Lisp_Object value = CODESYSMETH_OR_GIVEN (XCODING_SYSTEM (coding_system),
diff -r 84d618b355f5 -r 1d74a1d115ee src/file-coding.h
--- a/src/file-coding.h	Sat Aug 09 13:15:09 2008 +0200
+++ b/src/file-coding.h	Sun Dec 28 14:46:24 2008 +0000
＠＠ -583,6 +583,8 ＠＠
 #define CODING_SYSTEM_AUTO_EOL_WRAPPER(codesys) ((codesys)->auto_eol_wrapper)
 #define CODING_SYSTEM_SUBSIDIARY_PARENT(codesys) ((codesys)->subsidiary_parent)
 #define CODING_SYSTEM_CANONICAL(codesys) ((codesys)->canonical)
+#define CODING_SYSTEM_SAFE_CHARSETS(codesys) ((codesys)->safe_charsets)
+#define CODING_SYSTEM_SAFE_CHARS(codesys) ((codesys)->safe_chars)

 #define CODING_SYSTEM_CHAIN_CHAIN(codesys) \
   (CODING_SYSTEM_TYPE_DATA (codesys, chain)->chain)
＠＠ -623,6 +625,10 ＠＠
   CODING_SYSTEM_SUBSIDIARY_PARENT (XCODING_SYSTEM (codesys))
 #define XCODING_SYSTEM_CANONICAL(codesys) \
   CODING_SYSTEM_CANONICAL (XCODING_SYSTEM (codesys))
+#define XCODING_SYSTEM_SAFE_CHARSETS(codesys) \
+  CODING_SYSTEM_SAFE_CHARSETS (XCODING_SYSTEM (codesys))
+#define XCODING_SYSTEM_SAFE_CHARS(codesys) \
+  CODING_SYSTEM_SAFE_CHARS (XCODING_SYSTEM (codesys))

 #define XCODING_SYSTEM_CHAIN_CHAIN(codesys) \
   CODING_SYSTEM_CHAIN_CHAIN (XCODING_SYSTEM (codesys))
diff -r 84d618b355f5 -r 1d74a1d115ee tests/ChangeLog
--- a/tests/ChangeLog	Sat Aug 09 13:15:09 2008 +0200
+++ b/tests/ChangeLog	Sun Dec 28 14:46:24 2008 +0000
＠＠ -1,3 +1,9 ＠＠
+2008-12-28  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* automated/query-coding-tests.el: 
+	New file, testing the functionality of #'query-coding-region and
+	#'query-coding-string.
+
 2008-05-21  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* automated/mule-tests.el (featurep): 
diff -r 84d618b355f5 -r 1d74a1d115ee tests/automated/query-coding-tests.el
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tests/automated/query-coding-tests.el	Sun Dec 28 14:46:24 2008 +0000
＠＠ -0,0 +1,293 ＠＠
+;; Copyright (C) 2008 Free Software Foundation, Inc. -*- coding: iso-8859-1 -*-
+
+;; Author: Aidan Kehoe <kehoea(a)parhasard.net&gt;
+;; Maintainer: Aidan Kehoe <kehoea(a)parhasard.net&gt;
+;; Created: 2008
+;; Keywords: tests, query-coding-region
+
+;; This file is part of XEmacs.
+
+;; XEmacs is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 2, or (at your option)
+;; any later version.
+
+;; XEmacs is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with XEmacs; see the file COPYING.  If not, write to the Free
+;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
+;; 02111-1307, USA.
+
+;;; Synched up with: Not in FSF.
+
+;;; Commentary:
+
+;; Test the query-coding-region and query-coding-string implementations for
+;; some well-known coding systems.
+
+(require 'bytecomp)
+
+(defun q-c-debug (&rest aerger)
+  (let ((standard-output (get-buffer-create "query-coding-debug"))
+        (fmt (condition-case nil
+                 (and (stringp (first aerger))
+                      (apply #'format aerger))
+               (error nil))))
+    (if fmt
+        (progn
+          (princ (apply #'format aerger))
+          (terpri))
+      (princ "--> ")
+      (let ((i 1))
+        (dolist (sgra aerger)
+          (if (> i 1) (princ "  "))
+          (princ (format "%d. " i))
+          (prin1 sgra)
+          (incf i))
+        (terpri)))))
+
+;; Comment this out if debugging:
+(defalias 'q-c-debug #'ignore)
+
+(when (featurep 'mule)
+  (let ((ascii-chars-string (apply #'string
+                                   (loop for i from #x0 to #x7f
+                                     collect (int-to-char i))))
+        (latin-1-chars-string (apply #'string 
+                                     (loop for i from #x0 to #xff
+                                       collect (int-to-char i))))
+        unix-coding-system text-conversion-error-signalled)
+    (with-temp-buffer
+      (insert ascii-chars-string)
+      ;; First, check all the coding systems that are ASCII-transparent for
+      ;; ASCII-transparency in the check.
+      (dolist (coding-system
+               (delete-duplicates
+                (mapcar #'(lambda (coding-system)
+                            (unless (coding-system-alias-p coding-system)
+                              ;; We're only interested in the version with
+                              ;; Unix line endings right now.
+                              (setq unix-coding-system 
+                                    (subsidiary-coding-system
+                                     (coding-system-base coding-system) 'lf))
+                              (when (and 
+                                     ;; ASCII-transparent
+                                     (equal ascii-chars-string
+                                           (encode-coding-string
+                                            ascii-chars-string
+                                            unix-coding-system))
+                                     (not 
+                                      (memq (coding-system-type
+                                             unix-coding-system)
+                                            '(undecided chain))))
+                                unix-coding-system)))
+                        (coding-system-list nil))
+:test #'eq))
+        (q-c-debug "looking at coding system %S" (coding-system-name
+                                                  coding-system))
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-region (point-min) (point-max) coding-system)
+          (q-c-debug "checking type, coding-system, q-c-s, q-c-t %S"
+                     (list (coding-system-type coding-system)
+                           coding-system query-coding-succeeded
+                           query-coding-table))
+          (unless (and (eq t query-coding-succeeded)
+                       (null query-coding-table))
+            (q-c-debug "(eq t query-coding-succeeded) %S, (\
+null query-coding-table) %S" (eq t query-coding-succeeded)
+                             (null query-coding-table)))
+          (Assert (eq t query-coding-succeeded))
+          (Assert (null query-coding-table)))
+        (q-c-debug "testing the ASCII strings for %S" coding-system)
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-string ascii-chars-string coding-system)
+          (unless (and (eq t query-coding-succeeded)
+                       (null query-coding-table))
+            (q-c-debug "(eq t query-coding-succeeded) %S, (\
+null query-coding-table) %S" (eq t query-coding-succeeded)
+                             (null query-coding-table)))
+          (Assert (eq t query-coding-succeeded))
+          (Assert (null query-coding-table))))
+      (q-c-debug "past the loop through the coding systems")
+      (delete-region (point-min) (point-max))
+      ;; Check for success from the two Latin-1 coding systems 
+      (insert latin-1-chars-string)
+      (q-c-debug "point is now %S" (point))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (q-c-debug "point is now %S" (point))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-string (buffer-string) 'iso-8859-1-unix)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (q-c-debug "point is now %S" (point))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-string (buffer-string) 'iso-latin-1-with-esc-unix)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (q-c-debug "point is now %S" (point))
+      ;; Make it fail, check that it fails correctly
+      (insert (decode-char 'ucs #x20AC)) ;; EURO SIGN
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'iso-8859-1-unix)
+        (unless (and (null query-coding-succeeded)
+                     (equal query-coding-table
+                            #s(range-table type start-closed-end-open data
+                                           ((257 258) t))))
+          (q-c-debug "dealing with %S" 'iso-8859-1-unix)
+          (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+        (Assert (null query-coding-succeeded))
+        (Assert (equal query-coding-table
+                       #s(range-table type start-closed-end-open data
+                                      ((257 258) t)))))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max)
+                               'iso-latin-1-with-esc-unix)
+        ;; Stupidly, this succeeds. The behaviour is compatible with
+        ;; GNU, though, and we encourage people not to use
+        ;; iso-latin-1-with-esc-unix anyway:
+
+        (unless (and query-coding-succeeded
+                     (null query-coding-table))
+          (q-c-debug "dealing with %S" 'iso-latin-1-with-esc-unix)
+          (q-c-debug "query-coding-succeeded %S, query-coding-table \
+%S" query-coding-succeeded query-coding-table))
+        (Assert query-coding-succeeded)
+        (Assert (null query-coding-table)))
+      ;; Check that it errors correctly. 
+      (setq text-conversion-error-signalled nil)
+      (condition-case nil
+          (query-coding-region (point-min) (point-max) 'iso-8859-1-unix nil t)
+        (text-conversion-error
+         (setq text-conversion-error-signalled t)))
+      (Assert text-conversion-error-signalled)
+      (setq text-conversion-error-signalled nil)
+      (condition-case nil
+          (query-coding-region (point-min) (point-max)
+                               'iso-latin-1-with-esc-unix nil t)
+        (text-conversion-error
+         (setq text-conversion-error-signalled t)))
+      (Assert (null text-conversion-error-signalled))
+      (delete-region (point-min) (point-max))
+      (insert latin-1-chars-string)
+      (decode-coding-region (point-min) (point-max) 'windows-1252-unix)
+      (goto-char (point-max)) ;; #'decode-coding-region just messed up point.
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (insert ?\x80)
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+        (unless (and (null query-coding-succeeded)
+                     (equal query-coding-table
+                            #s(range-table type start-closed-end-open data
+                                           ((257 258) t))))
+          (q-c-debug "dealing with %S" 'windows-1252-unix)
+          (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+        (Assert (null query-coding-succeeded))
+        (Assert (equal query-coding-table
+                       #s(range-table type start-closed-end-open data
+                                      ((257 258) t)))))
+      ;; Try a similar approach with koi8-o, the koi8 variant with
+      ;; support for Old Church Slavonic.
+      (delete-region (point-min) (point-max))
+      (insert latin-1-chars-string)
+      (decode-coding-region (point-min) (point-max) 'koi8-o-unix)
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'koi8-o-unix)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'escape-quoted)
+        (Assert (eq t query-coding-succeeded))
+        (Assert (null query-coding-table)))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'windows-1252-unix)
+        (unless (and (null query-coding-succeeded)
+                     (equal query-coding-table
+                            #s(range-table type start-closed-end-open
+                                           data ((129 131) t (132 133) t
+                                                 (139 140) t (141 146) t
+                                                 (155 156) t (157 161) t
+                                                 (162 170) t (173 176) t
+                                                 (178 187) t (189 192) t
+                                                 (193 257) t))))
+          (q-c-debug "query-coding-succeeded not null, query-coding-table \
+%S" query-coding-table))
+        (Assert (null query-coding-succeeded))
+        (Assert (equal query-coding-table
+                       #s(range-table type start-closed-end-open
+                                      data ((129 131) t (132 133) t (139 140) t
+                                            (141 146) t (155 156) t (157 161) t
+                                            (162 170) t (173 176) t (178 187) t
+                                            (189 192) t (193 257) t)))))
+      (multiple-value-bind (query-coding-succeeded query-coding-table)
+          (query-coding-region (point-min) (point-max) 'koi8-r-unix)
+        (Assert (null query-coding-succeeded))
+        (Assert (equal query-coding-table
+                       #s(range-table type start-closed-end-open
+                                      data ((129 154) t (155 161) t (162 164) t
+                                            (165 177) t (178 180) t
+                                            (181 192) t)))))
+      ;; Check that the Unicode coding systems handle characters
+      ;; without Unicode mappings.
+      (delete-region (point-min) (point-max))
+      (insert latin-1-chars-string)
+      (decode-coding-region (point-min) (point-max) 'greek-iso-8bit-with-esc)
+      (dolist (coding-system
+               '(utf-16-mac ucs-4-mac utf-16-little-endian-bom-dos ucs-4-dos
+                 utf-16-little-endian-mac utf-16-bom-unix
+                 utf-16-little-endian ucs-4 utf-16-dos
+                 ucs-4-little-endian-dos utf-16-bom-mac utf-16-bom
+                 utf-16-unix utf-32-unix utf-32-little-endian
+                 utf-32-dos utf-32 utf-32-little-endian-dos utf-8-bom
+                 utf-16-bom-dos ucs-4-unix
+                 utf-16-little-endian-bom-unix utf-8-bom-mac
+                 utf-32-little-endian-unix utf-16
+                 utf-16-little-endian-dos utf-16-little-endian-bom-mac
+                 utf-8-bom-dos ucs-4-little-endian-mac utf-8-bom-unix
+                 utf-32-little-endian-mac utf-8-dos utf-8-unix
+                 utf-32-mac utf-8-mac utf-16-little-endian-unix
+                 ucs-4-little-endian ucs-4-little-endian-unix utf-8
+                 utf-16-little-endian-bom))
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-region (point-min) (point-max) coding-system)
+          (Assert (null query-coding-succeeded))
+          (Assert (equal query-coding-table
+                         #s(range-table type start-closed-end-open data
+                                        ((173 174) t (209 210) t
+                                         (254 255) t)))))
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-region (point-min) 173 coding-system)
+          (Assert (eq t query-coding-succeeded))
+          (Assert (null query-coding-table)))
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-region 174 209 coding-system)
+          (Assert (eq t query-coding-succeeded))
+          (Assert (null query-coding-table)))
+        (multiple-value-bind (query-coding-succeeded query-coding-table)
+            (query-coding-region 210 254 coding-system)
+          (Assert (eq t query-coding-succeeded))
+          (Assert (null query-coding-table)))
+        ;; Check that it errors correctly. 
+        (setq text-conversion-error-signalled nil)
+        (condition-case nil
+            (query-coding-region (point-min) (point-max) coding-system nil t)
+          (text-conversion-error
+           (setq text-conversion-error-signalled t)))
+        (Assert text-conversion-error-signalled)
+        (setq text-conversion-error-signalled nil)
+        (condition-case nil
+            (query-coding-region (point-min) 173 coding-system nil t)
+          (text-conversion-error
+           (setq text-conversion-error-signalled t)))
+        (Assert (null text-conversion-error-signalled))))))

changeset:   4512:84d618b355f5
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sat Aug 09 13:15:09 2008 +0200
summary:     2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

diff -r 26aae3bacf99 -r 84d618b355f5 lisp/ChangeLog
--- a/lisp/ChangeLog	Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/ChangeLog	Sat Aug 09 13:15:09 2008 +0200
＠＠ -1,3 +1,9 ＠＠
+2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* mule/mule-coding.el (make-8-bit-coding-system): 
+	* mule/general-late.el (posix-charset-to-coding-system-hash): 
+	Use #'skip-chars-quote as appropriate. 
+
 2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;

 	* subr.el (skip-chars-quote): New.
＠＠ -6,8 +12,6 ＠＠
 	#'skip-chars-backward. 

 2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
-
-	* subr.el (skip-chars-quote): New.

 	* mule/cyril-util.el: Remove. Use the version in packages instead.

diff -r 26aae3bacf99 -r 84d618b355f5 lisp/mule/general-late.el
--- a/lisp/mule/general-late.el	Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/mule/general-late.el	Sat Aug 09 13:15:09 2008 +0200
＠＠ -90,7 +90,7 ＠＠
               (setq skip-chars-string
                     (concat skip-chars-string
                             (charset-skip-chars-string charset))))
-            finally return skip-chars-string))))
+            finally return (skip-chars-quote skip-chars-string)))))

 ;; At this point in the dump, all the charsets have been loaded. Now, load
 ;; their Unicode mappings.
diff -r 26aae3bacf99 -r 84d618b355f5 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Sat Aug 09 13:11:06 2008 +0200
+++ b/lisp/mule/mule-coding.el	Sat Aug 09 13:15:09 2008 +0200
＠＠ -699,7 +699,8 ＠＠
     (coding-system-put name 'category 
                        (make-8-bit-choose-category decode-table))
     (coding-system-put name '8-bit-fixed-query-skip-chars
-                       (apply #'string (append decode-table nil)))
+                       (skip-chars-quote
+			      (apply #'string (append decode-table nil))))
     (coding-system-put name '8-bit-fixed-query-from-unicode encode-table)

     (coding-system-put name 'query-coding-function
＠＠ -786,7 +787,8 ＠＠
         (coding-system-put ',name 'category 
                            ',(make-8-bit-choose-category decode-table))
         (coding-system-put ',name '8-bit-fixed-query-skip-chars
-                           ',(apply #'string (append decode-table nil)))
+                           ',(skip-chars-quote
+			      (apply #'string (append decode-table nil))))
         (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
         (coding-system-put ',name 'query-coding-function
                            #'8-bit-fixed-query-coding-region)

changeset:   4510:31344162cf9a
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sat Aug 09 13:06:24 2008 +0200
summary:     Add #'skip-chars-quote to subr.el

diff -r 89406c31b125 -r 31344162cf9a lisp/ChangeLog
--- a/lisp/ChangeLog	Sat Aug 09 12:13:19 2008 +0200
+++ b/lisp/ChangeLog	Sat Aug 09 13:06:24 2008 +0200
＠＠ -1,4 +1,13 ＠＠
 2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* subr.el (skip-chars-quote): New.
+	Given STRING, return a string that means that all characters in
+	STRING will be skipped when passed to #'skip-chars-forward,
+	#'skip-chars-backward. 
+
+2008-08-09  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* subr.el (skip-chars-quote): New.

 	* mule/cyril-util.el: Remove. Use the version in packages instead.

diff -r 89406c31b125 -r 31344162cf9a lisp/subr.el
--- a/lisp/subr.el	Sat Aug 09 12:13:19 2008 +0200
+++ b/lisp/subr.el	Sat Aug 09 13:06:24 2008 +0200
＠＠ -1722,4 +1722,25 ＠＠

 ;; define-mail-user-agent is in simple.el.

+;; XEmacs; added. 
+(defun skip-chars-quote (string)
+  "Return a string that means all characters in STRING will be skipped,
+if passed to `skip-chars-forward' or `skip-chars-backward'.
+
+Ranges and carets are not treated specially.  This implementation is
+in Lisp; do not use it in performance-critical code."
+  (let ((list (delete-duplicates (string-to-list string) :test #'=)))
+    (when (equal list '((?- ?\[) (?\[ ?\-)))
+      (error 'invalid-argument
+	     "Cannot create `skip-chars-forward' arg from string"
+	     string))
+    (when (memq ?\] list)
+      (setq list (cons ?\] (delq ?\] list))))
+    (when (eq ?^ (car list))
+      (setq list (nconc (cdr list) '(?^))))
+    (when (memq ?- list)
+      (setq list (delq ?- list)
+	    list (nconc list (list (second list) ?- (second list) ?-))))
+    (apply #'string list)))
+
 ;;; subr.el ends here

changeset:   4478:bd1a68c34d44
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Wed May 21 21:49:19 2008 +0200
summary:     Merge my change of 2008-05-14 to the query-coding-region code.

diff -r d9fcb5442c95 -r bd1a68c34d44 lisp/ChangeLog
--- a/lisp/ChangeLog	Wed May 21 21:47:42 2008 +0200
+++ b/lisp/ChangeLog	Wed May 21 21:49:19 2008 +0200
＠＠ -1,3 +1,8 ＠＠
+2008-05-21  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* mule/mule-coding.el (make-8-bit-choose-category): 
+	Merge my change of 2008-05-14 to the query-coding-region code.
+
 2008-05-14  Stephen J. Turnbull  <stephen(a)xemacs.org&gt;

 	* subr.el (add-to-list): Fix Aidan's last commit.
diff -r d9fcb5442c95 -r bd1a68c34d44 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Wed May 21 21:47:42 2008 +0200
+++ b/lisp/mule/mule-coding.el	Wed May 21 21:49:19 2008 +0200
＠＠ -531,7 +531,7 ＠＠
   (check-argument-range (length decode-table) #x100 #x100)
   (loop
     named category
-    for i from #x80 to #xBF
+    for i from #x80 to #x9F
     do (unless (= i (aref decode-table i))
 	 (return-from category 'no-conversion))
     finally return 'iso-8-1))

changeset:   4453:20c32e489235
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sun May 11 19:50:10 2008 +0200
summary:     Add #'query-coding-clear-highlights.

diff -r 4953b7353349 -r 20c32e489235 lisp/ChangeLog
--- a/lisp/ChangeLog	Sat May 03 13:09:06 2008 +0200
+++ b/lisp/ChangeLog	Sun May 11 19:50:10 2008 +0200
＠＠ -1,3 +1,10 ＠＠
+2008-05-11  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* coding.el (query-coding-clear-highlights): 
+	New function--clear any face information added by
+	`query-coding-region'. 
+	(default-query-coding-region): Use it.
+
 2008-04-13  Henry S. Thompson <ht(a)inf.ed.ac.uk&gt;, Mike Sperber 
<mike(a)xemacs.org&gt;

 	* window-xemacs.el (save-window-excursion/mapping,
diff -r 4953b7353349 -r 20c32e489235 lisp/coding.el
--- a/lisp/coding.el	Sat May 03 13:09:06 2008 +0200
+++ b/lisp/coding.el	Sun May 11 19:50:10 2008 +0200
＠＠ -286,6 +286,20 ＠＠
   #s(hash-table test equal data ())
   "A map from list of charsets to `skip-chars-forward' arguments for
them.")

+(defsubst query-coding-clear-highlights (begin end &optional buffer)
+  "Remove extent faces added by `query-coding-region' between BEGIN and END.
+
+Optional argument BUFFER is the buffer to use, and defaults to the current
+buffer.
+
+The HIGHLIGHTP argument to `query-coding-region' indicates that it should
+display unencodable characters using `query-coding-warning-face'.  After
+this function has been called, this will no longer be the case.  "
+  (map-extents #'(lambda (extent ignored-arg)
+                   (when (eq 'query-coding-warning-face
+                             (extent-face extent))
+                     (delete-extent extent))) buffer begin end))
+
 (defun default-query-coding-region (begin end coding-system
 				    &optional buffer errorp highlightp)
   "The default `query-coding-region' implementation.
＠＠ -319,10 +333,7 ＠＠
 				safe-charsets "")
 		     default-query-coding-region-safe-charset-skip-chars-map)))
     (when highlightp
-      (map-extents #'(lambda (extent ignored-arg)
-		       (when (eq 'query-coding-warning-face
-				 (extent-face extent))
-			 (delete-extent extent))) buffer begin end))
+      (query-coding-clear-highlights begin end buffer))
     (if (and (zerop (length skip-chars-arg)) (null safe-chars))
 	(progn
 	    ;; Uh-oh, nothing known about this coding system. Fail. 
＠＠ -384,7 +395,7 ＠＠
 	    (values nil ranges)
 	  (values t nil))))))

-(defsubst query-coding-region (start end coding-system &optional buffer
+(defun query-coding-region (start end coding-system &optional buffer
                                errorp highlight)
   "Work out whether CODING-SYSTEM can losslessly encode a region.

changeset:   4443:75654496fa0e
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Sat May 03 13:08:54 2008 +0200
summary:     Correct a docstring

diff -r 9c1cfceab252 -r 75654496fa0e lisp/coding.el
--- a/lisp/coding.el	Thu Mar 13 10:24:34 2008 +0100
+++ b/lisp/coding.el	Sat May 03 13:08:54 2008 +0200
＠＠ -398,7 +398,7 ＠＠
 Optional argument HIGHLIGHT says to display unencodable characters in the
 region using `query-coding-warning-face'. It defaults to nil.

-This function returns a list; the intention is that callers use use
+This function returns a list; the intention is that callers use 
 `multiple-value-bind' or the related CL multiple value functions to deal
 with it.  The first element is `t' if the string can be encoded using
 CODING-SYSTEM, or `nil' if not.  The second element is `nil' if the string

changeset:   4413:6812571bfcb9
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Thu Mar 13 10:21:01 2008 +0100
summary:     Fix some bugs.

diff -r 1217f19ce196 -r 6812571bfcb9 lisp/coding.el
--- a/lisp/coding.el	Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/coding.el	Thu Mar 13 10:21:01 2008 +0100
＠＠ -299,8 +299,13 ＠＠
   (check-argument-type #'integer-or-marker-p begin)
   (check-argument-type #'integer-or-marker-p end)
   (let* ((safe-charsets
-          (coding-system-get coding-system 'safe-charsets))
-         (safe-chars (coding-system-get coding-system 'safe-chars))
+          (or (coding-system-get coding-system 'safe-charsets)
+	      (coding-system-get (coding-system-base coding-system)
+				 'safe-charsets)))
+         (safe-chars
+	  (or (coding-system-get coding-system 'safe-chars)
+	      (coding-system-get (coding-system-base coding-system)
+				 'safe-chars)))
          (skip-chars-arg
           (gethash safe-charsets
                    default-query-coding-region-safe-charset-skip-chars-map))
＠＠ -313,6 +318,11 ＠＠
 		     (mapconcat #'charset-skip-chars-string
 				safe-charsets "")
 		     default-query-coding-region-safe-charset-skip-chars-map)))
+    (when highlightp
+      (map-extents #'(lambda (extent ignored-arg)
+		       (when (eq 'query-coding-warning-face
+				 (extent-face extent))
+			 (delete-extent extent))) buffer begin end))
     (if (and (zerop (length skip-chars-arg)) (null safe-chars))
 	(progn
 	    ;; Uh-oh, nothing known about this coding system. Fail. 
diff -r 1217f19ce196 -r 6812571bfcb9 lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/mule/mule-coding.el	Thu Mar 13 10:21:01 2008 +0100
＠＠ -553,15 +553,24 ＠＠
   (check-argument-type #'integer-or-marker-p begin)
   (check-argument-type #'integer-or-marker-p end)
   (let ((from-unicode
-         (coding-system-get coding-system '8-bit-fixed-query-from-unicode))
+         (or (coding-system-get coding-system '8-bit-fixed-query-from-unicode)
+	     (coding-system-get (coding-system-base coding-system)
+				'8-bit-fixed-query-from-unicode)))
         (skip-chars-arg
-         (coding-system-get coding-system '8-bit-fixed-query-skip-chars))
+         (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars)
+	     (coding-system-get (coding-system-base coding-system)
+				'8-bit-fixed-query-skip-chars)))
 	(ranges (make-range-table))
         char-after fail-range-start fail-range-end previous-fail extent
 	failed)
     (check-type from-unicode hash-table)
     (check-type skip-chars-arg string)
     (save-excursion
+      (when highlightp
+	(map-extents #'(lambda (extent ignored-arg)
+			 (when (eq 'query-coding-warning-face
+				   (extent-face extent))
+			   (delete-extent extent))) buffer begin end))
       (goto-char begin buffer)
       (skip-chars-forward skip-chars-arg end buffer)
       (while (< (point buffer) end)
＠＠ -588,7 +597,7 ＠＠
 	  (when errorp 
 	    (error 'text-conversion-error
 		   (format "Cannot encode %s using coding system"
-			   (buffer-substring fail-range-start (point buffeR)
+			   (buffer-substring fail-range-start (point buffer)
 					     buffer))
 		   (coding-system-name coding-system)))
 	  (put-range-table fail-range-start
＠＠ -603,8 +612,8 ＠＠
 	    (setq extent (make-extent fail-range-start fail-range-end buffer))
 	    (set-extent-priority extent (+ mouse-highlight-priority 2))
 	    (set-extent-face extent 'query-coding-warning-face))
-	  (skip-chars-forward skip-chars-arg end buffer))
-	(message "about to give the result, ranges %S" ranges))
+	  (skip-chars-forward skip-chars-arg end buffer)))
+      (message "about to give the result, ranges %S" ranges)
       (if failed 
 	  (values nil ranges)
 	(values t nil)))))
diff -r 1217f19ce196 -r 6812571bfcb9 lisp/unicode.el
--- a/lisp/unicode.el	Mon Jan 21 22:54:43 2008 +0100
+++ b/lisp/unicode.el	Thu Mar 13 10:21:01 2008 +0100
＠＠ -624,15 +624,20 ＠＠
   (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
          (ranges (make-range-table))
          (looking-at-arg (concat "[" skip-chars-arg "]"))
-         fail-range-start fail-range-end previous-fail char-after
-	 failed extent)
+         fail-range-start fail-range-end previous-fail char-after failed
+	 extent)
     (save-excursion
+      (when highlightp
+	(map-extents #'(lambda (extent ignored-arg)
+			 (when (eq 'query-coding-warning-face
+				   (extent-face extent))
+			   (delete-extent extent))) buffer begin end))
       (goto-char begin buffer)
       (skip-chars-forward skip-chars-arg end buffer)
       (while (< (point buffer) end)
-        (message
-         "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
-         fail-range-start previous-fail (point buffer) end)
+;        (message
+;         "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+;         fail-range-start previous-fail (point buffer) end)
         (setq char-after (char-after (point buffer) buffer)
               fail-range-start (point buffer))
         (while (and
＠＠ -646,7 +651,7 ＠＠
         (if (= fail-range-start (point buffer))
             ;; The character can actually be encoded by the coding
             ;; system; check the characters past it.
-            (forward-char 1 buffer)
+	    (forward-char 1 buffer)
           ;; Can't be encoded; note this.
           (when errorp 
             (error 'text-conversion-error

changeset:   4403:68d1ca56cffa
user:        Aidan Kehoe <kehoea(a)parhasard.net&gt;
date:        Mon Jan 21 22:51:21 2008 +0100
summary:     First part of interactive checks that coding systems encode regions.

diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/ChangeLog
--- a/lisp/ChangeLog	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/ChangeLog	Mon Jan 21 22:51:21 2008 +0100
＠＠ -1,3 +1,50 ＠＠
+2008-01-21  Aidan Kehoe  <kehoea(a)parhasard.net&gt;
+
+	* coding.el (decode-coding-string): 
+	(encode-coding-string): Accept GNU's NOCOPY argument for
+	these. Todo; write compiler macros to use it. 
+	(query-coding-warning-face): New face, to show unencodable
+	characters. 
+	(default-query-coding-region-safe-charset-skip-chars-map): 
+	New variable, a cache used by #'default-query-coding-region. 
+	(default-query-coding-region): Default implementation of
+	#'query-coding-region, using the safe-charsets and safe-chars
+	coding systemproperties. 
+	(query-coding-region): New function; can a given coding system
+	encode a given region? 
+	(query-coding-string): New function; can a given coding system
+	encode a given string? 
+	(unencodable-char-position): Function API taken from GNU; return
+	the first unencodable position given a string and coding system. 
+	(encode-coding-char): Function API taken from GNU; return CHAR
+	encoded using CODING-SYSTEM, or nil if CODING-SYSTEM would trash
+	CHAR. 
+	((unless (featurep 'mule)): Override the default
+	query-coding-region implementation on non-Mule. 
+	* mule/mule-coding.el (make-8-bit-generate-helper): Eliminate a
+	duplicate comment. 
+	(make-8-bit-choose-category): Simplify implementation. 
+	(8-bit-fixed-query-coding-region): Implementation of
+	#'query-coding-region for coding systems created with
+	#'make-8-bit-coding-system. 
+	(make-8-bit-coding-system): Initialise the #'query-coding-region
+	implementation for these character sets. 
+	(make-8-bit-coding-system): Ditto for the compiler macro version
+	of this function. 
+	* unicode.el (unicode-query-coding-skip-chars-arg): New variable,
+	used by unicode-query-coding-region, initialised in
+	mule/general-late.el. 
+	(unicode-query-coding-region): New function, the
+	#'query-coding-region implementation for Unicode coding systems. 
+	Initialise the query-coding-function property for the Unicode
+	coding systems to #'unicode-query-coding-region.
+	* mule/mule-charset.el (charset-skip-chars-string): New
+	function. Return a #'skip-chars-forward argument that skips all
+	characters in CHARSET. 
+	(map-charset-chars): Function synced from GNU, modified to work
+	with XEmacs. Map FUNC across the int value charset ranges of
+	CHARSET. 
+
 2008-01-17  Mike Sperber  <mike(a)xemacs.org&gt;

 	* files.el (insert-directory): Bind `coding-system-for-read' to
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/coding.el
--- a/lisp/coding.el	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/coding.el	Mon Jan 21 22:51:21 2008 +0100
＠＠ -125,15 +125,20 ＠＠
   (interactive "r\nP")
   (princ (detect-coding-region start end)))

-(defun decode-coding-string (str coding-system)
+(defun decode-coding-string (str coding-system &optional nocopy)
   "Decode the string STR which is encoded in CODING-SYSTEM.
-Does not modify STR.  Returns the decoded string on successful conversion."
+Normally does not modify STR.  Returns the decoded string on
+successful conversion.
+Optional argument NOCOPY says that modifying STR and returning it is
+allowed."
   (with-string-as-buffer-contents
    str (decode-coding-region (point-min) (point-max) coding-system)))

-(defun encode-coding-string (str coding-system)
+(defun encode-coding-string (str coding-system &optional nocopy)
   "Encode the string STR using CODING-SYSTEM.
-Does not modify STR.  Returns the encoded string on successful conversion."
+Does not modify STR.  Returns the encoded string on successful conversion.
+Optional argument NOCOPY says that the original string may be returned
+if does not differ from the encoded string. "
   (with-string-as-buffer-contents
    str (encode-coding-region (point-min) (point-max) coding-system)))

＠＠ -274,4 +279,204 ＠＠

 (make-compatible-variable 'enable-multibyte-characters "Unimplemented")

+;; Sure would be nice to be able to use defface here. 
+(copy-face 'highlight 'query-coding-warning-face)
+
+(defvar default-query-coding-region-safe-charset-skip-chars-map
+  #s(hash-table test equal data ())
+  "A map from list of charsets to `skip-chars-forward' arguments for
them.")
+
+(defun default-query-coding-region (begin end coding-system
+				    &optional buffer errorp highlightp)
+  "The default `query-coding-region' implementation.
+
+Uses the `safe-charsets' and `safe-chars' coding system properties.
+The former is a list of XEmacs character sets that can be safely
+encoded by CODING-SYSTEM; the latter a char table describing, in
+addition, characters that can be safely encoded by CODING-SYSTEM."
+  (check-argument-type #'coding-system-p
+                       (setq coding-system (find-coding-system coding-system)))
+  (check-argument-type #'integer-or-marker-p begin)
+  (check-argument-type #'integer-or-marker-p end)
+  (let* ((safe-charsets
+          (coding-system-get coding-system 'safe-charsets))
+         (safe-chars (coding-system-get coding-system 'safe-chars))
+         (skip-chars-arg
+          (gethash safe-charsets
+                   default-query-coding-region-safe-charset-skip-chars-map))
+         (ranges (make-range-table))
+         fail-range-start fail-range-end previous-fail char-after
+	 looking-at-arg failed extent)
+    (unless skip-chars-arg
+      (setq skip-chars-arg
+	    (puthash safe-charsets
+		     (mapconcat #'charset-skip-chars-string
+				safe-charsets "")
+		     default-query-coding-region-safe-charset-skip-chars-map)))
+    (if (and (zerop (length skip-chars-arg)) (null safe-chars))
+	(progn
+	    ;; Uh-oh, nothing known about this coding system. Fail. 
+	    (when errorp 
+	      (error 'text-conversion-error
+		     "Coding system doesn't say what it can encode"
+		     (coding-system-name coding-system)))
+	    (put-range-table begin end t ranges)
+	    (when highlightp
+	      (setq extent (make-extent begin end buffer))
+	      (set-extent-priority extent (+ mouse-highlight-priority 2))
+	      (set-extent-face extent 'query-coding-warning-face))
+	    (values nil ranges))
+      (setq looking-at-arg (if (equal "" skip-chars-arg)
+			       ;; Regexp that will never match.
+			       #r".\{0,0\}" 
+                             (concat "[" skip-chars-arg "]")))
+      (save-excursion
+	(goto-char begin buffer)
+	(skip-chars-forward skip-chars-arg end buffer)
+	(while (< (point buffer) end)
+	  (message
+	   "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+	   fail-range-start previous-fail (point buffer) end)
+	  (setq char-after (char-after (point buffer) buffer)
+		fail-range-start (point buffer))
+	  (while (and
+		  (< (point buffer) end)
+		  (not (looking-at looking-at-arg))
+		  (or (not safe-chars)
+		      (not (get-char-table char-after safe-chars))))
+	    (forward-char 1 buffer)
+	    (setq char-after (char-after (point buffer) buffer)
+		  failed t))
+	  (if (= fail-range-start (point buffer))
+	      ;; The character can actually be encoded by the coding
+	      ;; system; check the characters past it.
+	      (forward-char 1 buffer)
+            ;; Can't be encoded; note this.
+	    (when errorp 
+	      (error 'text-conversion-error
+		     (format "Cannot encode %s using coding system"
+			     (buffer-substring fail-range-start (point buffer)
+					       buffer))
+		     (coding-system-name coding-system)))
+	    (put-range-table fail-range-start
+			     ;; If char-after is non-nil, we're not at
+			     ;; the end of the buffer.
+			     (setq fail-range-end (if char-after
+						      (point buffer)
+						    (point-max buffer)))
+			     t ranges)
+	    (when highlightp
+	      (setq extent (make-extent fail-range-start fail-range-end buffer))
+	      (set-extent-priority extent (+ mouse-highlight-priority 2))
+	      (set-extent-face extent 'query-coding-warning-face)))
+	  (skip-chars-forward skip-chars-arg end buffer))
+	(if failed
+	    (values nil ranges)
+	  (values t nil))))))
+
+(defsubst query-coding-region (start end coding-system &optional buffer
+                               errorp highlight)
+  "Work out whether CODING-SYSTEM can losslessly encode a region.
+
+START and END are the beginning and end of the region to check.
+CODING-SYSTEM is the coding system to try.
+
+Optional argument BUFFER is the buffer to check, and defaults to the current
+buffer.  Optional argument ERRORP says to signal a `text-conversion-error'
+if some character in the region cannot be encoded, and defaults to nil. 
+
+Optional argument HIGHLIGHT says to display unencodable characters in the
+region using `query-coding-warning-face'. It defaults to nil.
+
+This function returns a list; the intention is that callers use use
+`multiple-value-bind' or the related CL multiple value functions to deal
+with it.  The first element is `t' if the string can be encoded using
+CODING-SYSTEM, or `nil' if not.  The second element is `nil' if the string
+can be encoded using CODING-SYSTEM; otherwise, it is a range table
+describing the positions of the unencodable characters. See
+`make-range-table'."
+  (funcall (or (coding-system-get coding-system 'query-coding-function)
+               #'default-query-coding-region)
+           start end coding-system buffer errorp highlight))
+
+(defun query-coding-string (string coding-system &optional errorp highlight)
+  "Work out whether CODING-SYSTEM can losslessly encode STRING.
+CODING-SYSTEM is the coding system to check.
+
+Optional argument ERRORP says to signal a `text-conversion-error' if some
+character in the region cannot be encoded, and defaults to nil.
+
+Optional argument HIGHLIGHT says to display unencodable characters in the
+region using `query-coding-warning-face'. It defaults to nil.
+
+This function returns a list; the intention is that callers use use
+`multiple-value-bind' or the related CL multiple value functions to deal
+with it.  The first element is `t' if the string can be encoded using
+CODING-SYSTEM, or `nil' if not.  The second element is `nil' if the string
+can be encoded using CODING-SYSTEM; otherwise, it is a range table
+describing the positions of the unencodable characters. See
+`make-range-table'."
+  (with-temp-buffer 
+    (insert string)
+    (query-coding-region (point-min) (point-max) coding-system (current-buffer)
+                         ;; ### Will highlight work here?
+                         errorp highlight)))
+
+(defun unencodable-char-position  (start end coding-system
+                                   &optional count string) 
+  "Return position of first un-encodable character in a region.
+START and END specify the region and CODING-SYSTEM specifies the
+encoding to check.  Return nil if CODING-SYSTEM does encode the region.
+
+If optional 4th argument COUNT is non-nil, it specifies at most how
+many un-encodable characters to search.  In this case, the value is a
+list of positions.
+
+If optional 5th argument STRING is non-nil, it is a string to search
+for un-encodable characters.  In that case, START and END are indexes
+in the string."
+  (flet ((thunk ()
+	   (multiple-value-bind (result ranges)
+	       (query-coding-region start end coding-system)
+	     (if result
+		 ;; If query-coding-region thinks the entire region is
+		 ;; encodable, result will be t, and the thunk should
+		 ;; return nil, because there are no unencodable
+		 ;; positions in the region.
+                 nil
+               (if count 
+                   (block counted
+                     (map-range-table
+                      #'(lambda (begin end value)
+                          (while (and (<= begin end) (<= begin count))
+                            (push begin result)
+                            (incf begin))
+                          (if (> begin count) (return-from counted)))
+                      ranges))
+                 (map-range-table
+                  #'(lambda (begin end value)
+		      (while (<= begin end)
+			(push begin result)
+			(incf begin))) ranges))
+	       result))))
+    (if string
+	(with-temp-buffer (insert string) (thunk))
+      (thunk))))
+
+(defun encode-coding-char (char coding-system)
+  "Encode CHAR by CODING-SYSTEM and return the resulting string.
+If CODING-SYSTEM can't safely encode CHAR, return nil."
+  (check-argument-type #'characterp char)
+  (multiple-value-bind (succeededp)
+      (query-coding-string char coding-system)
+    (when succeededp
+      (encode-coding-string char coding-system))))
+
+(unless (featurep 'mule)
+  ;; If we're under non-Mule, every XEmacs character can be encoded
+  ;; with every XEmacs coding system.
+  (fset #'default-query-coding-region
+	#'(lambda (&rest ignored) (values t nil)))
+  (unintern 'default-query-coding-region-safe-charset-skip-chars-map))
+
 ;;; coding.el ends here
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/general-late.el
--- a/lisp/mule/general-late.el	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/general-late.el	Mon Jan 21 22:51:21 2008 +0100
＠＠ -63,7 +63,34 ＠＠
 			      (decode-coding-string
 			       Installation-string
 			       Installation-file-coding-system)
-			    Installation-string))
+			    Installation-string)
+
+      ;; Convince the byte compiler that, really, this file can't be encoded
+      ;; as binary. Ugh.
+      system-type (symbol-value (intern "\u0073ystem-type"))
+
+      unicode-query-coding-skip-chars-arg
+      (eval-when-compile 
+        (when-fboundp #'map-charset-chars 
+          (loop
+            for charset in (charset-list)
+            with skip-chars-string = ""
+            do
+            (block no-ucs-mapping
+              (map-charset-chars
+               #'(lambda (begin end)
+                   (loop
+                     while (/= end begin)
+                     do
+                     (when (= -1 (char-to-unicode begin))
+                       (setq this-charset-works nil)
+                       (return-from no-ucs-mapping))
+                     (setq begin (int-to-char (1+ begin)))))
+               charset)
+              (setq skip-chars-string
+                    (concat skip-chars-string
+                            (charset-skip-chars-string charset))))
+            finally return skip-chars-string))))

 ;; At this point in the dump, all the charsets have been loaded. Now, load
 ;; their Unicode mappings.
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/mule-charset.el
--- a/lisp/mule/mule-charset.el	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/mule-charset.el	Mon Jan 21 22:51:21 2008 +0100
＠＠ -116,6 +116,65 ＠＠
 (defun charset-bytes (charset)
   "Useless in XEmacs, returns 1."
    1)
+
+(defun charset-skip-chars-string (charset)
+  "Given  CHARSET, return a string suitable for for `skip-chars-forward'.
+Passing the string to `skip-chars-forward' will cause it to skip all
+characters in CHARSET."
+  (setq charset (get-charset charset))
+  (cond 
+   ;; Aargh, the general algorithm doesn't work for these charsets, because
+   ;; make-char strips the high bit. Hard code them.
+   ((eq (find-charset 'ascii) charset) "\x00-\x7f")
+   ((eq (find-charset 'control-1) charset) "\x80-\x9f")
+   (t 
+    (let (charset-lower charset-upper row-upper row-lower)
+      (if (= 1 (charset-dimension charset))
+          (condition-case args-out-of-range
+              (make-char charset #x100)
+            (args-out-of-range 
+             (setq charset-lower (third args-out-of-range)
+                   charset-upper (fourth args-out-of-range))
+             (format "%c-%c"
+                     (make-char charset charset-lower)
+                     (make-char charset charset-upper))))
+        (condition-case args-out-of-range
+            (make-char charset #x100 #x22)
+          (args-out-of-range
+           (setq row-lower (third args-out-of-range)
+                 row-upper (fourth args-out-of-range))))
+        (condition-case args-out-of-range
+            (make-char charset #x22 #x100)
+          (args-out-of-range
+           (setq charset-lower (third args-out-of-range)
+                 charset-upper (fourth args-out-of-range))))
+        (format "%c-%c"
+                (make-char charset row-lower charset-lower)
+                (make-char charset row-upper charset-upper)))))))
+;; From GNU. 
+(defun map-charset-chars (func charset)
+  "Use FUNC to map over all characters in CHARSET for side effects.
+FUNC is a function of two args, the start and end (inclusive) of a
+character code range.  Thus FUNC should iterate over [START, END]."
+  (check-argument-type #'functionp func)
+  (check-argument-type #'charsetp (setq charset (find-charset charset)))
+  (let* ((dim (charset-dimension charset))
+	 (chars (charset-chars charset))
+	 (start (if (= chars 94)
+		    33
+		  32)))
+    (if (= dim 1)
+        (cond 
+         ((eq (find-charset 'ascii) charset) (funcall func ?\x00 ?\x7f))
+         ((eq (find-charset 'control-1) charset) (funcall func ?\x80 ?\x9f))
+         (t 
+          (funcall func
+                   (make-char charset start)
+                   (make-char charset (+ start chars -1)))))
+      (dotimes (i chars)
+	(funcall func
+		 (make-char charset (+ i start) start)
+		 (make-char charset (+ i start) (+ start chars -1)))))))

 ;;;; Define setf methods for all settable Charset properties

diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/mule/mule-coding.el
--- a/lisp/mule/mule-coding.el	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/mule/mule-coding.el	Mon Jan 21 22:51:21 2008 +0100
＠＠ -238,8 +238,6 ＠＠
                          (if (r0 == ,(charset-id 'ascii))
                              (write r1)
                            ((if (r0 == #xABAB)
-                                ;; #xBFFE is a sentinel in the compiled
-                                ;; program.
                                 ;; #xBFFE is a sentinel in the compiled
                                 ;; program.
 				((r0 = r1 & #x7F)
＠＠ -531,12 +529,85 ＠＠
 disk to XEmacs characters for some fixed-width 8-bit coding system.  "
   (check-argument-type #'vectorp decode-table)
   (check-argument-range (length decode-table) #x100 #x100)
-  (block category
-    (loop
-      for i from #x80 to #xBF
-      do (unless (= i (aref decode-table i))
-           (return-from category 'no-conversion)))
-    'iso-8-1))
+  (loop
+    named category
+    for i from #x80 to #xBF
+    do (unless (= i (aref decode-table i))
+	 (return-from category 'no-conversion))
+    finally return 'iso-8-1))
+
+(defun 8-bit-fixed-query-coding-region (begin end coding-system
+                                        &optional buffer errorp highlightp)
+  "The `query-coding-region' implementation for 8-bit-fixed coding systems.
+
+Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars'
+coding system properties.  The former is a hash table mapping from valid
+Unicode code points to on-disk octets in the coding system; the latter a set
+of characters as used by `skip-chars-forward'.  Both of these properties are
+generated automatically by `make-8-bit-coding-system'.
+
+See that the documentation of `query-coding-region'; see also
+`make-8-bit-coding-system'. "
+  (check-argument-type #'coding-system-p
+                       (setq coding-system (find-coding-system coding-system)))
+  (check-argument-type #'integer-or-marker-p begin)
+  (check-argument-type #'integer-or-marker-p end)
+  (let ((from-unicode
+         (coding-system-get coding-system '8-bit-fixed-query-from-unicode))
+        (skip-chars-arg
+         (coding-system-get coding-system '8-bit-fixed-query-skip-chars))
+	(ranges (make-range-table))
+        char-after fail-range-start fail-range-end previous-fail extent
+	failed)
+    (check-type from-unicode hash-table)
+    (check-type skip-chars-arg string)
+    (save-excursion
+      (goto-char begin buffer)
+      (skip-chars-forward skip-chars-arg end buffer)
+      (while (< (point buffer) end)
+        (message
+	 "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+	 fail-range-start previous-fail (point buffer) end)
+	(setq char-after (char-after (point buffer) buffer)
+	      fail-range-start (point buffer))
+	(message "arguments are %S %S"
+		 (< (point buffer) end)
+		 (not (gethash (encode-char char-after 'ucs) from-unicode)))
+	(while (and
+		(< (point buffer) end)
+		(not (gethash (encode-char char-after 'ucs) from-unicode)))
+	  (forward-char 1 buffer)
+	  (setq char-after (char-after (point buffer) buffer)
+		failed t))
+	(if (= fail-range-start (point buffer))
+	    ;; The character can actually be encoded by the coding
+	    ;; system; check the characters past it.
+	    (forward-char 1 buffer)
+	  ;; The character actually failed. 
+	  (message "past the move through, point now %S" (point buffer))
+	  (when errorp 
+	    (error 'text-conversion-error
+		   (format "Cannot encode %s using coding system"
+			   (buffer-substring fail-range-start (point buffeR)
+					     buffer))
+		   (coding-system-name coding-system)))
+	  (put-range-table fail-range-start
+			   ;; If char-after is non-nil, we're not at
+			   ;; the end of the buffer.
+			   (setq fail-range-end (if char-after
+						    (point buffer)
+						  (point-max buffer)))
+			   t ranges)
+	  (when highlightp
+	    (message "highlighting")
+	    (setq extent (make-extent fail-range-start fail-range-end buffer))
+	    (set-extent-priority extent (+ mouse-highlight-priority 2))
+	    (set-extent-face extent 'query-coding-warning-face))
+	  (skip-chars-forward skip-chars-arg end buffer))
+	(message "about to give the result, ranges %S" ranges))
+      (if failed 
+	  (values nil ranges)
+	(values t nil)))))

 ;;;###autoload
 (defun make-8-bit-coding-system (name unicode-map &optional description props)
＠＠ -618,13 +689,27 ＠＠
     (coding-system-put name '8-bit-fixed t)
     (coding-system-put name 'category 
                        (make-8-bit-choose-category decode-table))
+    (coding-system-put name '8-bit-fixed-query-skip-chars
+                       (apply #'string (append decode-table nil)))
+    (coding-system-put name '8-bit-fixed-query-from-unicode encode-table)
+
+    (coding-system-put name 'query-coding-function
+                       #'8-bit-fixed-query-coding-region)
+    (coding-system-put (intern (format "%s-unix" name))
+		       'query-coding-function
+                       #'8-bit-fixed-query-coding-region)
+    (coding-system-put (intern (format "%s-dos" name))
+		       'query-coding-function
+                       #'8-bit-fixed-query-coding-region)
+    (coding-system-put (intern (format "%s-mac" name))
+		       'query-coding-function
+                       #'8-bit-fixed-query-coding-region)
     (loop for alias in aliases
       do (define-coding-system-alias alias name))
     result))

 (define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map
 						 &optional description props)
-
   ;; We provide the compiler macro (= macro that is expanded only on
   ;; compilation, and that can punt to a runtime version of the
   ;; associate function if necessary) not for reasons of speed, though
＠＠ -674,8 +759,9 ＠＠
              ;; (invalid-read-syntax "Multiply defined symbol label" 1)
              ;;
              ;; when the file is byte compiled.
-             (case-fold-search t))
-        (define-translation-hash-table encode-table-sym ,encode-table)
+             (case-fold-search t)
+             (encode-table ,encode-table))
+        (define-translation-hash-table encode-table-sym encode-table)
         (make-coding-system 
          ',name 'ccl ,description
          (plist-put (plist-put ',props 'decode 
＠＠ -688,8 +774,22 ＠＠
                                    (symbol-value 'encode-table-sym)))
                             ',encode-program))))
 	(coding-system-put ',name '8-bit-fixed t)
-        (coding-system-put ',name 'category ',
-                           (make-8-bit-choose-category decode-table))
+        (coding-system-put ',name 'category 
+                           ',(make-8-bit-choose-category decode-table))
+        (coding-system-put ',name '8-bit-fixed-query-skip-chars
+                           ',(apply #'string (append decode-table nil)))
+        (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table)
+        (coding-system-put ',name 'query-coding-function
+                           #'8-bit-fixed-query-coding-region)
+	(coding-system-put ',(intern (format "%s-unix" name))
+			   'query-coding-function
+			   #'8-bit-fixed-query-coding-region)
+	(coding-system-put ',(intern (format "%s-dos" name))
+			   'query-coding-function
+			   #'8-bit-fixed-query-coding-region)
+	(coding-system-put ',(intern (format "%s-mac" name))
+			   'query-coding-function
+			   #'8-bit-fixed-query-coding-region)
         ,(macroexpand `(loop for alias in ',aliases
                         do (define-coding-system-alias alias
                              ',name)))
＠＠ -703,4 +803,3 ＠＠
  '(mnemonic "Latin 1"
    documentation "The most used encoding of Western Europe and the Americas."
    aliases (iso-latin-1 latin-1)))
-
diff -r e70cc8a90e90 -r 68d1ca56cffa lisp/unicode.el
--- a/lisp/unicode.el	Thu Jan 17 11:55:11 2008 +0100
+++ b/lisp/unicode.el	Mon Jan 21 22:51:21 2008 +0100
＠＠ -611,6 +611,71 ＠＠
        (translate-region start finish table))
      begin end buffer))

+(defvar unicode-query-coding-skip-chars-arg nil ;; Set in general-late.el
+  "Used by `unicode-query-coding-region' to skip chars with known
mappings.")
+
+(defun unicode-query-coding-region (begin end coding-system
+				    &optional buffer errorp highlightp)
+  "The `query-coding-region' implementation for Unicode coding systems."
+  (check-argument-type #'coding-system-p
+                       (setq coding-system (find-coding-system coding-system)))
+  (check-argument-type #'integer-or-marker-p begin)
+  (check-argument-type #'integer-or-marker-p end)
+  (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg)
+         (ranges (make-range-table))
+         (looking-at-arg (concat "[" skip-chars-arg "]"))
+         fail-range-start fail-range-end previous-fail char-after
+	 failed extent)
+    (save-excursion
+      (goto-char begin buffer)
+      (skip-chars-forward skip-chars-arg end buffer)
+      (while (< (point buffer) end)
+        (message
+         "fail-range-start is %S, previous-fail %S, point is %S, end is %S"
+         fail-range-start previous-fail (point buffer) end)
+        (setq char-after (char-after (point buffer) buffer)
+              fail-range-start (point buffer))
+        (while (and
+                (< (point buffer) end)
+                (not (looking-at looking-at-arg))
+                (= -1 (char-to-unicode char-after)))
+          (forward-char 1 buffer)
+	  (message "what?!?")
+          (setq char-after (char-after (point buffer) buffer)
+                failed t))
+        (if (= fail-range-start (point buffer))
+            ;; The character can actually be encoded by the coding
+            ;; system; check the characters past it.
+            (forward-char 1 buffer)
+          ;; Can't be encoded; note this.
+          (when errorp 
+            (error 'text-conversion-error
+                   (format "Cannot encode %s using coding system"
+                           (buffer-substring fail-range-start (point buffer)
+                                             buffer))
+                   (coding-system-name coding-system)))
+          (put-range-table fail-range-start
+                           ;; If char-after is non-nil, we're not at
+                           ;; the end of the buffer.
+                           (setq fail-range-end (if char-after
+                                                    (point buffer)
+                                                  (point-max buffer)))
+                           t ranges)
+          (when highlightp
+            (setq extent (make-extent fail-range-start fail-range-end buffer))
+            (set-extent-priority extent (+ mouse-highlight-priority 2))
+            (set-extent-face extent 'query-coding-warning-face)))
+        (skip-chars-forward skip-chars-arg end buffer))
+      (if failed
+          (values nil ranges)
+        (values t nil)))))
+
+(loop
+  for coding-system in (coding-system-list)
+  do (when (eq 'unicode (coding-system-type coding-system))
+       (coding-system-put coding-system 'query-coding-function
+			  #'unicode-query-coding-region)))
+
 (unless (featurep 'mule)
   ;; We do this in such a roundabout way--instead of having the above defun
   ;; and defvar calls inside a (when (featurep 'mule) ...) form--to have

-- 
¿Dónde estará ahora mi sobrino Yoghurtu Nghé, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

[PATCH] Add #'query-coding-region, #'query-coding-string.