Ar an cúigiú lá is fiche de mí Lúnasa, scríobh Aidan Kehoe:
[...]
+ ;; Make them available to user code.
+ (defvar unicode-error-sequence-zero
+ (aref (decode-coding-string "\xd8\x00\x01\x00" 'utf-16-be) 3)
+ "The XEmacs character representing an invalid zero octet in Unicode.
+
+Subtract this character from each XEmacs character in an invalid sequence to
+get the octet on disk. E.g.
+
+\(- (aref (decode-coding-string ?\\x80 'utf-8) 0)
+ unicode-error-characters-zero)
+=> ?\\x80
+
+You can search for invalid sequences using
+`unicode-error-sequence-regexp-range', which see. ")
[...]
Note to everyone; this doesn’t work, since the integer values of the
relevant characters are not numerically contiguous. For example, for me on
this build,
(decode-coding-string "\xD0" 'utf-8)
gives (U+2000D0 jit-ucs-charset-0 35 32), numerical value 1069472, while
(decode-coding-string "\xCF" 'utf-8)
gives (U+2000CF jit-ucs-charset-0 34 127), numerical value 1069439, 33
numeric values apart instead of one.
The approach I intend to take to fix this is to create a char table mapping
from the error octets to characters with the on-disk values, and advise
using this char table with translate-region to get the octets.
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches