>>>> "Aidan" == Aidan Kehoe
<kehoea(a)parhasard.net> writes:
Aidan> To reproduce it here, XEmacs 21.5-b27 "fiddleheads"
Aidan> (+CVS-20060521) configured for `i686-pc-linux'.,
OK, I reproduced and I figured out what's happening. I suspect that
latin-unity is being stressed beyond its limits here. If you look at
the commentary on the related functions, you can see that I was
heading in the direction of getting them out of the coding-system
business as much as possible. Instead, I'd just unify (or
de-unificate, in the case of Han sets) *charsets*, and let some other
facility handle choosing an appropriate coding system. Ie, not caring
whether non-Latin charsets were present or not was more or less by
design.
Unfortunately, I don't recall how far that program proceeded (I didn't
have a clear idea at the time in any case). Whatever we do (or don't)
here is going to carry some risk. :-(
If you have any opinion, Aidan, I'd especially like to hear it, since
you've worked on the code a little bit, maybe.
The technical problem is that non-Latin charsets were inconceivable to
latin-unity. The following patch adds a signal that tells latin-unity
"whoa, hands off!" Unfortunately, this means that you can't remap a
region that contains non-Latin characters *at all*; you need to
restrict attention to Latin-only regions.
The recoding functions may need similar treatment, I haven't looked yet.
I'm almost sure this will cause annoyance to somebody, but most such
somebodies *probably* can just turn off latin-unity.
BTW, I apologize for the code ... that stuff is way too tricky. Well,
maybe not; it was gawdawful slow when I first coded it. I need to go
back and comment some of it, though.
Index: ChangeLog
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/mule-packages/latin-unity/ChangeLog,v
retrieving revision 1.45
diff -u -r1.45 ChangeLog
--- ChangeLog 3 May 2006 12:04:44 -0000 1.45
+++ ChangeLog 22 Jun 2006 09:17:03 -0000
@@ -0,0 +1,8 @@
+2006-06-22 Stephen J. Turnbull <stephen(a)xemacs.org>
+
+ * latin-unity-vars.el (latin-unity-non-latin-bit-flag): New constant.
+
+ * latin-unity.el (latin-unity-representations-present-region): Use
+ it to kludge around bug reported in
+ <m364iwq4yq.fsf(a)jerrypc.cs.usu.edu>.
+
Index: latin-unity-vars.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/mule-packages/latin-unity/latin-unity-vars.el,v
retrieving revision 1.6
diff -u -r1.6 latin-unity-vars.el
--- latin-unity-vars.el 15 Feb 2005 22:22:48 -0000 1.6
+++ latin-unity-vars.el 22 Jun 2006 09:17:03 -0000
@@ -150,14 +150,17 @@
"Bit vector representing the set of all Latin character sets.")
;; put the character set indicies and flag bits in reasonable places
-(let ((index 1) (bit 1))
- (if (> (length latin-unity-character-sets) 25)
- (error "representation too small to support so many charsets!"))
- (mapcar (lambda (cs)
- (put cs 'latin-unity-flag-bit bit)
- (put cs 'latin-unity-index index)
- (setq bit (lsh bit 1)
- index (1+ index)))
- latin-unity-character-sets))
+(defconst latin-unity-non-latin-bit-flag
+ (let ((index 1) (bit 1))
+ (if (> (length latin-unity-character-sets) 25)
+ (error "representation too small to support so many charsets!"))
+ (mapcar (lambda (cs)
+ (put cs 'latin-unity-flag-bit bit)
+ (put cs 'latin-unity-index index)
+ (setq bit (lsh bit 1)
+ index (1+ index)))
+ latin-unity-character-sets)
+ bit)
+ "A bit-flag indicating charsets not handled by latin-unity.")
;;; end of latin-unity-vars.
Index: latin-unity.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/mule-packages/latin-unity/latin-unity.el,v
retrieving revision 1.14
diff -u -r1.14 latin-unity.el
--- latin-unity.el 2 May 2006 21:58:12 -0000 1.14
+++ latin-unity.el 22 Jun 2006 09:17:03 -0000
@@ -469,21 +469,23 @@
(goto-char (point-min))
(while (not (eobp))
(let* ((ch (char-after))
- (cs (car (split-char ch))))
+ (cs (car (split-char ch)))
+ (flag (get cs 'latin-unity-flag-bit 0)))
(cond
((eq cs 'ascii)
(setq skipchars (concat "\000-\177" skipchars))
- (setq asets (logior (get cs 'latin-unity-flag-bit 0) asets)))
+ (setq asets (logior flag asets)))
((eq cs 'latin-jisx0201)
;; #### get this someday
;;(setq skipchars (concat skipchars latin-unity-latin-jisx0201))
(setq skipchars (concat skipchars (list ch)))
- (setq asets (logior (get cs 'latin-unity-flag-bit 0) asets)))
+ (setq asets (logior flag asets)))
(t
;; #### actually we can do the whole charset here
;; precompute and set a property on the cs symbol
(setq skipchars (concat skipchars (list ch)))
- (setq lsets (logior (get cs 'latin-unity-flag-bit 0) lsets)))))
+ (when (= flag 0) (setq lsets (logior latin-unity-non-latin-bit-flag lsets)))
+ (setq lsets (logior flag lsets)))))
;; The characters skipped here can't change asciisets
(skip-chars-forward skipchars))))
(cons lsets asets)))
--
School of Systems and Information Engineering
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.