>>>> "Mike" == Mike Fabian
Mike> LANG=ja_JP xemacs -q -vanilla kanji.euc-iso
>>>> "Stephen" == Stephen J Turnbull
Stephen> I've reproduced this up to the Lisp backtrace in 21.1
Stephen> (patch 12) "Channel Islands" plus some CVS updates. I
Stephen> should be able to take a close look at it later today.
Stephen> It's not obvious to me what's happening here though, so I
Stephen> can't promise a quick fix.
Oh, boy, are things fxxked up here.
What is happening is that the presence of undesignated characters from
GR followed by the ISO-2022 escape sequences causes Mule to
auto-detect the coding category as 'iso-lock-shift, and then the
coding system itself is set to 'iso-2022-lock-unix. There are some
bugs in the implementation of this coding system, such that characters
which are represented as negative Emchars by MAKE_CHAR are generated by
decode-coding-region. This should not happen; I believe the range of
Emchars is still only 19 bits, so with a 30-bit character
representation there's no excuse for wrap-around. :-(
I don't understand why this happens yet. I will have some kind of
patch in about two days (heavy class load next two days); either I'll
make the "safe" coding-priority-list the default and document the
problem, or I'll have a real fix. If somebody else can do something
useful in the interim, I'd be much obliged!
A work-around is to put
(defun make-coding-priority-safe ()
"Give `no-conversion' higher priority than some buggy coding categories.
`iso-lock-shift' is known to cause crashes in a Japanese environment in
certain situations, and `iso-8-designate' is rare and perhaps also not
to be trusted."
(let* ((buggy '(iso-lock-shift iso-8-designate))
(cpl (delq 'no-conversion (coding-priority-list)))
(while (and (cdr cpl)
(not (memq (car (cdr cpl)) buggy)))
(setq cpl (cdr cpl)))
(setcdr cpl (cons 'no-conversion (cdr cpl)))
;; Not very safe; should check for current coding system and buffer
;; change status etc.
(defun convert-buffer-using-coding-system (coding-system)
"Convert the whole buffer according to CODING-SYSTEM.
Should only be called on an unnarrowed binary buffer with a known
external encoding. Any other use will have undefined results."
(interactive "SCoding system: ")
(if (coding-system-p coding-system)
(decode-coding-region (point-min) (point-max) coding-system)
(error "You bozo! Try again, with a REAL coding system this time.")))
in ~/.emacs and to call it any time you change your language
environment or the function `set-coding-priority-list' by hand. Those
are the only ways I know of that coding-priority-list changes.
The file in question will be left in binary form.
`convert-buffer-using-coding-system' is a convenience function to do
the conversion at user request. In Mike's case, use of 'euc-jp gives
appropriate results. You can do this at any time, even after editing
the buffer, as long as you have not screwed up the external encoding
(eg, by altering an escape sequence or changing or deleting one of the
bytes in a multibyte character).
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."