Mike Fabian <mfabian(a)suse.de> writes:
Attached is a small lisp file "bug.el" to reproduce the
problem.
Execute it as follows:
mfabian@gregory:~/xemacs-bug$ LANG=C xemacs -q -batch -eval "(require
'un-define)" -l bug.el
Wrote /tmp/utf-8-output
mfabian@gregory:~/xemacs-bug$ LANG=C iconv -f utf-8 -t euc-jp /tmp/utf-8-output >
/dev/null
iconv: illegal input sequence at position 65535
mfabian@gregory:~/xemacs-bug$
and you see that one Japanese character in the output has been
destroyed.
This problem seems to occur for any big Japanese files in UTF-8. When
saving them, characters will be destroyed at file positions close to
N * 65535
bytes.
I forgot to mention the XEmacs version:
"XEmacs 21.4 (patch 8) \"Honest Recruiter\" [Lucid] (i386-suse-linux, Mule)
of Mon Sep 9 2002 on weber"
Mule-UCS 0.84 from the sumo packages from 2002-05-22.
It's a really annoying bug, because it destroys data when editing big
Japanese files in UTF-8.
;;; -*- coding: utf-8 -*-
;;; xemacs -q -batch -eval "(require 'un-define)" -l bug.el
(require 'un-define)
(setq number-of-characters 22000)
(defun reproduce-utf-8-bug ()
(interactive)
(with-temp-buffer
(set-buffer-file-coding-system 'utf-8)
(let ((i number-of-characters))
(while (> i 0)
(insert "あ") ; <--- insert Japanese hiragana 'a' in UTF-8
(setq i (1- i))))
(write-file "/tmp/utf-8-output")))
(reproduce-utf-8-bug)
--
Mike Fabian <mfabian(a)suse.de>
http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。