Re: BUG: Japanese UTF-8 files > 65 Kbyte get corrupted during saving

Saturday, 28 September 2002

        Mike Fabian <mfabian(a)suse.de&gt; writes:

...
 Attached is a small lisp file "bug.el" to reproduce the
problem.

 Execute it as follows:

     mfabian＠gregory:~/xemacs-bug$ LANG=C xemacs -q -batch -eval "(require
'un-define)" -l bug.el
     Wrote /tmp/utf-8-output
     mfabian＠gregory:~/xemacs-bug$ LANG=C iconv -f utf-8 -t euc-jp /tmp/utf-8-output >
/dev/null
     iconv: illegal input sequence at position 65535
     mfabian＠gregory:~/xemacs-bug$

 and you see that one Japanese character in the output has been
 destroyed.

 This problem seems to occur for any big Japanese files in UTF-8.  When
 saving them, characters will be destroyed at file positions close to

     N * 65535

 bytes. 
I forgot to mention the XEmacs version:

"XEmacs 21.4 (patch 8) \"Honest Recruiter\" [Lucid] (i386-suse-linux, Mule)
of Mon Sep  9 2002 on weber"

Mule-UCS 0.84 from the sumo packages from 2002-05-22.

It's a really annoying bug, because it destroys data when editing big
Japanese files in UTF-8.

...
 ;;; -*- coding: utf-8 -*- 

 ;;; xemacs -q -batch -eval "(require 'un-define)" -l bug.el

 (require 'un-define)

 (setq number-of-characters 22000)

 (defun reproduce-utf-8-bug ()
   (interactive)
   (with-temp-buffer
     (set-buffer-file-coding-system 'utf-8)
     (let ((i number-of-characters))
       (while (> i 0)
 	(insert "あ")  ; <--- insert Japanese hiragana 'a' in UTF-8 
 	(setq i (1- i))))
     (write-file "/tmp/utf-8-output")))

 (reproduce-utf-8-bug) 
-- 
Mike Fabian   <mfabian(a)suse.de&gt;   http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: BUG: Japanese UTF-8 files > 65 Kbyte get corrupted during saving