XEmacs 21.5.x cannot display U+FFFD correctly. In "normal"
text files, a wrong glyph is shown, in web-pages viewed
with w3m.el only garbage Chinese characters are shown after
the first occurence of U+FFFD.
The problem seems to be that XEmacs maps U+FFFD to Big5:
(split-char (string-to-char (decode-coding-string "\357\277\275" 'utf-8)))
=> (chinese-big5-1 35 110)
and the reason for this seems to be that BIG5.TXT in the XEmacs
sources (which comes originally from
Unicode.org) maps several Big5
characters to U+FFFD.
# WARNING! It is currently impossible to provide round-trip
compatibility
# between BIG5 and Unicode.
#
# A number of characters are not currently mapped because
# of conflicts with other mappings. They are as follows:
#
# BIG5 Description Comments
#
# 0xA15A SPACING UNDERSCORE duplicates A1C4
# 0xA1C3 SPACING HEAVY OVERSCORE not in Unicode
# 0xA1C5 SPACING HEAVY UNDERSCORE not in Unicode
# 0xA1FE LT DIAG UP RIGHT TO LOW LEFT duplicates A2AC
# 0xA240 LT DIAG UP LEFT TO LOW RIGHT duplicates A2AD
# 0xA2CC HANGZHOU NUMERAL TEN conflicts with A451 mapping
# 0xA2CE HANGZHOU NUMERAL THIRTY conflicts with A4CA mapping
#
# We currently map all of these characters to U+FFFD REPLACEMENT
CHARACTER.
# It is also possible to map these characters to their
duplicates, or to
# the user zone.
To verify this, I made the attached patch which comments out
all lines which map Big5 characters to U+FFFD.
With this patch, U+FFFD is displayed correctly in plain text files
(tested with an Xft build of XEmacs and a suitable font which has
a glyph for U+FFFD).
With this patch, web-pages containing U+FFFD are "mostly" correctly
displayed with w3m.el. "mostly" because U+FFFD is displayed as '???'
(3 question marks) in that case which is still not correct. But the
rest of such web-pages displays correctly.
I'm not sure whether commenting out the lines in BIG5.TXT which
map characters to U+FFFD is the right solution.
Maybe these characters should be mapped to the user zone instead
as suggested in the comment at the top of BIG5.TXT?
But at least this patch should help to illustrate where the problem
comes from.
For more details, please have a look at
http://bugzilla.novell.com/show_bug.cgi?id=293109
--
Mike FABIAN <mfabian(a)suse.de>
http://www.suse.de/~mfabian
睡眠不足はいい仕事の敵だ。
I � Unicode
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta