Jerry James writes:
On Mon, May 12, 2014 at 10:45 AM, Jerry James
<james(a)xemacs.org> wrote:
> $ iconv -f iso-2022-jp -t utf-8 skk.texi > skk.texi.utf8
> iconv: illegal input sequence at position 154746
>
> That is line 4779 of skk.texi, right after the "W:" if I am counting
> characters correctly. Any advice on how to proceed?
This is a character from JIS X 0213 which merged JIS X 0208 and JIS X
0212 and added some characters in previously undefined code points in
2000, and then was updated in 2004. The three untranslatable
characters corresponding to menu choices W, R, and Y in that line are
W: JIS 0x2231 = U+FF0D HYPHEN-MINUS (compatibility for U+002D)
R: JIS 0x2251 = U+2295 CIRCLED PLUS
Y: JIS 0x2271 = U+2194 LEFT RIGHT ARROW
This is a bug in Japan, a country whose inhabitants have no respect
for standards (or traffic laws, for that matter). The problem is that
skk.texi uses the escape sequence for JIS X 0208, and then uses the
new characters from a different, not quite compatible, coded character
set. Grrrr.
I think the easiest way to fix this file is to use XEmacs to generate
a (partly corrupt) UTF-8 file and then backpatch in the correct
characters.
Try this patch on the file generated by XEmacs conversion to UTF-8:
utf8 "\xE04576" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
utf8 "\xE04596" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
utf8 "\xE045B6" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
I don't understand the reference to line 4771? The only "interesting"
character on line 4771 is U+2103 DEGREE CELSIUS, but that also occurs
on line 4770. Also, note the the three erroneous characters are three
different characters. I assume this refers to line 4779, after all.
The fact that the three characters are differ from each other by 32,
just as the JIS characters do suggests to me that this is some kind of
internal XEmacs trick to deal with unknown characters that is leaking
into the file being written (and creating syntactically invalid UTF-8
in the process, a bad bug).
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
make[2]: *** [texi/skk_toc.html] Error 1
If this doesn't get fixed by fixing the previous error, I have no idea
what's going on. I can't see any misplaced brackets.
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-patches