Ar an dara lá is fiche de mí Bealtaine, scríobh Jerry James:
On Mon, May 12, 2014 at 10:45 AM, Jerry James
<james(a)xemacs.org> wrote:
> $ iconv -f iso-2022-jp -t utf-8 skk.texi > skk.texi.utf8
> iconv: illegal input sequence at position 154746
>
> That is line 4779 of skk.texi, right after the "W:" if I am counting
> characters correctly. Any advice on how to proceed?
I've got a framework in place that seems to generate HTML from the
info files correctly, except for this problem. It's not just iconv
that is confused by skk.texi, either; XEmacs 21.5 also has problems
with it.
Would one of you who reads Japanese please take a look at lines 4771
and 4780 of skk.texi? There is something wrong with those two lines.
If I use XEmacs itself to convert this file to UTF-8, then makeinfo
still chokes on the results, complaining:
The characters on line 4779 that iconv chokes on are all encoded in JISX
0208, details from #'split-char as so:
((japanese-jisx0208 34 49) (japanese-jisx0208 34 81) (japanese-jisx0208 34 113))
Converted to ku/ten (row/column) indexes, these are:
2 17
2 49
2 81
Looking at page 814 of Ken Lund’s CJKV Information Processing, second
edition (2002), these all have no mappings, they are blank spaces. They also
don’t have mappings in our japanese-jisx0208-1978 character set.
The relevant text in skk.texi is describing how to enter characters by means
of their ku/ten code (type \ within the japanese-skk input method), and
indeed, if I type C-u C-\ japanese-skk RET \ RET XEmacs prints that line
(4779) in the minibuffer, unencodable characters and all.
It’d be reasonable to use replacement characters, something like U+FFFD, for
them in skk.texi, and iconv and makeinfo shouldn’t choke on those. U+3013 is
in JISX0208 and is intended for this purpose, as I understand it. To be
clear, the characters to replace are those after W:, R: and Y:.
I’m not confident that there will be problems with lines 4771 and 4780 of
skk.texi once line 4779 is fixed. Give it a shot.
I’m also irritated the Unicode coding systems don’t error more cleanly on
this, I should have noticed and fixed this long ago. Oh well.
Thanks for the work, Jerry!
utf8 "\xE04576" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
utf8 "\xE04596" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
utf8 "\xE045B6" does not map to Unicode at
/usr/share/texinfo/Texinfo/Parser.pm line 1909, <FH> line 4771.
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
skk.texi:4780: misplaced {
skk.texi:4780: misplaced }
make[2]: *** [texi/skk_toc.html] Error 1
And reloading the UTF-8 version of the file in XEmacs results in a
bunch of weird characters showing up. I have no idea what should be
on those lines, so I need some help to proceed. Thanks,
--
‘Liston operated so fast that he once accidentally amputated an assistant’s
fingers along with a patient’s leg, […] The patient and the assistant both
died of sepsis, and a spectator reportedly died of shock, resulting in the
only known procedure with a 300% mortality.’ (Atul Gawande, NEJM, 2012)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-patches