I'm looking at how to get readable output from texi2any --html, by
hooking iconv into the pipeline. We have some ... interesting ...
encoding situations. For example, in mule-packages/mule-base/texi, we
have several files encoded with ISO-2022-JP, but canna-jp.texi appears
to be encoded with EUC-JP, and languages.texi appears to contain
multiple encodings (e.g., ISO-2022-JP-2 on line 96, something I
haven't been able to figure out on line 97, ISO-2022-KR on lines
461-465, etc.).
I tried to look at some of the info files with XEmacs, and found that
the info files did not survive being processed by makeinfo. The texi
files show non-ASCII character okay when opened with XEmacs, but the
info files just show ASCII garbage. (Raw ISO-2022 sequences maybe?)
Do others of you see that? Try looking at the skk or mule-base info
files. I wonder if this is because modern Linux distributions default
to UTF-8 as the encoding for everything.
I'm not sure how to handle this. Any suggestions on how to proceed
would be much appreciated.
By the way, when I couldn't figure out an encoding, I did this:
for enc in $(iconv -l); do
echo -n "${enc%//}: "
sed -n 97 languages.texi | iconv -f ${enc%//} -t utf-8
done
and looked for output that didn't contain complaints about bad byte
sequences and that looked like it might reasonably be correct. (I
have unifont installed, so all valid Unicode characters should be
displayable, if ugly.) Even with this approach, line 97 of
mule-base/texi/langauges.texi remains a mystery to me.
--
Jerry James
http://www.jamezone.org/
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta