Texinfo file encodings

Thursday, 15 May 2014

        I'm looking at how to get readable output from texi2any --html, by
hooking iconv into the pipeline.  We have some ... interesting ...
encoding situations.  For example, in mule-packages/mule-base/texi, we
have several files encoded with ISO-2022-JP, but canna-jp.texi appears
to be encoded with EUC-JP, and languages.texi appears to contain
multiple encodings (e.g., ISO-2022-JP-2 on line 96, something I
haven't been able to figure out on line 97, ISO-2022-KR on lines
461-465, etc.).

I tried to look at some of the info files with XEmacs, and found that
the info files did not survive being processed by makeinfo.  The texi
files show non-ASCII character okay when opened with XEmacs, but the
info files just show ASCII garbage.  (Raw ISO-2022 sequences maybe?)
Do others of you see that?  Try looking at the skk or mule-base info
files.  I wonder if this is because modern Linux distributions default
to UTF-8 as the encoding for everything.

I'm not sure how to handle this.  Any suggestions on how to proceed
would be much appreciated.

By the way, when I couldn't figure out an encoding, I did this:

for enc in $(iconv -l); do
  echo -n "${enc%//}: "
  sed -n 97 languages.texi | iconv -f ${enc%//} -t utf-8
done

and looked for output that didn't contain complaints about bad byte
sequences and that looked like it might reasonably be correct.  (I
have unifont installed, so all valid Unicode characters should be
displayable, if ugly.)  Even with this approach, line 97 of
mule-base/texi/langauges.texi remains a mystery to me.
-- 
Jerry James
http://www.jamezone.org/

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Texinfo file encodings