Hi,
Here’s some code to generate the various East Asian character set mappings
that have been deprecated by
Unicode.org, taking the Han mappings from
Unihan.txt.
It uses Microsoft’s code pages as the source for most of the non-Han
mappings, because there are no officially sanctioned Unicode mappings for
these characters, and because Apple’s mappings tend to be one (national
standard character) to many (Unicode characters). For CNS 11643 and JIS
0212, I’ve taken the non-Han data from the obsolete files; there doesn’t
appear to be anything arguable about it, and none of the vendor coding
systems seems to have the data (to my knowledge--anyone know of any vendor
equivalent to EUC-TW or JIS 0212’s normal coding system?)
Possible issues; it doesn’t use any of the Big 5 extensions, Microsoft’s
mappings may not be the ideal source for the non-Han mappings, no testing
for multiple mappings is done. (I’ve already written this, and can add it if
it’s an issue.) Available at
http://parhasard.net/eastasia-ucs-mappings.el
if the attachment breaks.
Best regards,
- Aidan Kehoe
--
“Ah come on now Ted, a Volkswagen with a mind of its own, driving all over
the place and going mad, if that’s not scary I don’t know what is.”