Uwe Brauer writes:
In case of ISO-2022-7bit for example ("qj" ?^[%Gױ^[%＠)
This isn't ISO-2022-7bit, this is 8bit (UTF-8, in fact) with the DOCS
control sequence. Effectively it says "ISO 2022 can't handle this, so
we'll talk UTF-8 for a while".
It seems to be that ISO-2022-7bit can handle all that UTF8(16) can
As far as I know, that has not been even close to true for a while.
Most of the more recent additions to Unicode do not have ISO-
registered coded character sets, so you need the above "include UTF-8
in the ISO 2022 stream" workaround.
and even UTF8 has been criticized for having shortcoming for
example for thai.
If so, that's the Thai government's fault, not that of Unicode.
Unicode is basically a consortium of national standards bodies. It
does not decide which individual characters belong in Unicode, only
whether a proposed set is actually a new character set, and where to
put those characters not already in Unicode.
So why was UTF-8(16) chosen over ISO-2022-7bit?
*Much* simpler and more portable. Unicode is a *single* extensible
character set. It is easy to make a (possibly ugly, but probably
readable) font for it, and once that is done, *any* Unicode
implementation can display it. (Doing a good job for some scripts
like Arabic or Thai is quite hard, but that has nothing to do with
Unicode vs. ISO 2022.) If you have a character set that you want to
implement that isn't in Unicode already, make a font and use some of
the several hundred thousand character codes reserved for "private
use". Very easy!
ISO 2022, on the other hand, is a framework for *switching* character
sets. In order to implement ISO 2022 you need to know the code that
designates each character set, and you have to have a font for the
specific set of codes (which may differ for the same character in
different sets). XEmacs faces an even bigger problem, which is that
the set of codes for designating character sets internally is *very*
limited (only a few dozen), and they are all already in use. GNU
chose to use private codes for some standard sets, so they had ten
more, but even so they were about to run out when they switched to
XEmacs-Beta mailing list