unicode-internal status
Aidan Kehoe
kehoea at parhasard.net
Tue Nov 15 08:07:43 EST 2005
Ar an ceathrú lá déag de mí na Samhain, scríobh Ben Wing:
> it now builds and runs when compiled without UNICODE_INTERNAL; this is a
> sign that the new char tables work, the new char-conversion system
> basically works, etc.
>
> with UNICODE_INTERNAL i've encountered the dreaded
> "iso2022-preservation" problem -- i.e. byte-compiling an iso2022-encoded
> lisp file is unlikely to work without further hacking. converting them
> to utf-8 is not an option since in many cases the lisp code depends on
> having the proper charset encoded in the actual chars in the file, at
> least in an old-Mule world. so i'm [a] trying to remove this
> assumption,
It’s not _that_ widespread, since non-Mule XEmacs wouldn’t honour it, as I
understand it.
> [b] putting in hacks to make byte-compilation iso2022-preserving, using
> private unicode chars.
There also arises the problem of what to do when you have Unicode-encoded
source code with characters that don’t have a mapping in Mule. For example,
I’m studying a language of the former Soviet Union which uses more Cyrillic
characters than are available in ISO 8859-5, and my local UTF-code-points
preserving code[1] will not allow my hacked input method to be reliably
byte-compiled. The byte-compiled code uses escape-quoted as its coding
system, which, although it is in general a universal coding system for Mule,
breaks down if you have just-in-time allocation like the implementation
linked below does, as different invocations will have different internal
Unicode code point mappings.
I suppose there are three cases we care about --
-- The source code sticks to ASCII. In this case, don’t specify a coding
system in the compiled file, and we don’t have a problem.
-- The source code uses ISO 2022 escape sequences. In this case, we should
specify escape-quoted as the coding system in the compiled file, because ISO
2022 encodes more characters than Unicode does, because we want the
assumptions about character sets mentioned by you above to be preserved, and
because it’s more compatbile with old implementations of Mule.
I consider files with the existing eight-bit coding systems a subset of
this case, because that behaviour is more compatible with old Mule.
-- The source code uses a Unicode coding system. In this case, specify utf-8
as the coding system in the compiled file. This will be incompatible with
older implementations of Mule; but hopefully with the advantage that they
will choke obviously, and that they will preserve non-Mule characters stored
in the source file.
[1] http://list-archive.xemacs.org/xemacs-patches/200506/msg00040.html
--
“I, alone, perhaps, in this city of nearly two million, view it with sadness,
sympathy, and respect, seeing in the millions of Russian youngsters who laid
down their lives in that war a tragedy rising above all the political
emotions of that time [...]” -- George Kennan on Vienna's „Erbsenkönig“
More information about the XEmacs-Beta
mailing list