unicode-internal status
Ben Wing
benwing666 at gmail.com
Tue Nov 15 20:40:56 EST 2005
On 11/15/05, Aidan Kehoe <kehoea at parhasard.net> wrote:
>
>
> Ar an ceathrú lá déag de mí na Samhain, scríobh Ben Wing:
>
> > it now builds and runs when compiled without UNICODE_INTERNAL; this is a
> > sign that the new char tables work, the new char-conversion system
> > basically works, etc.
> >
> > with UNICODE_INTERNAL i've encountered the dreaded
> > "iso2022-preservation" problem -- i.e. byte-compiling an iso2022-encoded
> > lisp file is unlikely to work without further hacking. converting them
> > to utf-8 is not an option since in many cases the lisp code depends on
> > having the proper charset encoded in the actual chars in the file, at
> > least in an old-Mule world. so i'm [a] trying to remove this
> > assumption,
>
> It's not _that_ widespread, since non-Mule XEmacs wouldn't honour it, as I
> understand it.
btw "remove this assumption" means using calls to (make-char ...) instead of
literal characters.
> [b] putting in hacks to make byte-compilation iso2022-preserving, using
> > private unicode chars.
>
> There also arises the problem of what to do when you have Unicode-encoded
> source code with characters that don't have a mapping in Mule. For
> example,
> I'm studying a language of the former Soviet Union which uses more
> Cyrillic
> characters than are available in ISO 8859-5, and my local UTF-code-points
> preserving code[1] will not allow my hacked input method to be reliably
> byte-compiled. The byte-compiled code uses escape-quoted as its coding
> system, which, although it is in general a universal coding system for
> Mule,
> breaks down if you have just-in-time allocation like the implementation
> linked below does, as different invocations will have different internal
> Unicode code point mappings.
>
> I suppose there are three cases we care about --
>
> -- The source code sticks to ASCII. In this case, don't specify a coding
> system in the compiled file, and we don't have a problem.
>
> -- The source code uses ISO 2022 escape sequences. In this case, we should
> specify escape-quoted as the coding system in the compiled file, because
> ISO
> 2022 encodes more characters than Unicode does, because we want the
> assumptions about character sets mentioned by you above to be preserved,
> and
> because it's more compatbile with old implementations of Mule.
>
> I consider files with the existing eight-bit coding systems a subset of
> this case, because that behaviour is more compatible with old Mule.
>
> -- The source code uses a Unicode coding system. In this case, specify
> utf-8
> as the coding system in the compiled file. This will be incompatible with
> older implementations of Mule; but hopefully with the advantage that they
> will choke obviously, and that they will preserve non-Mule characters
> stored
> in the source file.
>
> [1] http://list-archive.xemacs.org/xemacs-patches/200506/msg00040.html
so i've also created some unicode charsets for old-mule; these cover
everything up through 0x2FFFF and should take care of your problem to some
extent. (i rearranged the handling of leading bytes so any private leading
byte can be for charsets of any size, so less issues with running out of
charset space.) if i assign appropriate ISO2022 final bytes, or implement
the ISO2022 extension mechanism, these can be reliably stored in
byte-compiled code. (but this problem only arises if the source .el file is
in utf-8, *and* we use escape-quoted for the .elc file, *and* you have
literal characters embedded in your .el file. are these all true?) the use
of the extension mechanism is more standard and less hackish but with the
former method, current 21.5 versions of xemacs can at least load the files
without puking, since they create unknown iso charsets on the fly. probably
we don't really care about such compatibility in any case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://calypso.tux.org/pipermail/xemacs-beta/attachments/20051115/a69b5809/attachment.html
More information about the XEmacs-Beta
mailing list