About making elisp files UTF-8

Saturday, 11 December 2004

        Stephen, you mentioned this as part of the 'must have for 22.0' stuff.  The
idea is that non-Mule should be able to compile Mule files.

I agree this would be good, and we have all the mechanism in place to do
UTF-8 conversion.

There's a few problems, though --

[1] what do we do about translating the stuff into the internal format when
we read it in?  We have the problem of which charset to choose. [something
that would go away of course if we just went whole hog to Utf-8 internally;
but we'd still have the (potential) issue of not distinguishing the CJK
characters and maybe needing to use extents to keep this info and maybe
language tags; Stephen please comment on this as well]

If we don't go whole-hog to internal UTF-8, then what?  do we just stick
language tags in?  If so, in order to make sure things work in the
non-Mule-compiles-Mule case we may well have to put a language tag in at the
beginning of *every* transition from ASCII to non-ASCII.

[2] What about character objects?  Strings are no problem but a character
object is listed in a .ELC file as a ? plus a series of bytes that resolves
to a single Mule character.  To handle this we'd have to [a] add a UTF-8
decoder to the non-Mule build [not a big deal]; and [b] extend characters in
non-Mule to be big enough to hold a whole Mule character -- then we still
have to deal with whatever solution we come up with for [1] [e.g. if we use
language tags then (i) the non-Mule UTF-8 parser needs to recognize them and
then generate them again when the character is outputted (ii) it needs to
have a way of encoding the particular language in the upper bits of a
character].

Comments?  Sounds like it might just be better to go ahead and switch to
UTF-8 internal and be done with it.

Stephen [once again], what in your opinion are the big issues connected to
this, including but not limited to the CJK language-preservation issue?

Also, someone [maybe Stephen], what's going on in FSF land w.r.t. Unicode
support in GNU Emacs?  What specifically are they planning, and how far are
they along?  Is anyone in communication with them to try and ensure that
their API's look like ours, or are they just going to be incompatible AGAIN?
If not, shouldn't we be in communication with them?

ben

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003