i'm starting to make changes to implement the unicode-internal support.
some of the lisp primitives related to characters will need changes.
here are my ideas; please comment.
char-charset, char-octet and split-char will remain in a unicode world
but take an extra optional argument -- a charset precedence list, same
as the current `unicode-to-char'; ignored in a mule-internal world.
i'm thinking of making make-char be a polymorphic function -- either it
takes a charset and octets, as currently, or it takes a unicode
codepoint and an optional charset precedence list (ignored in a
unicode-internal world), like the current `unicode-to-char' (which would
be eliminated). this is not very lispy but it seems preferable to
having the name `make-char' not be the obvious way to make a character.
alternatives are to leave the make-char/unicode-to-char split or to
rename unicode-to-char to make-unicode-char.
also, in a unicode-internal world, make-char can return nil, if no
unicode equivalent exists.
char-to-unicode should probably remain; but should return nil, not -1,
in a mule-internal world if no unicode equivalent exists.
int-to-char and char-to-int always convert between chars and internal
codes, same as current. in a unicode-internal world, that is simply the
unicode codepoint.
internally, some concept like "leading byte" will remain but will simply
be an arbitrary charset index. more than 256 charsets can exist, and
charsets should be added for things like `windows-1252'. for charsets
like these, the octet in `make-charset' need not be in the iso2022 range
of 32-127 (or equivalently, 160-255) -- windows-1252 defines various
weird chars in the range 128-159. we should also add big5, shift-jis
and the like.
there should also be functions to convert directly between unicode
codepoints and charset codepoints, without the need to go through a
char. maybe `charset-codepoint-to-unicode' and
`unicode-to-charset-codepoint' (returning a list, like `split-char').
ideas for better names? do i have the terminology correct?
ben