>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> it's the other way around. the current situation is
Ben> incompatible with the X11 fonts, so we have to hack the
Ben> values using the bogus `graphic' characteristic.
Nothing bogus about it, it's perfectly (ISO 2022) standard. The X
fonts' use of character codes as font indexes where they fit into the
appropriate space is perfectly reasonable (although I've never
actually looked closely at an ISO-8859 font, thus the confusion on my
part, mea culpa), but is not of any particular interest here. Note
that multibyte fonts (like Japanese) do assume font indices in the
32-127 range (usually 33-126, actually).
Ben> note that in the new world, charsets can have values > 127 in
Ben> any case. cf. big5, shift-jis, etc.
Please stop abusing the word "charset" for an object that is only
well-defined in a workspace I have no access to. It's confusing you,
too, it would seem.
Ben> so when i'm creating a new charset like `latin-windows-1252',
Ben> which is compatible with iso-8859-1 but has extra chars in
Ben> the range 128-159, do i do the right thing and have its chars
Ben> in the range 128-255 be indexed as 128-255 (and hence be
Ben> inconsistent with the `latin-iso8859-1' charset), or do i do
Ben> the wrong thing and move its range down to 0-127? and then
Ben> it appears to have ascii control chars in the range 0-31, but
Ben> they aren't control chars, value 10 is not linefeed, value 13
Ben> is not cr, etc.?
You're thinking in terms of Mule charsets. Don't, it's no help.
Those values should _never_ _ever_ appear in a context where they
could be confused with characters.
We don't need named coded character sets internally, we don't need to
associate random octets with charsets to make characters. (Except for
backward compatibility, where backward is spelled P E R V E R S E.)
We only need subsets of Unicode. Abstractly, characters from internal
text (LISP characters, strings, and buffers) should only ever be
mapped to their Unicode values, and then from Unicode to external
coding systems for I/O.
If you're worrying about the practical problems of mapping Unicode
characters to font indicies, please don't bother. It's a practical
problem, yes, but you aren't going to enforce sanity on fonts by
perpetuating the charset bogosity. For now, _any old hack_ will do,
just get glyphs on the screen for the fonts you use. As long as the
API looks like a table and handles two-byte indicies, we can
generalize and optimize the internals for space later, if we even need
to.
I agree, we'll need named font index tables (a la Cmaps). We've
already got the tables in etc/unicode. Give them their Unicode names,
provide an aliasing mechanism, add an xemacs vendor directory, and put
anything we need that we don't already have in there.
Footnotes:
[1] If you read this as an anti-Microsoft rant, you're missing the
point.
--
School of Systems and Information Engineering
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.