Stephen J. Turnbull wrote:
>>>>>"Ben" == Ben Wing <ben(a)xemacs.org>
Ben> well, there are some fonts that have ...-devanagari-10646-... in them,
Ben> which are presumably unicode.
Yeah, I know, but I can't get them to work (haven't tried in a while,
though, because I'm using Xft almost all the time now). AFAICT the
standard X11R6 functions didn't handle them at all, and the XFree86
extensions wanted UTF-8 and did something magic internally.
theoretically it should just use straight UCS-2, no?
Basically I think the way to go is fontconfig/Xft.
Ben> i need some info on how X does it before thinking about
Ben> doing it.
Xft simply does Unicode, period, AFAICS. (There may be an exception
for 8 bit encodings, but I hope that the 8 bit functions are
restricted to ISO-8859-1---aka Plane 0, Row 00 of Unicode---as they
should be.) I'm not sure how they go about using fonts indexed by
other registries, although I know that Xft can render traditional
bitmap fonts (BDF, PCF at least). Xft and X Unicode stuff is terribly
documented as far as I can tell, and where it is documented (eg,
Markus Kuhn's standard for Unicode X keysyms) the implementations
if they use other registries then they have to have some conversion
between unicode and national charsets (not esp. hard).
Ben> 1. the goal is: redisplay gets a stretch of characters
Ben> the combined faces for each (if you want, just assume you've
Ben> been given a stretch entirely covered by the same combined
Ben> face) and has to figure out the appropriate font for each one
Ben> and the width of each character. (currently, it optimizes by
Ben> using the charset and assuming one font per charset; that
Ben> allows it to make one width check per stretch of characters
Ben> in the same charset. but this has to go, and i don't think
Ben> anything equivalent e.g. over unicode subranges would be
Ben> sound, as some fonts support only parts of certain subranges
Ben> so we'd have to use another font in some cases.)
I think that in general language segments are going to be large (99%
of the time covering the whole buffer). So you break up the region to
be displayed into segments by language, first. Then pass (segment,
language) pairs to the face-merging machinery, which returns a list of
(subsegment, language, face) triples. Then you pass those to the
rendering engine, which chooses font, and possibly other
characteristics, according to language. For example, in Japanese
italic is rarely used; underlines and boldface are preferred. So it
should be possible for an "emphasis" face would map to italic fonts
and no underlining for English, while in Japanese it maps to mincho
(same as the default face) and underlining, while "strong emphasis"
maps to bold and no underline for both languages.
If the first choice font doesn't contain the glyph for some character,
you fallback, first going down later choices for that language, then
inheriting from parents of the face (up to default), then finally
querying the system for any font with a glyph for that character (you
can do this in fontconfig).
ok. redisplay currently segments first by merged face and then by
charset. as a first pass i don't think we need to change that; the only
time this loses is when someone wants to do something weird like [a] use
a ligating script like arabic or devanagari; [b] switch faces in the
middle of a word. for the moment i'm not even worrying about ligating
at all, although dealing with this is not impossibly hard (basically,
redisplay has to calculate widths based on a segment, not
glyph-by-glyph; it already passes these segments on to the
display-specific code, which draws them a segment at a time, which more
or less does correct ligating behavior, at least under windows).
mostly what i'm confused about is how exactly we map characters to
fonts, see below ...
Ben> 3. maybe you have a language->font mapping inside of a
Ben> free to choose the appropriate mapping and interface.
This is going to be hard. The reason is that fontconfig and Windows,
at least, have their own mechanisms for grouping languages. We
probably want to be careful that our mechanism can be configured to be
compatible with that.
well, we don't have to worry too much about this, i think; presumably,
`language' is an XEmacs concept that's used somehow or other to derive a
list of fonts to look through. what i don't understand is what the
proper interface onto the `face' object should be, to specify
language-specific fonts; i assume you have a better idea since you've
had to do this work under xft.
4. while you're at it, feel free to design the appropriate
for mapping languages (or whatever) to a precedence list of charsets,
for unicode conversion (including both the mapping itself and the lisp
commands used to control this mapping).
Ben> 5. assume that our goal is to find the best font given the
Ben> language and associated preferences, but that in all cases we
Ben> need to find *some* font, if it's available. displaying a
Ben> "box" or whatever is a last resort, and ideally should happen
Ben> only when there is *no* font on the system (or at least, no
Ben> font from all those available to look through, i.e. specified
Ben> as global defaults).
Take a look at the fontconfig manual (in the general-docs package of
XEmacs as texinfo). I recommend this not because it's perfect; there
are warts. However, it seems to satisfy the vast majority of users
out there and it is the underlying mechanism for Xft and (I believe)
GTK v2. fontconfig/Xft are standard extensions in X11R6.8 (at worst,
it may be in X11R6.6 too).
ok, i'll look into this.
Ben> assume also that we want this to be reasonably fast, so
Ben> will probably need to cache char-> font mappings.
I don't know if fontconfig can give us that, so we may have to do some
we can do this ourselves; a cache won't be too hard.
Ben> maybe this sounds overwhelming, so don't take it as a
Actually, this is something I've been thinking about a lot. I don't
think it's really that hard, once we get rid of Mule charsets and
start thinking in terms of font (etc) repertoires defined as Unicode
subsets. Implementation is scary, so I'm very glad you're going to be
helping a lot with that.
well, it'll take time. i did a bunch of work on char tables recently,
but it was time consuming ... and there's still more to go before i can
even start compiling. first pass will not display non-charset fonts
(but at least it won't eat the chars), and it will be littered with
functions named ichar_charset_obsolete_me_baby_please(). (well, there's
only one of those; but there are other functions that deserve the same
treatment, e.g. ensure_face_cachel_contains_charset().)
Now, if we could do font-lock in CCL ... but we can't.
well, most font-lock slowness is in the regexp or extent code, not in