Re: Speed difference in Gnus

Friday, 21 October 2005

        ...
>>>> "Ben" == Ben Wing <ben(a)xemacs.org&gt;
writes: 
    Ben> well, there are some fonts that have ...-devanagari-10646-... in them,
    Ben> which are presumably unicode.

Yeah, I know, but I can't get them to work (haven't tried in a while,
though, because I'm using Xft almost all the time now).  AFAICT the
standard X11R6 functions didn't handle them at all, and the XFree86
extensions wanted UTF-8 and did something magic internally.

Basically I think the way to go is fontconfig/Xft.

    Ben> i need some info on how X does it before thinking about
    Ben> doing it.

Xft simply does Unicode, period, AFAICS.  (There may be an exception
for 8 bit encodings, but I hope that the 8 bit functions are
restricted to ISO-8859-1---aka Plane 0, Row 00 of Unicode---as they
should be.)  I'm not sure how they go about using fonts indexed by
other registries, although I know that Xft can render traditional
bitmap fonts (BDF, PCF at least).  Xft and X Unicode stuff is terribly
documented as far as I can tell, and where it is documented (eg,
Markus Kuhn's standard for Unicode X keysyms) the implementations
don't conform.

    Ben> 1. the goal is: redisplay gets a stretch of characters and
    Ben> the combined faces for each (if you want, just assume you've
    Ben> been given a stretch entirely covered by the same combined
    Ben> face) and has to figure out the appropriate font for each one
    Ben> and the width of each character. (currently, it optimizes by
    Ben> using the charset and assuming one font per charset; that
    Ben> allows it to make one width check per stretch of characters
    Ben> in the same charset.  but this has to go, and i don't think
    Ben> anything equivalent e.g. over unicode subranges would be
    Ben> sound, as some fonts support only parts of certain subranges
    Ben> so we'd have to use another font in some cases.)

I think that in general language segments are going to be large (99%
of the time covering the whole buffer).  So you break up the region to
be displayed into segments by language, first.  Then pass (segment,
language) pairs to the face-merging machinery, which returns a list of
(subsegment, language, face) triples.  Then you pass those to the
rendering engine, which chooses font, and possibly other
characteristics, according to language.  For example, in Japanese
italic is rarely used; underlines and boldface are preferred.  So it
should be possible for an "emphasis" face would map to italic fonts
and no underlining for English, while in Japanese it maps to mincho
(same as the default face) and underlining, while "strong emphasis"
maps to bold and no underline for both languages.

If the first choice font doesn't contain the glyph for some character,
you fallback, first going down later choices for that language, then
inheriting from parents of the face (up to default), then finally
querying the system for any font with a glyph for that character (you
can do this in fontconfig).

    Ben> 2. you also have available the buffer that the text came from

It's not clear to me that that's really useful, except at the very top
level.  As I say, I believe that the vast majority of the time you'll
have only a few language segments.  I don't think what I propose would
be inefficient for a document with many intralinear glosses, even.

    Ben> maybe better renamed to a `culture' object

That's not really right, as you point out.  We may eventually want
"culture" objects, but they should affect far more than redisplay.
How about "dialect" as something finer than language and a component
of culture?

    Ben> 3. maybe you have a language->font mapping inside of a face.  feel
    Ben> free to choose the appropriate mapping and interface.

This is going to be hard.  The reason is that fontconfig and Windows,
at least, have their own mechanisms for grouping languages.  We
probably want to be careful that our mechanism can be configured to be
compatible with that.

4. while you're at it, feel free to design the appropriate interface
for mapping languages (or whatever) to a precedence list of charsets,
for unicode conversion (including both the mapping itself and the lisp
commands used to control this mapping).

    Ben> 5. assume that our goal is to find the best font given the
    Ben> language and associated preferences, but that in all cases we
    Ben> need to find *some* font, if it's available.  displaying a
    Ben> "box" or whatever is a last resort, and ideally should happen
    Ben> only when there is *no* font on the system (or at least, no
    Ben> font from all those available to look through, i.e. specified
    Ben> as global defaults).

Take a look at the fontconfig manual (in the general-docs package of
XEmacs as texinfo).  I recommend this not because it's perfect; there
are warts.  However, it seems to satisfy the vast majority of users
out there and it is the underlying mechanism for Xft and (I believe)
GTK v2.  fontconfig/Xft are standard extensions in X11R6.8 (at worst,
it may be in X11R6.6 too).

    Ben> assume also that we want this to be reasonably fast, so we
    Ben> will probably need to cache char-> font mappings.

I don't know if fontconfig can give us that, so we may have to do some
redundant work.

    Ben> maybe this sounds overwhelming, so don't take it as a command :-)

Actually, this is something I've been thinking about a lot.  I don't
think it's really that hard, once we get rid of Mule charsets and
start thinking in terms of font (etc) repertoires defined as Unicode
subsets.  Implementation is scary, so I'm very glad you're going to be
helping a lot with that.

    Ben> the problem of shift-jis, big5 and koi8 just goes away with
    Ben> the new extended charsets.  i don't know about mule-ucs, but
    Ben> it's just a stopgap way of providing unicode support before
    Ben> the real support is implemented, right?

Sorry for the imprecision.  That's exactly what I meant.

    Ben> i'd need to see real evidence of significant things that
    Ben> could be done in ccl but no other reasonable way before
    Ben> considering this.

No, let's junk it (but check me on the last sentence of the next
paragraph! we don't want to lose KOI8 and so on, and we do need the
ability to define such at runtime).

CCL arithmetic is fast (because it doesn't do any of the magical LISP
stuff) and it can safely be called from redisplay (for the same
reason).  That's all you can say for it, except that it can be done at
runtime.  But `load-unicode-mapping-table' means that you can just
cons up a coded character set in a buffer, and install it directly.

Now, if we could do font-lock in CCL ... but we can't.

-- 
School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Speed difference in Gnus