Stephen J. Turnbull wrote:
I'll be looking more carefully at this later, but a couple of
quick
corrections and clarifications.
I see your point about priorities, it's especially clearer when you
put it in terms of redisplay access vs. extent modifications as the
typical cases.
me likewise. i'll just add a few [rather long!] comments below.
>>>>>"Ben" == Ben Wing <ben(a)xemacs.org> writes:
>>>>>
>>>>>
Ben> yes, i do. once i get the new text property implementation down, then
Ben> any program that only uses text properties and not directly uses
Ben> extents or overlays, or which only uses extents/overlays covering
Ben> small sections of the buffer, will be fast.
That's useful to know; I wasn't sure.
>> Something like this? A linked list P of (buffer position, extent
This was originally an array, which is where the "finding n is O(lg
n)" came from.
>> list) pairs, with an intervals defined as [P[n].pos,P[n+1].pos) for
>> each n. An extent is a pair of pointers into P defining which
>> fragments are covered by the extent plus a property list. P[n].ext is
>> the list of extents covering that interval. Then for random access,
>> finding n is O(lg n), and finding the property is O(|P[n].ext|), which
"Finding n" == "finding the n such that P[n].pos = some_position".
>> is the best you can get, I guess. (For many operations, in particular
>> redisplay, which walk P in order, finding n is O(1), right?) P itself
>> only needs to be updated when text is inserted/deleted in the buffer,
>> and when an extent is created/deleted. |P| <= |# boundary points of
>> all extents|, so that is no worse than what we currently have, and we
>> never have to update the extents themselves.
Ben> what do you mean "we never have to update the extents
Ben> themselves"?
Since the extents are pointers to *links*, and
links will be inserted/deleted only when fragments are split/coalesced,
which occurs only when an extent is "created with an endpoint not in
the list"/deleted,
the pointers in the extents never need to be updated.
You make the same point. What I missed is:
Ben> the array, btw, is necessary so that we can locate the
Ben> interval surrounding a particular position in O(lg n) time. it's not
Ben> possible to do this in a pure linked list; you'd get O(n) instead.
Ben> [1] how are unicode fonts handled under X windows?
Not, as far as I can tell.
well, there are some fonts that have ...-devanagari-10646-... in them,
which are presumably unicode. maybe we create a special unicode-bcs[?]
charset, 256x256, for such fonts -- and make sure we check for the
presence of characters in a
particular font before using it. the question is, what do other modern
Unicode-based applications do under X windows for displaying Unicode
chars? (think Firefox, KDE's web browser, etc) What about xft?
i'm asking this because the code to handle char->font mapping needs to
be totally redone (e.g. it's currently phrased in terms of char*set* to
font!) and i need some info on how X does it before thinking about
doing it. in windows it's easy because all fonts are indexed by unicode
and have some bit fields corresponding to specfic Unicode subranges
indicating, more or less, which subranges they support. (OTOH i haven't
yet figured out exactly what this means -- supports at least one char in
the range? supports "more than a few"? "many"? "all"?)
almost certainly
i'll need a cache of characters->fonts.
since you have a general handle on these issues better than i do, maybe
you could present the general model you have in mind for font mapping?
assume something like this:
1. the goal is: redisplay gets a stretch of characters and the combined
faces for each (if you want, just assume you've been given a stretch
entirely covered by the same combined face) and has to figure out the
appropriate font for each one and the width of each character.
(currently, it optimizes by using the charset and assuming one font per
charset; that allows it to make one width check per stretch of
characters in the same charset. but this has to go, and i don't think
anything equivalent e.g. over unicode subranges would be sound, as some
fonts support only parts of certain subranges so we'd have to use
another font in some cases.)
2. you also have available the buffer that the text came from (from
which you can derive a `language' object; not yet implemented, but will
be; maybe better renamed to a `culture' object, as per .NET, since it
refers to something more than just pure language and might well want to
differentiate "simplified chinese" from "traditional chinese" as
different 'languages/cultures/whatever'; OTOH, 'culture' in .NET is
really something different, something like 'en-US' or `arb-EGY', which
is not what we're looking for). you also may have available a
'language' (or whatever) taken from an extent. in fact, you probably
just have a single character-specific `language' object and you don't
know or care where it comes from.
3. maybe you have a language->font mapping inside of a face. feel free
to choose the appropriate mapping and interface.
4. while you're at it, feel free to design the appropriate interface for
mapping languages (or whatever) to a precedence list of charsets, for
unicode conversion (including both the mapping itself and the lisp
commands used to control this mapping).
5. assume that our goal is to find the best font given the language and
associated preferences, but that in all cases we need to find *some*
font, if it's available. displaying a "box" or whatever is a last
resort, and ideally should happen only when there is *no* font on the
system (or at least, no font from all those available to look through,
i.e. specified as global defaults). assume also that we want this to be
reasonably fast, so we will probably need to cache char->font mappings.
since presumably it's much more expensive to get at a font's data than
to look through it, maybe we put in some special hackery whenever we go
looking in a font to see what other characters are available in that
font (maybe in some reasonable-sized chunk surrounding the char we're
looking for, or even throughout the whole damn font). We can store that
info in a range table or something. maybe we also have special hacks to
make sure that ascii goes fast.
6. try to do it in a system-independent way as possible, indicating, as
necessary, how certain system-dependent things would be handled [see
comments above about windows vs. x-windows in font handling].
maybe this sounds overwhelming, so don't take it as a command :-) i'm
just interested in the model of char/language->font mapping in your
head, since you've obviously thought a lot about this and you work first
hand with it in your sjt-xft workspace, and i'm confused in a lot of
these areas.
Ben> [2] now that i look more at ccl, i see that it's
utterly tied
Ben> to the old mule encoding, as it expects to look directly at
Ben> the internal encoding of the text. it would have to be
Ben> radically restructured for it to work on a
Ben> character-by-character level, not byte-by-byte. do you think
Ben> it's reasonable to just make ccl be an optional feature that
Ben> is compiled in only with old mule?
Yes. I think that CCL has four interesting applications that I know
of: shift-jis, big5, mule-ucs, and fast table-based character sets
(eg, KOI8). The first two have been in C forever, and the latter two
are in C in the unicode support now (at least in principle).
the problem of shift-jis, big5 and koi8 just goes away with the new
extended charsets. i don't know about mule-ucs, but it's just a stopgap
way of providing unicode support before the real support is implemented,
right?
ccl could be fixed by making it work on characters not bytes, e.g.
having commands to separate a character into a charset and its octets
(although this depends on the current language, etc.). but this would
be a real undertaking and i'd need to see real evidence of significant
things that could be done in ccl but no other reasonable way before
considering this. just ripping out certain things and making it
non-mule-specific is another possibility; for the moment i'll just have
it be old-mule only.
There are also some minor hacks from Aidan and others.
It might be worth using the CCL interpreter as a general fast
arithmetic engine for small numbers, but rip out the Mule stuff.
ben