>>>> "Ben" == Ben Wing <ben(a)666.com>
writes:
Ben> a lot of what you say here is confusing so i'm going to make
Ben> some general points and ask some questions.
Ben> we need a way of representing named character classes.
Which is precisely what I said, using my words to boot. You talked
about "generalized charsets".
Ben> what api for new regexps would you propose?
1. Regexp object type, a first class Lisp object with room for lots
of attributes (although it could be a vector, hash, or list rather
than a new object type).
2. There's a family of regexp-compile methods with signature
(regexp-compile REGEXP SYNTAX) which convert the string REGEXP to a
regexp object according to SYNTAX. Initially we should support GNU
Elisp syntax and PCRE syntax.
3. The search and match (aka looking-at) functions return match
objects, which are first class Lisp objects with properties. They
need to be named to avoid collision with existing Elisp functions.
4. There is a (deprecated) global match variable, say global-match.
For backward compatibility we define
(defun re-search-forward (regexp &optional syntax)
(setq global-match
(new-re-search-forward (regexp-compile regexp
(or syntax 'legacy-elisp))))
(match-end global-match 0))
etc.
> Does it matter? What are the programmatic applications for
> these things?
Ben> what do you mean?
Now that you've made it clear that you're not talking about charsets,
you're talking about what Elisp implements via "syntax tables", I'm
not confused any more.
Ben> it does matter because with the wrong implementation we
Ben> either [a] take a humongous amount of space or [b]
Ben> potentially make our regexps slower than they should be.
Yeah, sure. "Premature optimization is the root of all error." Since
it's pretty clear that both methods will work and can be encapsulated
in the same API, pick one, get it working, and we'll deal with
optimization and/or user preference later.
Ben> as for fonts, i'm not sure what is so wrong about a
Ben> char->font mapping,
Han unification, as a representative example of a large class of
similar issue (such as ISO 8859).
Ben> the intention is that `put-char-table' can take a character
Ben> class as well as a single character, and sets a value for
Ben> that whole class. this seems quite natural to me -- usually,
Ben> you want to specify the e.g. "Traditional Arabic" font for
Ben> Arabic characters, the e.g. "MS Mincho" font for Japanese,
Ben> etc.
Natural, and wrong for multilingual documents, which are precisely the
ones where it matters. We should map language -> font, and then check
the font repertoire for the character and have fallbacks. Pretty much
as we currently do.
If you want a cache mapping characters to known good fonts, that's
another matter, but I think that's more premature optimization.
--
School of Systems and Information Engineering
http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.