Re: Emacs bidi

Saturday, 10 November 2001

        "Stephen J. Turnbull" <turnbull(a)sk.tsukuba.ac.jp&gt; writes:

...
>>>>> "Alex" == Alex Schroeder
<kensanata(a)yahoo.com&gt; writes:

     Alex> I just checked the "On-line proceedings for m17n2000" and
     Alex> found your name on the list of speakers.  I'm currently
     Alex> starting to work on Emacs bidi.  One place where this takes
     Alex> place is the emacs-bidi mailing list.  Is there a similar
     Alex> place for XEmacs bidi work?

 xemacs-beta will do.  Nobody's picked up on it yet; we have no Arabic-
 writing users AFAIK and very few Hebrew writers.  If something gets
 started, we could have a separate list.  However, presumably bidi will
 involve surgery on Mule and redisplay.  We would probably make a CVS
 branch, with intent to merge fairly quickly.  So it would be best to
 keep the general developer community informed, and xemacs-beta is
 currently the best channel for that. 
Ok, I'll just keep you XEmacs guys informed of what's going on
emacs-bidi from time to time.

Eli Zaretskii is writing code for the redisplay which will take bidi
into account while redisplaying.  It is based on UAX#9 but uses a
character-by-character-approach just like current Emacs redisplay:
Based on the current position in the buffer, the "next" character is
fetched.  Instead of just incrementing the position, Eli does some
scanning, caching, jumping and direction reversing such that the
"next" character for redisplay is indeed the next character in visual
order (in the window), but not necessarily the next character in
logical order (in the buffer).  UAX#9 is the Unicode Standard Annex #9
Technical Report and is available at
http://www.unicode.org/unicode/reports/tr9/.

TAKAHASHI Naoto and HANDA Kenichi are writing code to display Arabic
from Unicode and ISO-8859-6 correctly.  Those encodings use only one
character for the various glyphs (start, middle, end, isolated form),
so the display code needs to know about some basic Arabic rules to get
this right.  This is done using font-lock to identify the region of
interest and compose-region to compose any number of characters into
another character.  There's a screenshot available at
http://www.m17n.org/ntakahas/arabic.png.

I myself am just getting into this.  :) At the moment I'm writing code
to create categories usable for bidi according to UAX#9.  Currently I
use one such categories for each bidi type in UAX#9.  I use a table
derived from UnicodeData.txt (available from www.unicode.org) to get
all bidi types for UCS characters, and then I use tables from Dave
Love's ucs-tables.el to use this information for all 8859 charsets.  I
assume that -- as more unification tables from UCS to the various
other charsets become available -- the bidi categories can be for more
and more charsets.  This code is available from the emacs-bidi
archives, http://mail.gnu.org/pipermail/emacs-bidi/.  The next thing
on the list is a function to revert the bidi-algorithm: To derive a
logical form given the visual form.  We assume that such a thing may
be valuable when treating text pasted from other applications where
the original logical ordering (and all the explicit bidi marks) got
lost.

Alex.
-- 
http://www.emacswiki.org/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998