RFC: generic support for single-byte encodings in non-Mule XEmacs

Wednesday, 23 May 2001

Salam,

[this proposal is a rethought and refined version of the "RFC:
(set-xkb-cyrillic-charset "koi8-r") for non-Mule XEmacs".  Thanks to
everyone for their comments, especially to Hrvoje for
'ascii-character, and to Martin for ideas about not bothering with C
and the need of "many more thinking", you were right :) ]

I will speak of "single-byte encodings", using mostly Cyrillic ones as
an example.

It seems like 'ascii-character property could be renamed to
'single-byte-equivalent for clarity and formal correctness, if it
wasn't for the compatibility (although it could be considered to
declare 'ascii-character obsolete and consult it only if
'single-byte-equivalent is not defined.  Renamed property should be
properly documented, as everything else discussed here, of course).

There are three primary things that should be modified when switching
single-byte encoding: mapping of X keysyms to single-byte codes;
case-tables; syntax-tables.

1. Mapping of keysyms was the primary subject of previous discussion
and the best practice seems to be

	(put 'Keysym 'single-byte-equivalent ?\xXX)

All possible keysyms that are subject to this mapping are tracked and
bound to nil if they do not have an equivalent in the encoding that is
being switched to.  E.g., when we are using "latin-1" encoding, we
have

	(global-set-key 'ntilde 'self-insert-command)
	(put 'ntilde 'single-byte-equivalent ?\xXX)

When we're switching to "koi8-r" encoding, we have to

	(global-unset-key 'ntilde)

because it does not have koi8-r equivalent.  (Or, otherwise, we could
bind it to some `no-single-byte-equivalent' function, that does just

    (message "last typed character has no CURRENT-ENCODING equivalent")

(instead of "ASCII", which is not always true).

Sure, if user redefined 'ntilde to something like (insert "\~n"), it
will be left alone.

2. case-tables.  It's a rather minor and straightforward issue, except
that it effectively does not work in 21.1.x.  Is the change in 21.4.x
that enabled this feature small enough to be somehow backported to
21.1?  It'd be good to be able to make a patch that brings it all
together to one more release of 21.1.

3. syntax-tables (mostly for "word constituent" syntax class).  That's
as minor and straightforward issue as the previous one, and without
bugs.  It just needs to be carefully checked for various major modes
that change syntax tables so that they all live together.

I propose that generic implementation of all this be put in
"single-byte.el", while particular encoding descriptions be put in
e.g. "cyrillic.el" and "x-cyrillic.el" (different from
"mule/cyrillic.el"), a-la "iso8859-1.el" and
"x-iso8859-1.el".

Previous implementations: koi8-r-only support for all of this based on
results of previous discussion in xemacs-beta is written by "Andrew
W. Nosenko" <awn(a)bcs.zp.ua&gt; and seems to be successfully used around.
There is also a widely used .el-files by Ilya Perminov to support most
popular Cyrillic encodings: it also features automatic detection of
encoding and recoding of buffers.

There are also several more issues, like choosing the correct ispell
dictionary, but they probably could be done in hooks or just discussed
later.

Any comments?  I mostly do not see any unresolved issues and plan to
start developing in couple more days.

--alexm

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

RFC: generic support for single-byte encodings in non-Mule XEmacs