Re: Is there any more information about unicode-internal?

Monday, 28 November 2005

        Zhang Wei wrote:

...
As I learned from the xemacs-beta mail list, Ben is working on
unicode-internal support. I want to learn more about it.

What internel representation scheme shall we use? Is it the same
scheme as in GNU Emacs's emacs-unicode-2 branch (which named
utf-8-emacs)? Or we'll use a different scheme such as utf-16-like? Is
there a unicode-internal branch in the cvs repository?

 Zhang, i've attached a recent diff.  it's unlikely to build using these 
diffs, as i'm in the middle of merging the recent code changes; but 
you'll get an idea.

the internal representation is definitely not like emacs-unicode-2.  
that branch seems to use a hacked-up form of utf-8 that keeps the 
different charsets embedded in the characters.  i try to use pure utf-8 
whenever possible; i.e. han characters will be unified.  i do have a 
hack to preserve charset info using high unicode chars (outside of the 
utf-16 space), but this is used only for processing .el files and for 
characters that have no unicode equivalent; otherwise, we intend to use 
text properties to preserve language info.

at some point we'll also support a fixed-width 8/16/32-bit buffer 
representation.  most of the code to do this is present; it just needs 
some debugging.

in any case, the internal representation should never escape to any 
external files.  if this happens in emacs-unicode-2 (does it ever 
happen?), i'd consider it a bug.  XEmacs has never allowed its internal 
representation to escape; in fact, you can't create such a file even if 
you want to, unless you have compiled with `--debug'.

ben

? .new.configure.ac
? html
? unicode-megabuild
? unicode.112205-merged
? unicode.build
? lib-src/dump-id.c
? lisp/mule/ethiopic.elc-good
? man/beta.html
? man/xemacs-faq.html
? man/internals/internals.aux
? man/internals/internals.cp
? man/internals/internals.fn
? man/internals/internals.fns
? man/internals/internals.ky
? man/internals/internals.log
? man/internals/internals.pdf
? man/internals/internals.pg
? man/internals/internals.toc
? man/internals/internals.tp
? man/internals/internals.vr
? modules/canna/autom4te.cache
? nt/.xemacs.dsw.swp
? nt/make-docfile.out
? nt/xemacs.sln
? nt/xemacs.suo
? nt/xemacs.vcproj
? src/.compiler.h.swp
? src/temacs.sln
? src/temacs.suo
? src/xemacs.pdb
? src/xemacs.sln
? src/xemacs.suo
cvs server: Diffing .
Index: configure.ac
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/configure.ac,v
retrieving revision 1.17
diff -u -r1.17 configure.ac
--- configure.ac	2005/10/24 21:01:08	1.17
+++ configure.ac	2005/11/22 14:00:02
＠＠ -469,6 +469,20 ＠＠
 $3
 ],[$4])])dnl
 dnl
+dnl XE_COMPLEX_ARG_ENABLE(FEATURE, HELP-STRING, ACTION-IF-TRUE, ACTION-IF-FALSE,
+dnl                     [XE_COMPLEX_OPTION, ....])
+dnl --------------------------------------------------------------------------
+dnl
+dnl Expanded version of AC_ARG_ENABLE for complex options.  All the parameters
+dnl are required.
+dnl
+define([XE_COMPLEX_ARG_ENABLE],
+[XE_INIT_COMPLEX_OPTION([enable_]patsubst([$1], -, _), m4_shiftn(4, $＠))
+AC_ARG_ENABLE([$1],[$2],
+[XE_PARSE_COMPLEX_OPTION([enable_]patsubst([$1], -, _), [--enable-$1])
+$3
+],[$4])])
+dnl
 dnl -------------------------------------------------------------------------
 XE_APPEND(lib-src, MAKE_SUBDIR)
 XE_APPEND(lib-src, INSTALL_ARCH_DEP_SUBDIR)
＠＠ -480,18 +494,18 ＠＠
 dnl
 dnl I think these will be caught by autoconf internal checks,
 dnl only --with-* are unchecked
-dnl --external-widget         --enable-external-widget
+dnl --external-widget         --with-external-widget
 dnl --native-sound-lib=LIB    --with-native-sound-lib
 dnl --mail-locking=TYPE       --with-mail-locking
 dnl --rel-alloc               --with-rel-alloc
-dnl --use-number-lib          --enable-bignum
-dnl --debug                   --enable-debug
-dnl --error-checking          --enable-error-checking
-dnl --memory-usage-stats      --enable-memory-usage-stats
-dnl --quick-build             --enable-quick-build
-dnl --use-union-type          --enable-union-type
-dnl --pdump                   --enable-pdump
-dnl --use-kkcc                --enable-kkcc
+dnl --use-number-lib          --with-bignum
+dnl --debug                   --with-debug
+dnl --error-checking          --with-error-checking
+dnl --memory-usage-stats      --with-memory-usage-stats
+dnl --quick-build             --with-quick-build
+dnl --use-union-type          --with-union-type
+dnl --pdump                   --with-pdump
+dnl --use-kkcc                --with-kkcc
 dnl
 dnl parse flags
 XE_MERGED_ARG([prefix],
＠＠ -593,7 +607,7 ＠＠
 	AC_HELP_STRING([--with-quantify],[Support performance debugging using Quantify.]),
 	[], [])
 XE_MERGED_ARG([toolbars],
-	AC_HELP_STRING([--enable-toolbars],[Enable toolbar support. Default: yes.]),
+	AC_HELP_STRING([--with-toolbars],[Enable toolbar support. Default: yes.]),
 	[], [])
 XE_MERGED_ARG([tty],
 	AC_HELP_STRING([--with-tty],[Enable TTY support. Default: yes.]),
＠＠ -601,15 +615,20 ＠＠
 XE_MERGED_ARG([xfs],
 	AC_HELP_STRING([--with-xfs],[Enable XFontSet support for internationalized
                         menubar.  Incompatible with `--with-xim=motif'.
-                        `--enable-menubars=lucid' (the default) is desirable.]),
+                        `--with-menubars=lucid' (the default) is desirable.]),
 	[], [])
 XE_MERGED_ARG([mule],
-	AC_HELP_STRING([--enable-mule],[Compile with Mule (Multi-Lingual Emacs) support,
+	AC_HELP_STRING([--with-mule],[Compile with Mule (Multi-Lingual Emacs), i.e.
internationalization support,
                         needed to support non-Latin-1 (including Asian)
                         languages.]),
 	[], [])
+AC_ARG_WITH([unicode-internal],
+	AC_HELP_STRING([--with-unicode-internal],[Use Unicode and UTF-8 internally for text. 
Automatically selects `--with-mule'.  Without this,
+Mule uses an old internal encoding that explicitly encodes the national
+character set associated with a char and cannot represent all of Unicode.]),
+	[], [])
 XE_MERGED_ARG([default-eol-detection],
-	AC_HELP_STRING([--enable-default-eol-detection],[Turns on by default auto-detection of
end-of-line type
+	AC_HELP_STRING([--with-default-eol-detection],[Turns on by default auto-detection of
end-of-line type
 			when reading a file.  Applies to those platforms where
 			auto-detection is off by default (non-Mule Unix).  Has
 			no effect otherwise.]),
＠＠ -706,44 +725,44 ＠＠
 	AC_HELP_STRING([--with-package-path],[Search path for package hierarchies.]),
 	[AC_DEFINE(PACKAGE_PATH_USER_DEFINED)], [])
 XE_MERGED_ARG([external-widget],
-	AC_HELP_STRING([--enable-external-widget],[Support XEmacs server for text widgets in
other applications.]),
+	AC_HELP_STRING([--with-external-widget],[Support XEmacs server for text widgets in other
applications.]),
 	[], [])
 XE_MERGED_ARG([kkcc],
-	AC_HELP_STRING([--enable-kkcc],[Enable new GC algorithms.]),
-	[], [enable_kkcc=yes])
+	AC_HELP_STRING([--with-kkcc],[Enable new GC algorithms.]),
+	[], [with_kkcc=yes])
 XE_MERGED_ARG([mc-alloc],
-	AC_HELP_STRING([--enable-mc-alloc],[Enable new allocator.]),
-	[], [enable_mc_alloc=yes])
+	AC_HELP_STRING([--with-mc-alloc],[Enable new allocator.]),
+	[], [with_mc_alloc=yes])
 XE_MERGED_ARG([union-type],
-	AC_HELP_STRING([--enable-union-type],[Use union definition of Lisp_Object type.  Known
to trigger bugs in some compilers.]),
+	AC_HELP_STRING([--with-union-type],[Use union definition of Lisp_Object type.  Known to
trigger bugs in some compilers.]),
 	[], [])
 XE_MERGED_ARG([pdump],
-	AC_HELP_STRING([--enable-pdump],[Enable portable LISP preloader.]),
+	AC_HELP_STRING([--with-pdump],[Enable portable LISP preloader.]),
 	[], [])
 XE_MERGED_ARG([dump-in-exec],
-	AC_HELP_STRING([--enable-dump-in-exec],[Enable dumping into executable (enabled by
default
+	AC_HELP_STRING([--with-dump-in-exec],[Enable dumping into executable (enabled by
default
 			for `pdump', not enabled by default in combination 
 			with `mc-alloc').]),
 	[], [])
 XE_MERGED_ARG([debug],
-	AC_HELP_STRING([--enable-debug],[Enable additional debugging information.  No time
cost.]),
+	AC_HELP_STRING([--with-debug],[Enable additional debugging information.  No time
cost.]),
 	[], [])
 XE_MERGED_ARG([assertions],
-	AC_HELP_STRING([--enable-assertions],[]),
+	AC_HELP_STRING([--with-assertions],[]),
 	[], [])
 XE_MERGED_ARG([memory-usage-stats],
-	AC_HELP_STRING([--enable-memory-usage-stats],[Enable LISP memory usage API.]),
+	AC_HELP_STRING([--with-memory-usage-stats],[Enable LISP memory usage API.]),
 	[], [])
 XE_MERGED_ARG([clash-detection],
-	AC_HELP_STRING([--enable-clash-detection],[Disable use of lock files to detect multiple
edits
+	AC_HELP_STRING([--with-clash-detection],[Disable use of lock files to detect multiple
edits
 			of the same file.]),
 	[], [])
 XE_MERGED_ARG([modules],
-	AC_HELP_STRING([--enable-modules],[Compile in experimental support for dynamically
+	AC_HELP_STRING([--with-modules],[Compile in experimental support for dynamically
                         loaded libraries (Dynamic Shared Objects).]),
 	[], [])
 XE_MERGED_ARG([quick-build],
-	AC_HELP_STRING([--enable-quick-build],[Speed up the build cycle by leaving out steps
where
+	AC_HELP_STRING([--with-quick-build],[Speed up the build cycle by leaving out steps
where
                         XEmacs will still work (more or less) without them.
                         Potentially dangerous if you don't know what you're
                         doing. This (1) doesn't garbage-collect after loading
＠＠ -769,7 +788,7 ＠＠

 XE_MERGED_ARG([cflags],
 	AC_HELP_STRING([--with-cflags],
-	[Compiler flags.  These flags will be placed after any flags inserted for warnings,
debugging or optimization; setting this does not disable the insertion of those flags. 
Use configure settings such as `--with-optimization=no' or `enable-debug=no' to
turn them off, or override them with `--with-cflags-optimization',
`--with-cflags-debugging', or `with-cflags-warning'.]),
+	[Compiler flags.  These flags will be placed after any flags inserted for warnings,
debugging or optimization; setting this does not disable the insertion of those flags. 
Use configure settings such as `--with-optimization=no' or `with-debug=no' to turn
them off, or override them with `--with-cflags-optimization',
`--with-cflags-debugging', or `with-cflags-warning'.]),
 	[], [])

 XE_MERGED_ARG([cflags-warning],
＠＠ -823,11 +842,11 ＠＠
 #Enable code.

 XE_COMPLEX_ARG([database],
-	AC_HELP_STRING([--enable-database],[Compile with database support.  Valid types are
+	AC_HELP_STRING([--with-database],[Compile with database support.  Valid types are
                         `no' or a comma-separated list of one or more
                         of `berkdb' and either `dbm' or `gnudbm'.]),
 	[
-if test "$enable_database_dbm"  = "yes" -a
"$enable_database_gdbm" = "yes"; then
+if test "$with_database_dbm"  = "yes" -a
"$with_database_gdbm" = "yes"; then
 	USAGE_ERROR("Only one of \`dbm' and \`gnudbm' may be specified
 with the \`--with-database' option.")
 fi
＠＠ -838,7 +857,7 ＠＠
  XE_COMPLEX_OPTION([gdbm],[""])])

 XE_COMPLEX_ARG([sound],
-	AC_HELP_STRING([--enable-sound],[Compile with sound support.
+	AC_HELP_STRING([--with-sound],[Compile with sound support.
                         Valid types are `native', `nas' and `esd'.
                         Prefix a type with 'no' to disable.
                         The first type can be `none' or `all'.  `none' means
＠＠ -847,7 +866,7 ＠＠
                         The default is to autodetect all sound support except 
                         for ESD which defaults to off.]),
 	[],
-	[enable_sound_nas=""],
+	[with_sound_nas=""],
 	[XE_COMPLEX_OPTION([native],[""]),
 	 XE_COMPLEX_OPTION([nas],[""]),
 	 XE_COMPLEX_OPTION([esd],[no])])
＠＠ -866,13 +885,13 ＠＠

 XE_KEYWORD_ARG([bignum],
-	AC_HELP_STRING([--enable-bignum=TYPE],[Compile in support for bignums, ratios, or
bigfloats
+	AC_HELP_STRING([--with-bignum=TYPE],[Compile in support for bignums, ratios, or
bigfloats
                         using library support.  TYPE must be one of "gmp"
                         (for GNU MP), "mp" (for BSD MP), or "no"
(disabled).]),
-	[], [enable_bignum="no"],[no,gmp,mp])
+	[], [with_bignum="no"],[no,gmp,mp])

 XE_COMPLEX_ARG([error-checking],
-	AC_HELP_STRING([--enable-error-checking],[Compile with internal error-checking added.
+	AC_HELP_STRING([--with-error-checking],[Compile with internal error-checking added.
                         Causes noticeable loss of speed.  Valid types
                         are extents, bufpos, malloc, gc, types, text, byte_code, glyphs,
display, structures.]),
 	[], [],
＠＠ -887,25 +906,25 ＠＠
  XE_COMPLEX_OPTION([structures],[""])])

 XE_KEYWORD_ARG([menubars],
-	AC_HELP_STRING([--enable-menubars=TYPE],[Use TYPE menubars (lucid, motif, or no).  The
Lucid
+	AC_HELP_STRING([--with-menubars=TYPE],[Use TYPE menubars (lucid, motif, or no).  The
Lucid
                         widgets emulate Motif (mostly) but are faster.
                         *WARNING*  The Motif menubar is currently broken.
                         Lucid menubars are the default.]),
 	[], [],[yes,no,lucid,motif,athena,gtk,msw])

 XE_KEYWORD_ARG([scrollbars],
-	AC_HELP_STRING([--enable-scrollbars=TYPE],[Use TYPE scrollbars (lucid, motif, athena, or
no).
+	AC_HELP_STRING([--with-scrollbars=TYPE],[Use TYPE scrollbars (lucid, motif, athena, or
no).
                         Lucid scrollbars are the default.]),
 	[], [],[yes,no,lucid,motif,athena,gtk,msw])

 XE_KEYWORD_ARG([dialogs],
-	AC_HELP_STRING([--enable-dialogs=TYPE],[Use TYPE dialog boxes (lucid, motif, athena, or
no).
+	AC_HELP_STRING([--with-dialogs=TYPE],[Use TYPE dialog boxes (lucid, motif, athena, or
no).
                         There are no true Lucid dialogs; Motif dialogs will be
                         used if Motif can be found, else Athena is used.]),
 	[], [],[yes,no,lucid,motif,athena,gtk,msw])

 XE_KEYWORD_ARG([widgets],
-	AC_HELP_STRING([--enable-widgets],[Use TYPE native widgets (lucid, motif, athena, or
no).
+	AC_HELP_STRING([--with-widgets],[Use TYPE native widgets (lucid, motif, athena, or no).
                         Other widget types are currently unsupported.
                         There are no true Lucid widgets; Motif widgets will be
                         used if Motif can be found, else Athena is used.]),
＠＠ -1078,27 +1097,27 ＠＠
 dnl Error checking and debugging flags
 dnl ----------------------------------
 dnl Error checking default to "yes" in beta versions, to "no" in
releases.
-dnl Same goes for --enable-debug and --extra-verbosity.
+dnl Same goes for --with-debug and --extra-verbosity.
 if test -n "$emacs_is_beta"; then beta=yes; else beta=no; fi
-test "${enable_error_checking_extents:=$beta}"    = yes &&
AC_DEFINE(ERROR_CHECK_EXTENTS)
-test "${enable_error_checking_types:=$beta}"      = yes &&
AC_DEFINE(ERROR_CHECK_TYPES)
-test "${enable_error_checking_text:=$beta}"       = yes &&
AC_DEFINE(ERROR_CHECK_TEXT)
-test "${enable_error_checking_gc:=$beta}"         = yes &&
AC_DEFINE(ERROR_CHECK_GC)
-test "${enable_error_checking_malloc:=$beta}"     = yes &&
AC_DEFINE(ERROR_CHECK_MALLOC)
-test "${enable_error_checking_byte_code:=$beta}"  = yes &&
AC_DEFINE(ERROR_CHECK_BYTE_CODE)
-test "${enable_error_checking_glyphs:=$beta}"     = yes &&
AC_DEFINE(ERROR_CHECK_GLYPHS)
-test "${enable_error_checking_display:=$beta}"    = yes &&
AC_DEFINE(ERROR_CHECK_DISPLAY)
-test "${enable_error_checking_structures:=$beta}" = yes &&
AC_DEFINE(ERROR_CHECK_STRUCTURES)
-dnl enable_debug=yes must be set when error checking is present.  This should be
+test "${with_error_checking_extents:=$beta}"    = yes &&
AC_DEFINE(ERROR_CHECK_EXTENTS)
+test "${with_error_checking_types:=$beta}"      = yes &&
AC_DEFINE(ERROR_CHECK_TYPES)
+test "${with_error_checking_text:=$beta}"       = yes &&
AC_DEFINE(ERROR_CHECK_TEXT)
+test "${with_error_checking_gc:=$beta}"         = yes &&
AC_DEFINE(ERROR_CHECK_GC)
+test "${with_error_checking_malloc:=$beta}"     = yes &&
AC_DEFINE(ERROR_CHECK_MALLOC)
+test "${with_error_checking_byte_code:=$beta}"  = yes &&
AC_DEFINE(ERROR_CHECK_BYTE_CODE)
+test "${with_error_checking_glyphs:=$beta}"     = yes &&
AC_DEFINE(ERROR_CHECK_GLYPHS)
+test "${with_error_checking_display:=$beta}"    = yes &&
AC_DEFINE(ERROR_CHECK_DISPLAY)
+test "${with_error_checking_structures:=$beta}" = yes &&
AC_DEFINE(ERROR_CHECK_STRUCTURES)
+dnl with_debug=yes must be set when error checking is present.  This should be
 dnl fixed up.
-dnl enable_debug implies other options
-if test "${enable_debug:=$beta}" = "yes"; then
-  enable_assertions=yes
-  enable_memory_usage_stats=yes
+dnl with_debug implies other options
+if test "${with_debug:=$beta}" = "yes"; then
+  with_assertions=yes
+  with_memory_usage_stats=yes
   AC_DEFINE(DEBUG_XEMACS)
 fi
-test "$enable_assertions"         = "yes" &&
AC_DEFINE(USE_ASSERTIONS)
-test "$enable_memory_usage_stats" = "yes" &&
AC_DEFINE(MEMORY_USAGE_STATS)
+test "$with_assertions"         = "yes" &&
AC_DEFINE(USE_ASSERTIONS)
+test "$with_memory_usage_stats" = "yes" &&
AC_DEFINE(MEMORY_USAGE_STATS)

 dnl ------------------------------
 dnl Determine the s&m files to use
＠＠ -1641,12 +1660,12 ＠＠
 dnl $opsys detection complete; defaults depending on $opsys follow
 dnl --------------------------------------------------------------

-if test -z "$enable_pdump"; then
+if test -z "$with_pdump"; then
   case "$opsys" in
-    linux* )  enable_pdump=yes ;;          dnl glibc 2.3.1 seems to hose unexec
-    darwin )  enable_pdump=yes ;;          dnl No "native" working dumper
available
-    cygwin* ) enable_pdump=yes ;;          dnl unexec is broken
-    * )       enable_pdump=no ;;
+    linux* )  with_pdump=yes ;;          dnl glibc 2.3.1 seems to hose unexec
+    darwin )  with_pdump=yes ;;          dnl No "native" working dumper
available
+    cygwin* ) with_pdump=yes ;;          dnl unexec is broken
+    * )       with_pdump=no ;;
   esac
 fi

＠＠ -1693,7 +1712,7 ＠＠
   AC_DEFINE(SUNPRO)
 fi

-if test "$enable_clash_detection" != "no"; then
+if test "$with_clash_detection" != "no"; then
   AC_DEFINE(CLASH_DETECTION)
 fi

＠＠ -1985,9 +2004,9 ＠＠
 dnl If the s&m files don't define a system-specific dumper, simply use pdump.
 dnl Sometime in the future, we'll remove all definitions of UNEXEC
 dnl from all the s&m files.
-test -z "$unexec" && enable_pdump=yes
+test -z "$unexec" && with_pdump=yes

-if test "$enable_pdump" = "yes"; then
+if test "$with_pdump" = "yes"; then
   ordinary_link="yes"
   ld="${ordinary_ld}"
   start_files=
＠＠ -1997,16 +2016,16 ＠＠
 fi

 dnl Enable KKCC and MC-ALLOC by default
-test -z "$enable_kkcc" && enable_kkcc=yes
-test -z "$enable_mc_alloc" && enable_mc_alloc=yes
+test -z "$with_kkcc" && with_kkcc=yes
+test -z "$with_mc_alloc" && with_mc_alloc=yes

 dnl Dump into executable
-if test -z "$enable_dump_in_exec"; then
-  if test "$enable_pdump" = "yes"; then
-    if test "$enable_mc_alloc" = "yes"; then
-      enable_dump_in_exec=no
+if test -z "$with_dump_in_exec"; then
+  if test "$with_pdump" = "yes"; then
+    if test "$with_mc_alloc" = "yes"; then
+      with_dump_in_exec=no
     else
-      enable_dump_in_exec=yes
+      with_dump_in_exec=yes
     fi
   fi
 fi
＠＠ -2302,7 +2321,7 ＠＠
 dnl a usage message, that's often good enough.  Please report it, though.
 dnl #### Should make this Solaris-friendly.
 dnl Link with -z nocombreloc for now.
-if test "$enable_pdump" != "yes"; then
+if test "$with_pdump" != "yes"; then
   AC_MSG_CHECKING(for \"-z nocombreloc\" linker flag)
   case "`ld --help 2>&1`" in
     *-z\ nocombreloc* ) AC_MSG_RESULT(yes)
＠＠ -2321,8 +2340,8 ＠＠
 dnl Add s&m-determined objects (including unexec) to link line
 test -n "$objects_machine" && XE_ADD_OBJS($objects_machine)
 test -n "$objects_system"  && XE_ADD_OBJS($objects_system)
-test -n "$unexec"          && test ! "$enable_pdump" =
"yes" && XE_ADD_OBJS($unexec)
-test "$enable_pdump" = "yes" && XE_ADD_OBJS(dumper.o)
+test -n "$unexec"          && test ! "$with_pdump" =
"yes" && XE_ADD_OBJS($unexec)
+test "$with_pdump" = "yes" && XE_ADD_OBJS(dumper.o)

 dnl Dynodump (Solaris 2.x, x<6)
 AC_MSG_CHECKING(for dynodump)
＠＠ -2683,7 +2702,7 ＠＠
 dnl Try this again when 2.1 hits the streets.
 dnl Avoid using free-hook.c if support exists for malloc debugging in libc
 dnl have_libmcheck=no
-dnl if test "$enable_error_checking_malloc" = "yes" -a \
+dnl if test "$with_error_checking_malloc" = "yes" -a \
 dnl    "$have_glibc" = "yes" -a \
 dnl    "$doug_lea_malloc" = "yes"; then
 dnl   AC_CHECK_HEADERS(mcheck.h)
＠＠ -3000,24 +3019,24 ＠＠

       for feature in scrollbars toolbars menubars dialogs widgets
       do
-        eval "feature_value=\${enable_${feature}}"
+        eval "feature_value=\${with_${feature}}"
 	case "${feature_value}" in
 	yes|no|gtk|"" )
 	    ;;
 	* )
 	  feature_conflict_with_gtk=yes
-	  AC_MSG_WARN([--enable-${feature}=${feature_value} is incompatible with --with-gtk])
;;
+	  AC_MSG_WARN([--with-${feature}=${feature_value} is incompatible with --with-gtk]) ;;
 	esac
       done
       if test "${feature_conflict_with_gtk}" = "yes"; then
 	XE_DIE(["One or more GUI toolkit features conflict with GTK"])
       fi

-      test "${enable_scrollbars}" != "no" &&
enable_scrollbars=gtk
-      test "${enable_toolbars}" != "no" &&
enable_toolbars=gtk
-      test "${enable_menubars}" != "no" &&
enable_menubars=gtk
-      test "${enable_dialogs}" != "no" && enable_dialogs=gtk
-      test "${enable_widgets}" != "no" && enable_widgets=gtk
+      test "${with_scrollbars}" != "no" &&
with_scrollbars=gtk
+      test "${with_toolbars}" != "no" && with_toolbars=gtk
+      test "${with_menubars}" != "no" && with_menubars=gtk
+      test "${with_dialogs}" != "no" && with_dialogs=gtk
+      test "${with_widgets}" != "no" && with_widgets=gtk

       dnl Check for libglade support (it rocks)
       OLD_CFLAGS="${CFLAGS}"
＠＠ -3235,8 +3254,8 ＠＠
   AC_DEFINE_UNQUOTED(THIS_IS_X11R${x11_release})

   if test "${x11_release}" = "4"; then
-    case "$enable_widgets" in
-      "" | "no") enable_widgets=no ;;
+    case "$with_widgets" in
+      "" | "no") with_widgets=no ;;
       *) XE_DIE("Widget support requires X11R5 or greater") ;;
     esac
   fi
＠＠ -3310,11 +3329,11 ＠＠
     fi
     if test "$window_system" != x11; then
       window_system=msw
-      test "$enable_scrollbars" != "no" &&
enable_scrollbars=msw
-      test "$enable_menubars"   != "no" &&
enable_menubars=msw
-      test "$enable_toolbars"   != "no" &&
enable_toolbars=msw
-      test "$enable_dialogs"    != "no" &&
enable_dialogs=msw
-      test "$enable_widgets"    != "no" &&
enable_widgets=msw
+      test "$with_scrollbars" != "no" && with_scrollbars=msw
+      test "$with_menubars"   != "no" && with_menubars=msw
+      test "$with_toolbars"   != "no" && with_toolbars=msw
+      test "$with_dialogs"    != "no" && with_dialogs=msw
+      test "$with_widgets"    != "no" && with_widgets=msw
     fi
     dnl check for our special version of select
     AC_RUN_IFELSE([AC_LANG_SOURCE([#include <fcntl.h>
＠＠ -3332,13 +3351,13 ＠＠
 if test "$window_system" = "none"; then
   for feature in menubars scrollbars toolbars dialogs dragndrop xface
   do
-    if eval "test -n \"\$enable_${feature}\" -a
\"\$enable_${feature}\" != no" ; then
-       AC_MSG_WARN([--enable-$feature ignored:  Not valid without window system
support])
+    if eval "test -n \"\$with_${feature}\" -a
\"\$with_${feature}\" != no" ; then
+       AC_MSG_WARN([--with-$feature ignored:  Not valid without window system support])
     fi
-    eval "enable_${feature}=no"
+    eval "with_${feature}=no"
   done
 else
-  test -z "$enable_toolbars" && enable_toolbars=yes
+  test -z "$with_toolbars" && with_toolbars=yes
 fi

 dnl ### Test for features that require mswindows support - currently none
＠＠ -3362,7 +3381,7 ＠＠
   dnl if test "$with_tty" = "no" ; then
   dnl   AC_MSG_ERROR([No window system support and no TTY support - Unable to proceed.])
   dnl fi
-  for feature in with_tooltalk with_cde with_offix with_wmcommand with_xim with_xmu
enable_sound_nas
+  for feature in with_tooltalk with_cde with_offix with_wmcommand with_xim with_xmu
with_sound_nas
   do
     if eval "test -n \"\$${feature}\" -a \"\$${feature}\" !=
\"no\"" ; then
        AC_MSG_WARN([--$feature ignored:  Not valid without X support])
＠＠ -3438,7 +3457,7 ＠＠
 AC_SUBST(LIBSTDCPP)

 dnl This must come before the detection code for anything that is in a module
-if test "$enable_modules" != "no"; then
+if test "$with_modules" != "no"; then
   AC_CHECKING([for module support])

   case "$opsys" in
＠＠ -3515,22 +3534,22 ＠＠
     XE_APPEND(src, INSTALL_ARCH_DEP_SUBDIR)
     test -n "$libdl" && XE_PREPEND(-l${libdl}, LIBS)
     AC_CHECK_FUNCS(dlerror _dlerror)
-    enable_modules=yes
+    with_modules=yes
     MOD_CC="../../lib-src/ellcc"
     MODCFLAGS="\$(CFLAGS) --mode=compile --mod-output=\$＠ -I../../src
-I\$(srcdir)/../../src"
     INSTALLPATH="\$(moduledir)"
     MOD_INSTALL_PROGRAM=${INSTALL_PROGRAM}
     OBJECT_TO_BUILD="\$(MODNAME).ell"
   else
-    if test "$enable_modules" = "yes"; then
+    if test "$with_modules" = "yes"; then
       XE_DIE("Required module support cannot be provided.")
     else
       echo "    No module support."
     fi
-    enable_modules=no
+    with_modules=no
   fi
 fi
-if test "$enable_modules" != "yes"; then
+if test "$with_modules" != "yes"; then
   MOD_CC="$XEMACS_CC"
   MODCFLAGS="\$(XE_CFLAGS) -I../../src -I\$(srcdir)/../../src"
   INSTALLPATH=""
＠＠ -3539,7 +3558,7 ＠＠
 fi
 MODARCHDIR=
 MAKE_DOCFILE="../../lib-src/make-docfile"
-AC_SUBST(with_modules, $enable_modules)
+AC_SUBST(with_modules, $with_modules)
 AC_SUBST(MOD_CC)
 AC_SUBST(MODARCHDIR)
 AC_SUBST(MAKE_DOCFILE)
＠＠ -3684,7 +3703,7 ＠＠
   LIBS="$save_LIBS"
   XE_APPEND(modules/ldap, MAKE_SUBDIR)
   need_modules_common=yes
-  if test "$enable_modules" = "yes"; then
+  if test "$with_modules" = "yes"; then
     XE_APPEND(modules/ldap, INSTALL_ARCH_DEP_SUBDIR)
   fi
 fi
＠＠ -3719,7 +3738,7 ＠＠
     XE_PREPEND(-lpq, postgresql_libs)
     XE_APPEND(modules/postgresql, MAKE_SUBDIR)
     need_modules_common=yes
-    if test "$enable_modules" = "yes"; then
+    if test "$with_modules" = "yes"; then
       XE_APPEND(modules/postgresql, INSTALL_ARCH_DEP_SUBDIR)
     fi
   elif test "$with_postgresql" = "yes"; then
＠＠ -3925,12 +3944,12 ＠＠
   AC_CHECKING([for X11 graphics libraries])
 fi

-case "$enable_widgets" in
+case "$with_widgets" in
   "" | "yes" | "athena")	detect_athena=yes ;;
    *)				detect_athena=no ;;
 esac

-case "$enable_dialogs" in
+case "$with_dialogs" in
   "" | "yes" | "athena")	detect_athena=yes ;;
 esac

＠＠ -4093,44 +4112,44 ＠＠
   * )        lucid_prefers_motif="yes" ;;
 esac

-case "$enable_menubars" in "" | "yes" | "athena"
)
-  enable_menubars="lucid" ;;
+case "$with_menubars" in "" | "yes" | "athena" )
+  with_menubars="lucid" ;;
 esac
-case "$enable_dialogs" in "" | "yes" | "lucid" )
+case "$with_dialogs" in "" | "yes" | "lucid" )
   if test "$lucid_prefers_motif" = "yes"; then
-    if   test "$have_motif"     = "yes"; then
enable_dialogs="motif"
-    elif test "$have_xaw"       = "yes"; then
enable_dialogs="athena"
-    elif test "$with_msw"       = "yes"; then
enable_dialogs="msw"
-    else enable_dialogs=no
+    if   test "$have_motif"     = "yes"; then
with_dialogs="motif"
+    elif test "$have_xaw"       = "yes"; then
with_dialogs="athena"
+    elif test "$with_msw"       = "yes"; then
with_dialogs="msw"
+    else with_dialogs=no
     fi
   else
-    if   test "$have_xaw"       = "yes"; then
enable_dialogs="athena"
-    elif test "$have_motif"     = "yes"; then
enable_dialogs="motif"
-    elif test "$with_msw"       = "yes"; then
enable_dialogs="msw"
-    else enable_dialogs=no
+    if   test "$have_xaw"       = "yes"; then
with_dialogs="athena"
+    elif test "$have_motif"     = "yes"; then
with_dialogs="motif"
+    elif test "$with_msw"       = "yes"; then
with_dialogs="msw"
+    else with_dialogs=no
     fi
   fi ;;
 esac
-case "$enable_scrollbars" in "" | "yes" )
-  enable_scrollbars="lucid" ;;
+case "$with_scrollbars" in "" | "yes" )
+  with_scrollbars="lucid" ;;
 esac
-case "$enable_widgets" in "" | "yes" | "lucid")
+case "$with_widgets" in "" | "yes" | "lucid")
   if test "$lucid_prefers_motif" = "yes"; then
-    if   test "$have_motif" = "yes"; then
enable_widgets="motif"
-    elif test "$have_xaw"   = "yes"; then
enable_widgets="athena"
-    elif test "$with_msw"   = "yes"; then
enable_widgets="msw"
-    else enable_widgets=no
+    if   test "$have_motif" = "yes"; then
with_widgets="motif"
+    elif test "$have_xaw"   = "yes"; then
with_widgets="athena"
+    elif test "$with_msw"   = "yes"; then
with_widgets="msw"
+    else with_widgets=no
     fi
   else
-    if   test "$have_xaw"   = "yes"; then
enable_widgets="athena"
-    elif test "$have_motif" = "yes"; then
enable_widgets="motif"
-    elif test "$with_msw"   = "yes"; then
enable_widgets="msw"
-    else enable_widgets=no
+    if   test "$have_xaw"   = "yes"; then
with_widgets="athena"
+    elif test "$have_motif" = "yes"; then
with_widgets="motif"
+    elif test "$with_msw"   = "yes"; then
with_widgets="msw"
+    else with_widgets=no
     fi
   fi ;;
 esac

-all_widgets="$enable_menubars $enable_scrollbars $enable_dialogs $enable_toolbars
$enable_widgets"
+all_widgets="$with_menubars $with_scrollbars $with_dialogs $with_toolbars
$with_widgets"

 case "$all_widgets" in
   *athena* )
＠＠ -4176,12 +4195,12 ＠＠
   need_motif=yes ;;
 esac

-test "$enable_menubars"   = "lucid" && XE_APPEND(xlwmenu.o,
lwlib_objs)
-test "$enable_menubars"   = "motif" && XE_APPEND(xlwmenu.o,
lwlib_objs)
-test "$enable_scrollbars" = "lucid" &&
XE_APPEND(xlwscrollbar.o, lwlib_objs)
-test "$enable_widgets"   != "no" && test
"$enable_widgets" != "msw" && \
+test "$with_menubars"   = "lucid" && XE_APPEND(xlwmenu.o,
lwlib_objs)
+test "$with_menubars"   = "motif" && XE_APPEND(xlwmenu.o,
lwlib_objs)
+test "$with_scrollbars" = "lucid" &&
XE_APPEND(xlwscrollbar.o, lwlib_objs)
+test "$with_widgets"   != "no" && test
"$with_widgets" != "msw" && \
 	XE_APPEND(xlwtabs.o xlwgcs.o, lwlib_objs)
-case "$enable_widgets" in athena* )
+case "$with_widgets" in athena* )
 	XE_APPEND(xlwradio.o xlwcheckbox.o xlwgauge.o, lwlib_objs);;
 esac
 case "$all_widgets" in *lucid* )
＠＠ -4191,46 +4210,56 ＠＠

 AC_SUBST(lwlib_objs)

-test "$enable_scrollbars" = "athena" &&
AC_DEFINE(LWLIB_SCROLLBARS_ATHENA)
-test "$enable_dialogs"    = "athena" &&
AC_DEFINE(LWLIB_DIALOGS_ATHENA)
+test "$with_scrollbars" = "athena" &&
AC_DEFINE(LWLIB_SCROLLBARS_ATHENA)
+test "$with_dialogs"    = "athena" &&
AC_DEFINE(LWLIB_DIALOGS_ATHENA)

 if test "$athena_3d" = "yes"; then
-  test "$enable_scrollbars" = "athena" &&
AC_DEFINE(LWLIB_SCROLLBARS_ATHENA3D)
-  test "$enable_dialogs"    = "athena" &&
AC_DEFINE(LWLIB_DIALOGS_ATHENA3D)
+  test "$with_scrollbars" = "athena" &&
AC_DEFINE(LWLIB_SCROLLBARS_ATHENA3D)
+  test "$with_dialogs"    = "athena" &&
AC_DEFINE(LWLIB_DIALOGS_ATHENA3D)
 fi

-case "$enable_widgets" in athena* ) AC_DEFINE(LWLIB_WIDGETS_ATHENA);; esac
-test "$enable_widgets" != "no" && test
"$enable_widgets" != "msw" && \
+case "$with_widgets" in athena* ) AC_DEFINE(LWLIB_WIDGETS_ATHENA);; esac
+test "$with_widgets" != "no" && test
"$with_widgets" != "msw" && \
 	AC_DEFINE(LWLIB_TABS_LUCID)

-test "$enable_menubars"   != "no"    &&
AC_DEFINE(HAVE_MENUBARS)
-test "$enable_scrollbars" != "no"    &&
AC_DEFINE(HAVE_SCROLLBARS)
-test "$enable_dialogs"    != "no"    &&
AC_DEFINE(HAVE_DIALOGS)
-test "$enable_toolbars"   != "no"    &&
AC_DEFINE(HAVE_TOOLBARS)
-test "$enable_widgets"    != "no"    &&
AC_DEFINE(HAVE_WIDGETS)
-
-test "$enable_menubars"   = "lucid"  &&
AC_DEFINE(LWLIB_MENUBARS_LUCID)
-test "$enable_scrollbars" = "lucid"  &&
AC_DEFINE(LWLIB_SCROLLBARS_LUCID)
-
-test "$enable_menubars"   = "motif"  &&
AC_DEFINE(LWLIB_MENUBARS_MOTIF)
-test "$enable_scrollbars" = "motif"  &&
AC_DEFINE(LWLIB_SCROLLBARS_MOTIF)
-test "$enable_dialogs"    = "motif"  &&
AC_DEFINE(LWLIB_DIALOGS_MOTIF)
-test "$enable_widgets"    = "motif"  &&
AC_DEFINE(LWLIB_WIDGETS_MOTIF)
+test "$with_menubars"   != "no"    &&
AC_DEFINE(HAVE_MENUBARS)
+test "$with_scrollbars" != "no"    &&
AC_DEFINE(HAVE_SCROLLBARS)
+test "$with_dialogs"    != "no"    &&
AC_DEFINE(HAVE_DIALOGS)
+test "$with_toolbars"   != "no"    &&
AC_DEFINE(HAVE_TOOLBARS)
+test "$with_widgets"    != "no"    &&
AC_DEFINE(HAVE_WIDGETS)
+
+test "$with_menubars"   = "lucid"  &&
AC_DEFINE(LWLIB_MENUBARS_LUCID)
+test "$with_scrollbars" = "lucid"  &&
AC_DEFINE(LWLIB_SCROLLBARS_LUCID)
+
+test "$with_menubars"   = "motif"  &&
AC_DEFINE(LWLIB_MENUBARS_MOTIF)
+test "$with_scrollbars" = "motif"  &&
AC_DEFINE(LWLIB_SCROLLBARS_MOTIF)
+test "$with_dialogs"    = "motif"  &&
AC_DEFINE(LWLIB_DIALOGS_MOTIF)
+test "$with_widgets"    = "motif"  &&
AC_DEFINE(LWLIB_WIDGETS_MOTIF)

 dnl ----------------------
 dnl Mule-dependent options
 dnl ----------------------

-test -z "$enable_mule" && enable_mule=no
+if test "$with_unicode_internal" = "yes"; then
+  test "$with_mule" = "no" && \
+	USAGE_ERROR("Cannot give \`--with-mule=no' and
`--with-unicode-internal=yes'.
+Unicode-internal requires Mule.")
+  test "$with_mule" != "yes" && \
+	AC_MSG_WARN([\`--with-mule' forced to \`yes': Unicode-internal requires Mule.])
+  with_mule=yes
+  AC_DEFINE(UNICODE_INTERNAL)
+fi

-dnl if test "$enable_mule" = "yes" && test ! -d
"$srcdir/lisp/mule"; then
+test -z "$with_mule" && with_mule=no
+
+dnl if test "$with_mule" = "yes" && test ! -d
"$srcdir/lisp/mule"; then
 dnl	echo "Attempt to Build with Mule without Mule/Lisp"
 dnl	echo "Please install the XEmacs/Mule tarball or"
 dnl	echo "rerun configure with --with-mule=no"
 dnl	exit 1
 dnl fi

-if test "$enable_default_eol_detection" = "yes"; then
+if test "$with_default_eol_detection" = "yes"; then
   AC_DEFINE(HAVE_DEFAULT_EOL_DETECTION)
 fi

＠＠ -4238,7 +4267,7 ＠＠
 canna_libs=
 wnn_libs=

-if test "$enable_mule" = "yes" ; then
+if test "$with_mule" = "yes" ; then
   AC_CHECKING([for Mule-related features])
   AC_DEFINE(MULE)

＠＠ -4281,7 +4310,7 ＠＠
   if test "$with_xfs" = "yes" ; then
     AC_CHECKING([for XFontSet])
     AC_CHECK_LIB(X11, XmbDrawString, [:], with_xfs=no)
-    if test "$with_xfs" = "yes" && test
"$enable_menubars" = "lucid"; then
+    if test "$with_xfs" = "yes" && test
"$with_menubars" = "lucid"; then
       AC_DEFINE(USE_XFONTSET)
       if test "$with_xim" = "no" ; then
         XE_ADD_OBJS(input-method-xlib.o)
＠＠ -4348,7 +4377,7 ＠＠
   fi
 AC_SUBST(canna_libs)

-else dnl "$enable_mule" = "no"
+else dnl "$with_mule" = "no"
   for feature in xim canna wnn; do
     if eval "test -n \"\$with_${feature}\" -a
\"\$with_${feature}\" != no" ; then
       AC_MSG_WARN([--with-${feature} ignored:  Not valid without Mule support])
＠＠ -4370,7 +4399,7 ＠＠
 dnl Check for POSIX functions.
 dnl ----------------------------------------------------------------

-AC_CHECK_FUNCS(cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate
getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb
lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select
setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr
symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen)
+AC_CHECK_FUNCS(cbrt closedir dup2 eaccess fmod fpathconf frexp fsync ftime ftruncate
getaddrinfo gethostname getnameinfo getpagesize getrlimit gettimeofday getcwd link logb
lrand48 matherr mkdir mktime perror poll random readlink rename res_init rint rmdir select
setitimer setpgid setsid sigblock sighold sigprocmask snprintf strerror strlwr strupr
symlink tzset ulimit umask usleep vlimit vsnprintf waitpid wcscmp wcslen wcwidth)

 dnl getaddrinfo() is borked under hpux11
 if test "$ac_cv_func_getaddrinfo" != "no" ; then
＠＠ -4745,9 +4774,9 ＠＠

 dnl Autodetect native sound
 AC_CHECKING([for sound support])
-test -n "$with_native_sound_lib" && enable_sound_native=yes
+test -n "$with_native_sound_lib" && with_sound_native=yes

-if test "$enable_sound_native" != "no"; then
+if test "$with_sound_native" != "no"; then
   dnl Maybe sound is already on include path...
   if test -n "$with_native_sound_lib"; then
     AC_CHECK_HEADER(multimedia/audio_device.h,
＠＠ -4848,26 +4877,26 ＠＠
   fi

   if test "$sound_found" = "yes"; then
-    enable_sound_native=yes
+    with_sound_native=yes
   else
-    if test "$enable_sound_native" = "yes" ; then
+    if test "$with_sound_native" = "yes" ; then
       AC_MSG_WARN([No native libraries found.  Disabling native sound support.])
     fi
-    enable_sound_native=no     
+    with_sound_native=no     
   fi
 fi

-if test "$enable_sound_native" = "yes"; then
+if test "$with_sound_native" = "yes"; then
   AC_DEFINE(HAVE_NATIVE_SOUND)
   test -n "$with_native_sound_lib" &&
XE_PREPEND($with_native_sound_lib, LIBS)
 fi

 dnl NAS Sound support
-if test "$enable_sound_nas" != "no"; then
+if test "$with_sound_nas" != "no"; then
   AC_CHECK_HEADER(audio/audiolib.h, [
     AC_CHECK_LIB(audio, AuOpenServer, have_nas_sound=yes)])
   if test "$have_nas_sound" = "yes"; then
-    enable_sound_nas=yes
+    with_sound_nas=yes
     AC_DEFINE(HAVE_NAS_SOUND)
     XE_ADD_OBJS(nas.o)
     XE_PREPEND(-laudio, libs_x)
＠＠ -4875,14 +4904,14 ＠＠
     dnl then we force safer behavior.
     AC_EGREP_HEADER(AuXtErrorJump,audio/Xtutil.h,,[old_nas=yes;
AC_DEFINE(NAS_NO_ERROR_JUMP)])
   else
-    test "$enable_sound_nas" = "yes" && \
+    test "$with_sound_nas" = "yes" && \
       XE_DIE("Required NAS sound support cannot be provided.")
-    enable_sound_nas=no
+    with_sound_nas=no
   fi
 fi

 dnl ESD Sound support
-if test "$enable_sound_esd" != "no"; then
+if test "$with_sound_esd" != "no"; then
   AC_CHECK_PROG(have_esd_config, esd-config, yes, no)
   if test "$have_esd_config" = "yes"; then
     save_c_switch_site="$c_switch_site" save_LIBS="$LIBS"
＠＠ -4894,14 +4923,14 ＠＠
   fi

   if test "$have_esd_sound" = "yes"; then
-    enable_sound_esd=yes
+    with_sound_esd=yes
     need_miscplay=yes
     XE_ADD_OBJS(esd.o)
     AC_DEFINE(HAVE_ESD_SOUND)
   else
-    test "$enable_sound_esd" = "yes" && \
+    test "$with_sound_esd" = "yes" && \
       XE_DIE("Required ESD sound support cannot be provided.")
-    enable_sound_esd=no
+    with_sound_esd=no
   fi
 fi

＠＠ -4998,45 +5027,45 ＠＠
 dnl On FreeBSD, both DB and DBM are part of libc.
 dnl By default, we check for DBM support in libgdbm, then libc, then libdbm.

-test "$enable_database_gdbm $enable_database_dbm $enable_database_berkdb" \
+test "$with_database_gdbm $with_database_dbm $with_database_berkdb" \
   != "no no no" && AC_CHECKING([for database support])

 dnl Check for ndbm.h, required for either kind of DBM support.
-if test "$enable_database_gdbm $enable_database_dbm" != "no no";
then
+if test "$with_database_gdbm $with_database_dbm" != "no no"; then
   AC_CHECK_HEADER(ndbm.h, [:], [
-    test "$enable_database_gdbm" = "yes" -o \
-         "$enable_database_dbm"  = "yes" && \
+    test "$with_database_gdbm" = "yes" -o \
+         "$with_database_dbm"  = "yes" && \
       XE_DIE("Required DBM support cannot be provided.")
-    enable_database_gdbm=no enable_database_dbm=no])
+    with_database_gdbm=no with_database_dbm=no])
 fi

 dnl Check for DBM support in libgdbm.
-if test "$enable_database_gdbm" != "no"; then
+if test "$with_database_gdbm" != "no"; then
   AC_CHECK_LIB(gdbm, dbm_open, [
-   enable_database_gdbm=yes enable_database_dbm=no libdbm=-lgdbm], [
-   if test "$enable_database_gdbm" = "yes"; then
+   with_database_gdbm=yes with_database_dbm=no libdbm=-lgdbm], [
+   if test "$with_database_gdbm" = "yes"; then
      XE_DIE("Required GNU DBM support cannot be provided.")
    fi
-   enable_database_gdbm=no])
+   with_database_gdbm=no])
 fi

 dnl Check for DBM support in libc and libdbm.
-if test "$enable_database_dbm" != "no"; then
-  AC_CHECK_FUNC(dbm_open, [enable_database_dbm=yes libdbm=], [
-    AC_CHECK_LIB(dbm, dbm_open, [enable_database_dbm=yes libdbm=-ldbm], [
-      test "$enable_database_dbm" = "yes" && \
+if test "$with_database_dbm" != "no"; then
+  AC_CHECK_FUNC(dbm_open, [with_database_dbm=yes libdbm=], [
+    AC_CHECK_LIB(dbm, dbm_open, [with_database_dbm=yes libdbm=-ldbm], [
+      test "$with_database_dbm" = "yes" && \
         XE_DIE("Required DBM support cannot be provided.")
-      enable_database_dbm=no])])
+      with_database_dbm=no])])
 fi

 dnl Tell make about the DBM support we detected.
 test -n "$libdbm" && XE_PREPEND("$libdbm", LIBS)
-test "$enable_database_gdbm" = "yes" -o \
-     "$enable_database_dbm"  = "yes" && \
+test "$with_database_gdbm" = "yes" -o \
+     "$with_database_dbm"  = "yes" && \
   AC_DEFINE(HAVE_DBM)

 dnl Check for Berkeley DB.
-if test "$enable_database_berkdb" != "no"; then
+if test "$with_database_berkdb" != "no"; then
   AC_MSG_CHECKING(for Berkeley db.h)
   for header in "db/db.h" "db.h"; do
     AC_TRY_COMPILE([
＠＠ -5057,11 +5086,11 ＠＠
 ],[], db_h_file="$header"; break)
   done
   if test -z "$db_h_file"
-  then AC_MSG_RESULT(no); enable_database_berkdb=no
+  then AC_MSG_RESULT(no); with_database_berkdb=no
   else AC_MSG_RESULT($db_h_file)
   fi

-  if test "$enable_database_berkdb" != "no"; then
+  if test "$with_database_berkdb" != "no"; then
     AC_MSG_CHECKING(for Berkeley DB version)
     AC_EGREP_CPP(yes,
 [#include <$db_h_file>
＠＠ -5082,12 +5111,12 ＠＠
     AC_MSG_RESULT(3); dbfunc=db_create; dbver=3])],[
     AC_MSG_RESULT(2); dbfunc=db_open; dbver=2])],[
     AC_MSG_RESULT(1); dbfunc=dbopen; dbver=1])
-    AC_CHECK_FUNC($dbfunc,     enable_database_berkdb=yes need_libdb=no, [
-    AC_CHECK_LIB(db, $dbfunc,  enable_database_berkdb=yes need_libdb=yes)])
+    AC_CHECK_FUNC($dbfunc,     with_database_berkdb=yes need_libdb=no, [
+    AC_CHECK_LIB(db, $dbfunc,  with_database_berkdb=yes need_libdb=yes)])
   fi

   dnl Berk db 4.1 decorates public functions with version information
-  if test "$enable_database_berkdb" != "yes" -a "$dbver" =
"4"; then
+  if test "$with_database_berkdb" != "yes" -a "$dbver" =
"4"; then
     rm -f $tempcname
     echo "#include <$db_h_file>" > $tempcname
     echo "configure___ dbfunc=db_create" >> $tempcname
＠＠ -5096,18 +5125,18 ＠＠
 	| sed -n -e "s/[[ TAB]]*=[[ TAB\"]]*/='/" -e "s/[[
TAB\"]]*\$/'/" -e "s/^configure___//p"`
     rm -f $tempcname
     AC_MSG_WARN("db_create is really $dbfunc")
-    AC_CHECK_LIB(db, $dbfunc,  enable_database_berkdb=yes need_libdb=yes)
+    AC_CHECK_LIB(db, $dbfunc,  with_database_berkdb=yes need_libdb=yes)
   fi

-  if test "$enable_database_berkdb" = "yes"; then
+  if test "$with_database_berkdb" = "yes"; then
     AC_DEFINE_UNQUOTED(DB_H_FILE, "$db_h_file")
     AC_DEFINE(HAVE_BERKELEY_DB)
     test "$need_libdb" = "yes" && XE_PREPEND(-ldb, LIBS)
-  else enable_database_berkdb=no
+  else with_database_berkdb=no
   fi
 fi

-if test "$enable_database_gdbm $enable_database_dbm $enable_database_berkdb" \
+if test "$with_database_gdbm $with_database_dbm $with_database_berkdb" \
      != "no no no"; then
   AC_DEFINE(HAVE_DATABASE)
 fi
＠＠ -5125,7 +5154,7 ＠＠
 fi

 dnl Enhanced number support
-if test "$enable_bignum" = "gmp"; then
+if test "$with_bignum" = "gmp"; then
   AC_CHECK_HEADER(gmp.h, [
     AC_CHECK_LIB(gmp, __gmpz_init, have_mpz_init=yes)])
   if test "$have_mpz_init" = "yes"; then
＠＠ -5135,7 +5164,7 ＠＠
   else
     XE_DIE("Required GMP numeric support cannot be provided.")
   fi
-elif test "$enable_bignum" = "mp"; then
+elif test "$with_bignum" = "mp"; then
   for library in "" "-lcrypto"; do
     AC_CHECK_HEADER(mp.h, [
       AC_CHECK_LIB(mp, mp_mfree, have_mp_mfree=yes; break, [
＠＠ -5194,7 +5223,7 ＠＠

 dnl We ignore (C|LD)_SWITCH_X_(MACHINE|SYSTEM)
 dnl Use XE_SPACE instead of plain assignment statements to remove extraneous blanks
-if test "$enable_modules" = "yes"; then
+if test "$with_modules" = "yes"; then
   ld_libs_module=
 else
   XE_SPACE(ld_libs_module, $ldap_libs $postgresql_libs $canna_libs)
＠＠ -5213,7 +5242,7 ＠＠
 XE_SPACE(ld_libs_all, $ld_libs_window_system $ld_libs_general $ld_libs_module)

 dnl For no-module builds, make the src dir last
-if test "$enable_modules" = "no"; then
+if test "$with_modules" = "no"; then
   XE_APPEND(src, MAKE_SUBDIR)
 fi

＠＠ -5363,7 +5392,7 ＠＠
 AC_SUBST(INFOPATH)

 if test -z "$with_package_path" && test -n
"$with_package_prefix" ; then
-  if test "$enable_mule" = "yes" ; then
+  if test "$with_mule" = "yes" ; then

with_package_path="~/.xemacs::${with_package_prefix}/site-packages:${with_package_prefix}/xemacs-packages:${with_package_prefix}/mule-packages"
   else

with_package_path="~/.xemacs::${with_package_prefix}/site-packages:${with_package_prefix}/xemacs-packages"
＠＠ -5531,18 +5560,18 ＠＠
 					      AC_DEFINE(USE_SYSTEM_MALLOC)
 fi
 test "$GCC"                = "yes" && AC_DEFINE(USE_GCC)
-test "$enable_external_widget" = "yes" &&
AC_DEFINE(EXTERNAL_WIDGET)
-test "$enable_kkcc"        = "yes" && AC_DEFINE(USE_KKCC)
-test "$enable_mc_alloc"    = "yes" && AC_DEFINE(MC_ALLOC)
-test "$enable_quick_build" = "yes" && AC_DEFINE(QUICK_BUILD)
+test "$with_external_widget" = "yes" &&
AC_DEFINE(EXTERNAL_WIDGET)
+test "$with_kkcc"        = "yes" && AC_DEFINE(USE_KKCC)
+test "$with_mc_alloc"    = "yes" && AC_DEFINE(MC_ALLOC)
+test "$with_quick_build" = "yes" && AC_DEFINE(QUICK_BUILD)
 test "$with_purify"        = "yes" && AC_DEFINE(PURIFY)
 test "$with_quantify"      = "yes" && AC_DEFINE(QUANTIFY)
 test "$with_pop"           = "yes" &&
AC_DEFINE(MAIL_USE_POP)
 test "$with_kerberos"      = "yes" && AC_DEFINE(KERBEROS)
 test "$with_hesiod"        = "yes" && AC_DEFINE(HESIOD)
-test "$enable_union_type"  = "yes" &&
AC_DEFINE(USE_UNION_TYPE)
-test "$enable_pdump"       = "yes" && AC_DEFINE(PDUMP)
-test "$enable_dump_in_exec" = "yes" &&
AC_DEFINE(DUMP_IN_EXEC)
+test "$with_union_type"  = "yes" &&
AC_DEFINE(USE_UNION_TYPE)
+test "$with_pdump"       = "yes" && AC_DEFINE(PDUMP)
+test "$with_dump_in_exec" = "yes" && AC_DEFINE(DUMP_IN_EXEC)
 test "$with_ipv6_cname"    = "yes" &&
AC_DEFINE(IPV6_CANONICALIZE)

＠＠ -5625,7 +5654,7 ＠＠
 echo "  GNU version of malloc:             ${GNU_MALLOC}${GNU_MALLOC_reason}"
 case "$ld_switch_site" in
   *nocombreloc*) echo "  Linking with \`-z nocombreloc'.
-    - Consider configuring with --enable-pdump." ;;
+    - Consider configuring with --with-pdump." ;;
 esac

 echo "
＠＠ -5662,35 +5691,35 ＠＠
   echo "    - Athena headers location:                    $athena_h_path"
   echo "    - Athena library to link:                     $athena_lib"
 fi
-case "$enable_menubars" in
+case "$with_menubars" in
   gtk   ) echo "  Using GTK menubars."   ;;
   lucid ) echo "  Using Lucid menubars." ;;
   motif ) echo "  Using Motif menubars."
           echo "  *WARNING*  The Motif menubar implementation is currently
buggy."
           echo "             We recommend using the Lucid menubar instead."
-          echo "             Re-run configure with
--enable-menubars='lucid'." ;;
+          echo "             Re-run configure with
--with-menubars='lucid'." ;;
     msw ) echo "  Using MS-Windows menubars." ;;
 esac
-case "$enable_scrollbars" in
+case "$with_scrollbars" in
   gtk    ) echo "  Using GTK scrollbars."        ;;
   lucid  ) echo "  Using Lucid scrollbars."      ;;
   motif  ) echo "  Using Motif scrollbars."      ;;
   athena ) echo "  Using Athena scrollbars."     ;;
      msw ) echo "  Using MS-Windows scrollbars." ;;
 esac
-case "$enable_dialogs" in
+case "$with_dialogs" in
   gtk    ) echo "  Using GTK dialog boxes." ;;
   motif  ) echo "  Using Motif dialog boxes."
            if test "$unexec" = "unexaix.o"; then if test "`uname
-v`" = 4 -a "`uname -r`" -ge 3; then
              echo "  *WARNING*  The Motif dialog boxes cause problems on AIX 4.3 and
higher."
              echo "             We recommend using the Athena dialog boxes
instead."
-             echo "             Install libXaw and re-run configure with
--enable-dialogs='athena'."
+             echo "             Install libXaw and re-run configure with
--with-dialogs='athena'."
              echo "             Read the PROBLEMS file for more information."
            fi; fi ;;
   athena ) echo "  Using Athena dialog boxes."     ;;
      msw ) echo "  Using MS-Windows dialog boxes." ;;
 esac
-case "$enable_widgets" in
+case "$with_widgets" in
   gtk    ) echo "  Using GTK native widgets."        ;;
   motif  ) echo "  Using Motif native widgets."      ;;
   athena ) echo "  Using Athena native widgets."     ;;
＠＠ -5741,16 +5770,16 ＠＠

 echo "
 Sound:"
-test "$enable_sound_native" = yes && echo "  Compiling in support
for sound (native)."
-test "$enable_sound_nas" = yes && echo "  Compiling in support for
NAS (network audio system)."
+test "$with_sound_native" = yes && echo "  Compiling in support
for sound (native)."
+test "$with_sound_nas" = yes && echo "  Compiling in support for
NAS (network audio system)."
 test "$old_nas" = yes && echo "    - NAS library lacks error
trapping; will play synchronously."
-test "$enable_sound_esd" = yes && echo "  Compiling in support for
ESD (Enlightened Sound Daemon)."
+test "$with_sound_esd" = yes && echo "  Compiling in support for
ESD (Enlightened Sound Daemon)."

 echo "
 Databases:"
-test "$enable_database_berkdb" = yes && echo "  Compiling in
support for Berkeley database."
-test "$enable_database_dbm"    = yes && echo "  Compiling in
support for DBM."
-test "$enable_database_gdbm"   = yes && echo "  Compiling in
support for GNU DBM."
+test "$with_database_berkdb" = yes && echo "  Compiling in support
for Berkeley database."
+test "$with_database_dbm"    = yes && echo "  Compiling in support
for DBM."
+test "$with_database_gdbm"   = yes && echo "  Compiling in support
for GNU DBM."
 test "$with_ldap" = yes && echo "  Compiling in support for
LDAP."
 if test "$with_postgresql" = yes; then
   echo "  Compiling in support for PostgreSQL."
＠＠ -5760,7 +5789,11 ＠＠

 echo "
 Internationalization:"
-test "$enable_mule" = yes && echo "  Compiling in support for Mule
(multi-lingual Emacs)."
+if test "$with_mule" = yes; then
+  echo "  Compiling in support for Mule (multi-lingual Emacs)."
+  test "$with_unicode_internal" = yes && echo "    -  Unicode and
UTF-8 will be used internally for text."
+  test "$with_unicode_internal" != yes && echo "    -  The
old-Mule format will be used internally for text."
+fi
 test "$with_xim" != no && echo "  Compiling in support for XIM
(X11R5+ I18N input method)."
 test "$with_xim" = motif && echo "    - Using Motif to provide XIM
support."
 test "$with_xim" = xlib && echo "    - Using raw Xlib to provide
XIM support."
＠＠ -5785,10 +5818,10 ＠＠
 test "$with_workshop" = yes && echo "  Compiling in support for
Sun WorkShop."
 test "$with_socks" = yes && echo "  Compiling in support for
SOCKS."
 test "$with_dnet" = yes && echo "  Compiling in support for
DNET."
-test "$enable_modules" = "yes" && echo "  Compiling in
support for dynamic shared object modules."
-test "$enable_bignum" = "gmp" && echo "  Compiling in
support for more number types using the GNU MP library."
-test "$enable_bignum" = "mp" && echo "  Compiling in
support for more number types using the BSD MP library."
-if test "$enable_union_type" = yes ; then
+test "$with_modules" = "yes" && echo "  Compiling in
support for dynamic shared object modules."
+test "$with_bignum" = "gmp" && echo "  Compiling in
support for more number types using the GNU MP library."
+test "$with_bignum" = "mp" && echo "  Compiling in
support for more number types using the BSD MP library."
+if test "$with_union_type" = yes ; then
   echo "  Using the union type for Lisp_Objects."
   echo "  WARNING: ---------------------------------------------------------"
   echo "  WARNING: This tends to trigger compiler bugs, especially when"
＠＠ -5798,7 +5831,7 ＠＠
   echo "  WARNING: More recent versions may be safer, or not."
   echo "  WARNING: ---------------------------------------------------------"
 fi
-if test "$enable_kkcc" = yes ; then
+if test "$with_kkcc" = yes ; then
   echo "  Using the new GC algorithms."
   echo "  WARNING: ---------------------------------------------------------"
   echo "  WARNING: The new algorithms are experimental. They are enabled by"
＠＠ -5806,7 +5839,7 ＠＠
   echo "  WARNING: turn it off."
   echo "  WARNING: ---------------------------------------------------------"
 fi
-if test "$enable_mc_alloc" = yes ; then
+if test "$with_mc_alloc" = yes ; then
   echo "  Using the new allocator."
   echo "  WARNING: ---------------------------------------------------------"
   echo "  WARNING: The new allocator is experimental. It is enabled by"
＠＠ -5814,11 +5847,11 ＠＠
   echo "  WARNING: turn it off."
   echo "  WARNING: ---------------------------------------------------------"
 fi
-test "$enable_pdump" = yes && echo "  Using the new portable
dumper."
-test "$enable_dump_in_exec" = yes && echo "  Dumping into
executable."
-test "$enable_debug" = yes && echo "  Compiling in support for
extra debugging code."
+test "$with_pdump" = yes && echo "  Using the new portable
dumper."
+test "$with_dump_in_exec" = yes && echo "  Dumping into
executable."
+test "$with_debug" = yes && echo "  Compiling in support for extra
debugging code."
 test "$usage_tracking" = yes && echo "  Compiling in support for
active usage tracking (Sun internal)."
-if test "$enable_error_checking_extents $enable_error_checking_types
$enable_error_checking_text $enable_error_checking_gc $enable_error_checking_malloc
$enable_error_checking_glyphs $enable_error_checking_byte_code
$enable_error_checking_display $enable_error_checking_structures" \
+if test "$with_error_checking_extents $with_error_checking_types
$with_error_checking_text $with_error_checking_gc $with_error_checking_malloc
$with_error_checking_glyphs $with_error_checking_byte_code $with_error_checking_display
$with_error_checking_structures" \
   != "no no no no no no no no no"; then
   echo "  Compiling in support for runtime error checking."
   echo "  WARNING: ---------------------------------------------------------"
＠＠ -5858,7 +5891,7 ＠＠
   AC_CONFIG_FILES($file)
 done
 AC_CONFIG_FILES(src/paths.h src/xemacs.def.in lib-src/config.values)
-test "$enable_modules" = "yes" &&
AC_CONFIG_FILES(lib-src/ellcc.h)
+test "$with_modules" = "yes" &&
AC_CONFIG_FILES(lib-src/ellcc.h)

 dnl Normally []'s are used for quoting but this will cause problems
cvs server: Diffing dynodump
cvs server: Diffing dynodump/i386
cvs server: Diffing dynodump/ppc
cvs server: Diffing dynodump/sparc
cvs server: Diffing etc
cvs server: Diffing etc/custom
cvs server: Diffing etc/custom/example-themes
cvs server: Diffing etc/eos
cvs server: Diffing etc/idd
cvs server: Diffing etc/photos
cvs server: Diffing etc/sparcworks
cvs server: Diffing etc/tests
cvs server: Diffing etc/tests/external-widget
cvs server: Diffing etc/toolbar
cvs server: Diffing etc/unicode
cvs server: Diffing etc/unicode/ibm
cvs server: Diffing etc/unicode/mule-ucs
cvs server: Diffing etc/unicode/other
cvs server: Diffing etc/unicode/unicode-consortium
cvs server: Diffing info
cvs server: Diffing lib-src
cvs server: Diffing lisp
Index: lisp/bytecomp.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/bytecomp.el,v
retrieving revision 1.19
diff -u -r1.19 bytecomp.el
--- lisp/bytecomp.el	2004/08/13 21:19:15	1.19
+++ lisp/bytecomp.el	2005/11/22 14:00:04
＠＠ -1,7 +1,7 ＠＠
 ;;; bytecomp.el --- compilation of Lisp code into byte code.

 ;;; Copyright (C) 1985-1987, 1991-1994 Free Software Foundation, Inc.
-;;; Copyright (C) 1996 Ben Wing.
+;;; Copyright (C) 1996, 2005 Ben Wing.

 ;; Authors: Jamie Zawinski <jwz(a)jwz.org&gt;
 ;;	Hallvard Furuseth <hbf(a)ulrik.uio.no&gt;
＠＠ -1592,7 +1592,16 ＠＠
       (setq input-buffer (get-buffer-create " *Compiler Input*"))
       (set-buffer input-buffer)
       (erase-buffer)
-      (insert-file-contents filename)
+      (let ((codesys
+	     (and (featurep 'mule)
+		  (find-coding-system-magic-cookie-in-file filename))))
+	(when codesys
+	  (setq codesys (find-coding-system (intern codesys))))
+	(if (and codesys (eq 'iso2022 (coding-system-type codesys)))
+	    (let ((coding-system-for-read 'iso-2022-8bit-preserve))
+	      (debug-print "foo")
+	      (insert-file-contents filename))
+	  (insert-file-contents filename)))
       ;; Run hooks including the uncompression hook.
       ;; If they change the file name, then change it for the output also.
       (let ((buffer-file-name filename)
Index: lisp/derived.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/derived.el,v
retrieving revision 1.10
diff -u -r1.10 derived.el
--- lisp/derived.el	2004/06/18 15:48:38	1.10
+++ lisp/derived.el	2005/11/22 14:00:04
＠＠ -421,17 +421,8 ＠＠
   ;; check for inheritance.
   (map-char-table
    #'(lambda (key value)
-       (let ((newval (get-range-char-table key new 'multi)))
-	 (cond ((eq newval 'multi)	; OK, dive into the class hierarchy
-		(map-char-table
-		 #'(lambda (key1 value1)
-		     (when (eq ?＠ (char-syntax-from-code
-				   (get-range-char-table key new ?＠)))
-		       (put-char-table key1 value new))
-		     nil)
-		 new
-		 key))
-	       ((eq ?＠ (char-syntax-from-code newval)) ;; class at once
+       (let ((newval (get-char-table key new)))
+	 (cond ((eq ?＠ (char-syntax-from-code newval)) ;; class at once
 		(put-char-table key value new))))
        nil)
    old))
Index: lisp/dumped-lisp.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/dumped-lisp.el,v
retrieving revision 1.56
diff -u -r1.56 dumped-lisp.el
--- lisp/dumped-lisp.el	2005/10/04 17:51:20	1.56
+++ lisp/dumped-lisp.el	2005/11/22 14:00:04
＠＠ -155,7 +155,6 ＠＠
        "code-process"
        ;; Provide basic commands to set coding systems to user
        "code-cmds"
-       "unicode"
 	;;;;;;;;;;;;;;;;;; MULE support
        (when (featurep 'mule)
 	 '("mule/mule-charset"
＠＠ -164,18 +163,25 ＠＠
 	   "mule/mule-composite-stub"
 	   "mule/mule-composite"
 	   ))
+       ;; Initialize Unicode and load the translation tables.  This requires
+       ;; that all charsets be created (happens in mule/mule-charset).
+       "unicode"
        ;; may initialize coding systems
        (when (featurep '(and mule x)) "mule/mule-x-init")
        (when (featurep '(and mule tty)) "mule/mule-tty-init")
        (when (and (featurep 'mule) (memq system-type '(windows-nt cygwin32)))
 	 "mule/mule-win32-init")
        "code-init" ; set up defaults
-       ;; All files after this can have extended characters in them.
+
+;;; ***************************************************************************
+;;;           All files after this can have extended characters in them.
+;;; ***************************************************************************
+
        (when (featurep 'mule)
 	 '("mule/mule-category"
-	   "mule/mule-ccl"
 	   "mule/kinsoku"
 	   ))
+       (when (featurep 'ccl) "mule/mule-ccl")

 ;; after this goes the specific lisp routines for a particular input system
 ;; 97.2.5 JHod Shouldn't these go into a site-load file to allow site
＠＠ -215,6 +221,7 ＠＠
 	   "mule/thai-xtis"
 	   "mule/tibetan"
 	   "mule/vietnamese"
+	   "mule/windows"
 	   ))

 	;; Specialized language support
Index: lisp/loadup.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/loadup.el,v
retrieving revision 1.31
diff -u -r1.31 loadup.el
--- lisp/loadup.el	2004/12/27 12:25:14	1.31
+++ lisp/loadup.el	2005/11/22 14:00:04
＠＠ -2,7 +2,7 ＠＠

 ;; Copyright (C) 1985, 1986, 1992, 1994, 1997 Free Software Foundation, Inc.
 ;; Copyright (C) 1996 Richard Mlynarik.
-;; Copyright (C) 1995, 1996, 2003 Ben Wing.
+;; Copyright (C) 1995, 1996, 2003, 2005 Ben Wing.

 ;; Maintainer: XEmacs Development Team
 ;; Keywords: internal, dumped
＠＠ -38,6 +38,12 ＠＠
 ;; Help debug problems.
 (setq stack-trace-on-error t
       load-always-display-messages t)
+(when (featurep 'debug-xemacs)
+  ;; Immediately dump core upon an unhandled error, rather than just quitting
+  ;; the program.  This can also be achieved by setting an environment variable
+  ;; XEMACSDEBUG to contain '(setq debug-on-error t)', e.g.
+  ;; export XEMACSDEBUG='(setq debug-on-error t)'
+  (setq debug-on-error t))

 ;(princ (format "command-line-args: %s\n" command-line-args))
 ;(princ (format "configure-lisp-directory: %S\n" configure-lisp-directory))
＠＠ -158,13 +164,6 ＠＠
 	    (defun toolbar-button-p    (obj) "No toolbar support." nil)
 	    (defun toolbar-specifier-p (obj) "No toolbar support." nil))
 	  (fmakunbound 'pureload))
-
-	;; We cannot do this in mule-cmds.el because not all the
-	;; appropriate charsets are loaded yet.
-	(when (and (featurep 'mule)
-		   load-unicode-tables-at-dump-time)
-	  (let ((data-directory (expand-file-name "etc" source-root)))
-	    (load-unicode-tables)))

 	(packages-load-package-dumped-lisps late-package-load-path)

Index: lisp/occur.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/occur.el,v
retrieving revision 1.2
diff -u -r1.2 occur.el
--- lisp/occur.el	2005/10/24 10:07:27	1.2
+++ lisp/occur.el	2005/11/22 14:00:05
＠＠ -319,12 +319,12 ＠＠
   (interactive (occur-read-primary-args))
   (occur-1 regexp nlines (list (current-buffer))))

+(defvar ido-ignore-item-temp-list)
 ;;;###autoload
 (defun multi-occur (bufs regexp &optional nlines)
   "Show all lines in buffers BUFS containing a match for REGEXP.
 This function acts on multiple buffers; otherwise, it is exactly like
 `occur'."
-  (defvar ido-ignore-item-temp-list)
   (interactive
    (cons
     (let* ((bufs (list (read-buffer "First buffer to search: "
＠＠ -465,9 +465,9 ＠＠
 		    (setq marker (make-marker))
 		    (set-marker marker matchbeg)
 		    (if (and keep-props
-			     (if (boundp 'jit-lock-mode) jit-lock-mode)
+			     (if-boundp 'jit-lock-mode jit-lock-mode)
 			     (text-property-not-all begpt endpt 'fontified t))
-			(if (fboundp 'jit-lock-fontify-now)
+			(if-fboundp 'jit-lock-fontify-now
 			    (jit-lock-fontify-now begpt endpt)))
 		    (setq curstring (buffer-substring begpt endpt))
 		    ;; Depropertize the string, and maybe
Index: lisp/unicode.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/unicode.el,v
retrieving revision 1.12
diff -u -r1.12 unicode.el
--- lisp/unicode.el	2005/03/04 21:59:43	1.12
+++ lisp/unicode.el	2005/11/22 14:00:05
＠＠ -1,6 +1,6 ＠＠
-;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*-
+;;; unicode.el --- Unicode support

-;; Copyright (C) 2001, 2002 Ben Wing.
+;; Copyright (C) 2001, 2002, 2005 Ben Wing.

 ;; Keywords: multilingual, Unicode

＠＠ -29,65 +29,12 ＠＠

 ;;; Code:

-; ;; Subsets of Unicode.
-
-; #### what is this bogosity ... "chars 96, final ?2" !!?!
-; (make-charset 'mule-unicode-2500-33ff 
-; 	      "Unicode characters of the range U+2500..U+33FF."
-; 	      '(dimension
-; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?2
-; 		graphic 0
-; 		short-name "Unicode subset 2"
-; 		long-name "Unicode subset (U+2500..U+33FF)"
-; 		))
-
-
-; (make-charset 'mule-unicode-e000-ffff 
-; 	      "Unicode characters of the range U+E000..U+FFFF."
-; 	      '(dimension
-; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?3
-; 		graphic 0
-; 		short-name "Unicode subset 3"
-; 		long-name "Unicode subset (U+E000+FFFF)"
-; 		))
-
-
-; (make-charset 'mule-unicode-0100-24ff 
-; 	      "Unicode characters of the range U+0100..U+24FF."
-; 	      '(dimension
-; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?1
-; 		graphic 0
-; 		short-name "Unicode subset"
-; 		long-name "Unicode subset (U+0100..U+24FF)"
-; 		))
-
-
-;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c
-(defvar load-unicode-tables-at-dump-time (eq system-type 'windows-nt)
-  "[INTERNAL] Whether to load the Unicode tables at dump time.
-Setting this at run-time does nothing.")
-
 ;; NOTE: This takes only a fraction of a second on my Pentium III
 ;; 700Mhz even with a totally optimization-disabled XEmacs.
 (defun load-unicode-tables ()
   "Initialize the Unicode translation tables for all standard charsets."
   (let ((parse-args
-	 '(("unicode/unicode-consortium"
+	 `(("unicode/unicode-consortium"
 	    ;; Due to the braindamaged way Mule treats the ASCII and Control-1
 	    ;; charsets' types, trying to load them results in out-of-range
 	    ;; warnings at unicode.c:1439.  They're no-ops anyway, they're
＠＠ -96,74 +43,83 ＠＠
 	    ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80)
             ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in
 	    ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar.
-	    ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80)
-	    ;; "8859-10.TXT"
-	    ;; "8859-13.TXT"
-	    ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80)
-	    ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80)
-	    ("8859-16.TXT" latin-iso8859-16 #xA0 #xFF #x-80)
-	    ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80)
-	    ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80)
-	    ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80)
-	    ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80)
-	    ("8859-6.TXT" arabic-iso8859-6 #xA0 #xFF #x-80)
-	    ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80)

-	    ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80)
-	    ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80)
-	    ;; charset for Big5 does not matter; specifying `big5' will
-	    ;; automatically make the right thing happen
-	    ("BIG5.TXT" chinese-big5-1 nil nil nil big5)
-	    ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
-	    ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
-	    ;; "CP1250.TXT" 
-	    ;; "CP1251.TXT" 
-	    ;; "CP1252.TXT" 
-	    ;; "CP1253.TXT" 
-	    ;; "CP1254.TXT" 
-	    ;; "CP1255.TXT" 
-	    ;; "CP1256.TXT" 
-	    ;; "CP1257.TXT" 
-	    ;; "CP1258.TXT" 
-	    ;; "CP874.TXT" 
-	    ;; "CP932.TXT" 
-	    ;; "CP936.TXT" 
-	    ;; "CP949.TXT" 
-	    ;; "CP950.TXT" 
+	    ;; ＠＠#### Add more charsets, esp. things like KOI8-R; take them
+	    ;; from emacs-unicode-2, among other things.
+	    ("8859-1.TXT" latin-iso8859-1 #xA0)
+	    ("8859-10.TXT" latin-iso8859-10 #xA0)
+	    ("8859-13.TXT" latin-iso8859-13 #xA0)
+	    ("8859-14.TXT" latin-iso8859-14 #xA0)
+	    ("8859-15.TXT" latin-iso8859-15 #xA0)
+	    ("8859-16.TXT" latin-iso8859-16 #xA0)
+	    ("8859-2.TXT" latin-iso8859-2 #xA0)
+	    ("8859-3.TXT" latin-iso8859-3 #xA0)
+	    ("8859-4.TXT" latin-iso8859-4 #xA0)
+	    ("8859-5.TXT" cyrillic-iso8859-5 #xA0)
+	    ("8859-6.TXT" arabic-iso8859-6 #xA0)
+	    ("8859-7.TXT" greek-iso8859-7 #xA0)
+	    ("8859-8.TXT" hebrew-iso8859-8 #xA0)
+	    ("8859-9.TXT" latin-iso8859-9 #xA0)
+	    ,＠(if (find-charset 'chinese-big5-1)
+		  ;; Under old-Mule, charset for Big5 does not matter;
+		  ;; specifying `big5' will automatically make the right
+		  ;; thing happen.
+		  '(("BIG5.TXT" chinese-big5-1 nil nil nil big5))
+		'(("BIG5.TXT" chinese-big5)))
+	    ;; Currently, these files are based on CNS 11643-1986
+	    ;; (with planes 1, 2, and 14), rather than CNS 11643-1992,
+	    ;; with planes 1-7.  See below.
+	    ;("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
+	    ;("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
+	    ("CP1250.TXT" latin-windows-1250 #x80)
+	    ("CP1251.TXT" cyrillic-windows-1251 #x80)
+	    ("CP1252.TXT" latin-windows-1252 #x80)
+	    ("CP1253.TXT" greek-windows-1253 #x80)
+	    ("CP1254.TXT" latin-windows-1254 #x80)
+	    ("CP1255.TXT" hebrew-windows-1255 #x80)
+	    ("CP1256.TXT" arabic-windows-1256 #x80)
+	    ("CP1257.TXT" latin-windows-1257 #x80)
+	    ("CP1258.TXT" latin-windows-1258 #x80)
+	    ("CP874.TXT" thai-windows-874 #x80)
+	    ("CP932.TXT" japanese-windows-932 #x8000)
+	    ("CP936.TXT" chinese-windows-936 #x8000)
+	    ("CP949.TXT" korean-windows-949 #x8000)
+	    ("CP950.TXT" chinese-windows-950 #x8000)
 	    ;; "GB12345.TXT" 
 	    ("GB2312.TXT" chinese-gb2312)
 	    ;; "HANGUL.TXT"
-	    ;; #### shouldn't JIS X 0201's upper limit be 7f?
-	    ("JIS0201.TXT" latin-jisx0201 #x21 #x80)
-	    ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80)
+	    ("JIS0201.TXT" latin-jisx0201 #x21 #x7F)
+	    ("JIS0201.TXT" katakana-jisx0201 #xA0)
 	    ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column)
 	    ("JIS0212.TXT" japanese-jisx0212)
-	    ;; "JOHAB.TXT" 
-	    ;; "KOI8-R.TXT" 
+	    ("JOHAB.TXT" korean-johab #x8000)
+	    ("KOI8-R.TXT" cyrillic-koi8-r #x80)
 	    ;; "KSC5601.TXT" 
 	    ;; note that KSC5601.TXT as currently distributed is NOT what
 	    ;; it claims to be!  see comments in KSX1001.TXT.
 	    ("KSX1001.TXT" korean-ksc5601)
 	    ;; "OLD5601.TXT" 
-	    ;; "SHIFTJIS.TXT"
+	    ,＠(when (find-charset 'japanese-shift-jis)
+		'(("SHIFTJIS.TXT" japanese-shift-jis #x8000)))
 	    )
+
 	   ("unicode/mule-ucs"
-	    ;; #### we don't support surrogates?!??
-	    ;; use these instead of the above ones once we support surrogates
-	    ;;("chinese-cns11643-1.txt" chinese-cns11643-1)
-	    ;;("chinese-cns11643-2.txt" chinese-cns11643-2)
-	    ;;("chinese-cns11643-3.txt" chinese-cns11643-3)
-	    ;;("chinese-cns11643-4.txt" chinese-cns11643-4)
-	    ;;("chinese-cns11643-5.txt" chinese-cns11643-5)
-	    ;;("chinese-cns11643-6.txt" chinese-cns11643-6)
-	    ;;("chinese-cns11643-7.txt" chinese-cns11643-7)
+	    ("chinese-cns11643-1.txt" chinese-cns11643-1)
+	    ("chinese-cns11643-2.txt" chinese-cns11643-2)
+	    ("chinese-cns11643-3.txt" chinese-cns11643-3)
+	    ("chinese-cns11643-4.txt" chinese-cns11643-4)
+	    ("chinese-cns11643-5.txt" chinese-cns11643-5)
+	    ("chinese-cns11643-6.txt" chinese-cns11643-6)
+	    ("chinese-cns11643-7.txt" chinese-cns11643-7)
 	    ("chinese-sisheng.txt" chinese-sisheng)
 	    ("ethiopic.txt" ethiopic)
-	    ("indian-is13194.txt" indian-is13194)
-	    ("ipa.txt" ipa)
-	    ("thai-tis620.txt" thai-tis620)
+	    ("indian-is13194.txt" indian-is13194 nil nil #x80)
+	    ("ipa.txt" ipa nil nil #x80)
+	    ("thai-tis620.txt" thai-tis620 nil nil #x80)
 	    ("tibetan.txt" tibetan)
-	    ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower)
-	    ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper)
+	    ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower
+	     nil nil #x80)
+	    ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper
+	     nil nil #x80)
 	    )
 	   ("unicode/other"
 	    ("lao.txt" lao)
＠＠ -179,6 +135,20 ＠＠
 			  (cdr tables))))
 	    parse-args)))

+;; Now we always load them at dump time; necessary for Unicode-internal.
+
+; ;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c
+; (defvar load-unicode-tables-at-dump-time t ;(eq system-type 'windows-nt)
+;   "[INTERNAL] Whether to load the Unicode tables at dump time.
+; Setting this at run-time does nothing.")
+
+;; Load the Unicode tables now!!!
+(when (and (featurep 'mule)
+	   ;load-unicode-tables-at-dump-time
+	   )
+  (let ((data-directory (expand-file-name "etc" source-root)))
+    (load-unicode-tables)))
+
 (make-coding-system
  'utf-16 'unicode
  "UTF-16"
＠＠ -309,7 +279,8 ＠＠
 The second argument must be 'ucs, the third argument is ignored.  "
   (assert (eq quote-ucs 'ucs) 
 	  "Sorry, decode-char doesn't yet support anything but the UCS.  ")
-  (unicode-to-char code))
+  ;(unicode-to-char code)
+  (make-char code))

 (defun encode-char (char quote-ucs &optional restriction)
   "FSF compatibility--return the Unicode code point of `char'.
＠＠ -317,6 +288,10 ＠＠
   (assert (eq quote-ucs 'ucs)
 	  "Sorry, encode-char doesn't yet support anything but the UCS.  ")
   (char-to-unicode char))
+
+(make-obsolete 'char-octets 'char-to-charset-codepoint)
+(make-obsolete 'char-charset 'char-to-charset-codepoint)
+(make-obsolete 'split-char 'char-to-charset-codepoint)

 ;; #### UTF-7 is not yet implemented, and it's tricky to do.  There's
 ;; an implementation in appendix A.1 of the Unicode Standard, Version
Index: lisp/x-compose.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/x-compose.el,v
retrieving revision 1.4
diff -u -r1.4 x-compose.el
--- lisp/x-compose.el	2005/06/26 18:04:50	1.4
+++ lisp/x-compose.el	2005/11/22 14:00:05
＠＠ -104,6 +104,9 ＠＠
 ;; 20050324103919.8D22E4901(a)boffi95.stru.polimi.it is caused by Xlib doing
 ;; the compose processing. To turn that off, I'm not certain what's 

+;; ＠＠#### This should probably be integrated into general Usenet handling
+;; of composition sequences.
+
 ;;; Code:

 (macrolet
cvs server: Diffing lisp/mule
Index: lisp/mule/arabic.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/arabic.el,v
retrieving revision 1.7
diff -u -r1.7 arabic.el
--- lisp/mule/arabic.el	2002/03/16 10:39:05	1.7
+++ lisp/mule/arabic.el	2005/11/22 14:00:05
＠＠ -28,63 +28,6 ＠＠

 ;;; Code:

-; (make-charset 'arabic-iso8859-6 
-; 	      "Right-Hand Part of Latin/Arabic Alphabet (ISO/IEC 8859-6):
ISO-IR-127"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-6"
-; 		chars 96
-; 		columns 1
-; 		direction r2l
-; 		final ?G
-; 		graphic 1
-; 		short-name "RHP of ISO8859/6"
-; 		long-name "RHP of Arabic (ISO 8859-6): ISO-IR-127"
-; 		))
-
-;; For Arabic, we need three different types of character sets.
-;; Digits are of direction left-to-right and of width 1-column.
-;; Others are of direction right-to-left and of width 1-column or
-;; 2-column.
-(make-charset 'arabic-digit "Arabic digit"
-	      '(dimension
-		1
-		registry "MuleArabic-0"
-		chars 94
-		columns 1
-		direction l2r
-		final ?2
-		graphic 0
-		short-name "Arabic digit"
-		long-name "Arabic digit"
-		))
-
-(make-charset 'arabic-1-column "Arabic 1-column"
-	      '(dimension
-		1
-		registry "MuleArabic-1"
-		chars 94
-		columns 1
-		direction r2l
-		final ?3
-		graphic 0
-		short-name "Arabic 1-col"
-		long-name "Arabic 1-column"
-		))
-
-(make-charset 'arabic-2-column "Arabic 2-column"
-	      '(dimension
-		1
-		registry "MuleArabic-2"
-		chars 94
-		columns 2
-		direction r2l
-		final ?4
-		graphic 0
-		short-name "Arabic 2-col"
-		long-name "Arabic 2-column"
-		))
-
 (make-coding-system 'iso-8859-6 'iso2022
 		    "ISO-8859-6 (Arabic)"
 		    '(charset-g0 ascii
Index: lisp/mule/chinese.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/chinese.el,v
retrieving revision 1.10
diff -u -r1.10 chinese.el
--- lisp/mule/chinese.el	2002/04/07 17:10:50	1.10
+++ lisp/mule/chinese.el	2005/11/22 14:00:05
＠＠ -3,7 +3,7 ＠＠
 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
-;; Copyright (C) 2000, 2001, 2002 Ben Wing.
+;; Copyright (C) 2000, 2001, 2002, 2005 Ben Wing.

 ;; Keywords: multilingual, Chinese

＠＠ -34,75 +34,18 ＠＠
 (eval-when-compile
   (require 'china-util))

-; (make-charset 'chinese-gb2312 
-; 	      "GB2312 Chinese simplified: ISO-IR-58"
-; 	      '(dimension
-; 		2
-; 		registry "GB2312.1980"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?A
-; 		graphic 0
-; 		short-name "GB2312"
-; 		long-name "GB2312: ISO-IR-58"
-; 		))
-
-; (make-charset 'chinese-cns11643-1 
-; 	      "CNS11643 Plane 1 Chinese traditional: ISO-IR-171"
-; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-1"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?G
-; 		graphic 0
-; 		short-name "CNS11643-1"
-; 		long-name "CNS11643-1 (Chinese traditional): ISO-IR-171"
-; 		))
-
-; (make-charset 'chinese-cns11643-2 
-; 	      "CNS11643 Plane 2 Chinese traditional: ISO-IR-172"
-; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-2"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?H
-; 		graphic 0
-; 		short-name "CNS11643-2"
-; 		long-name "CNS11643-2 (Chinese traditional): ISO-IR-172"
-; 		))
-
-; (make-charset 'chinese-big5-1 
-; 	      "Frequently used part (A141-C67F) of Big5 (Chinese traditional)"
-; 	      '(dimension
-; 		2
-; 		registry "Big5"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?0
-; 		graphic 0
-; 		short-name "Big5 (Level-1)"
-; 		long-name "Big5 (Level-1) A141-C67F"
-; 		))
-
-; (make-charset 'chinese-big5-2 
-; 	      "Less frequently used part (C940-FEFE) of Big5 (Chinese traditional)"
-; 	      '(dimension
-; 		2
-; 		registry "Big5"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?1
-; 		graphic 0
-; 		short-name "Big5 (Level-2)"
-; 		long-name "Big5 (Level-2) C940-FEFE"
-; 		))
+(flet
+    ((frob-chinese-cns11643-charset
+      (name)
+      (modify-syntax-entry   name "w")
+      (modify-category-entry name ?t)
+      ))
+  (frob-chinese-cns11643-charset 'chinese-cns11643-3)
+  (frob-chinese-cns11643-charset 'chinese-cns11643-4)
+  (frob-chinese-cns11643-charset 'chinese-cns11643-5)
+  (frob-chinese-cns11643-charset 'chinese-cns11643-6)
+  (frob-chinese-cns11643-charset 'chinese-cns11643-7)
+)

 ;; Syntax of Chinese characters.
 (modify-syntax-entry 'chinese-gb2312 "w")
＠＠ -115,86 +58,13 ＠＠

 (modify-syntax-entry 'chinese-cns11643-1  "w")
 (modify-syntax-entry 'chinese-cns11643-2  "w")
-(modify-syntax-entry 'chinese-big5-1 "w")
-(modify-syntax-entry 'chinese-big5-2 "w")
+(if (find-charset 'chinese-big5-1)
+    (progn
+      (modify-syntax-entry 'chinese-big5-1 "w")
+      (modify-syntax-entry 'chinese-big5-2 "w")
+      )
+  (modify-syntax-entry 'chinese-big5 "w"))

-; ;; Chinese CNS11643 Plane3 thru Plane7.  Although these are official
-; ;; character sets, the use is rare and don't have to be treated
-; ;; space-efficiently in the buffer.
-; (make-charset 'chinese-cns11643-3 
-; 	      "CNS11643 Plane 3 Chinese Traditional: ISO-IR-183"
-; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-3"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?I
-; 		graphic 0
-; 		short-name "CNS11643-3"
-; 		long-name "CNS11643-3 (Chinese traditional): ISO-IR-183"
-; 		))
-
-;; CNS11643 Plane3 thru Plane7
-;; These represent more and more obscure Chinese characters.
-;; By the time you get to Plane 7, we're talking about characters
-;; that appear once in some ancient manuscript and whose meaning
-;; is unknown.
-
-(flet
-    ((make-chinese-cns11643-charset
-      (name plane final)
-      (make-charset
-       name (concat "CNS 11643 Plane " plane " (Chinese
traditional)")
-       `(registry 
-         ,(concat "CNS11643[.-]\\(.*[.-]\\)?" plane "$")
-         dimension 2
-         chars 94
-         final ,final
-         graphic 0
-	 short-name ,(concat "CNS11643-" plane)
-	 long-name ,(format "CNS11643-%s (Chinese traditional): ISO-IR-183"
-			    plane)))
-      (modify-syntax-entry   name "w")
-      (modify-category-entry name ?t)
-      ))
-  (make-chinese-cns11643-charset 'chinese-cns11643-3 "3" ?I)
-  (make-chinese-cns11643-charset 'chinese-cns11643-4 "4" ?J)
-  (make-chinese-cns11643-charset 'chinese-cns11643-5 "5" ?K)
-  (make-chinese-cns11643-charset 'chinese-cns11643-6 "6" ?L)
-  (make-chinese-cns11643-charset 'chinese-cns11643-7 "7" ?M)
-  )
-
-;; ISO-IR-165 (CCITT Extended GB)
-;;    It is based on CCITT Recommendation T.101, includes GB 2312-80 +
-;;    GB 8565-88 table A4 + 293 characters.
-(make-charset ;; not in FSF 21.1
- 'chinese-isoir165
- "ISO-IR-165 (CCITT Extended GB; Chinese simplified)"
- `(registry "isoir165"
-   dimension 2
-   chars 94
-   final ?E
-   graphic 0
-   short-name "ISO-IR-165"
-   long-name "ISO-IR-165 (CCITT Extended GB; Chinese simplified)"))
-
-;; PinYin-ZhuYin
-(make-charset 'chinese-sisheng 
-	      "SiSheng characters for PinYin/ZhuYin"
-	      '(dimension
-		1
-		;; XEmacs addition: second half of registry spec
-		registry "sisheng_cwnn\\|OMRON_UDC_ZH"
-		chars 94
-		columns 1
-		direction l2r
-		final ?0
-		graphic 0
-		short-name "SiSheng"
-		long-name "SiSheng (PinYin/ZhuYin)"
-		))
-
 ;; If you prefer QUAIL to EGG, please modify below as you wish.
 ;;(when (and (featurep 'egg) (featurep 'wnn))
 ;;  (setq wnn-server-type 'cserver)
＠＠ -328,7 +198,7 ＠＠
 			"chinese-s" "zh"
 			 (lambda (arg)
 			      (and arg (let ((case-fold-search t))
-					 (string-match "^zh_.*.GB.*" arg)))))
+					 (string-match "^zh_.*\\.GB.*" arg)))))
 		(mswindows-locale ("CHINESE" . "CHINESE_SIMPLIFIED"))
 		(native-coding-system cn-gb-2312)
 		(input-method . "chinese-py-punct")
＠＠ -371,29 +241,33 ＠＠

 (define-coding-system-alias 'cn-big5 'big5)

-;; Big5 font requires special encoding.
-(define-ccl-program ccl-encode-big5-font
-  `(0
-    ;; In:  R0:chinese-big5-1 or chinese-big5-2
-    ;;      R1:position code 1
-    ;;      R2:position code 2
-    ;; Out: R1:font code point 1
-    ;;      R2:font code point 2
-    ((r2 = ((((r1 - ?\x21) * 94) + r2) - ?\x21))
-     (if (r0 == ,(charset-id 'chinese-big5-2)) (r2 += 6280))
-     (r1 = ((r2 / 157) + ?\xA1))
-     (r2 %= 157)
-     (if (r2 < ?\x3F) (r2 += ?\x40) (r2 += ?\x62))))
-  "CCL program to encode a Big5 code to code point of Big5 font.")
+(when (featurep 'ccl)
+  ;; Big5 font requires special encoding.
+  (define-ccl-program ccl-encode-big5-font
+    `(0
+      ;; In:  R0:chinese-big5-1 or chinese-big5-2
+      ;;      R1:position code 1
+      ;;      R2:position code 2
+      ;; Out: R1:font code point 1
+      ;;      R2:font code point 2
+      ((r2 = ((((r1 - ?\x21) * 94) + r2) - ?\x21))
+       (if (r0 == ,(charset-id 'chinese-big5-2)) (r2 += 6280))
+       (r1 = ((r2 / 157) + ?\xA1))
+       (r2 %= 157)
+       (if (r2 < ?\x3F) (r2 += ?\x40) (r2 += ?\x62))))
+    "CCL program to encode a Big5 code to code point of Big5 font.")

-;; (setq font-ccl-encoder-alist
-;;       (cons (cons "big5" ccl-encode-big5-font) font-ccl-encoder-alist))
+  ;; (setq font-ccl-encoder-alist
+  ;;       (cons (cons "big5" ccl-encode-big5-font) font-ccl-encoder-alist))

-(set-charset-ccl-program 'chinese-big5-1 'ccl-encode-big5-font)
-(set-charset-ccl-program 'chinese-big5-2 'ccl-encode-big5-font)
+  (set-charset-ccl-program 'chinese-big5-1 'ccl-encode-big5-font)
+  (set-charset-ccl-program 'chinese-big5-2 'ccl-encode-big5-font)
+  )

 (set-language-info-alist
- "Chinese-BIG5" '((charset chinese-big5-1 chinese-big5-2)
+ "Chinese-BIG5" `(,(if (find-charset 'chinese-big5-1)
+		       '(charset chinese-big5-1 chinese-big5-2)
+		     '(charset chinese-big5))
 		  (coding-system big5 iso-2022-7bit)
 		  (coding-priority big5 cn-gb-2312 iso-2022-7bit)
 		  (cygwin-locale "zh_TW")
Index: lisp/mule/cyril-util.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/cyril-util.el,v
retrieving revision 1.6
diff -u -r1.6 cyril-util.el
--- lisp/mule/cyril-util.el	2003/02/06 06:35:52	1.6
+++ lisp/mule/cyril-util.el	2005/11/22 14:00:05
＠＠ -1,7 +1,7 ＠＠
 ;;; cyril-util.el ---  utilities for Cyrillic scripts -*- coding: iso-2022-7bit; -*-

 ;; Copyright (C) 1997  Free Software Foundation, Inc.
-;; Copyright (C) 2002 Ben Wing.
+;; Copyright (C) 2002, 2005 Ben Wing.

 ;; Keywords: mule, multilingual, Cyrillic

＠＠ -28,16 +28,6 ＠＠

 ;;; Code:

-;;;###autoload
-(defun cyrillic-encode-koi8-r-char (char)
-  "Return KOI8-R external character code of CHAR if appropriate."
-  (get-char-table char cyrillic-koi8-r-to-external-code-table))
-
-;;;###autoload
-(defun cyrillic-encode-alternativnyj-char (char)
-  "Return ALTERNATIVNYJ external character code of CHAR if appropriate."
-  (get-char-table char cyrillic-alternativnyj-to-external-code-table))
-

 ;; Display 

＠＠ -76,118 +66,118 ＠＠

   (if (null cyrillic-language)
       (setq standard-display-table (make-display-table))
-    (aset standard-display-table ?,LP(B  [?a])
-    (aset standard-display-table ?,LQ(B  [?b])
-    (aset standard-display-table ?,LR(B  [?v])
-    (aset standard-display-table ?,LS(B  [?g])
-    (aset standard-display-table ?,LT(B  [?d])
-    (aset standard-display-table ?,LU(B  [?e])
-    (aset standard-display-table ?,Lq(B  [?y?o])
-    (aset standard-display-table ?,LV(B  [?z?h])
-    (aset standard-display-table ?,LW(B  [?z])
-    (aset standard-display-table ?,LX(B  [?i])
-    (aset standard-display-table ?,LY(B  [?j])
-    (aset standard-display-table ?,LZ(B  [?k])
-    (aset standard-display-table ?,L[(B  [?l])
-    (aset standard-display-table ?,L\(B  [?m])
-    (aset standard-display-table ?,L](B  [?n])
-    (aset standard-display-table ?,L^(B  [?o])
-    (aset standard-display-table ?,L_(B  [?p])
-    (aset standard-display-table ?,L`(B  [?r])
-    (aset standard-display-table ?,La(B  [?s])
-    (aset standard-display-table ?,Lb(B  [?t])
-    (aset standard-display-table ?,Lc(B  [?u])
-    (aset standard-display-table ?,Ld(B  [?f])
-    (aset standard-display-table ?,Le(B  [?k?h])
-    (aset standard-display-table ?,Lf(B  [?t?s])
-    (aset standard-display-table ?,Lg(B  [?c?h])
-    (aset standard-display-table ?,Lh(B  [?s?h])
-    (aset standard-display-table ?,Li(B  [?s?c?h])
-    (aset standard-display-table ?,Lj(B  [?~])
-    (aset standard-display-table ?,Lk(B  [?y])
-    (aset standard-display-table ?,Ll(B  [?'])
-    (aset standard-display-table ?,Lm(B  [?e?'])
-    (aset standard-display-table ?,Ln(B  [?y?u])
-    (aset standard-display-table ?,Lo(B  [?y?a])
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd0)  [?a])
;?,LP(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd1)  [?b])
;?,LQ(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd2)  [?v])
;?,LR(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd3)  [?g])
;?,LS(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd4)  [?d])
;?,LT(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd5)  [?e])
;?,LU(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf1)  [?y?o])
;?,Lq(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd6)  [?z?h])
;?,LV(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd7)  [?z])
;?,LW(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd8)  [?i])
;?,LX(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd9)  [?j])
;?,LY(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xda)  [?k])
;?,LZ(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xdb)  [?l])
;?,L[(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xdc)  [?m])
;?,L\(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xdd)  [?n])
;?,L](B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xde)  [?o])
;?,L^(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xdf)  [?p])
;?,L_(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe0)  [?r])
;?,L`(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe1)  [?s])
;?,La(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe2)  [?t])
;?,Lb(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe3)  [?u])
;?,Lc(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe4)  [?f])
;?,Ld(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe5)  [?k?h])
;?,Le(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe6)  [?t?s])
;?,Lf(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe7)  [?c?h])
;?,Lg(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe8)  [?s?h])
;?,Lh(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe9)  [?s?c?h])
;?,Li(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xea)  [?~])
;?,Lj(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xeb)  [?y])
;?,Lk(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xec)  [?'])
;?,Ll(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xed)  [?e?'])
;?,Lm(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xee)  [?y?u])
;?,Ln(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xef)  [?y?a])
;?,Lo(B

-    (aset standard-display-table ?,L0(B  [?A])
-    (aset standard-display-table ?,L1(B  [?B])
-    (aset standard-display-table ?,L2(B  [?V])
-    (aset standard-display-table ?,L3(B  [?G])
-    (aset standard-display-table ?,L4(B  [?D])
-    (aset standard-display-table ?,L5(B  [?E])
-    (aset standard-display-table ?,L!(B  [?Y?o])
-    (aset standard-display-table ?,L6(B  [?Z?h])
-    (aset standard-display-table ?,L7(B  [?Z])
-    (aset standard-display-table ?,L8(B  [?I])
-    (aset standard-display-table ?,L9(B  [?J])
-    (aset standard-display-table ?,L:(B  [?K])
-    (aset standard-display-table ?,L;(B  [?L])
-    (aset standard-display-table ?,L<(B  [?M])
-    (aset standard-display-table ?,L=(B  [?N])
-    (aset standard-display-table ?,L>(B  [?O])
-    (aset standard-display-table ?,L?(B  [?P])
-    (aset standard-display-table ?,L＠(B  [?R])
-    (aset standard-display-table ?,LA(B  [?S])
-    (aset standard-display-table ?,LB(B  [?T])
-    (aset standard-display-table ?,LC(B  [?U])
-    (aset standard-display-table ?,LD(B  [?F])
-    (aset standard-display-table ?,LE(B  [?K?h])
-    (aset standard-display-table ?,LF(B  [?T?s])
-    (aset standard-display-table ?,LG(B  [?C?h])
-    (aset standard-display-table ?,LH(B  [?S?h])
-    (aset standard-display-table ?,LI(B  [?S?c?h])
-    (aset standard-display-table ?,LJ(B  [?~])
-    (aset standard-display-table ?,LK(B  [?Y])
-    (aset standard-display-table ?,LL(B  [?'])
-    (aset standard-display-table ?,LM(B  [?E?'])
-    (aset standard-display-table ?,LN(B  [?Y?u])
-    (aset standard-display-table ?,LO(B  [?Y?a])
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb0)  [?A])
;?,L0(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb1)  [?B])
;?,L1(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb2)  [?V])
;?,L2(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb3)  [?G])
;?,L3(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb4)  [?D])
;?,L4(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb5)  [?E])
;?,L5(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa1)  [?Y?o])
;?,L!(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb6)  [?Z?h])
;?,L6(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb7)  [?Z])
;?,L7(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb8)  [?I])
;?,L8(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb9)  [?J])
;?,L9(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xba)  [?K])
;?,L:(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xbb)  [?L])
;?,L;(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xbc)  [?M])
;?,L<(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xbd)  [?N])
;?,L=(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xbe)  [?O])
;?,L>(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xbf)  [?P])
;?,L?(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc0)  [?R])
;?,L＠(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc1)  [?S])
;?,LA(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc2)  [?T])
;?,LB(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc3)  [?U])
;?,LC(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc4)  [?F])
;?,LD(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc5)  [?K?h])
;?,LE(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc6)  [?T?s])
;?,LF(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc7)  [?C?h])
;?,LG(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc8)  [?S?h])
;?,LH(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc9)  [?S?c?h])
;?,LI(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xca)  [?~])
;?,LJ(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xcb)  [?Y])
;?,LK(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xcc)  [?'])
;?,LL(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xcd)  [?E?'])
;?,LM(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xce)  [?Y?u])
;?,LN(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xcf)  [?Y?a])
;?,LO(B

-    (aset standard-display-table ?,Lt(B  [?i?e])
-    (aset standard-display-table ?,Lw(B  [?i])
-    (aset standard-display-table ?,L~(B  [?u])
-    (aset standard-display-table ?,Lr(B  [?d?j])
-    (aset standard-display-table ?,L{(B  [?c?h?j])
-    (aset standard-display-table ?,Ls(B  [?g?j])
-    (aset standard-display-table ?,Lu(B  [?s])
-    (aset standard-display-table ?,L|(B  [?k])
-    (aset standard-display-table ?,Lv(B  [?i])
-    (aset standard-display-table ?,Lx(B  [?j])
-    (aset standard-display-table ?,Ly(B  [?l?j])
-    (aset standard-display-table ?,Lz(B  [?n?j])
-    (aset standard-display-table ?,L(B  [?d?z])
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf4)  [?i?e])
;?,Lt(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf7)  [?i])
;?,Lw(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xfe)  [?u])
;?,L~(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf2)  [?d?j])
;?,Lr(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xfb)  [?c?h?j])
;?,L{(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf3)  [?g?j])
;?,Ls(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf5)  [?s])
;?,Lu(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xfc)  [?k])
;?,L|(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf6)  [?i])
;?,Lv(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf8)  [?j])
;?,Lx(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xf9)  [?l?j])
;?,Ly(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xfa)  [?n?j])
;?,Lz(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xff)  [?d?z])
;?,L(B

-    (aset standard-display-table ?,L$(B  [?Y?e])
-    (aset standard-display-table ?,L'(B  [?Y?i])
-    (aset standard-display-table ?,L.(B  [?U])
-    (aset standard-display-table ?,L"(B  [?D?j])
-    (aset standard-display-table ?,L+(B  [?C?h?j])
-    (aset standard-display-table ?,L#(B  [?G?j])
-    (aset standard-display-table ?,L%(B  [?S])
-    (aset standard-display-table ?,L,(B  [?K])
-    (aset standard-display-table ?,L&(B  [?I])
-    (aset standard-display-table ?,L((B  [?J])
-    (aset standard-display-table ?,L)(B  [?L?j])
-    (aset standard-display-table ?,L*(B  [?N?j])
-    (aset standard-display-table ?,L/(B  [?D?j])
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa4)  [?Y?e])
;?,L$(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa7)  [?Y?i])
;?,L'(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xae)  [?U])
;?,L.(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa2)  [?D?j])
;?,L"(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xab)  [?C?h?j])
;?,L+(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa3)  [?G?j])
;?,L#(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa5)  [?S])
;?,L%(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xac)  [?K])
;?,L,(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa6)  [?I])
;?,L&(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa8)  [?J])
;?,L((B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xa9)  [?L?j])
;?,L)(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xaa)  [?N?j])
;?,L*(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xaf)  [?D?j])
;?,L/(B

     (when (equal cyrillic-language "Bulgarian")
-      (aset standard-display-table ?,Li(B [?s?h?t])
-      (aset standard-display-table ?,LI(B [?S?h?t])
-      (aset standard-display-table ?,Ln(B [?i?u])
-      (aset standard-display-table ?,LN(B [?I?u])
-      (aset standard-display-table ?,Lo(B [?i?a])
-      (aset standard-display-table ?,LO(B [?I?a]))
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xe9) [?s?h?t])
;?,Li(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xc9) [?S?h?t])
;?,LI(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xee) [?i?u])
;?,Ln(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xce) [?I?u])
;?,LN(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xef) [?i?a])
;?,Lo(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xcf) [?I?a]))
;?,LO(B

     (when (equal cyrillic-language "Ukrainian") ; based on the official 
 					; transliteration table
-      (aset standard-display-table ?,LX(B [?y])
-      (aset standard-display-table ?,L8(B [?Y])
-      (aset standard-display-table ?,LY(B [?i])
-      (aset standard-display-table ?,L9(B [?Y])
-    (aset standard-display-table ?,Ln(B [?i?u])
-    (aset standard-display-table ?,Lo(B [?i?a]))))
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd8) [?y])
;?,LX(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb8) [?Y])
;?,L8(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xd9) [?i])
;?,LY(B
+      (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xb9) [?Y])
;?,L9(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xee) [?i?u])
;?,Ln(B
+    (aset standard-display-table (make-char 'cyrillic-iso8859-5 #xef) [?i?a]))))
;?,Lo(B

 ;;
Index: lisp/mule/cyrillic.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/cyrillic.el,v
retrieving revision 1.10
diff -u -r1.10 cyrillic.el
--- lisp/mule/cyrillic.el	2002/03/21 07:30:21	1.10
+++ lisp/mule/cyrillic.el	2005/11/22 14:00:05
＠＠ -3,7 +3,7 ＠＠
 ;; Copyright (C) 1995,1999 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
-;; Copyright (C) 2001, 2002 Ben Wing.
+;; Copyright (C) 2001, 2002, 2005 Ben Wing.

 ;; Keywords: multilingual, Cyrillic

＠＠ -26,18 +26,17 ＠＠

 ;;; Commentary:

-;; The character set ISO8859-5 is supported.  KOI-8 and ALTERNATIVNYJ are
-;; converted to ISO8859-5 internally.
+;; The character sets ISO8859-5, KOI-8 and ALTERNATIVNYJ are supported.

-;; Windows-1251 support deleted because XEmacs has automatic support.
+;; Windows-1251 support in windows.el.

 ;;; Code:

 ;; Cyrillic syntax
 (modify-syntax-entry 'cyrillic-iso8859-5 "w")
-(modify-syntax-entry ?,L-(B ".")
-(modify-syntax-entry ?,Lp(B ".")
-(modify-syntax-entry ?,L}(B ".")
+(modify-syntax-entry (make-char 'cyrillic-iso8859-5 #xad) ".")
+(modify-syntax-entry (make-char 'cyrillic-iso8859-5 #xf0) ".")
+(modify-syntax-entry (make-char 'cyrillic-iso8859-5 #xfd) ".")

 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;; CYRILLIC
＠＠ -45,20 +44,6 ＠＠

 ;; ISO-8859-5

-; (make-charset 'cyrillic-iso8859-5 
-; 	      "Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5):
ISO-IR-144"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-5"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?L
-; 		graphic 1
-; 		short-name "RHP of ISO8859/5"
-; 		long-name "RHP of Cyrillic (ISO 8859-5): ISO-IR-144"
-; 		))
-
 (make-coding-system
  'iso-8859-5 'iso2022
  "ISO-8859-5 (Cyrillic)"
＠＠ -82,96 +67,17 ＠＠

 ;; KOI-8

-(eval-and-compile
-
-(defvar cyrillic-koi8-r-decode-table
-  [
-   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-   16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
-   32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
-   48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
-   64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
-   80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
-   96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
-   112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
-   ?$B(!(B ?$B("(B ?$B(#(B ?$B($(B ?$B(&(B ?$B(%(B ?$B('(B
?$B()(B ?$B(((B ?$B(*(B ?$B(+(B 32  ?$(G#'(B ?$(G#+(B ?$(G#/(B 32
-   32  ?$(C"F(B 32  32  ?$B"#(B 32  ?$B"e(B ?$A!V(B ?$A!\(B
?$A!](B ?,L (B  32  ?,A0(B  ?,A2(B  ?,A7(B  ?,Aw(B
-   ?$(G#D(B 32  32  ?,Lq(B  32  32  32  32  32  32  32  32  32  32  32  ?$(G#E(B
-   32  32  ?$(G#G(B ?,L!(B  32  32  32  32  32  32  32  32  ?$(G#F(B 32  32 
?,A)(B
-   ?,Ln(B  ?,LP(B  ?,LQ(B  ?,Lf(B  ?,LT(B  ?,LU(B  ?,Ld(B  ?,LS(B 
?,Le(B  ?,LX(B  ?,LY(B  ?,LZ(B  ?,L[(B  ?,L\(B  ?,L](B  ?,L^(B 
-   ?,L_(B  ?,Lo(B  ?,L`(B  ?,La(B  ?,Lb(B  ?,Lc(B  ?,LV(B  ?,LR(B 
?,Ll(B  ?,Lk(B  ?,LW(B  ?,Lh(B  ?,Lm(B  ?,Li(B  ?,Lg(B  ?,Lj(B 
-   ?,LN(B  ?,L0(B  ?,L1(B  ?,LF(B  ?,L4(B  ?,L5(B  ?,LD(B  ?,L3(B 
?,LE(B  ?,L8(B  ?,L9(B  ?,L:(B  ?,L;(B  ?,L<(B  ?,L=(B  ?,L>(B 
-   ?,L?(B  ?,LO(B  ?,L＠(B  ?,LA(B  ?,LB(B  ?,LC(B  ?,L6(B  ?,L2(B 
?,LL(B  ?,LK(B  ?,L7(B  ?,LH(B  ?,LM(B  ?,LI(B  ?,LG(B  ?,LJ(B ]
-  "Cyrillic KOI8-R decoding table.")
-
-(defvar cyrillic-koi8-r-encode-table
-  (let ((table (make-vector 256 32))
-	(i 0))
-    (while (< i 256)
-      (let* ((ch (aref cyrillic-koi8-r-decode-table i))
-	     (split (split-char ch)))
-	(cond ((eq (car split) 'cyrillic-iso8859-5)
-	       (aset table (logior (nth 1 split) 128) i)
-	       )
-	      ((eq ch 32))
-	      ((eq (car split) 'ascii)
-	       (aset table ch i)
-	       )))
-      (setq i (1+ i)))
-    table)
-  "Cyrillic KOI8-R encoding table.")
-
-)
-
-(define-ccl-program ccl-decode-koi8
-  `(3
-    ((read r0)
-     (loop
-      (write-read-repeat r0 ,cyrillic-koi8-r-decode-table))))
-  "CCL program to decode KOI8.")
-
-(define-ccl-program ccl-encode-koi8
-  `(1
-    ((read r0)
-     (loop
-      (if (r0 != ,(charset-id 'cyrillic-iso8859-5))
-	  (write-read-repeat r0)
-	((read r0)
-	 (write-read-repeat r0 , cyrillic-koi8-r-encode-table))))))
-  "CCL program to encode KOI8.")
-
-;; (define-coding-system-alias 'koi8-r 'cyrillic-koi8)
-;; (define-coding-system-alias 'koi8 'cyrillic-koi8)
-
 (make-coding-system
- 'koi8-r 'ccl
+ 'koi8-r 'mbcs
  "KOI8-R (Cyrillic)"
- '(decode ccl-decode-koi8
-   encode ccl-encode-koi8
+ '(charsets (ascii cyrillic-koi8-r)
    mnemonic "KOI8"))

-;; `iso-8-1' is not correct, but XEmacs doesn't have a `ccl' category
-(coding-system-put 'koi8-r 'category 'iso-8-1)
+;; (define-coding-system-alias 'koi8-r 'cyrillic-koi8)
+;; (define-coding-system-alias 'koi8 'cyrillic-koi8)

-;; (define-ccl-program ccl-encode-koi8-font
-;;   `(0
-;;     ((r1 |= 128)
-;;      (r1 = r1 ,cyrillic-koi8-r-encode-table)))
-;;   "CCL program to encode Cyrillic chars to KOI font.")
-
-;; (setq font-ccl-encoder-alist
-;;       (cons (cons "koi8" ccl-encode-koi8-font) font-ccl-encoder-alist))
-
-(defvar cyrillic-koi8-r-to-external-code-table
-  (let ((table (make-char-table 'generic))
-	(i 0)
-	(len (length cyrillic-koi8-r-decode-table)))
-    (while (< i len)
-      (let ((ch (aref cyrillic-koi8-r-decode-table i)))
-	(if (characterp ch)
-	    (put-char-table ch i table)))
-      (incf i)))
-  "Table to convert from characters to their Koi8-R code.")
+;; `iso-8-1' is not correct, but XEmacs doesn't have an `mbcs' category
+(coding-system-put 'koi8-r 'category 'iso-8-1)

 (set-language-info-alist
  "Cyrillic-KOI8" '((charset cyrillic-iso8859-5)
＠＠ -186,98 +92,20 ＠＠
 		   (documentation . "Support for Cyrillic KOI8-R."))
  '("Cyrillic"))

-;;; WINDOWS-1251 deleted; we support it automatically in XEmacs
+;; Windows-1251 support in windows.el.

 ;;; ALTERNATIVNYJ

-(eval-and-compile
-
-(defvar cyrillic-alternativnyj-decode-table
-  [
-   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
-   16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31
-   32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47
-   48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63
-   64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79
-   80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
-   96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111
-   112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
-   ?,L0(B  ?,L1(B  ?,L2(B  ?,L3(B  ?,L4(B  ?,L5(B  ?,L6(B  ?,L7(B 
?,L8(B  ?,L9(B  ?,L:(B  ?,L;(B  ?,L<(B  ?,L=(B  ?,L>(B  ?,L?(B
-   ?,L＠(B  ?,LA(B  ?,LB(B  ?,LC(B  ?,LD(B  ?,LE(B  ?,LF(B  ?,LG(B 
?,LH(B  ?,LI(B  ?,LJ(B  ?,LK(B  ?,LL(B  ?,LM(B  ?,LN(B  ?,LO(B
-   ?,LP(B  ?,LQ(B  ?,LR(B  ?,LS(B  ?,LT(B  ?,LU(B  ?,LV(B  ?,LW(B 
?,LX(B  ?,LY(B  ?,LZ(B  ?,L[(B  ?,L\(B  ?,L](B  ?,L^(B  ?,L_(B
-   32  32  32  32  32  32  32  32  32  32  32  32  32  32  32  32
-   32  32  32  32  32  32  32  32  32  32  32  32  32  32  32  32
-   32  32  32  32  32  32  32  32  32  32  32  32  32  32  32  32
-   ?,L`(B  ?,La(B  ?,Lb(B  ?,Lc(B  ?,Ld(B  ?,Le(B  ?,Lf(B  ?,Lg(B 
?,Lh(B  ?,Li(B  ?,Lj(B  ?,Lk(B  ?,Ll(B  ?,Lm(B  ?,Ln(B  ?,Lo(B
-   ?,L!(B  ?,Lq(B   32  32  32  32  32  32  32  32  32  32  32  32  32 ?,Lp(B]
-  "Cyrillic ALTERNATIVNYJ decoding table.")
-
-(defvar cyrillic-alternativnyj-encode-table
-  (let ((table (make-vector 256 32))
-	(i 0))
-    (while (< i 256)
-      (let* ((ch (aref cyrillic-alternativnyj-decode-table i))
-	     (split (split-char ch)))
-	(if (eq (car split) 'cyrillic-iso8859-5)
-	    (aset table (logior (nth 1 split) 128) i)
-	  (if (/= ch 32)
-	      (aset table ch i))))
-      (setq i (1+ i)))
-    table)
-  "Cyrillic ALTERNATIVNYJ encoding table.")
-  
-)
-
-
-(define-ccl-program ccl-decode-alternativnyj
-  `(3
-    ((read r0)
-     (loop
-      (write-read-repeat r0 ,cyrillic-alternativnyj-decode-table))))
-  "CCL program to decode Alternativnyj.")
-
-(define-ccl-program ccl-encode-alternativnyj
-  `(1
-    ((read r0)
-     (loop
-      (if (r0 != ,(charset-id 'cyrillic-iso8859-5))
-	  (write-read-repeat r0)
-	((read r0)
-	 (write-read-repeat r0 ,cyrillic-alternativnyj-encode-table))))))
-  "CCL program to encode Alternativnyj.")
-
 ;; (define-coding-system-alias 'alternativnyj 'cyrillic-alternativnyj)

 (make-coding-system
- 'alternativnyj 'ccl
+ 'alternativnyj 'mbcs
  "Alternativnyj (Cyrillic)"
- '(decode ccl-decode-alternativnyj
-   encode ccl-encode-alternativnyj
+ '(charsets (ascii cyrillic-alternativnyj)
    mnemonic "Cy.Alt"))

-;; `iso-8-1' is not correct, but XEmacs doesn't have `ccl' category
+;; `iso-8-1' is not correct, but XEmacs doesn't have an `mbcs' category
 (coding-system-put 'alternativnyj 'category 'iso-8-1)
-
-;; (define-ccl-program ccl-encode-alternativnyj-font
-;;   '(0
-;;     ((r1 |= 128)
-;;      (r1 = r1 ,cyrillic-alternativnyj-encode-table)))
-;;   "CCL program to encode Cyrillic chars to Alternativnyj font.")
-
-;; (setq font-ccl-encoder-alist
-;;       (cons (cons "alternativnyj" ccl-encode-alternativnyj-font)
-;;             font-ccl-encoder-alist))
-
-(defvar cyrillic-alternativnyj-to-external-code-table
-  (let ((table (make-char-table 'generic))
-	(i 0)
-	(len (length cyrillic-alternativnyj-decode-table)))
-    (while (< i len)
-      (let ((ch (aref cyrillic-alternativnyj-decode-table i)))
-	(if (characterp ch)
-	    (put-char-table ch i table)))
-      (incf i)))
-  "Table to convert from characters to their Alternativnyj code.")

 (set-language-info-alist
  "Cyrillic-ALT" '((charset cyrillic-iso8859-5)
Index: lisp/mule/devan-util.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/devan-util.el,v
retrieving revision 1.5
diff -u -r1.5 devan-util.el
--- lisp/mule/devan-util.el	2004/11/15 20:15:22	1.5
+++ lisp/mule/devan-util.el	2005/11/22 14:00:05
＠＠ -1,6 +1,7 ＠＠
 ;;; devan-util.el --- support for Devanagari Script Composition -*- coding:
iso-2022-7bit; -*-

 ;; Copyright (C) 1996, 2001 Free Software Foundation, Inc.
+;; Copyright (C) 2005 Ben Wing.

 ;; Author: KAWABATA, Taichi <kawabata(a)is.s.u-tokyo.ac.jp&gt;

＠＠ -667,382 +668,386 ＠＠
 ;; If application-direction is omitted, it is asumbed to be '(mr . ml).

 (defconst devanagari-composition-rules
-  '((?$(5!!(B 0 (tr . br))
-    (?$(5!"(B 0 (mr . mr))
-    (?$(5!#(B 0)
-    (?$(5!$(B 0)
-    (?$(5!%(B 0)
-    (?$(5!&(B 0)
-    (?$(5!'(B 0)
-    (?$(5!((B 0)
-    (?$(5!)(B 0)
-    (?$(5!*(B 0)
-    (?$(5!+(B 0)
-    (?$(5!,(B 0)
-    (?$(5!-(B 0)
-    (?$(5!.(B 0)
-    (?$(5!/(B 0)
-    (?$(5!0(B 0)
-    (?$(5!1(B 0)
-    (?$(5!2(B 0)
-    (?$(5!3(B 0)
-    (?$(5!4(B 0)
-    (?$(5!5(B 0)
-    (?$(5!6(B 0)
-    (?$(5!7(B 0)
-    (?$(5!8(B 0)
-    (?$(5!9(B 0)
-    (?$(5!:(B 0)
-    (?$(5!;(B 0)
-    (?$(5!<(B 0)
-    (?$(5!=(B 0)
-    (?$(5!>(B 0)
-    (?$(5!?(B 0)
-    (?$(5!＠(B 0)
-    (?$(5!A(B 0)
-    (?$(5!B(B 0)
-    (?$(5!C(B 0)
-    (?$(5!D(B 0)
-    (?$(5!E(B 0)
-    (?$(5!F(B 0)
-    (?$(5!G(B 0)
-    (?$(5!H(B 0)
-    (?$(5!I(B 0)
-    (?$(5!J(B 0)
-    (?$(5!K(B 0)
-    (?$(5!L(B 0)
-    (?$(5!M(B 0)
-    (?$(5!N(B 0)
-    (?$(5!O(B 0)
-    (?$(5!P(B 0)
-    (?$(5!Q(B 0)
-    (?$(5!R(B 0)
-    (?$(5!S(B 0)
-    (?$(5!T(B 0)
-    (?$(5!U(B 0)
-    (?$(5!V(B 0)
-    (?$(5!W(B 0)
-    (?$(5!X(B 0)
-    (?$(5!Y(B 0)
-    (?$(5!Z(B 0)
-    (?$(5![(B 0 (ml . mr))
-    (?$(5!\(B 0)
-    (?$(5!](B 0 (br . tr))
-    (?$(5!^(B 0 (br . tr))
-    (?$(5!_(B 0 (br . tr))
-    (?$(5!`(B 0 (mr . mr))  ; (tc . bc)
-    (?$(5!a(B 0 (mr . mr))
-    (?$(5!b(B 0 (mr . mr))
-    (?$(5!c(B 0 (mr . mr))
-    (?$(5!d(B 0)
-    (?$(5!e(B 0)
-    (?$(5!f(B 0)
-    (?$(5!g(B 0)
-    (?$(5!h(B 0 (br . tr))
-    (?$(5!i(B 0 (br . tr))
-    (?$(5!j(B 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (?$(5!q(B 0)
-    (?$(5!r(B 0)
-    (?$(5!s(B 0)
-    (?$(5!t(B 0)
-    (?$(5!u(B 0)
-    (?$(5!v(B 0)
-    (?$(5!w(B 0)
-    (?$(5!x(B 0)
-    (?$(5!y(B 0)
-    (?$(5!z(B 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (nil 0)
-    (?$(5"!(B 0)
-    (?$(5""(B 0)
-    (?$(5"#(B 0)
-    (?$(5"$(B 0)
-    (?$(5"%(B 0)
-    (?$(5"&(B 0)
-    (?$(5"'(B 0)
-    (?$(5"((B 0)
-    (?$(5")(B 0)
-    (?$(5"*(B 0)
-    (?$(5"+(B 0)
-    (?$(5",(B 0)
-    (?$(5"-(B 0)
-    (?$(5".(B 0)
-    (?$(5"/(B 0)
-    (?$(5"0(B 0)
-    (?$(5"1(B 0)
-    (?$(5"2(B 0)
-    (?$(5"3(B 0)
-    (?$(5"4(B 0)
-    (?$(5"5(B 0)
-    (?$(5"6(B 0)
-    (?$(5"7(B 0)
-    (?$(5"8(B 0)
-    (?$(5"9(B 0)
-    (?$(5":(B 0)
-    (?$(5";(B 0)
-    (?$(5"<(B 0)
-    (?$(5"=(B 0)
-    (?$(5">(B 0)
-    (?$(5"?(B 0)
-    (?$(5"＠(B 0)
-    (?$(5"A(B 0)
-    (?$(5"B(B 0)
-    (?$(5"C(B 0)
-    (?$(5"D(B 0)
-    (?$(5"E(B 0)
-    (?$(5"F(B 0)
-    (?$(5"G(B 0)
-    (?$(5"H(B 0)
-    (?$(5"I(B 0)
-    (?$(5"J(B 0)
-    (?$(5"K(B 0)
-    (?$(5"L(B 0)
-    (?$(5"M(B 0)
-    (?$(5"N(B 0)
-    (?$(5"O(B 0)
-    (?$(5"P(B 0)
-    (?$(5"Q(B 0)
-    (?$(5"R(B 0)
-    (?$(5"S(B 0)
-    (?$(5"T(B 0)
-    (?$(5"U(B 0)
-    (?$(5"V(B 0)
-    (?$(5"W(B 0)
-    (?$(5"X(B 0)
-    (?$(5"Y(B 0)
-    (?$(5"Z(B 0)
-    (?$(5"[(B 0)
-    (?$(5"\(B 0)
-    (?$(5"](B 0)
-    (?$(5"^(B 0)
-    (?$(5"_(B 0)
-    (?$(5"`(B 0)
-    (?$(5"a(B 0)
-    (?$(5"b(B 0)
-    (?$(5"c(B 0)
-    (?$(5"d(B 0)
-    (?$(5"e(B 0)
-    (?$(5"f(B 0)
-    (?$(5"g(B 0)
-    (?$(5"h(B 0)
-    (?$(5"i(B 0)
-    (?$(5"j(B 0)
-    (?$(5"k(B 0)
-    (?$(5"l(B 0)
-    (?$(5"m(B 0)
-    (?$(5"n(B 0)
-    (?$(5"o(B 0)
-    (?$(5"p(B 10 (mr . mr))
-    (?$(5"q(B 0 (br . br))
-    (?$(5"r(B 0 (br . tr))
-    (?$(5"s(B 0)
-    (?$(5"t(B 0)
-    (?$(5"u(B 0)
-    (?$(5"v(B 0)
-    (?$(5"w(B 0)
-    (?$(5"x(B 0)
-    (?$(5"y(B 0)
-    (?$(5"z(B 0)
-    (?$(5"{(B 0)
-    (?$(5"|(B 0)
-    (?$(5"}(B 0)
-    (?$(5"~(B 0)
-    (?$(5#!(B 0)
-    (?$(5#"(B 0)
-    (?$(5##(B 0)
-    (?$(5#$(B 0)
-    (?$(5#%(B 0)
-    (?$(5#&(B 0)
-    (?$(5#'(B 0)
-    (?$(5#((B 0)
-    (?$(5#)(B 0)
-    (?$(5#*(B 0)
-    (?$(5#+(B 0)
-    (?$(5#,(B 0)
-    (?$(5#-(B 0)
-    (?$(5#.(B 0)
-    (?$(5#/(B 0)
-    (?$(5#0(B 0)
-    (?$(5#1(B 0)
-    (?$(5#2(B 0)
-    (?$(5#3(B 0)
-    (?$(5#4(B 0)
-    (?$(5#5(B 0)
-    (?$(5#6(B 0)
-    (?$(5#7(B 0)
-    (?$(5#8(B 0)
-    (?$(5#9(B 0)
-    (?$(5#:(B 0)
-    (?$(5#;(B 0)
-    (?$(5#<(B 0)
-    (?$(5#=(B 0)
-    (?$(5#>(B 0)
-    (?$(5#?(B 0)
-    (?$(5#＠(B 0)
-    (?$(5#A(B 0)
-    (?$(5#B(B 0)
-    (?$(5#C(B 0)
-    (?$(5#D(B 0)
-    (?$(5#E(B 0)
-    (?$(5#F(B 0)
-    (?$(5#G(B 0)
-    (?$(5#H(B 0)
-    (?$(5#I(B 0)
-    (?$(5#J(B 0)
-    (?$(5#K(B 0 (br . tr))
-    (?$(5#L(B 0 (br . tr))
-    (?$(5#M(B 0 (br . tr))
-    (?$(5#N(B 0)
-    (?$(5#O(B 0)
-    (?$(5#P(B 0)
-    (?$(5#Q(B 0)
-    (?$(5#R(B 0)
-    (?$(5#S(B 0)
-    (?$(5#T(B 0)
-    (?$(5#U(B 0)
-    (?$(5#V(B 0)
-    (?$(5#W(B 0)
-    (?$(5#X(B 0)
-    (?$(5#Y(B 0)
-    (?$(5#Z(B 0)
-    (?$(5#[(B 0)
-    (?$(5#\(B 0)
-    (?$(5#](B 0)
-    (?$(5#^(B 0)
-    (?$(5#_(B 0)
-    (?$(5#`(B 0)
-    (?$(5#a(B 0)
-    (?$(5#b(B 0)
-    (?$(5#c(B 0)
-    (?$(5#d(B 0)
-    (?$(5#e(B 0)
-    (?$(5#f(B 0)
-    (?$(5#g(B 0)
-    (?$(5#h(B 0)
-    (?$(5#i(B 0)
-    (?$(5#j(B 0)
-    (?$(5#k(B 0)
-    (?$(5#l(B 0)
-    (?$(5#m(B 0)
-    (?$(5#n(B 0)
-    (?$(5#o(B 0)
-    (?$(5#p(B 0)
-    (?$(5#q(B 0)
-    (?$(5#r(B 0)
-    (?$(5#s(B 0)
-    (?$(5#t(B 0)
-    (?$(5#u(B 0)
-    (?$(5#v(B 0)
-    (?$(5#w(B 0)
-    (?$(5#x(B 0)
-    (?$(5#y(B 0)
-    (?$(5#z(B 0)
-    (?$(5#{(B 0)
-    (?$(5#|(B 0)
-    (?$(5#}(B 0)
-    (?$(5#~(B 0)
-    (?$(5$!(B 0)
-    (?$(5$"(B 0)
-    (?$(5$#(B 0)
-    (?$(5$$(B 0)
-    (?$(5$%(B 0)
-    (?$(5$&(B 0)
-    (?$(5$'(B 0)
-    (?$(5$((B 0)
-    (?$(5$)(B 0)
-    (?$(5$*(B 0)
-    (?$(5$+(B 0)
-    (?$(5$,(B 0)
-    (?$(5$-(B 0)
-    (?$(5$.(B 0)
-    (?$(5$/(B 0)
-    (?$(5$0(B 0)
-    (?$(5$1(B 0)
-    (?$(5$2(B 0)
-    (?$(5$3(B 0)
-    (?$(5$4(B 0)
-    (?$(5$5(B 0)
-    (?$(5$6(B 0)
-    (?$(5$7(B 0)
-    (?$(5$8(B 0)
-    (?$(5$9(B 0)
-    (?$(5$:(B 0)
-    (?$(5$;(B 0)
-    (?$(5$<(B 0)
-    (?$(5$=(B 0)
-    (?$(5$>(B 0)
-    (?$(5$?(B 0)
-    (?$(5$＠(B 0)
-    (?$(5$A(B 0)
-    (?$(5$B(B 0)
-    (?$(5$C(B 0)
-    (?$(5$D(B 0)
-    (?$(5$E(B 0)
-    (?$(5$F(B 0)
-    (?$(5$G(B 0)
-    (?$(5$H(B 0)
-    (?$(5$I(B 0)
-    (?$(5$J(B 0)
-    (?$(5$K(B 0)
-    (?$(5$L(B 0)
-    (?$(5$M(B 0)
-    (?$(5$N(B 0)
-    (?$(5$O(B 0)
-    (?$(5$P(B 0)
-    (?$(5$Q(B 0)
-    (?$(5$R(B 0)
-    (?$(5$S(B 0)
-    (?$(5$T(B 0)
-    (?$(5$U(B 0)
-    (?$(5$V(B 0)
-    (?$(5$W(B 0)
-    (?$(5$X(B 0)
-    (?$(5$Y(B 0)
-    (?$(5$Z(B 0)
-    (?$(5$[(B 0)
-    (?$(5$\(B 0)
-    (?$(5$](B 0)
-    (?$(5$^(B 0)
-    (?$(5$_(B 0)
-    (?$(5$`(B 0)
-    (?$(5$a(B 0)
-    (?$(5$b(B 0)
-    (?$(5$c(B 0)
-    (?$(5$d(B 0)
-    (?$(5$e(B 0)
-    (?$(5$f(B 0)
-    (?$(5$g(B 0)
-    (?$(5$h(B 0)
-    (?$(5$i(B 0)
-    (?$(5$j(B 0)
-    (?$(5$k(B 0)
-    (?$(5$l(B 0)
-    (?$(5$m(B 0)
-    (?$(5$n(B 0)
-    (?$(5$o(B 0)
-    (?$(5$p(B 0)
-    (?$(5$q(B 0)
-    (?$(5$r(B 0)
-    (?$(5$s(B 0)
-    (?$(5$t(B 0)
-    (?$(5$u(B 0)
-    (?$(5$v(B 0)
-    (?$(5$w(B 0)
-    (?$(5$x(B 0)
-    (?$(5$y(B 0)
-    (?$(5$z(B 0)
-    (?$(5${(B 0)
-    (?$(5$|(B 0)
-    (?$(5$}(B 0)
-    (?$(5$~(B 0)
+  (loop for (a . b) in
+    '(((#x21 #x21) 0 (tr . br))		;?$(5!!(B
+      ((#x21 #x22) 0 (mr . mr))		;?$(5!"(B
+      ((#x21 #x23) 0)			;?$(5!#(B
+      ((#x21 #x24) 0)			;?$(5!$(B
+      ((#x21 #x25) 0)			;?$(5!%(B
+      ((#x21 #x26) 0)			;?$(5!&(B
+      ((#x21 #x27) 0)			;?$(5!'(B
+      ((#x21 #x28) 0)			;?$(5!((B
+      ((#x21 #x29) 0)			;?$(5!)(B
+      ((#x21 #x2a) 0)			;?$(5!*(B
+      ((#x21 #x2b) 0)			;?$(5!+(B

+      ((#x21 #x2c) 0)			;?$(5!,(B
+      ((#x21 #x2d) 0)			;?$(5!-(B
+      ((#x21 #x2e) 0)			;?$(5!.(B
+      ((#x21 #x2f) 0)			;?$(5!/(B
+      ((#x21 #x30) 0)			;?$(5!0(B
+      ((#x21 #x31) 0)			;?$(5!1(B
+      ((#x21 #x32) 0)			;?$(5!2(B
+      ((#x21 #x33) 0)			;?$(5!3(B
+      ((#x21 #x34) 0)			;?$(5!4(B
+      ((#x21 #x35) 0)			;?$(5!5(B
+      ((#x21 #x36) 0)			;?$(5!6(B
+      ((#x21 #x37) 0)			;?$(5!7(B
+      ((#x21 #x38) 0)			;?$(5!8(B
+      ((#x21 #x39) 0)			;?$(5!9(B
+      ((#x21 #x3a) 0)			;?$(5!:(B
+      ((#x21 #x3b) 0)			;?$(5!;(B
+      ((#x21 #x3c) 0)			;?$(5!<(B
+      ((#x21 #x3d) 0)			;?$(5!=(B
+      ((#x21 #x3e) 0)			;?$(5!>(B
+      ((#x21 #x3f) 0)			;?$(5!?(B
+      ((#x21 #x40) 0)			;?$(5!＠(B
+      ((#x21 #x41) 0)			;?$(5!A(B
+      ((#x21 #x42) 0)			;?$(5!B(B
+      ((#x21 #x43) 0)			;?$(5!C(B
+      ((#x21 #x44) 0)			;?$(5!D(B
+      ((#x21 #x45) 0)			;?$(5!E(B
+      ((#x21 #x46) 0)			;?$(5!F(B
+      ((#x21 #x47) 0)			;?$(5!G(B
+      ((#x21 #x48) 0)			;?$(5!H(B
+      ((#x21 #x49) 0)			;?$(5!I(B
+      ((#x21 #x4a) 0)			;?$(5!J(B
+      ((#x21 #x4b) 0)			;?$(5!K(B
+      ((#x21 #x4c) 0)			;?$(5!L(B
+      ((#x21 #x4d) 0)			;?$(5!M(B
+      ((#x21 #x4e) 0)			;?$(5!N(B
+      ((#x21 #x4f) 0)			;?$(5!O(B
+      ((#x21 #x50) 0)			;?$(5!P(B
+      ((#x21 #x51) 0)			;?$(5!Q(B
+      ((#x21 #x52) 0)			;?$(5!R(B
+      ((#x21 #x53) 0)			;?$(5!S(B
+      ((#x21 #x54) 0)			;?$(5!T(B
+      ((#x21 #x55) 0)			;?$(5!U(B
+      ((#x21 #x56) 0)			;?$(5!V(B
+      ((#x21 #x57) 0)			;?$(5!W(B
+      ((#x21 #x58) 0)			;?$(5!X(B
+      ((#x21 #x59) 0)			;?$(5!Y(B
+      ((#x21 #x5a) 0)			;?$(5!Z(B
+      ((#x21 #x5b) 0 (ml . mr))		;?$(5![(B
+      ((#x21 #x5c) 0)			;?$(5!\(B
+      ((#x21 #x5d) 0 (br . tr))		;?$(5!](B
+      ((#x21 #x5e) 0 (br . tr))		;?$(5!^(B
+      ((#x21 #x5f) 0 (br . tr))		;?$(5!_(B
+      ((#x21 #x60) 0 (mr . mr))		; (tc . bc) ;?$(5!`(B
+      ((#x21 #x61) 0 (mr . mr))		;?$(5!a(B
+      ((#x21 #x62) 0 (mr . mr))		;?$(5!b(B
+      ((#x21 #x63) 0 (mr . mr))		;?$(5!c(B
+      ((#x21 #x64) 0)			;?$(5!d(B
+      ((#x21 #x65) 0)			;?$(5!e(B
+      ((#x21 #x66) 0)			;?$(5!f(B
+      ((#x21 #x67) 0)			;?$(5!g(B
+      ((#x21 #x68) 0 (br . tr))		;?$(5!h(B
+      ((#x21 #x69) 0 (br . tr))		;?$(5!i(B
+      ((#x21 #x6a) 0)			;?$(5!j(B
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      ((#x21 #x71) 0)			;?$(5!q(B
+      ((#x21 #x72) 0)			;?$(5!r(B
+      ((#x21 #x73) 0)			;?$(5!s(B
+      ((#x21 #x74) 0)			;?$(5!t(B
+      ((#x21 #x75) 0)			;?$(5!u(B
+      ((#x21 #x76) 0)			;?$(5!v(B
+      ((#x21 #x77) 0)			;?$(5!w(B
+      ((#x21 #x78) 0)			;?$(5!x(B
+      ((#x21 #x79) 0)			;?$(5!y(B
+      ((#x21 #x7a) 0)			;?$(5!z(B
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      (nil 0)
+      ((#x22 #x21) 0)			;?$(5"!(B
+      ((#x22 #x22) 0)			;?$(5""(B
+      ((#x22 #x23) 0)			;?$(5"#(B
+      ((#x22 #x24) 0)			;?$(5"$(B
+      ((#x22 #x25) 0)			;?$(5"%(B
+      ((#x22 #x26) 0)			;?$(5"&(B
+      ((#x22 #x27) 0)			;?$(5"'(B
+      ((#x22 #x28) 0)			;?$(5"((B
+      ((#x22 #x29) 0)			;?$(5")(B
+      ((#x22 #x2a) 0)			;?$(5"*(B
+      ((#x22 #x2b) 0)			;?$(5"+(B
+      ((#x22 #x2c) 0)			;?$(5",(B
+      ((#x22 #x2d) 0)			;?$(5"-(B
+      ((#x22 #x2e) 0)			;?$(5".(B
+      ((#x22 #x2f) 0)			;?$(5"/(B
+      ((#x22 #x30) 0)			;?$(5"0(B
+      ((#x22 #x31) 0)			;?$(5"1(B
+      ((#x22 #x32) 0)			;?$(5"2(B
+      ((#x22 #x33) 0)			;?$(5"3(B
+      ((#x22 #x34) 0)			;?$(5"4(B
+      ((#x22 #x35) 0)			;?$(5"5(B
+      ((#x22 #x36) 0)			;?$(5"6(B
+      ((#x22 #x37) 0)			;?$(5"7(B
+      ((#x22 #x38) 0)			;?$(5"8(B
+      ((#x22 #x39) 0)			;?$(5"9(B
+      ((#x22 #x3a) 0)			;?$(5":(B
+      ((#x22 #x3b) 0)			;?$(5";(B
+      ((#x22 #x3c) 0)			;?$(5"<(B
+      ((#x22 #x3d) 0)			;?$(5"=(B
+      ((#x22 #x3e) 0)			;?$(5">(B
+      ((#x22 #x3f) 0)			;?$(5"?(B
+      ((#x22 #x40) 0)			;?$(5"＠(B
+      ((#x22 #x41) 0)			;?$(5"A(B
+      ((#x22 #x42) 0)			;?$(5"B(B
+      ((#x22 #x43) 0)			;?$(5"C(B
+      ((#x22 #x44) 0)			;?$(5"D(B
+      ((#x22 #x45) 0)			;?$(5"E(B
+      ((#x22 #x46) 0)			;?$(5"F(B
+      ((#x22 #x47) 0)			;?$(5"G(B
+      ((#x22 #x48) 0)			;?$(5"H(B
+      ((#x22 #x49) 0)			;?$(5"I(B
+      ((#x22 #x4a) 0)			;?$(5"J(B
+      ((#x22 #x4b) 0)			;?$(5"K(B
+      ((#x22 #x4c) 0)			;?$(5"L(B
+      ((#x22 #x4d) 0)			;?$(5"M(B
+      ((#x22 #x4e) 0)			;?$(5"N(B
+      ((#x22 #x4f) 0)			;?$(5"O(B
+      ((#x22 #x50) 0)			;?$(5"P(B
+      ((#x22 #x51) 0)			;?$(5"Q(B
+      ((#x22 #x52) 0)			;?$(5"R(B
+      ((#x22 #x53) 0)			;?$(5"S(B
+      ((#x22 #x54) 0)			;?$(5"T(B
+      ((#x22 #x55) 0)			;?$(5"U(B
+      ((#x22 #x56) 0)			;?$(5"V(B
+      ((#x22 #x57) 0)			;?$(5"W(B
+      ((#x22 #x58) 0)			;?$(5"X(B
+      ((#x22 #x59) 0)			;?$(5"Y(B
+      ((#x22 #x5a) 0)			;?$(5"Z(B
+      ((#x22 #x5b) 0)			;?$(5"[(B
+      ((#x22 #x5c) 0)			;?$(5"\(B
+      ((#x22 #x5d) 0)			;?$(5"](B
+      ((#x22 #x5e) 0)			;?$(5"^(B
+      ((#x22 #x5f) 0)			;?$(5"_(B
+      ((#x22 #x60) 0)			;?$(5"`(B
+      ((#x22 #x61) 0)			;?$(5"a(B
+      ((#x22 #x62) 0)			;?$(5"b(B
+      ((#x22 #x63) 0)			;?$(5"c(B
+      ((#x22 #x64) 0)			;?$(5"d(B
+      ((#x22 #x65) 0)			;?$(5"e(B
+      ((#x22 #x66) 0)			;?$(5"f(B
+      ((#x22 #x67) 0)			;?$(5"g(B
+      ((#x22 #x68) 0)			;?$(5"h(B
+      ((#x22 #x69) 0)			;?$(5"i(B
+      ((#x22 #x6a) 0)			;?$(5"j(B
+      ((#x22 #x6b) 0)			;?$(5"k(B
+      ((#x22 #x6c) 0)			;?$(5"l(B
+      ((#x22 #x6d) 0)			;?$(5"m(B
+      ((#x22 #x6e) 0)			;?$(5"n(B
+      ((#x22 #x6f) 0)			;?$(5"o(B
+      ((#x22 #x70) 10 (mr . mr))	;?$(5"p(B
+      ((#x22 #x71) 0 (br . br))		;?$(5"q(B
+      ((#x22 #x72) 0 (br . tr))		;?$(5"r(B
+      ((#x22 #x73) 0)			;?$(5"s(B
+      ((#x22 #x74) 0)			;?$(5"t(B
+      ((#x22 #x75) 0)			;?$(5"u(B
+      ((#x22 #x76) 0)			;?$(5"v(B
+      ((#x22 #x77) 0)			;?$(5"w(B
+      ((#x22 #x78) 0)			;?$(5"x(B
+      ((#x22 #x79) 0)			;?$(5"y(B
+      ((#x22 #x7a) 0)			;?$(5"z(B
+      ((#x22 #x7b) 0)			;?$(5"{(B
+      ((#x22 #x7c) 0)			;?$(5"|(B
+      ((#x22 #x7d) 0)			;?$(5"}(B
+      ((#x22 #x7e) 0)			;?$(5"~(B
+      ((#x23 #x21) 0)			;?$(5#!(B
+      ((#x23 #x22) 0)			;?$(5#"(B
+      ((#x23 #x23) 0)			;?$(5##(B
+      ((#x23 #x24) 0)			;?$(5#$(B
+      ((#x23 #x25) 0)			;?$(5#%(B
+      ((#x23 #x26) 0)			;?$(5#&(B
+      ((#x23 #x27) 0)			;?$(5#'(B
+      ((#x23 #x28) 0)			;?$(5#((B
+      ((#x23 #x29) 0)			;?$(5#)(B
+      ((#x23 #x2a) 0)			;?$(5#*(B
+      ((#x23 #x2b) 0)			;?$(5#+(B
+      ((#x23 #x2c) 0)			;?$(5#,(B
+      ((#x23 #x2d) 0)			;?$(5#-(B
+      ((#x23 #x2e) 0)			;?$(5#.(B
+      ((#x23 #x2f) 0)			;?$(5#/(B
+      ((#x23 #x30) 0)			;?$(5#0(B
+      ((#x23 #x31) 0)			;?$(5#1(B
+      ((#x23 #x32) 0)			;?$(5#2(B
+      ((#x23 #x33) 0)			;?$(5#3(B
+      ((#x23 #x34) 0)			;?$(5#4(B
+      ((#x23 #x35) 0)			;?$(5#5(B
+      ((#x23 #x36) 0)			;?$(5#6(B
+      ((#x23 #x37) 0)			;?$(5#7(B
+      ((#x23 #x38) 0)			;?$(5#8(B
+      ((#x23 #x39) 0)			;?$(5#9(B
+      ((#x23 #x3a) 0)			;?$(5#:(B
+      ((#x23 #x3b) 0)			;?$(5#;(B
+      ((#x23 #x3c) 0)			;?$(5#<(B
+      ((#x23 #x3d) 0)			;?$(5#=(B
+      ((#x23 #x3e) 0)			;?$(5#>(B
+      ((#x23 #x3f) 0)			;?$(5#?(B
+      ((#x23 #x40) 0)			;?$(5#＠(B
+      ((#x23 #x41) 0)			;?$(5#A(B
+      ((#x23 #x42) 0)			;?$(5#B(B
+      ((#x23 #x43) 0)			;?$(5#C(B
+      ((#x23 #x44) 0)			;?$(5#D(B
+      ((#x23 #x45) 0)			;?$(5#E(B
+      ((#x23 #x46) 0)			;?$(5#F(B
+      ((#x23 #x47) 0)			;?$(5#G(B
+      ((#x23 #x48) 0)			;?$(5#H(B
+      ((#x23 #x49) 0)			;?$(5#I(B
+      ((#x23 #x4a) 0)			;?$(5#J(B
+      ((#x23 #x4b) 0 (br . tr))		;?$(5#K(B
+      ((#x23 #x4c) 0 (br . tr))		;?$(5#L(B
+      ((#x23 #x4d) 0 (br . tr))		;?$(5#M(B
+      ((#x23 #x4e) 0)			;?$(5#N(B
+      ((#x23 #x4f) 0)			;?$(5#O(B
+      ((#x23 #x50) 0)			;?$(5#P(B
+      ((#x23 #x51) 0)			;?$(5#Q(B
+      ((#x23 #x52) 0)			;?$(5#R(B
+      ((#x23 #x53) 0)			;?$(5#S(B
+      ((#x23 #x54) 0)			;?$(5#T(B
+      ((#x23 #x55) 0)			;?$(5#U(B
+      ((#x23 #x56) 0)			;?$(5#V(B
+      ((#x23 #x57) 0)			;?$(5#W(B
+      ((#x23 #x58) 0)			;?$(5#X(B
+      ((#x23 #x59) 0)			;?$(5#Y(B
+      ((#x23 #x5a) 0)			;?$(5#Z(B
+      ((#x23 #x5b) 0)			;?$(5#[(B
+      ((#x23 #x5c) 0)			;?$(5#\(B
+      ((#x23 #x5d) 0)			;?$(5#](B
+      ((#x23 #x5e) 0)			;?$(5#^(B
+      ((#x23 #x5f) 0)			;?$(5#_(B
+      ((#x23 #x60) 0)			;?$(5#`(B
+      ((#x23 #x61) 0)			;?$(5#a(B
+      ((#x23 #x62) 0)			;?$(5#b(B
+      ((#x23 #x63) 0)			;?$(5#c(B
+      ((#x23 #x64) 0)			;?$(5#d(B
+      ((#x23 #x65) 0)			;?$(5#e(B
+      ((#x23 #x66) 0)			;?$(5#f(B
+      ((#x23 #x67) 0)			;?$(5#g(B
+      ((#x23 #x68) 0)			;?$(5#h(B
+      ((#x23 #x69) 0)			;?$(5#i(B
+      ((#x23 #x6a) 0)			;?$(5#j(B
+      ((#x23 #x6b) 0)			;?$(5#k(B
+      ((#x23 #x6c) 0)			;?$(5#l(B
+      ((#x23 #x6d) 0)			;?$(5#m(B
+      ((#x23 #x6e) 0)			;?$(5#n(B
+      ((#x23 #x6f) 0)			;?$(5#o(B
+      ((#x23 #x70) 0)			;?$(5#p(B
+      ((#x23 #x71) 0)			;?$(5#q(B
+      ((#x23 #x72) 0)			;?$(5#r(B
+      ((#x23 #x73) 0)			;?$(5#s(B
+      ((#x23 #x74) 0)			;?$(5#t(B
+      ((#x23 #x75) 0)			;?$(5#u(B
+      ((#x23 #x76) 0)			;?$(5#v(B
+      ((#x23 #x77) 0)			;?$(5#w(B
+      ((#x23 #x78) 0)			;?$(5#x(B
+      ((#x23 #x79) 0)			;?$(5#y(B
+      ((#x23 #x7a) 0)			;?$(5#z(B
+      ((#x23 #x7b) 0)			;?$(5#{(B
+      ((#x23 #x7c) 0)			;?$(5#|(B
+      ((#x23 #x7d) 0)			;?$(5#}(B
+      ((#x23 #x7e) 0)			;?$(5#~(B
+      ((#x24 #x21) 0)			;?$(5$!(B
+      ((#x24 #x22) 0)			;?$(5$"(B
+      ((#x24 #x23) 0)			;?$(5$#(B
+      ((#x24 #x24) 0)			;?$(5$$(B
+      ((#x24 #x25) 0)			;?$(5$%(B
+      ((#x24 #x26) 0)			;?$(5$&(B
+      ((#x24 #x27) 0)			;?$(5$'(B
+      ((#x24 #x28) 0)			;?$(5$((B
+      ((#x24 #x29) 0)			;?$(5$)(B
+      ((#x24 #x2a) 0)			;?$(5$*(B
+      ((#x24 #x2b) 0)			;?$(5$+(B
+      ((#x24 #x2c) 0)			;?$(5$,(B
+      ((#x24 #x2d) 0)			;?$(5$-(B
+      ((#x24 #x2e) 0)			;?$(5$.(B
+      ((#x24 #x2f) 0)			;?$(5$/(B
+      ((#x24 #x30) 0)			;?$(5$0(B
+      ((#x24 #x31) 0)			;?$(5$1(B
+      ((#x24 #x32) 0)			;?$(5$2(B
+      ((#x24 #x33) 0)			;?$(5$3(B
+      ((#x24 #x34) 0)			;?$(5$4(B
+      ((#x24 #x35) 0)			;?$(5$5(B
+      ((#x24 #x36) 0)			;?$(5$6(B
+      ((#x24 #x37) 0)			;?$(5$7(B
+      ((#x24 #x38) 0)			;?$(5$8(B
+      ((#x24 #x39) 0)			;?$(5$9(B
+      ((#x24 #x3a) 0)			;?$(5$:(B
+      ((#x24 #x3b) 0)			;?$(5$;(B
+      ((#x24 #x3c) 0)			;?$(5$<(B
+      ((#x24 #x3d) 0)			;?$(5$=(B
+      ((#x24 #x3e) 0)			;?$(5$>(B
+      ((#x24 #x3f) 0)			;?$(5$?(B
+      ((#x24 #x40) 0)			;?$(5$＠(B
+      ((#x24 #x41) 0)			;?$(5$A(B
+      ((#x24 #x42) 0)			;?$(5$B(B
+      ((#x24 #x43) 0)			;?$(5$C(B
+      ((#x24 #x44) 0)			;?$(5$D(B
+      ((#x24 #x45) 0)			;?$(5$E(B
+      ((#x24 #x46) 0)			;?$(5$F(B
+      ((#x24 #x47) 0)			;?$(5$G(B
+      ((#x24 #x48) 0)			;?$(5$H(B
+      ((#x24 #x49) 0)			;?$(5$I(B
+      ((#x24 #x4a) 0)			;?$(5$J(B
+      ((#x24 #x4b) 0)			;?$(5$K(B
+      ((#x24 #x4c) 0)			;?$(5$L(B
+      ((#x24 #x4d) 0)			;?$(5$M(B
+      ((#x24 #x4e) 0)			;?$(5$N(B
+      ((#x24 #x4f) 0)			;?$(5$O(B
+      ((#x24 #x50) 0)			;?$(5$P(B
+      ((#x24 #x51) 0)			;?$(5$Q(B
+      ((#x24 #x52) 0)			;?$(5$R(B
+      ((#x24 #x53) 0)			;?$(5$S(B
+      ((#x24 #x54) 0)			;?$(5$T(B
+      ((#x24 #x55) 0)			;?$(5$U(B
+      ((#x24 #x56) 0)			;?$(5$V(B
+      ((#x24 #x57) 0)			;?$(5$W(B
+      ((#x24 #x58) 0)			;?$(5$X(B
+      ((#x24 #x59) 0)			;?$(5$Y(B
+      ((#x24 #x5a) 0)			;?$(5$Z(B
+      ((#x24 #x5b) 0)			;?$(5$[(B
+      ((#x24 #x5c) 0)			;?$(5$\(B
+      ((#x24 #x5d) 0)			;?$(5$](B
+      ((#x24 #x5e) 0)			;?$(5$^(B
+      ((#x24 #x5f) 0)			;?$(5$_(B
+      ((#x24 #x60) 0)			;?$(5$`(B
+      ((#x24 #x61) 0)			;?$(5$a(B
+      ((#x24 #x62) 0)			;?$(5$b(B
+      ((#x24 #x63) 0)			;?$(5$c(B
+      ((#x24 #x64) 0)			;?$(5$d(B
+      ((#x24 #x65) 0)			;?$(5$e(B
+      ((#x24 #x66) 0)			;?$(5$f(B
+      ((#x24 #x67) 0)			;?$(5$g(B
+      ((#x24 #x68) 0)			;?$(5$h(B
+      ((#x24 #x69) 0)			;?$(5$i(B
+      ((#x24 #x6a) 0)			;?$(5$j(B
+      ((#x24 #x6b) 0)			;?$(5$k(B
+      ((#x24 #x6c) 0)			;?$(5$l(B
+      ((#x24 #x6d) 0)			;?$(5$m(B
+      ((#x24 #x6e) 0)			;?$(5$n(B
+      ((#x24 #x6f) 0)			;?$(5$o(B
+      ((#x24 #x70) 0)			;?$(5$p(B
+      ((#x24 #x71) 0)			;?$(5$q(B
+      ((#x24 #x72) 0)			;?$(5$r(B
+      ((#x24 #x73) 0)			;?$(5$s(B
+      ((#x24 #x74) 0)			;?$(5$t(B
+      ((#x24 #x75) 0)			;?$(5$u(B
+      ((#x24 #x76) 0)			;?$(5$v(B
+      ((#x24 #x77) 0)			;?$(5$w(B
+      ((#x24 #x78) 0)			;?$(5$x(B
+      ((#x24 #x79) 0)			;?$(5$y(B
+      ((#x24 #x7a) 0)			;?$(5$z(B
+      ((#x24 #x7b) 0)			;?$(5${(B
+      ((#x24 #x7c) 0)			;?$(5$|(B
+      ((#x24 #x7d) 0)			;?$(5$}(B
+      ((#x24 #x7e) 0)			;?$(5$~(B
+      )
+    collect (cons (and a (apply #'make-char 'indian-2-column a))
+		  b)
     ))

 ;; Determine composition priority and rule of the array of Glyphs.
Index: lisp/mule/devanagari.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/devanagari.el,v
retrieving revision 1.3
diff -u -r1.3 devanagari.el
--- lisp/mule/devanagari.el	2002/03/16 10:39:05	1.3
+++ lisp/mule/devanagari.el	2005/11/22 14:00:06
＠＠ -1,6 +1,7 ＠＠
 ;;; devanagari.el --- support for Devanagari -*- coding: iso-2022-7bit; -*-

 ;; Copyright (C) 1996 Free Software Foundation, Inc.
+;; Copyright (C) 2005 Ben Wing.

 ;; Author: KAWABATA, Taichi <kawabata(a)is.s.u-tokyo.ac.jp&gt;

＠＠ -138,382 +139,386 ＠＠
 ;; Modify the following table if you change the set of 1-column font.
 ;;
 (defconst devanagari-1-column-char 
-  '((?$(5!!(B . ?$(6!!(B)
-    (?$(5!"(B . ?$(6!"(B)
-    (?$(5!#(B . ?$(6!#(B)
-    (?$(5!$(B . nil)
-    (?$(5!%(B . nil)
-    (?$(5!&(B . ?$(6!&(B)
-    (?$(5!'(B . ?$(6!'(B)
-    (?$(5!((B . ?$(6!((B)
-    (?$(5!)(B . nil)
-    (?$(5!*(B . nil)
-    (?$(5!+(B . nil)
-    (?$(5!,(B . nil)
-    (?$(5!-(B . nil)
-    (?$(5!.(B . nil)
-    (?$(5!/(B . nil)
-    (?$(5!0(B . nil)
-    (?$(5!1(B . nil)
-    (?$(5!2(B . nil)
-    (?$(5!3(B . nil)
-    (?$(5!4(B . nil)
-    (?$(5!5(B . ?$(6!5(B)
-    (?$(5!6(B . nil)
-    (?$(5!7(B . nil)
-    (?$(5!8(B . nil)
-    (?$(5!9(B . nil)
-    (?$(5!:(B . nil)
-    (?$(5!;(B . nil)
-    (?$(5!<(B . nil)
-    (?$(5!=(B . ?$(6!=(B)
-    (?$(5!>(B . ?$(6!>(B)
-    (?$(5!?(B . ?$(6!?(B)
-    (?$(5!＠(B . ?$(6!＠(B)
-    (?$(5!A(B . nil)
-    (?$(5!B(B . ?$(6!B(B)
-    (?$(5!C(B . ?$(6!C(B)
-    (?$(5!D(B . ?$(6!D(B)
-    (?$(5!E(B . ?$(6!E(B)
-    (?$(5!F(B . ?$(6!F(B)
-    (?$(5!G(B . ?$(6!G(B)
-    (?$(5!H(B . ?$(6!H(B)
-    (?$(5!I(B . nil)
-    (?$(5!J(B . ?$(6!J(B)
-    (?$(5!K(B . ?$(6!K(B)
-    (?$(5!L(B . ?$(6!L(B)
-    (?$(5!M(B . ?$(6!M(B)
-    (?$(5!N(B . ?$(6!N(B)
-    (?$(5!O(B . ?$(6!O(B)
-    (?$(5!P(B . ?$(6!P(B)
-    (?$(5!Q(B . nil)
-    (?$(5!R(B . nil)
-    (?$(5!S(B . nil)
-    (?$(5!T(B . ?$(6!T(B)
-    (?$(5!U(B . nil)
-    (?$(5!V(B . ?$(6!V(B)
-    (?$(5!W(B . ?$(6!W(B)
-    (?$(5!X(B . ?$(6!X(B)
-    (?$(5!Y(B . nil)
-    (?$(5!Z(B . ?$(6!Z(B)
-    (?$(5![(B . ?$(6![(B)
-    (?$(5!\(B . ?$(6!\(B)
-    (?$(5!](B . ?$(6!](B)
-    (?$(5!^(B . ?$(6!^(B)
-    (?$(5!_(B . ?$(6!_(B)
-    (?$(5!`(B . ?$(6!`(B)
-    (?$(5!a(B . ?$(6!a(B)
-    (?$(5!b(B . ?$(6!b(B)
-    (?$(5!c(B . ?$(6!c(B)
-    (?$(5!d(B . ?$(6!d(B)
-    (?$(5!e(B . ?$(6!e(B)
-    (?$(5!f(B . ?$(6!f(B)
-    (?$(5!g(B . ?$(6!g(B)
-    (?$(5!h(B . ?$(6!h(B)
-    (?$(5!i(B . ?$(6!i(B)
-    (?$(5!j(B . ?$(6!j(B)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (?$(5!q(B . ?$(6!q(B)
-    (?$(5!r(B . ?$(6!r(B)
-    (?$(5!s(B . ?$(6!s(B)
-    (?$(5!t(B . ?$(6!t(B)
-    (?$(5!u(B . ?$(6!u(B)
-    (?$(5!v(B . ?$(6!v(B)
-    (?$(5!w(B . ?$(6!w(B)
-    (?$(5!x(B . ?$(6!x(B)
-    (?$(5!y(B . ?$(6!y(B)
-    (?$(5!z(B . ?$(6!z(B)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (nil . nil)
-    (?$(5"!(B . nil)
-    (?$(5""(B . nil)
-    (?$(5"#(B . nil)
-    (?$(5"$(B . ?$(6"$(B)
-    (?$(5"%(B . ?$(6"%(B)
-    (?$(5"&(B . ?$(6"&(B)
-    (?$(5"'(B . nil)
-    (?$(5"((B . nil)
-    (?$(5")(B . nil)
-    (?$(5"*(B . nil)
-    (?$(5"+(B . nil)
-    (?$(5",(B . ?$(6",(B)
-    (?$(5"-(B . nil)
-    (?$(5".(B . ?$(6".(B)
-    (?$(5"/(B . nil)
-    (?$(5"0(B . nil)
-    (?$(5"1(B . nil)
-    (?$(5"2(B . nil)
-    (?$(5"3(B . ?$(6"3(B)
-    (?$(5"4(B . ?$(6"4(B)
-    (?$(5"5(B . ?$(6"5(B)
-    (?$(5"6(B . ?$(6"6(B)
-    (?$(5"7(B . nil)
-    (?$(5"8(B . ?$(6"8(B)
-    (?$(5"9(B . nil)
-    (?$(5":(B . ?$(6":(B)
-    (?$(5";(B . ?$(6";(B)
-    (?$(5"<(B . ?$(6"<(B)
-    (?$(5"=(B . nil)
-    (?$(5">(B . nil)
-    (?$(5"?(B . nil)
-    (?$(5"＠(B . nil)
-    (?$(5"A(B . ?$(6"A(B)
-    (?$(5"B(B . ?$(6"B(B)
-    (?$(5"C(B . ?$(6"C(B)
-    (?$(5"D(B . nil)
-    (?$(5"E(B . ?$(6"E(B)
-    (?$(5"F(B . ?$(6"F(B)
-    (?$(5"G(B . ?$(6"G(B)
-    (?$(5"H(B . ?$(6"H(B)
-    (?$(5"I(B . ?$(6"I(B)
-    (?$(5"J(B . ?$(6"J(B)
-    (?$(5"K(B . ?$(6"K(B)
-    (?$(5"L(B . ?$(6"L(B)
-    (?$(5"M(B . ?$(6"M(B)
-    (?$(5"N(B . ?$(6"N(B)
-    (?$(5"O(B . nil)
-    (?$(5"P(B . nil)
-    (?$(5"Q(B . ?$(6"Q(B)
-    (?$(5"R(B . nil)
-    (?$(5"S(B . nil)
-    (?$(5"T(B . ?$(6"T(B)
-    (?$(5"U(B . ?$(6"U(B)
-    (?$(5"V(B . ?$(6"V(B)
-    (?$(5"W(B . ?$(6"W(B)
-    (?$(5"X(B . nil)
-    (?$(5"Y(B . nil)
-    (?$(5"Z(B . nil)
-    (?$(5"[(B . nil)
-    (?$(5"\(B . nil)
-    (?$(5"](B . ?$(6"](B)
-    (?$(5"^(B . nil)
-    (?$(5"_(B . nil)
-    (?$(5"`(B . ?$(6"`(B)
-    (?$(5"a(B . ?$(6"a(B)
-    (?$(5"b(B . ?$(6"b(B)
-    (?$(5"c(B . ?$(6"c(B)
-    (?$(5"d(B . ?$(6"d(B)
-    (?$(5"e(B . ?$(6"e(B)
-    (?$(5"f(B . ?$(6"f(B)
-    (?$(5"g(B . ?$(6"g(B)
-    (?$(5"h(B . ?$(6"h(B)
-    (?$(5"i(B . ?$(6"i(B)
-    (?$(5"j(B . ?$(6"j(B)
-    (?$(5"k(B . ?$(6"k(B)
-    (?$(5"l(B . ?$(6"l(B)
-    (?$(5"m(B . ?$(6"m(B)
-    (?$(5"n(B . nil)
-    (?$(5"o(B . nil)
-    (?$(5"p(B . ?$(6"p(B)
-    (?$(5"q(B . ?$(6"q(B)
-    (?$(5"r(B . ?$(6"r(B)
-    (?$(5"s(B . ?$(6"s(B)
-    (?$(5"t(B . ?$(6"t(B)
-    (?$(5"u(B . ?$(6"u(B)
-    (?$(5"v(B . nil)
-    (?$(5"w(B . nil)
-    (?$(5"x(B . nil)
-    (?$(5"y(B . ?$(6"y(B)
-    (?$(5"z(B . ?$(6"z(B)
-    (?$(5"{(B . nil)
-    (?$(5"|(B . nil)
-    (?$(5"}(B . nil)
-    (?$(5"~(B . nil)
-    (?$(5#!(B . nil)
-    (?$(5#"(B . nil)
-    (?$(5##(B . nil)
-    (?$(5#$(B . nil)
-    (?$(5#%(B . nil)
-    (?$(5#&(B . nil)
-    (?$(5#'(B . nil)
-    (?$(5#((B . nil)
-    (?$(5#)(B . nil)
-    (?$(5#*(B . nil)
-    (?$(5#+(B . nil)
-    (?$(5#,(B . nil)
-    (?$(5#-(B . nil)
-    (?$(5#.(B . nil)
-    (?$(5#/(B . nil)
-    (?$(5#0(B . nil)
-    (?$(5#1(B . nil)
-    (?$(5#2(B . nil)
-    (?$(5#3(B . nil)
-    (?$(5#4(B . nil)
-    (?$(5#5(B . ?$(6#5(B)
-    (?$(5#6(B . nil)
-    (?$(5#7(B . nil)
-    (?$(5#8(B . nil)
-    (?$(5#9(B . nil)
-    (?$(5#:(B . nil)
-    (?$(5#;(B . nil)
-    (?$(5#<(B . nil)
-    (?$(5#=(B . nil)
-    (?$(5#>(B . nil)
-    (?$(5#?(B . ?$(6#?(B)
-    (?$(5#＠(B . ?$(6#＠(B)
-    (?$(5#A(B . nil)
-    (?$(5#B(B . nil)
-    (?$(5#C(B . nil)
-    (?$(5#D(B . nil)
-    (?$(5#E(B . nil)
-    (?$(5#F(B . nil)
-    (?$(5#G(B . nil)
-    (?$(5#H(B . nil)
-    (?$(5#I(B . nil)
-    (?$(5#J(B . ?$(6#J(B)
-    (?$(5#K(B . ?$(6#K(B)
-    (?$(5#L(B . ?$(6#L(B)
-    (?$(5#M(B . ?$(6#M(B)
-    (?$(5#N(B . nil)
-    (?$(5#O(B . nil)
-    (?$(5#P(B . nil)
-    (?$(5#Q(B . nil)
-    (?$(5#R(B . ?$(6#R(B)
-    (?$(5#S(B . nil)
-    (?$(5#T(B . nil)
-    (?$(5#U(B . nil)
-    (?$(5#V(B . nil)
-    (?$(5#W(B . nil)
-    (?$(5#X(B . nil)
-    (?$(5#Y(B . nil)
-    (?$(5#Z(B . nil)
-    (?$(5#[(B . nil)
-    (?$(5#\(B . nil)
-    (?$(5#](B . nil)
-    (?$(5#^(B . nil)
-    (?$(5#_(B . nil)
-    (?$(5#`(B . nil)
-    (?$(5#a(B . ?$(6#a(B)
-    (?$(5#b(B . ?$(6#b(B)
-    (?$(5#c(B . nil)
-    (?$(5#d(B . nil)
-    (?$(5#e(B . nil)
-    (?$(5#f(B . nil)
-    (?$(5#g(B . nil)
-    (?$(5#h(B . nil)
-    (?$(5#i(B . nil)
-    (?$(5#j(B . ?$(6#j(B)
-    (?$(5#k(B . ?$(6#k(B)
-    (?$(5#l(B . ?$(6#l(B)
-    (?$(5#m(B . nil)
-    (?$(5#n(B . nil)
-    (?$(5#o(B . nil)
-    (?$(5#p(B . nil)
-    (?$(5#q(B . nil)
-    (?$(5#r(B . nil)
-    (?$(5#s(B . nil)
-    (?$(5#t(B . nil)
-    (?$(5#u(B . nil)
-    (?$(5#v(B . nil)
-    (?$(5#w(B . nil)
-    (?$(5#x(B . nil)
-    (?$(5#y(B . nil)
-    (?$(5#z(B . nil)
-    (?$(5#{(B . nil)
-    (?$(5#|(B . nil)
-    (?$(5#}(B . nil)
-    (?$(5#~(B . nil)
-    (?$(5$!(B . nil)
-    (?$(5$"(B . nil)
-    (?$(5$#(B . nil)
-    (?$(5$$(B . nil)
-    (?$(5$%(B . nil)
-    (?$(5$&(B . nil)
-    (?$(5$'(B . nil)
-    (?$(5$((B . nil)
-    (?$(5$)(B . nil)
-    (?$(5$*(B . nil)
-    (?$(5$+(B . nil)
-    (?$(5$,(B . nil)
-    (?$(5$-(B . nil)
-    (?$(5$.(B . nil)
-    (?$(5$/(B . nil)
-    (?$(5$0(B . nil)
-    (?$(5$1(B . nil)
-    (?$(5$2(B . nil)
-    (?$(5$3(B . nil)
-    (?$(5$4(B . nil)
-    (?$(5$5(B . nil)
-    (?$(5$6(B . nil)
-    (?$(5$7(B . nil)
-    (?$(5$8(B . nil)
-    (?$(5$9(B . nil)
-    (?$(5$:(B . nil)
-    (?$(5$;(B . nil)
-    (?$(5$<(B . nil)
-    (?$(5$=(B . nil)
-    (?$(5$>(B . nil)
-    (?$(5$?(B . nil)
-    (?$(5$＠(B . nil)
-    (?$(5$A(B . ?$(6$A(B)
-    (?$(5$B(B . nil)
-    (?$(5$C(B . nil)
-    (?$(5$D(B . nil)
-    (?$(5$E(B . ?$(6$E(B)
-    (?$(5$F(B . nil)
-    (?$(5$G(B . nil)
-    (?$(5$H(B . ?$(6$H(B)
-    (?$(5$I(B . ?$(6$I(B)
-    (?$(5$J(B . ?$(6$J(B)
-    (?$(5$K(B . nil)
-    (?$(5$L(B . nil)
-    (?$(5$M(B . nil)
-    (?$(5$N(B . ?$(6$N(B)
-    (?$(5$O(B . nil)
-    (?$(5$P(B . ?$(6$P(B)
-    (?$(5$Q(B . ?$(6$Q(B)
-    (?$(5$R(B . ?$(6$R(B)
-    (?$(5$S(B . nil)
-    (?$(5$T(B . nil)
-    (?$(5$U(B . nil)
-    (?$(5$V(B . nil)
-    (?$(5$W(B . nil)
-    (?$(5$X(B . nil)
-    (?$(5$Y(B . nil)
-    (?$(5$Z(B . nil)
-    (?$(5$[(B . nil)
-    (?$(5$\(B . nil)
-    (?$(5$](B . nil)
-    (?$(5$^(B . nil)
-    (?$(5$_(B . nil)
-    (?$(5$`(B . nil)
-    (?$(5$a(B . nil)
-    (?$(5$b(B . nil)
-    (?$(5$c(B . nil)
-    (?$(5$d(B . nil)
-    (?$(5$e(B . nil)
-    (?$(5$f(B . nil)
-    (?$(5$g(B . nil)
-    (?$(5$h(B . ?$(6$h(B)
-    (?$(5$i(B . ?$(6$i(B)
-    (?$(5$j(B . ?$(6$j(B)
-    (?$(5$k(B . nil)
-    (?$(5$l(B . ?$(6$l(B)
-    (?$(5$m(B . ?$(6$m(B)
-    (?$(5$n(B . ?$(6$n(B)
-    (?$(5$o(B . nil)
-    (?$(5$p(B . ?$(6$p(B)
-    (?$(5$q(B . ?$(6$q(B)
-    (?$(5$r(B . ?$(6$r(B)
-    (?$(5$s(B . nil)
-    (?$(5$t(B . nil)
-    (?$(5$u(B . ?$(6$u(B)
-    (?$(5$v(B . ?$(6$v(B)
-    (?$(5$w(B . nil)
-    (?$(5$x(B . ?$(6$x(B)
-    (?$(5$y(B . ?$(6$y(B)
-    (?$(5$z(B . nil)
-    (?$(5${(B . nil)
-    (?$(5$|(B . nil)
-    (?$(5$}(B . nil)
-    (?$(5$~(B . nil)
+  (loop for (a . b) in
+    '(((#x21 #x21) . (#x21 #x21))	;?$(5!!(B ;?$(6!!(B
+      ((#x21 #x22) . (#x21 #x22))	;?$(5!"(B ;?$(6!"(B
+      ((#x21 #x23) . (#x21 #x23))	;?$(5!#(B ;?$(6!#(B
+      ((#x21 #x24) . nil)		;?$(5!$(B
+      ((#x21 #x25) . nil)		;?$(5!%(B
+      ((#x21 #x26) . (#x21 #x26))	;?$(5!&(B ;?$(6!&(B
+      ((#x21 #x27) . (#x21 #x27))	;?$(5!'(B ;?$(6!'(B
+      ((#x21 #x28) . (#x21 #x28))	;?$(5!((B ;?$(6!((B
+      ((#x21 #x29) . nil)		;?$(5!)(B
+      ((#x21 #x2a) . nil)		;?$(5!*(B
+      ((#x21 #x2b) . nil)		;?$(5!+(B
+      ((#x21 #x2c) . nil)		;?$(5!,(B
+      ((#x21 #x2d) . nil)		;?$(5!-(B
+      ((#x21 #x2e) . nil)		;?$(5!.(B
+      ((#x21 #x2f) . nil)		;?$(5!/(B
+      ((#x21 #x30) . nil)		;?$(5!0(B
+      ((#x21 #x31) . nil)		;?$(5!1(B
+      ((#x21 #x32) . nil)		;?$(5!2(B
+      ((#x21 #x33) . nil)		;?$(5!3(B
+      ((#x21 #x34) . nil)		;?$(5!4(B
+      ((#x21 #x35) . (#x21 #x35))	;?$(5!5(B ;?$(6!5(B
+      ((#x21 #x36) . nil)		;?$(5!6(B
+      ((#x21 #x37) . nil)		;?$(5!7(B
+      ((#x21 #x38) . nil)		;?$(5!8(B
+      ((#x21 #x39) . nil)		;?$(5!9(B
+      ((#x21 #x3a) . nil)		;?$(5!:(B
+      ((#x21 #x3b) . nil)		;?$(5!;(B
+      ((#x21 #x3c) . nil)		;?$(5!<(B
+      ((#x21 #x3d) . (#x21 #x3d))	;?$(5!=(B ;?$(6!=(B
+      ((#x21 #x3e) . (#x21 #x3e))	;?$(5!>(B ;?$(6!>(B
+      ((#x21 #x3f) . (#x21 #x3f))	;?$(5!?(B ;?$(6!?(B
+      ((#x21 #x40) . (#x21 #x40))	;?$(5!＠(B ;?$(6!＠(B
+      ((#x21 #x41) . nil)		;?$(5!A(B
+      ((#x21 #x42) . (#x21 #x42))	;?$(5!B(B ;?$(6!B(B
+      ((#x21 #x43) . (#x21 #x43))	;?$(5!C(B ;?$(6!C(B
+      ((#x21 #x44) . (#x21 #x44))	;?$(5!D(B ;?$(6!D(B
+      ((#x21 #x45) . (#x21 #x45))	;?$(5!E(B ;?$(6!E(B
+      ((#x21 #x46) . (#x21 #x46))	;?$(5!F(B ;?$(6!F(B
+      ((#x21 #x47) . (#x21 #x47))	;?$(5!G(B ;?$(6!G(B
+      ((#x21 #x48) . (#x21 #x48))	;?$(5!H(B ;?$(6!H(B
+      ((#x21 #x49) . nil)		;?$(5!I(B
+      ((#x21 #x4a) . (#x21 #x4a))	;?$(5!J(B ;?$(6!J(B
+      ((#x21 #x4b) . (#x21 #x4b))	;?$(5!K(B ;?$(6!K(B
+      ((#x21 #x4c) . (#x21 #x4c))	;?$(5!L(B ;?$(6!L(B
+      ((#x21 #x4d) . (#x21 #x4d))	;?$(5!M(B ;?$(6!M(B
+      ((#x21 #x4e) . (#x21 #x4e))	;?$(5!N(B ;?$(6!N(B
+      ((#x21 #x4f) . (#x21 #x4f))	;?$(5!O(B ;?$(6!O(B
+      ((#x21 #x50) . (#x21 #x50))	;?$(5!P(B ;?$(6!P(B
+      ((#x21 #x51) . nil)		;?$(5!Q(B
+      ((#x21 #x52) . nil)		;?$(5!R(B
+      ((#x21 #x53) . nil)		;?$(5!S(B
+      ((#x21 #x54) . (#x21 #x54))	;?$(5!T(B ;?$(6!T(B
+      ((#x21 #x55) . nil)		;?$(5!U(B
+      ((#x21 #x56) . (#x21 #x56))	;?$(5!V(B ;?$(6!V(B
+      ((#x21 #x57) . (#x21 #x57))	;?$(5!W(B ;?$(6!W(B
+      ((#x21 #x58) . (#x21 #x58))	;?$(5!X(B ;?$(6!X(B
+      ((#x21 #x59) . nil)		;?$(5!Y(B
+      ((#x21 #x5a) . (#x21 #x5a))	;?$(5!Z(B ;?$(6!Z(B
+      ((#x21 #x5b) . (#x21 #x5b))	;?$(5![(B ;?$(6![(B
+      ((#x21 #x5c) . (#x21 #x5c))	;?$(5!\(B ;?$(6!\(B
+      ((#x21 #x5d) . (#x21 #x5d))	;?$(5!](B ;?$(6!](B
+      ((#x21 #x5e) . (#x21 #x5e))	;?$(5!^(B ;?$(6!^(B
+      ((#x21 #x5f) . (#x21 #x5f))	;?$(5!_(B ;?$(6!_(B
+      ((#x21 #x60) . (#x21 #x60))	;?$(5!`(B ;?$(6!`(B
+      ((#x21 #x61) . (#x21 #x61))	;?$(5!a(B ;?$(6!a(B
+      ((#x21 #x62) . (#x21 #x62))	;?$(5!b(B ;?$(6!b(B
+      ((#x21 #x63) . (#x21 #x63))	;?$(5!c(B ;?$(6!c(B
+      ((#x21 #x64) . (#x21 #x64))	;?$(5!d(B ;?$(6!d(B
+      ((#x21 #x65) . (#x21 #x65))	;?$(5!e(B ;?$(6!e(B
+      ((#x21 #x66) . (#x21 #x66))	;?$(5!f(B ;?$(6!f(B
+      ((#x21 #x67) . (#x21 #x67))	;?$(5!g(B ;?$(6!g(B
+      ((#x21 #x68) . (#x21 #x68))	;?$(5!h(B ;?$(6!h(B
+      ((#x21 #x69) . (#x21 #x69))	;?$(5!i(B ;?$(6!i(B
+      ((#x21 #x6a) . (#x21 #x6a))	;?$(5!j(B ;?$(6!j(B
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      ((#x21 #x71) . (#x21 #x71))	;?$(5!q(B ;?$(6!q(B
+      ((#x21 #x72) . (#x21 #x72))	;?$(5!r(B ;?$(6!r(B
+      ((#x21 #x73) . (#x21 #x73))	;?$(5!s(B ;?$(6!s(B
+      ((#x21 #x74) . (#x21 #x74))	;?$(5!t(B ;?$(6!t(B
+      ((#x21 #x75) . (#x21 #x75))	;?$(5!u(B ;?$(6!u(B
+      ((#x21 #x76) . (#x21 #x76))	;?$(5!v(B ;?$(6!v(B
+      ((#x21 #x77) . (#x21 #x77))	;?$(5!w(B ;?$(6!w(B
+      ((#x21 #x78) . (#x21 #x78))	;?$(5!x(B ;?$(6!x(B
+      ((#x21 #x79) . (#x21 #x79))	;?$(5!y(B ;?$(6!y(B
+      ((#x21 #x7a) . (#x21 #x7a))	;?$(5!z(B ;?$(6!z(B
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      (nil . nil)
+      ((#x22 #x21) . nil)		;?$(5"!(B
+      ((#x22 #x22) . nil)		;?$(5""(B
+      ((#x22 #x23) . nil)		;?$(5"#(B
+      ((#x22 #x24) . (#x22 #x24))	;?$(5"$(B ;?$(6"$(B
+      ((#x22 #x25) . (#x22 #x25))	;?$(5"%(B ;?$(6"%(B
+      ((#x22 #x26) . (#x22 #x26))	;?$(5"&(B ;?$(6"&(B
+      ((#x22 #x27) . nil)		;?$(5"'(B
+      ((#x22 #x28) . nil)		;?$(5"((B
+      ((#x22 #x29) . nil)		;?$(5")(B
+      ((#x22 #x2a) . nil)		;?$(5"*(B
+      ((#x22 #x2b) . nil)		;?$(5"+(B
+      ((#x22 #x2c) . (#x22 #x2c))	;?$(5",(B ;?$(6",(B
+      ((#x22 #x2d) . nil)		;?$(5"-(B
+      ((#x22 #x2e) . (#x22 #x2e))	;?$(5".(B ;?$(6".(B
+      ((#x22 #x2f) . nil)		;?$(5"/(B
+      ((#x22 #x30) . nil)		;?$(5"0(B
+      ((#x22 #x31) . nil)		;?$(5"1(B
+      ((#x22 #x32) . nil)		;?$(5"2(B
+      ((#x22 #x33) . (#x22 #x33))	;?$(5"3(B ;?$(6"3(B
+      ((#x22 #x34) . (#x22 #x34))	;?$(5"4(B ;?$(6"4(B
+      ((#x22 #x35) . (#x22 #x35))	;?$(5"5(B ;?$(6"5(B
+      ((#x22 #x36) . (#x22 #x36))	;?$(5"6(B ;?$(6"6(B
+      ((#x22 #x37) . nil)		;?$(5"7(B
+      ((#x22 #x38) . (#x22 #x38))	;?$(5"8(B ;?$(6"8(B
+      ((#x22 #x39) . nil)		;?$(5"9(B
+      ((#x22 #x3a) . (#x22 #x3a))	;?$(5":(B ;?$(6":(B
+      ((#x22 #x3b) . (#x22 #x3b))	;?$(5";(B ;?$(6";(B
+      ((#x22 #x3c) . (#x22 #x3c))	;?$(5"<(B ;?$(6"<(B
+      ((#x22 #x3d) . nil)		;?$(5"=(B
+      ((#x22 #x3e) . nil)		;?$(5">(B
+      ((#x22 #x3f) . nil)		;?$(5"?(B
+      ((#x22 #x40) . nil)		;?$(5"＠(B
+      ((#x22 #x41) . (#x22 #x41))	;?$(5"A(B ;?$(6"A(B
+      ((#x22 #x42) . (#x22 #x42))	;?$(5"B(B ;?$(6"B(B
+      ((#x22 #x43) . (#x22 #x43))	;?$(5"C(B ;?$(6"C(B
+      ((#x22 #x44) . nil)		;?$(5"D(B
+      ((#x22 #x45) . (#x22 #x45))	;?$(5"E(B ;?$(6"E(B
+      ((#x22 #x46) . (#x22 #x46))	;?$(5"F(B ;?$(6"F(B
+      ((#x22 #x47) . (#x22 #x47))	;?$(5"G(B ;?$(6"G(B
+      ((#x22 #x48) . (#x22 #x48))	;?$(5"H(B ;?$(6"H(B
+      ((#x22 #x49) . (#x22 #x49))	;?$(5"I(B ;?$(6"I(B
+      ((#x22 #x4a) . (#x22 #x4a))	;?$(5"J(B ;?$(6"J(B
+      ((#x22 #x4b) . (#x22 #x4b))	;?$(5"K(B ;?$(6"K(B
+      ((#x22 #x4c) . (#x22 #x4c))	;?$(5"L(B ;?$(6"L(B
+      ((#x22 #x4d) . (#x22 #x4d))	;?$(5"M(B ;?$(6"M(B
+      ((#x22 #x4e) . (#x22 #x4e))	;?$(5"N(B ;?$(6"N(B
+      ((#x22 #x4f) . nil)		;?$(5"O(B
+      ((#x22 #x50) . nil)		;?$(5"P(B
+      ((#x22 #x51) . (#x22 #x51))	;?$(5"Q(B ;?$(6"Q(B
+      ((#x22 #x52) . nil)		;?$(5"R(B
+      ((#x22 #x53) . nil)		;?$(5"S(B
+      ((#x22 #x54) . (#x22 #x54))	;?$(5"T(B ;?$(6"T(B
+      ((#x22 #x55) . (#x22 #x55))	;?$(5"U(B ;?$(6"U(B
+      ((#x22 #x56) . (#x22 #x56))	;?$(5"V(B ;?$(6"V(B
+      ((#x22 #x57) . (#x22 #x57))	;?$(5"W(B ;?$(6"W(B
+      ((#x22 #x58) . nil)		;?$(5"X(B
+      ((#x22 #x59) . nil)		;?$(5"Y(B
+      ((#x22 #x5a) . nil)		;?$(5"Z(B
+      ((#x22 #x5b) . nil)		;?$(5"[(B
+      ((#x22 #x5c) . nil)		;?$(5"\(B
+      ((#x22 #x5d) . (#x22 #x5d))	;?$(5"](B ;?$(6"](B
+      ((#x22 #x5e) . nil)		;?$(5"^(B
+      ((#x22 #x5f) . nil)		;?$(5"_(B
+      ((#x22 #x60) . (#x22 #x60))	;?$(5"`(B ;?$(6"`(B
+      ((#x22 #x61) . (#x22 #x61))	;?$(5"a(B ;?$(6"a(B
+      ((#x22 #x62) . (#x22 #x62))	;?$(5"b(B ;?$(6"b(B
+      ((#x22 #x63) . (#x22 #x63))	;?$(5"c(B ;?$(6"c(B
+      ((#x22 #x64) . (#x22 #x64))	;?$(5"d(B ;?$(6"d(B
+      ((#x22 #x65) . (#x22 #x65))	;?$(5"e(B ;?$(6"e(B
+      ((#x22 #x66) . (#x22 #x66))	;?$(5"f(B ;?$(6"f(B
+      ((#x22 #x67) . (#x22 #x67))	;?$(5"g(B ;?$(6"g(B
+      ((#x22 #x68) . (#x22 #x68))	;?$(5"h(B ;?$(6"h(B
+      ((#x22 #x69) . (#x22 #x69))	;?$(5"i(B ;?$(6"i(B
+      ((#x22 #x6a) . (#x22 #x6a))	;?$(5"j(B ;?$(6"j(B
+      ((#x22 #x6b) . (#x22 #x6b))	;?$(5"k(B ;?$(6"k(B
+      ((#x22 #x6c) . (#x22 #x6c))	;?$(5"l(B ;?$(6"l(B
+      ((#x22 #x6d) . (#x22 #x6d))	;?$(5"m(B ;?$(6"m(B
+      ((#x22 #x6e) . nil)		;?$(5"n(B
+      ((#x22 #x6f) . nil)		;?$(5"o(B
+      ((#x22 #x70) . (#x22 #x70))	;?$(5"p(B ;?$(6"p(B
+      ((#x22 #x71) . (#x22 #x71))	;?$(5"q(B ;?$(6"q(B
+      ((#x22 #x72) . (#x22 #x72))	;?$(5"r(B ;?$(6"r(B
+      ((#x22 #x73) . (#x22 #x73))	;?$(5"s(B ;?$(6"s(B
+      ((#x22 #x74) . (#x22 #x74))	;?$(5"t(B ;?$(6"t(B
+      ((#x22 #x75) . (#x22 #x75))	;?$(5"u(B ;?$(6"u(B
+      ((#x22 #x76) . nil)		;?$(5"v(B
+      ((#x22 #x77) . nil)		;?$(5"w(B
+      ((#x22 #x78) . nil)		;?$(5"x(B
+      ((#x22 #x79) . (#x22 #x79))	;?$(5"y(B ;?$(6"y(B
+      ((#x22 #x7a) . (#x22 #x7a))	;?$(5"z(B ;?$(6"z(B
+      ((#x22 #x7b) . nil)		;?$(5"{(B
+      ((#x22 #x7c) . nil)		;?$(5"|(B
+      ((#x22 #x7d) . nil)		;?$(5"}(B
+      ((#x22 #x7e) . nil)		;?$(5"~(B
+      ((#x23 #x21) . nil)		;?$(5#!(B
+      ((#x23 #x22) . nil)		;?$(5#"(B
+      ((#x23 #x23) . nil)		;?$(5##(B
+      ((#x23 #x24) . nil)		;?$(5#$(B
+      ((#x23 #x25) . nil)		;?$(5#%(B
+      ((#x23 #x26) . nil)		;?$(5#&(B
+      ((#x23 #x27) . nil)		;?$(5#'(B
+      ((#x23 #x28) . nil)		;?$(5#((B
+      ((#x23 #x29) . nil)		;?$(5#)(B
+      ((#x23 #x2a) . nil)		;?$(5#*(B
+      ((#x23 #x2b) . nil)		;?$(5#+(B
+      ((#x23 #x2c) . nil)		;?$(5#,(B
+      ((#x23 #x2d) . nil)		;?$(5#-(B
+      ((#x23 #x2e) . nil)		;?$(5#.(B
+      ((#x23 #x2f) . nil)		;?$(5#/(B
+      ((#x23 #x30) . nil)		;?$(5#0(B
+      ((#x23 #x31) . nil)		;?$(5#1(B
+      ((#x23 #x32) . nil)		;?$(5#2(B
+      ((#x23 #x33) . nil)		;?$(5#3(B
+      ((#x23 #x34) . nil)		;?$(5#4(B
+      ((#x23 #x35) . (#x23 #x35))	;?$(5#5(B ;?$(6#5(B
+      ((#x23 #x36) . nil)		;?$(5#6(B
+      ((#x23 #x37) . nil)		;?$(5#7(B
+      ((#x23 #x38) . nil)		;?$(5#8(B
+      ((#x23 #x39) . nil)		;?$(5#9(B
+      ((#x23 #x3a) . nil)		;?$(5#:(B
+      ((#x23 #x3b) . nil)		;?$(5#;(B
+      ((#x23 #x3c) . nil)		;?$(5#<(B
+      ((#x23 #x3d) . nil)		;?$(5#=(B
+      ((#x23 #x3e) . nil)		;?$(5#>(B
+      ((#x23 #x3f) . (#x23 #x3f))	;?$(5#?(B ;?$(6#?(B
+      ((#x23 #x40) . (#x23 #x40))	;?$(5#＠(B ;?$(6#＠(B
+      ((#x23 #x41) . nil)		;?$(5#A(B
+      ((#x23 #x42) . nil)		;?$(5#B(B
+      ((#x23 #x43) . nil)		;?$(5#C(B
+      ((#x23 #x44) . nil)		;?$(5#D(B
+      ((#x23 #x45) . nil)		;?$(5#E(B
+      ((#x23 #x46) . nil)		;?$(5#F(B
+      ((#x23 #x47) . nil)		;?$(5#G(B
+      ((#x23 #x48) . nil)		;?$(5#H(B
+      ((#x23 #x49) . nil)		;?$(5#I(B
+      ((#x23 #x4a) . (#x23 #x4a))	;?$(5#J(B ;?$(6#J(B
+      ((#x23 #x4b) . (#x23 #x4b))	;?$(5#K(B ;?$(6#K(B
+      ((#x23 #x4c) . (#x23 #x4c))	;?$(5#L(B ;?$(6#L(B
+      ((#x23 #x4d) . (#x23 #x4d))	;?$(5#M(B ;?$(6#M(B
+      ((#x23 #x4e) . nil)		;?$(5#N(B
+      ((#x23 #x4f) . nil)		;?$(5#O(B
+      ((#x23 #x50) . nil)		;?$(5#P(B
+      ((#x23 #x51) . nil)		;?$(5#Q(B
+      ((#x23 #x52) . (#x23 #x52))	;?$(5#R(B ;?$(6#R(B
+      ((#x23 #x53) . nil)		;?$(5#S(B
+      ((#x23 #x54) . nil)		;?$(5#T(B
+      ((#x23 #x55) . nil)		;?$(5#U(B
+      ((#x23 #x56) . nil)		;?$(5#V(B
+      ((#x23 #x57) . nil)		;?$(5#W(B
+      ((#x23 #x58) . nil)		;?$(5#X(B
+      ((#x23 #x59) . nil)		;?$(5#Y(B
+      ((#x23 #x5a) . nil)		;?$(5#Z(B
+      ((#x23 #x5b) . nil)		;?$(5#[(B
+      ((#x23 #x5c) . nil)		;?$(5#\(B
+      ((#x23 #x5d) . nil)		;?$(5#](B
+      ((#x23 #x5e) . nil)		;?$(5#^(B
+      ((#x23 #x5f) . nil)		;?$(5#_(B
+      ((#x23 #x60) . nil)		;?$(5#`(B
+      ((#x23 #x61) . (#x23 #x61))	;?$(5#a(B ;?$(6#a(B
+      ((#x23 #x62) . (#x23 #x62))	;?$(5#b(B ;?$(6#b(B
+      ((#x23 #x63) . nil)		;?$(5#c(B
+      ((#x23 #x64) . nil)		;?$(5#d(B
+      ((#x23 #x65) . nil)		;?$(5#e(B
+      ((#x23 #x66) . nil)		;?$(5#f(B
+      ((#x23 #x67) . nil)		;?$(5#g(B
+      ((#x23 #x68) . nil)		;?$(5#h(B
+      ((#x23 #x69) . nil)		;?$(5#i(B
+      ((#x23 #x6a) . (#x23 #x6a))	;?$(5#j(B ;?$(6#j(B
+      ((#x23 #x6b) . (#x23 #x6b))	;?$(5#k(B ;?$(6#k(B
+      ((#x23 #x6c) . (#x23 #x6c))	;?$(5#l(B ;?$(6#l(B
+      ((#x23 #x6d) . nil)		;?$(5#m(B
+      ((#x23 #x6e) . nil)		;?$(5#n(B
+      ((#x23 #x6f) . nil)		;?$(5#o(B
+      ((#x23 #x70) . nil)		;?$(5#p(B
+      ((#x23 #x71) . nil)		;?$(5#q(B
+      ((#x23 #x72) . nil)		;?$(5#r(B
+      ((#x23 #x73) . nil)		;?$(5#s(B
+      ((#x23 #x74) . nil)		;?$(5#t(B
+      ((#x23 #x75) . nil)		;?$(5#u(B
+      ((#x23 #x76) . nil)		;?$(5#v(B
+      ((#x23 #x77) . nil)		;?$(5#w(B
+      ((#x23 #x78) . nil)		;?$(5#x(B
+      ((#x23 #x79) . nil)		;?$(5#y(B
+      ((#x23 #x7a) . nil)		;?$(5#z(B
+      ((#x23 #x7b) . nil)		;?$(5#{(B
+      ((#x23 #x7c) . nil)		;?$(5#|(B
+      ((#x23 #x7d) . nil)		;?$(5#}(B
+      ((#x23 #x7e) . nil)		;?$(5#~(B
+      ((#x24 #x21) . nil)		;?$(5$!(B
+      ((#x24 #x22) . nil)		;?$(5$"(B
+      ((#x24 #x23) . nil)		;?$(5$#(B
+      ((#x24 #x24) . nil)		;?$(5$$(B
+      ((#x24 #x25) . nil)		;?$(5$%(B
+      ((#x24 #x26) . nil)		;?$(5$&(B
+      ((#x24 #x27) . nil)		;?$(5$'(B
+      ((#x24 #x28) . nil)		;?$(5$((B
+      ((#x24 #x29) . nil)		;?$(5$)(B
+      ((#x24 #x2a) . nil)		;?$(5$*(B
+      ((#x24 #x2b) . nil)		;?$(5$+(B
+      ((#x24 #x2c) . nil)		;?$(5$,(B
+      ((#x24 #x2d) . nil)		;?$(5$-(B
+      ((#x24 #x2e) . nil)		;?$(5$.(B
+      ((#x24 #x2f) . nil)		;?$(5$/(B
+      ((#x24 #x30) . nil)		;?$(5$0(B
+      ((#x24 #x31) . nil)		;?$(5$1(B
+      ((#x24 #x32) . nil)		;?$(5$2(B
+      ((#x24 #x33) . nil)		;?$(5$3(B
+      ((#x24 #x34) . nil)		;?$(5$4(B
+      ((#x24 #x35) . nil)		;?$(5$5(B
+      ((#x24 #x36) . nil)		;?$(5$6(B
+      ((#x24 #x37) . nil)		;?$(5$7(B
+      ((#x24 #x38) . nil)		;?$(5$8(B
+      ((#x24 #x39) . nil)		;?$(5$9(B
+      ((#x24 #x3a) . nil)		;?$(5$:(B
+      ((#x24 #x3b) . nil)		;?$(5$;(B
+      ((#x24 #x3c) . nil)		;?$(5$<(B
+      ((#x24 #x3d) . nil)		;?$(5$=(B
+      ((#x24 #x3e) . nil)		;?$(5$>(B
+      ((#x24 #x3f) . nil)		;?$(5$?(B
+      ((#x24 #x40) . nil)		;?$(5$＠(B
+      ((#x24 #x41) . (#x24 #x41))	;?$(5$A(B ;?$(6$A(B
+      ((#x24 #x42) . nil)		;?$(5$B(B
+      ((#x24 #x43) . nil)		;?$(5$C(B
+      ((#x24 #x44) . nil)		;?$(5$D(B
+      ((#x24 #x45) . (#x24 #x45))	;?$(5$E(B ;?$(6$E(B
+      ((#x24 #x46) . nil)		;?$(5$F(B
+      ((#x24 #x47) . nil)		;?$(5$G(B
+      ((#x24 #x48) . (#x24 #x48))	;?$(5$H(B ;?$(6$H(B
+      ((#x24 #x49) . (#x24 #x49))	;?$(5$I(B ;?$(6$I(B
+      ((#x24 #x4a) . (#x24 #x4a))	;?$(5$J(B ;?$(6$J(B
+      ((#x24 #x4b) . nil)		;?$(5$K(B
+      ((#x24 #x4c) . nil)		;?$(5$L(B
+      ((#x24 #x4d) . nil)		;?$(5$M(B
+      ((#x24 #x4e) . (#x24 #x4e))	;?$(5$N(B ;?$(6$N(B
+      ((#x24 #x4f) . nil)		;?$(5$O(B
+      ((#x24 #x50) . (#x24 #x50))	;?$(5$P(B ;?$(6$P(B
+      ((#x24 #x51) . (#x24 #x51))	;?$(5$Q(B ;?$(6$Q(B
+      ((#x24 #x52) . (#x24 #x52))	;?$(5$R(B ;?$(6$R(B
+      ((#x24 #x53) . nil)		;?$(5$S(B
+      ((#x24 #x54) . nil)		;?$(5$T(B
+      ((#x24 #x55) . nil)		;?$(5$U(B
+      ((#x24 #x56) . nil)		;?$(5$V(B
+      ((#x24 #x57) . nil)		;?$(5$W(B
+      ((#x24 #x58) . nil)		;?$(5$X(B
+      ((#x24 #x59) . nil)		;?$(5$Y(B
+      ((#x24 #x5a) . nil)		;?$(5$Z(B
+      ((#x24 #x5b) . nil)		;?$(5$[(B
+      ((#x24 #x5c) . nil)		;?$(5$\(B
+      ((#x24 #x5d) . nil)		;?$(5$](B
+      ((#x24 #x5e) . nil)		;?$(5$^(B
+      ((#x24 #x5f) . nil)		;?$(5$_(B
+      ((#x24 #x60) . nil)		;?$(5$`(B
+      ((#x24 #x61) . nil)		;?$(5$a(B
+      ((#x24 #x62) . nil)		;?$(5$b(B
+      ((#x24 #x63) . nil)		;?$(5$c(B
+      ((#x24 #x64) . nil)		;?$(5$d(B
+      ((#x24 #x65) . nil)		;?$(5$e(B
+      ((#x24 #x66) . nil)		;?$(5$f(B
+      ((#x24 #x67) . nil)		;?$(5$g(B
+      ((#x24 #x68) . (#x24 #x68))	;?$(5$h(B ;?$(6$h(B
+      ((#x24 #x69) . (#x24 #x69))	;?$(5$i(B ;?$(6$i(B
+      ((#x24 #x6a) . (#x24 #x6a))	;?$(5$j(B ;?$(6$j(B
+      ((#x24 #x6b) . nil)		;?$(5$k(B
+      ((#x24 #x6c) . (#x24 #x6c))	;?$(5$l(B ;?$(6$l(B
+      ((#x24 #x6d) . (#x24 #x6d))	;?$(5$m(B ;?$(6$m(B
+      ((#x24 #x6e) . (#x24 #x6e))	;?$(5$n(B ;?$(6$n(B
+      ((#x24 #x6f) . nil)		;?$(5$o(B
+      ((#x24 #x70) . (#x24 #x70))	;?$(5$p(B ;?$(6$p(B
+      ((#x24 #x71) . (#x24 #x71))	;?$(5$q(B ;?$(6$q(B
+      ((#x24 #x72) . (#x24 #x72))	;?$(5$r(B ;?$(6$r(B
+      ((#x24 #x73) . nil)		;?$(5$s(B
+      ((#x24 #x74) . nil)		;?$(5$t(B
+      ((#x24 #x75) . (#x24 #x75))	;?$(5$u(B ;?$(6$u(B
+      ((#x24 #x76) . (#x24 #x76))	;?$(5$v(B ;?$(6$v(B
+      ((#x24 #x77) . nil)		;?$(5$w(B
+      ((#x24 #x78) . (#x24 #x78))	;?$(5$x(B ;?$(6$x(B
+      ((#x24 #x79) . (#x24 #x79))	;?$(5$y(B ;?$(6$y(B
+      ((#x24 #x7a) . nil)		;?$(5$z(B
+      ((#x24 #x7b) . nil)		;?$(5${(B
+      ((#x24 #x7c) . nil)		;?$(5$|(B
+      ((#x24 #x7d) . nil)		;?$(5$}(B
+      ((#x24 #x7e) . nil)		;?$(5$~(B
+      )
+    collect (cons (and a (apply #'make-char 'indian-2-column a))
+		  (and b (apply #'make-char 'indian-1-column b)))
     ))

 (provide 'devanagari)
Index: lisp/mule/ethio-util.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/ethio-util.el,v
retrieving revision 1.7
diff -u -r1.7 ethio-util.el
--- lisp/mule/ethio-util.el	2004/06/06 23:58:46	1.7
+++ lisp/mule/ethio-util.el	2005/11/22 14:00:06
＠＠ -1778,7 +1778,7 ＠＠
 (defun ethio-fidel-to-java-buffer nil
   "Convert Ethiopic characters into the Java escape sequences.

-Each escape sequence is of the form \uXXXX, where XXXX is the
+Each escape sequence is of the form \\uXXXX, where XXXX is the
 character's codepoint (in hex) in Unicode.

 If `ethio-java-save-lowercase' is non-nil, use [0-9a-f].
Index: lisp/mule/ethiopic.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/ethiopic.el,v
retrieving revision 1.6
diff -u -r1.6 ethiopic.el
--- lisp/mule/ethiopic.el	2002/03/16 10:39:06	1.6
+++ lisp/mule/ethiopic.el	2005/11/22 14:00:06
＠＠ -28,20 +28,6 ＠＠

 ;;; Code:

-;; Ethiopic characters (Amahric and Tigrigna).
-(make-charset 'ethiopic "Ethiopic characters"
-	      '(dimension
-		2
-		registry "Ethiopic-Unicode"
-		chars 94
-		columns 2
-		direction l2r
-		final ?3
-		graphic 0
-		short-name "Ethiopic"
-		long-name "Ethiopic characters"
-		))
-
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;; ETHIOPIC
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
＠＠ -49,28 +35,29 ＠＠
 (define-category ?E "Ethiopic (Ge'ez) character.")
 (modify-category-entry 'ethiopic ?E)

-(define-ccl-program ccl-encode-ethio-font
-  '(0
-    ;; In:  R0:ethiopic (not checked)
-    ;;      R1:position code 1
-    ;;      R2:position code 2
-    ;; Out: R1:font code point 1
-    ;;      R2:font code point 2
-    ((r1 -= 33)
-     (r2 -= 33)
-     (r1 *= 94)
-     (r2 += r1)
-     (if (r2 < 256)
-	 (r1 = ?\x12)
-       (if (r2 < 448)
-	   ((r1 = ?\x13) (r2 -= 256))
-	 ((r1 = ?\xfd) (r2 -= 208))
-	 ))))
-  "CCL program to encode an Ethiopic code to code point of Ethiopic font.")
+(when (featurep 'ccl)
+  (define-ccl-program ccl-encode-ethio-font
+    '(0
+      ;; In:  R0:ethiopic (not checked)
+      ;;      R1:position code 1
+      ;;      R2:position code 2
+      ;; Out: R1:font code point 1
+      ;;      R2:font code point 2
+      ((r1 -= 33)
+       (r2 -= 33)
+       (r1 *= 94)
+       (r2 += r1)
+       (if (r2 < 256)
+	   (r1 = ?\x12)
+	 (if (r2 < 448)
+	     ((r1 = ?\x13) (r2 -= 256))
+	   ((r1 = ?\xfd) (r2 -= 208))
+	   ))))
+    "CCL program to encode an Ethiopic code to code point of Ethiopic font.")

-;; (setq font-ccl-encoder-alist
-;;       (cons (cons "ethiopic" ccl-encode-ethio-font)
font-ccl-encoder-alist))
-(set-charset-ccl-program 'ethiopic 'ccl-encode-ethio-font)
+  ;; (setq font-ccl-encoder-alist
+  ;;       (cons (cons "ethiopic" ccl-encode-ethio-font)
font-ccl-encoder-alist))
+  (set-charset-ccl-program 'ethiopic 'ccl-encode-ethio-font))

 (set-language-info-alist
  "Ethiopic" '((setup-function . setup-ethiopic-environment-internal)
Index: lisp/mule/european.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/european.el,v
retrieving revision 1.12
diff -u -r1.12 european.el
--- lisp/mule/european.el	2005/05/10 17:02:59	1.12
+++ lisp/mule/european.el	2005/11/22 14:00:06
＠＠ -3,7 +3,7 ＠＠
 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
-;; Copyright (C) 2001 Ben Wing.
+;; Copyright (C) 2001, 2005 Ben Wing
 ;; Copyright (C) 2002, 2005 Free Software Foundation

 ;; Keywords: multilingual, European
＠＠ -28,125 +28,16 ＠＠
 ;;; Commentary:

 ;; For Roman-alphabet-using Europeans, eight coded character sets,
-;; ISO8859-1,2,3,4,9,14,15,16 are supported.
+;; ISO8859-1,2,3,4,9,10,13,14,15,16 are supported.

 ;; #### latin.el would be a better name for this file.

 ;;; Code:
-; (make-charset 'latin-iso8859-1 
-; 	      "Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?A
-; 		graphic 1
-; 		short-name "RHP of Latin-1"
-; 		long-name "RHP of Latin-1 (ISO 8859-1): ISO-IR-100"
-; 		))
-
-; (make-charset 'latin-iso8859-2 
-; 	      "Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2): ISO-IR-101"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-2"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?B
-; 		graphic 1
-; 		short-name "RHP of Latin-2"
-; 		long-name "RHP of Latin-2 (ISO 8859-2): ISO-IR-101"
-; 		))
-
-; (make-charset 'latin-iso8859-3 
-; 	      "Right-Hand Part of Latin Alphabet 3 (ISO/IEC 8859-3): ISO-IR-109"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-3"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?C
-; 		graphic 1
-; 		short-name "RHP of Latin-3"
-; 		long-name "RHP of Latin-3 (ISO 8859-3): ISO-IR-109"
-; 		))
-
-; (make-charset 'latin-iso8859-4 
-; 	      "Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-4"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?D
-; 		graphic 1
-; 		short-name "RHP of Latin-4"
-; 		long-name "RHP of Latin-4 (ISO 8859-4): ISO-IR-110"
-; 		))
-
-; (make-charset 'latin-iso8859-9 
-; 	      "Right-Hand Part of Latin Alphabet 5 (ISO/IEC 8859-9): ISO-IR-148"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-9"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?M
-; 		graphic 1
-; 		short-name "RHP of Latin-5"
-; 		long-name "RHP of Latin-5 (ISO 8859-9): ISO-IR-148"
-; 		))
-
-; (make-charset 'latin-iso8859-15 
-; 	      "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-15"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?b
-; 		graphic 1
-; 		short-name "RHP of Latin-9"
-; 		long-name "RHP of Latin-9 (ISO 8859-15): ISO-IR-203"
-; 		))
-
-(make-charset 'latin-iso8859-14 
-	      "Right-Hand Part of Latin Alphabet 8 (ISO/IEC 8859-14)"
-	      '(dimension
-		1
-		registry "ISO8859-14"
-		chars 96
-		columns 1
-		direction l2r
-		final ?_
-		graphic 1
-		short-name "RHP of Latin-8"
-		long-name "RHP of Latin-8 (ISO 8859-14)"
-		))
-
-(make-charset 'latin-iso8859-16
-	      "Right-Hand Part of Latin Alphabet 10 (ISO/IEC 8859-16)"
-	      '(dimension
-		1
-		registry "ISO8859-16"
-		chars 96
-		columns 1
-		direction l2r
-		final ?f			; octet 06/06; cf ISO-IR 226
-		graphic 1
-		short-name "RHP of Latin-10"
-		long-name "RHP of Latin-10 (ISO 8859-16)"
-		))

 ;; Latin-1 is dealt with in iso8859-1.el, which see. 

+;; ＠＠#### Support ISO 8859-10, 13.
+
 ;; ISO 8859-14. 
 ;; 
 ;; Initialise all characters to word syntax.
＠＠ -209,8 +100,9 ＠＠
 	  (make-char 'latin-iso8859-15 c)
 	  (string (char-syntax (make-char 'latin-iso8859-1 c)))))
 ;; Now, the exceptions
-(loop for c in '(?,b&(B ?,b((B ?,b4(B ?,b8(B ?,b<(B ?,b=(B
?,b>(B)
-      do (modify-syntax-entry c "w"))
+(loop for c in '(#xa6 #xa8 #xb4 #xb8 #xbc #xbd #xbe)
+  ;;(?,b&(B ?,b((B ?,b4(B ?,b8(B ?,b<(B ?,b=(B ?,b>(B)
+  do (modify-syntax-entry (make-char 'latin-iso8859-15 c) "w"))

 ;; Again, perpetuating insanity with the guillemets.
 (modify-syntax-entry (make-char 'latin-iso8859-16 #xab) 
＠＠ -220,37 +112,46 ＠＠
 ;; end of ISO 8859-15. 

 ;; For syntax of Latin-2
-(loop for c in '(?,B!(B ?,B#(B ?,B%(B ?,B&(B ?,B)(B ?,B*(B ?,B+(B
?,B,(B ?,B.(B ?,B/(B ?,B1(B ?,B3(B ?,B5(B ?,B6(B ?,B9(B ?,B:(B ?,B;(B
?,B<(B)
-      do (modify-syntax-entry c "w"))
+(loop for c in '(#xa1 #xa3 #xa5 #xa6 #xa9 #xaa #xab #xac #xae #xaf #xb1 #xb3
+		 #xb5 #xb6 #xb9 #xba #xbb #xbc)
+  ;;(?,B!(B ?,B#(B ?,B%(B ?,B&(B ?,B)(B ?,B*(B ?,B+(B ?,B,(B ?,B.(B
?,B/(B ?,B1(B ?,B3(B ?,B5(B ?,B6(B ?,B9(B ?,B:(B ?,B;(B ?,B<(B)
+  do (modify-syntax-entry (make-char 'latin-iso8859-2 c) "w"))

 (loop for c from 62 to 126
       do (modify-syntax-entry (make-char 'latin-iso8859-2 c) "w"))

 (modify-syntax-entry (make-char 'latin-iso8859-2 32) "w") ; no-break space
-(modify-syntax-entry ?,BW(B ".")
-(modify-syntax-entry ?,Bw(B ".")
+(modify-syntax-entry (make-char 'latin-iso8859-2 #xd7) ".") ;?,BW(B
+(modify-syntax-entry (make-char 'latin-iso8859-2 #xf7) ".") ;?,Bw(B

 ;; For syntax of Latin-3
-(loop for c in '(?,C!(B ?,C&(B ?,C)(B ?,C*(B ?,C+(B ?,C,(B ?,C/(B
?,C1(B ?,C5(B ?,C6(B ?,C:(B ?,C;(B ?,C<(B ?,C?(B)
-  do (modify-syntax-entry c "w"))
-
-(loop for c from 64 to 126
+(loop for c in '(#xa1 #xa6 #xa9 #xaa #xab #xac #xaf #xb1 #xb5 #xb6 #xba #xbb
+		 #xbc #xbf)
+  ;;(?,C!(B ?,C&(B ?,C)(B ?,C*(B ?,C+(B ?,C,(B ?,C/(B ?,C1(B ?,C5(B
?,C6(B ?,C:(B ?,C;(B ?,C<(B ?,C?(B)
   do (modify-syntax-entry (make-char 'latin-iso8859-3 c) "w"))

+(loop for c from 64 to 126
+  do (let ((ch (make-char 'latin-iso8859-3 c)))
+       ;; There are gaps in the ISO8859-3 encoding.
+       (when ch
+	 (modify-syntax-entry ch "w"))))
+       
 (modify-syntax-entry (make-char 'latin-iso8859-3 32) "w") ; no-break space
-(modify-syntax-entry ?,CW(B ".")
-(modify-syntax-entry ?,Cw(B ".")
+(modify-syntax-entry (make-char 'latin-iso8859-3 #xd7) ".") ;?,CW(B
+(modify-syntax-entry (make-char 'latin-iso8859-3 #xf7) ".") ;?,Cw(B

 ;; For syntax of Latin-4
-(loop for c in '(?,D!(B ?,D"(B ?,D#(B ?,D%(B ?,D&(B ?,D)(B
?,D*(B ?,D+(B ?,D,(B ?,D.(B ?,D1(B ?,D3(B ?,D5(B ?,D6(B ?,D9(B ?,D:(B
?,D;(B ?,D<(B ?,D=(B ?,D>(B ?,D?(B)
-  do (modify-syntax-entry c "w"))
+(loop for c in '(#xa1 #xa2 #xa3 #xa5 #xa6 #xa9 #xaa #xab #xac #xae #xb1 #xb3
+		 #xb5 #xb6 #xb9 #xba #xbb #xbc #xbd #xbe #xbf)
+  ;;(?,D!(B ?,D"(B ?,D#(B ?,D%(B ?,D&(B ?,D)(B ?,D*(B ?,D+(B
?,D,(B ?,D.(B ?,D1(B ?,D3(B ?,D5(B ?,D6(B ?,D9(B ?,D:(B ?,D;(B
?,D<(B ?,D=(B ?,D>(B ?,D?(B)
+  do (modify-syntax-entry (make-char 'latin-iso8859-4 c) "w"))

 (loop for c from 64 to 126
   do (modify-syntax-entry (make-char 'latin-iso8859-4 c) "w"))

 (modify-syntax-entry (make-char 'latin-iso8859-4 32) "w") ; no-break space
-(modify-syntax-entry ?,DW(B ".")
-(modify-syntax-entry ?,Dw(B ".")
+(modify-syntax-entry (make-char 'latin-iso8859-4 #xd7) ".") ;?,DW(B
+(modify-syntax-entry (make-char 'latin-iso8859-4 #xf7) ".") ;?,Dw(B

 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
＠＠ -456,7 +357,7 ＠＠
 		       ("\
 This language environment is a generic one for Latin-9 (ISO-8859-15)
 character set which supports the Euro sign and the following languages
-(they use the Latin-1 character set by default):
+\(they use the Latin-1 character set by default):
  Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic,
  Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.
 Each also has its own specific language environment."))
Index: lisp/mule/greek.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/greek.el,v
retrieving revision 1.5
diff -u -r1.5 greek.el
--- lisp/mule/greek.el	2002/03/16 10:39:06	1.5
+++ lisp/mule/greek.el	2005/11/22 14:00:06
＠＠ -29,27 +29,16 ＠＠

 ;;; Code:

-; (make-charset 'greek-iso8859-7 
-; 	      "Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7):
ISO-IR-126"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-7"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?F
-; 		graphic 1
-; 		short-name "RHP of ISO8859/7"
-; 		long-name "RHP of Greek (ISO 8859-7): ISO-IR-126"
-; 		))
-
 ;; For syntax of Greek
 (loop for c from 54 to 126
-      do (modify-syntax-entry (make-char 'greek-iso8859-7 c) "w"))
+  do (let ((ch (make-char 'greek-iso8859-7 c)))
+       ;; There are gaps in the ISO8859-7 encoding.
+       (when ch
+	 (modify-syntax-entry ch "w"))))
 (modify-syntax-entry (make-char 'greek-iso8859-7 32) "w") ; no-break space
-(modify-syntax-entry ?,F7(B ".")
-(modify-syntax-entry ?,F;(B ".")
-(modify-syntax-entry ?,F=(B ".")
+(modify-syntax-entry (make-char 'greek-iso8859-7 #xb7) ".") ;?,F7(B
+(modify-syntax-entry (make-char 'greek-iso8859-7 #xbb) ".") ;?,F;(B
+(modify-syntax-entry (make-char 'greek-iso8859-7 #xbd) ".") ;?,F=(B

 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Index: lisp/mule/hebrew.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/hebrew.el,v
retrieving revision 1.4
diff -u -r1.4 hebrew.el
--- lisp/mule/hebrew.el	2002/03/16 10:39:06	1.4
+++ lisp/mule/hebrew.el	2005/11/22 14:00:06
＠＠ -28,20 +28,6 ＠＠

 ;;; Code:

-; (make-charset 'hebrew-iso8859-8 
-; 	      "Right-Hand Part of Latin/Hebrew Alphabet (ISO/IEC 8859-8):
ISO-IR-138"
-; 	      '(dimension
-; 		1
-; 		registry "ISO8859-8"
-; 		chars 96
-; 		columns 1
-; 		direction r2l
-; 		final ?H
-; 		graphic 1
-; 		short-name "RHP of ISO8859/8"
-; 		long-name "RHP of Hebrew (ISO 8859-8): ISO-IR-138"
-; 		))
-
 ;; Syntax of Hebrew characters
 (loop for c from 96 to 122
       do (modify-syntax-entry (make-char 'hebrew-iso8859-8 c) "w"))
Index: lisp/mule/indian.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/indian.el,v
retrieving revision 1.3
diff -u -r1.3 indian.el
--- lisp/mule/indian.el	2002/03/16 10:39:06	1.3
+++ lisp/mule/indian.el	2005/11/22 14:00:06
＠＠ -95,50 +95,6 ＠＠
 ;; not assigned.  They are automatically converted to each Indian
 ;; script which IS-13194 supports.

-(make-charset 'indian-is13194 
-	      "Generic Indian charset for data exchange with IS 13194"
-	      '(dimension
-		1
-		registry "IS13194-Devanagari"
-		chars 94
-		columns 2
-		direction l2r
-		final ?5

-		graphic 1
-		short-name "IS 13194"
-		long-name "Indian IS 13194"
-		))
-
-;; Actual Glyph for 1-column width.
-(make-charset 'indian-1-column 
-	      "Indian charset for 2-column width glyphs"
-	      '(dimension
-		2
-		registry "MuleIndian-1"
-		chars 94
-		columns 1
-		direction l2r
-		final ?6
-		graphic 0
-		short-name "Indian 1-col"
-		long-name "Indian 1 Column"
-		))
-
-;; Actual Glyph for 2-column width.
-(make-charset 'indian-2-column 
-	      "Indian charset for 2-column width glyphs"
-	      '(dimension
-		2
-		registry "MuleIndian-2"
-		chars 94
-		columns 2
-		direction l2r
-		final ?5
-		graphic 0
-		short-name "Indian 2-col"
-		long-name "Indian 2 Column"
-		))
-
 (defvar indian-itrans-consonant-alist
   '(
     ("k" . "(53(B")
Index: lisp/mule/japanese.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/japanese.el,v
retrieving revision 1.11
diff -u -r1.11 japanese.el
--- lisp/mule/japanese.el	2004/01/29 05:22:40	1.11
+++ lisp/mule/japanese.el	2005/11/22 14:00:07
＠＠ -3,7 +3,7 ＠＠
 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
-;; Copyright (C) 2000, 2002 Ben Wing.
+;; Copyright (C) 2000, 2002, 2005 Ben Wing.

 ;; Keywords: multilingual, Japanese

＠＠ -33,103 +33,6 ＠＠

 ;;; Code:

-; (make-charset 'katakana-jisx0201 
-; 	      "Katakana Part of JISX0201.1976"
-; 	      '(dimension
-; 		1
-; 		registry "JISX0201"
-; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?I
-; 		graphic 1
-; 		short-name "JISX0201 Katakana"
-; 		long-name "Japanese Katakana (JISX0201.1976)"
-; 		))
-
-; (make-charset 'latin-jisx0201 
-; 	      "Roman Part of JISX0201.1976"
-; 	      '(dimension
-; 		1
-; 		registry "JISX0201"
-; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?J
-; 		graphic 0
-; 		short-name "JISX0201 Roman"
-; 		long-name "Japanese Roman (JISX0201.1976)"
-; 		))
-
-; (make-charset 'japanese-jisx0208-1978 
-; 	      "JISX0208.1978 Japanese Kanji (so called \"old JIS\"):
ISO-IR-42"
-; 	      '(dimension
-; 		2
-; 		registry "JISX0208.1990"
-; 		registry "JISX0208.1978"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?＠
-; 		graphic 0
-; 		short-name "JISX0208.1978"
-; 		long-name "JISX0208.1978 (Japanese): ISO-IR-42"
-; 		))
-
-; (make-charset 'japanese-jisx0208 
-; 	      "JISX0208.1983/1990 Japanese Kanji: ISO-IR-87"
-; 	      '(dimension
-; 		2
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?B
-; 		graphic 0
-; 		short-name "JISX0208"
-; 		long-name "JISX0208.1983/1990 (Japanese): ISO-IR-87"
-; 		))
-
-; (make-charset 'japanese-jisx0212 
-; 	      "JISX0212 Japanese supplement: ISO-IR-159"
-; 	      '(dimension
-; 		2
-; 		registry "JISX0212"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?D
-; 		graphic 0
-; 		short-name "JISX0212"
-; 		long-name "JISX0212 (Japanese): ISO-IR-159"
-; 		))
-
-(make-charset 'japanese-jisx0213-1 "JISX0213 Plane 1 (Japanese)"
-	      '(dimension
-		2
-		registry "JISX0213.2000-1"
-		chars 94
-		columns 2
-		direction l2r
-		final ?O
-		graphic 0
-		short-name "JISX0213-1"
-		long-name "JISX0213-1"
-		))
-
-;; JISX0213 Plane 2
-(make-charset 'japanese-jisx0213-2 "JISX0213 Plane 2 (Japanese)"
-	      '(dimension
-		2
-		registry "JISX0213.2000-2"
-		chars 94
-		columns 2
-		direction l2r
-		final ?P
-		graphic 0
-		short-name "JISX0213-2"
-		long-name "JISX0213-2"
-		))
-
 ;;; Syntax of Japanese characters.
 (modify-syntax-entry 'katakana-jisx0201 "w")
 (modify-syntax-entry 'japanese-jisx0212 "w")
＠＠ -137,18 +40,20 ＠＠
 (modify-syntax-entry 'japanese-jisx0208 "w")
 (loop for row in '(33 34 40)
       do (modify-syntax-entry `[japanese-jisx0208 ,row] "_"))
-(loop for char in '(?$B!<(B ?$B!+(B ?$B!,(B ?$B!3(B ?$B!4(B ?$B!5(B
?$B!6(B ?$B!7(B ?$B!8(B ?$B!9(B ?$B!:(B ?$B!;(B)
-      do (modify-syntax-entry char "w"))
-(modify-syntax-entry ?\$B!J(B "($B!K(B")
-(modify-syntax-entry ?\$B!N(B "($B!O(B")
-(modify-syntax-entry ?\$B!P(B "($B!Q(B")
-(modify-syntax-entry ?\$B!V(B "($B!W(B")
-(modify-syntax-entry ?\$B!X(B "($B!Y(B")
-(modify-syntax-entry ?\$B!K(B ")$B!J(B")
-(modify-syntax-entry ?\$B!O(B ")$B!N(B")
-(modify-syntax-entry ?\$B!Q(B ")$B!P(B")
-(modify-syntax-entry ?\$B!W(B ")$B!V(B")
-(modify-syntax-entry ?\$B!Y(B ")$B!X(B")
+(loop for char in '(#x3c #x2b #x2c #x33 #x34 #x35 #x36 #x37 #x38 #x39
+		    #x3a #x3b)
+  ;;(?$B!<(B ?$B!+(B ?$B!,(B ?$B!3(B ?$B!4(B ?$B!5(B ?$B!6(B ?$B!7(B
?$B!8(B ?$B!9(B ?$B!:(B ?$B!;(B)
+  do (modify-syntax-entry (make-char 'japanese-jisx0208 #x21 char) "w"))
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x4a) "($B!K(B")
;?$B!J(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x4e) "($B!O(B")
;?$B!N(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x50) "($B!Q(B")
;?$B!P(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x56) "($B!W(B")
;?$B!V(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x58) "($B!Y(B")
;?$B!X(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x4b) ")$B!J(B")
;?$B!K(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x4f) ")$B!N(B")
;?$B!O(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x51) ")$B!P(B")
;?$B!Q(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x57) ")$B!V(B")
;?$B!W(B
+(modify-syntax-entry (make-char 'japanese-jisx0208 #x21 #x59) ")$B!X(B")
;?$B!Y(B

 ;;; Character categories S, A, H, K, G, Y, and C
 (define-category ?S "Japanese 2-byte symbol character.")
Index: lisp/mule/korean.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/korean.el,v
retrieving revision 1.6
diff -u -r1.6 korean.el
--- lisp/mule/korean.el	2002/03/16 10:39:06	1.6
+++ lisp/mule/korean.el	2005/11/22 14:00:07
＠＠ -3,6 +3,7 ＠＠
 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
+;; Copyright (C) 2005 Ben Wing.

 ;; Keywords: multilingual, Korean

＠＠ -28,20 +29,6 ＠＠
 ;; For Korean, the character set KSC5601 is supported.

 ;;; Code:
-
-; (make-charset 'korean-ksc5601 
-; 	      "KSC5601 Korean Hangul and Hanja: ISO-IR-149"
-; 	      '(dimension
-; 		2
-; 		registry "KSC5601.1989"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?C
-; 		graphic 0
-; 		short-name "KSC5601"
-; 		long-name "KSC5601 (Korean): ISO-IR-149"
-; 		))

 ;; Syntax of Korean characters.
 (loop for row from 33 to  34 do
Index: lisp/mule/lao.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/lao.el,v
retrieving revision 1.4
diff -u -r1.4 lao.el
--- lisp/mule/lao.el	2002/03/18 10:07:37	1.4
+++ lisp/mule/lao.el	2005/11/22 14:00:07
＠＠ -28,21 +28,6 ＠＠

 ;;; Code:

-;; Lao script.
-;; ISO10646's 0x0E80..0x0EDF are mapped to 0x20..0x7F.
-(make-charset 'lao "Lao characters (ISO10646 0E80..0EDF)"
-	      '(dimension
-		1
-		registry "MuleLao-1"
-		chars 94
-		columns 1
-		direction l2r
-		final ?1
-		graphic 0
-		short-name "Lao"
-		long-name "Lao"
-		))
-
 ; (make-coding-system
 ;  'lao 2 ?L
 ;  "8-bit encoding for ASCII (MSB=0) and LAO (MSB=1)"
＠＠ -68,11 +53,11 ＠＠
 	 (features lao-util)
 	 (documentation . t)))

-(put-char-table ?(1;(B t use-default-ascent)
-(put-char-table ?(1=(B t use-default-ascent)
-(put-char-table ?(1?(B t use-default-ascent)
-(put-char-table ?(1B(B t use-default-ascent)
-(put-char-table ?(1\(B t ignore-relative-composition)
+(put-char-table (make-char 'lao #x3b) t use-default-ascent) ;?(1;(B
+(put-char-table (make-char 'lao #x3d) t use-default-ascent) ;?(1=(B
+(put-char-table (make-char 'lao #x3f) t use-default-ascent) ;?(1?(B
+(put-char-table (make-char 'lao #x42) t use-default-ascent) ;?(1B(B
+(put-char-table (make-char 'lao #x5c) t ignore-relative-composition) ;?(1\(B

 ;; Register a function to compose Lao characters.
 (put-char-table 'lao
Index: lisp/mule/misc-lang.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/misc-lang.el,v
retrieving revision 1.5
diff -u -r1.5 misc-lang.el
--- lisp/mule/misc-lang.el	2002/03/16 10:39:07	1.5
+++ lisp/mule/misc-lang.el	2005/11/22 14:00:07
＠＠ -30,20 +30,6 ＠＠
 ;;; IPA (International Phonetic Alphabet)
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

-;; IPA characters for phonetic symbols.
-(make-charset 'ipa "IPA (International Phonetic Association)"
-	      '(dimension
-		1
-		registry "MuleIPA"
-		chars 96
-		columns 1
-		direction l2r
-		final ?0
-		graphic 1
-		short-name "IPA"
-		long-name "IPA"
-		))
-
 (set-language-info-alist
  "IPA" '((charset . (ipa))
 	 (coding-priority iso-2022-7bit)
Index: lisp/mule/mule-category.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-category.el,v
retrieving revision 1.9
diff -u -r1.9 mule-category.el
--- lisp/mule/mule-category.el	2002/08/02 16:54:25	1.9
+++ lisp/mule/mule-category.el	2005/11/22 14:00:07
＠＠ -229,7 +229,7 ＠＠
     (terpri)))

 (defconst predefined-category-list
-  '((latin-iso8859-1	?l "Latin-1 through Latin-5 character set")
+  `((latin-iso8859-1	?l "Latin-1 through Latin-5 character set")
     (latin-iso8859-2	?l)
     (latin-iso8859-3	?l)
     (latin-iso8859-4	?l)
＠＠ -246,8 +246,10 ＠＠
     (chinese-gb2312	?c "Chinese GB (China, PRC) 2-byte character set")
     (chinese-cns11643-1	?t "Chinese Taiwan (CNS or Big5) 2-byte character
set")
     (chinese-cns11643-2	?t)
-    (chinese-big5-1	?t)
-    (chinese-big5-2	?t)
+    ,＠(if (find-charset 'chinese-big5-1)
+	  '((chinese-big5-1	?t)
+	    (chinese-big5-2	?t))
+	'((chinese-big5	?t)))
     (korean-ksc5601	?h "Hangul (Korean) 2-byte character set")
     )
   "List of predefined categories.
Index: lisp/mule/mule-ccl.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-ccl.el,v
retrieving revision 1.9
diff -u -r1.9 mule-ccl.el
--- lisp/mule/mule-ccl.el	2005/05/05 17:10:38	1.9
+++ lisp/mule/mule-ccl.el	2005/11/22 14:00:07
＠＠ -1,4 +1,4 ＠＠
-;;; mule-ccl.el --- CCL (Code Conversion Language) compiler -*- coding: iso-2022-7bit;
-*-
+;;; mule-ccl.el --- CCL (Code Conversion Language) compiler

 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
＠＠ -268,7 +268,6 ＠＠
 ;; the current loop.
 (defvar ccl-breaks nil)

-;;;###autoload
 (defun ccl-compile (ccl-program)
   "Return a compiled code of CCL-PROGRAM as a vector of integer."
   (if (or (null (consp ccl-program))
＠＠ -906,7 +905,6 ＠＠
 ;; To avoid byte-compiler warning.
 (defvar ccl-code)

-;;;###autoload
 (defun ccl-dump (ccl-code)
   "Disassemble compiled CCL-CODE."
   (let ((len (length ccl-code))
＠＠ -1223,7 +1221,6 ＠＠

 ;; Auto-loaded functions.

-;;;###autoload
 (defmacro declare-ccl-program (name &optional vector)
   "Declare NAME as a name of CCL program.

＠＠ -1236,7 +1233,6 ＠＠
 Optional arg VECTOR is a compiled CCL code of the CCL program."
   `(put ',name 'ccl-program-idx (register-ccl-program ',name ,vector)))

-;;;###autoload
 (defmacro define-ccl-program (name ccl-program &optional doc)
   "Set NAME to be the compiled CCL code of CCL-PROGRAM.

＠＠ -1452,7 +1448,6 ＠＠
      (put ',name 'ccl-program-idx (register-ccl-program ',name prog))
      nil))

-;;;###autoload
 (defmacro check-ccl-program (ccl-program &optional name)
   "Check validity of CCL-PROGRAM.
 If CCL-PROGRAM is a symbol denoting a CCL program, return
＠＠ -1466,7 +1461,6 ＠＠
 	     ,name)
 	 ,ccl-program)))

-;;;###autoload
 (defun ccl-execute-with-args (ccl-prog &rest args)
   "Execute CCL-PROGRAM with registers initialized by the remaining args.
 The return value is a vector of resulting CCL registers.
＠＠ -1481,7 +1475,5 ＠＠
       (setq args (cdr args) i (1+ i)))
     (ccl-execute ccl-prog reg)
     reg))
-
-(provide 'ccl)

 ;; ccl.el ends here
Index: lisp/mule/mule-charset.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-charset.el,v
retrieving revision 1.16
diff -u -r1.16 mule-charset.el
--- lisp/mule/mule-charset.el	2004/09/22 02:25:06	1.16
+++ lisp/mule/mule-charset.el	2005/11/22 14:00:07
＠＠ -1,11 +1,11 ＠＠
-;;; mule-charset.el --- Charset functions for Mule. -*- coding: iso-2022-7bit; -*-
+;;; mule-charset.el --- Charset functions for Mule.

 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Copyright (C) 1992, 2001 Free Software Foundation, Inc.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1995 Amdahl Corporation.
 ;; Copyright (C) 1996 Sun Microsystems.
-;; Copyright (C) 2002 Ben Wing.
+;; Copyright (C) 2002, 2005 Ben Wing.

 ;; Author: Unknown
 ;; Keywords: i18n, mule, internal
＠＠ -113,10 +113,11 ＠＠
 that can display the characters in CHARSET."
   (charset-property charset 'registry))

-(defun charset-ccl-program (charset)
-  "Return the CCL program of CHARSET.
+(when (featurep 'ccl)
+  (defun charset-ccl-program (charset)
+    "Return the CCL program of CHARSET.
 See `make-charset'."
-  (charset-property charset 'ccl-program))
+    (charset-property charset 'ccl-program)))

 (defun charset-bytes (charset)
   "Useless in XEmacs, returns 1."
＠＠ -130,7 +131,8 ＠＠
 ;;;; Define setf methods for all settable Charset properties

 (defsetf charset-registry    set-charset-registry)
-(defsetf charset-ccl-program set-charset-ccl-program)
+(when (featurep 'ccl)
+  (defsetf charset-ccl-program set-charset-ccl-program))

 ;;; FSF compatibility functions
 (defun charset-after (&optional pos)
＠＠ -289,11 +291,15 ＠＠
       (error 'invalid-argument "No such translation table"
table-or-name)))

+;; ＠＠####
 ;; Setup auto-fill-chars for charsets that should invoke auto-filling.
 ;; SPACE and NEWLINE are already set.
-(let ((l '(katakana-jisx0201
+(let ((l `(katakana-jisx0201
 	   japanese-jisx0208 japanese-jisx0212
-	   chinese-gb2312 chinese-big5-1 chinese-big5-2)))
+	   chinese-gb2312
+	   ,＠(if (find-charset 'chinese-big5-1)
+		 '(chinese-big5-1 chinese-big5-2)
+	       '(chinese-big5)))))
   (while l
     (put-char-table (car l) t auto-fill-chars)
     (setq l (cdr l))))
＠＠ -374,762 +380,779 ＠＠
 ;; japanese-jisx0213-1 "JISX0213.2000-1"
 ;; japanese-jisx0213-2 "JISX0213.2000-2"

-;;; Begin stuff from international/mule-conf.el.
+;;; In international/mule-conf.el in GNU Emacs.

-; ;;; Definitions of character sets.
+;;; Definitions of character sets.  We must put them here, rather than
+;;; in the individual files devoted to particular languages (as we did
+;;; before), because we need to load the Unicode tables for them
+;;; *before* loading any files containing characters from these
+;;; character sets. (If/when these files are converted to UTF-8, the
+;;; problem will conceivably go away, at least for Unicode-internal --
+;;; but then the opposite problem would exist for old-Mule, if this is
+;;; still being preserved.)

-; ;; Basic (official) character sets.  These character sets are treated
-; ;; efficiently with respect to buffer memory.
+; #### No equivalent of the following charset from FSF

-; ;; Syntax:
-; ;; (define-charset CHARSET-ID CHARSET
-; ;;   [ DIMENSION CHARS WIDTH DIRECTION ISO-FINAL-CHAR ISO-GRAPHIC-PLANE
-; ;;     SHORT-NAME LONG-NAME DESCRIPTION ])
-; ;; ASCII charset is defined in src/charset.c as below.
-; ;; (define-charset 0 ascii
-; ;;    [1 94 1 0 ?B 0 "ASCII" "ASCII" "ASCII (ISO646
IRV)"])
-
-; ;; 1-byte charsets.  Valid range of CHARSET-ID is 128..143.
-
-; ;; CHARSET-ID 128 is not used.
-
-; ; An extra level of commenting means an official (done in C) charset.
-; ; (make-charset 'latin-iso8859-1 
-; ; 	      "Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-1"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?A
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-1"
-; ; 		long-name "RHP of Latin-1 (ISO 8859-1): ISO-IR-100"
-; ; 		))
-
-; ; (make-charset 'latin-iso8859-2 
-; ; 	      "Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2): ISO-IR-101"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-2"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?B
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-2"
-; ; 		long-name "RHP of Latin-2 (ISO 8859-2): ISO-IR-101"
-; ; 		))
-
-; ; (make-charset 'latin-iso8859-3 
-; ; 	      "Right-Hand Part of Latin Alphabet 3 (ISO/IEC 8859-3): ISO-IR-109"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-3"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?C
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-3"
-; ; 		long-name "RHP of Latin-3 (ISO 8859-3): ISO-IR-109"
-; ; 		))
-
-; ; (make-charset 'latin-iso8859-4 
-; ; 	      "Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-4"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?D
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-4"
-; ; 		long-name "RHP of Latin-4 (ISO 8859-4): ISO-IR-110"
-; ; 		))
-
-; ; (make-charset 'thai-tis620 
-; ; 	      "Right-Hand Part of TIS620.2533 (Thai): ISO-IR-166"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "TIS620"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?T
-; ; 		graphic 1
-; ; 		short-name "RHP of TIS620"
-; ; 		long-name "RHP of Thai (TIS620): ISO-IR-166"
-; ; 		))
-
-; ; (make-charset 'greek-iso8859-7 
-; ; 	      "Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7):
ISO-IR-126"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-7"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?F
-; ; 		graphic 1
-; ; 		short-name "RHP of ISO8859/7"
-; ; 		long-name "RHP of Greek (ISO 8859-7): ISO-IR-126"
-; ; 		))
-
-; ; (make-charset 'arabic-iso8859-6 
-; ; 	      "Right-Hand Part of Latin/Arabic Alphabet (ISO/IEC 8859-6):
ISO-IR-127"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-6"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction r2l
-; ; 		final ?G
-; ; 		graphic 1
-; ; 		short-name "RHP of ISO8859/6"
-; ; 		long-name "RHP of Arabic (ISO 8859-6): ISO-IR-127"
-; ; 		))
-
-; ; (make-charset 'hebrew-iso8859-8 
-; ; 	      "Right-Hand Part of Latin/Hebrew Alphabet (ISO/IEC 8859-8):
ISO-IR-138"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-8"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction r2l
-; ; 		final ?H
-; ; 		graphic 1
-; ; 		short-name "RHP of ISO8859/8"
-; ; 		long-name "RHP of Hebrew (ISO 8859-8): ISO-IR-138"
-; ; 		))
-
-; ; (make-charset 'katakana-jisx0201 
-; ; 	      "Katakana Part of JISX0201.1976"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "JISX0201"
-; ; 		chars 94
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?I
-; ; 		graphic 1
-; ; 		short-name "JISX0201 Katakana"
-; ; 		long-name "Japanese Katakana (JISX0201.1976)"
-; ; 		))
-
-; ; (make-charset 'latin-jisx0201 
-; ; 	      "Roman Part of JISX0201.1976"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "JISX0201"
-; ; 		chars 94
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?J
-; ; 		graphic 0
-; ; 		short-name "JISX0201 Roman"
-; ; 		long-name "Japanese Roman (JISX0201.1976)"
-; ; 		))
-
-
-; ;; CHARSET-ID is not used 139.
-
-; ; (make-charset 'cyrillic-iso8859-5 
-; ; 	      "Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5):
ISO-IR-144"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-5"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?L
-; ; 		graphic 1
-; ; 		short-name "RHP of ISO8859/5"
-; ; 		long-name "RHP of Cyrillic (ISO 8859-5): ISO-IR-144"
-; ; 		))
-
-; ; (make-charset 'latin-iso8859-9 
-; ; 	      "Right-Hand Part of Latin Alphabet 5 (ISO/IEC 8859-9): ISO-IR-148"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-9"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?M
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-5"
-; ; 		long-name "RHP of Latin-5 (ISO 8859-9): ISO-IR-148"
-; ; 		))
-
-; ; (make-charset 'latin-iso8859-15 
-; ; 	      "Right-Hand Part of Latin Alphabet 9 (ISO/IEC 8859-15): ISO-IR-203"
-; ; 	      '(dimension
-; ; 		1
-; ; 		registry "ISO8859-15"
-; ; 		chars 96
-; ; 		columns 1
-; ; 		direction l2r
-; ; 		final ?b
-; ; 		graphic 1
-; ; 		short-name "RHP of Latin-9"
-; ; 		long-name "RHP of Latin-9 (ISO 8859-15): ISO-IR-203"
-; ; 		))
-
-; (make-charset 'latin-iso8859-14 
-; 	      "Right-Hand Part of Latin Alphabet 8 (ISO/IEC 8859-14)"
+; ;; ASCII with right-to-left direction.
+; (make-charset 'ascii-right-to-left 
+; 	      "ASCII (left half of ISO 8859-1) with right-to-left direction"
 ; 	      '(dimension
 ; 		1
-; 		registry "ISO8859-14"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?_
-; 		graphic 1
-; 		short-name "RHP of Latin-8"
-; 		long-name "RHP of Latin-8 (ISO 8859-14)"
-; 		))
-
-
-; ;; 2-byte charsets.  Valid range of CHARSET-ID is 144..153.
-
-; ; (make-charset 'japanese-jisx0208-1978 
-; ; 	      "JISX0208.1978 Japanese Kanji (so called \"old JIS\"):
ISO-IR-42"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "JISX0208.1990"
-; ; 		registry "JISX0208.1978"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?＠
-; ; 		graphic 0
-; ; 		short-name "JISX0208.1978"
-; ; 		long-name "JISX0208.1978 (Japanese): ISO-IR-42"
-; ; 		))
-
-; ; (make-charset 'chinese-gb2312 
-; ; 	      "GB2312 Chinese simplified: ISO-IR-58"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "GB2312.1980"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?A
-; ; 		graphic 0
-; ; 		short-name "GB2312"
-; ; 		long-name "GB2312: ISO-IR-58"
-; ; 		))
-
-; ; (make-charset 'japanese-jisx0208 
-; ; 	      "JISX0208.1983/1990 Japanese Kanji: ISO-IR-87"
-; ; 	      '(dimension
-; ; 		2
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?B
-; ; 		graphic 0
-; ; 		short-name "JISX0208"
-; ; 		long-name "JISX0208.1983/1990 (Japanese): ISO-IR-87"
-; ; 		))
-
-; ; (make-charset 'korean-ksc5601 
-; ; 	      "KSC5601 Korean Hangul and Hanja: ISO-IR-149"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "KSC5601.1989"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?C
-; ; 		graphic 0
-; ; 		short-name "KSC5601"
-; ; 		long-name "KSC5601 (Korean): ISO-IR-149"
-; ; 		))
-
-; ; (make-charset 'japanese-jisx0212 
-; ; 	      "JISX0212 Japanese supplement: ISO-IR-159"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "JISX0212"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?D
-; ; 		graphic 0
-; ; 		short-name "JISX0212"
-; ; 		long-name "JISX0212 (Japanese): ISO-IR-159"
-; ; 		))
-
-; ; (make-charset 'chinese-cns11643-1 
-; ; 	      "CNS11643 Plane 1 Chinese traditional: ISO-IR-171"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "CNS11643.1992-1"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?G
-; ; 		graphic 0
-; ; 		short-name "CNS11643-1"
-; ; 		long-name "CNS11643-1 (Chinese traditional): ISO-IR-171"
-; ; 		))
-
-; ; (make-charset 'chinese-cns11643-2 
-; ; 	      "CNS11643 Plane 2 Chinese traditional: ISO-IR-172"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "CNS11643.1992-2"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?H
-; ; 		graphic 0
-; ; 		short-name "CNS11643-2"
-; ; 		long-name "CNS11643-2 (Chinese traditional): ISO-IR-172"
-; ; 		))
-
-; (make-charset 'japanese-jisx0213-1 "JISX0213 Plane 1 (Japanese)"
-; 	      '(dimension
-; 		2
-; 		registry "JISX0213.2000-1"
+; 		registry "ISO8859-1"
 ; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?O
+; 		columns 1
+; 		direction r2l
+; 		final ?B
 ; 		graphic 0
-; 		short-name "JISX0213-1"
-; 		long-name "JISX0213-1"
+; 		short-name "rev ASCII"
+; 		long-name "ASCII with right-to-left direction"
 ; 		))

-; ; (make-charset 'chinese-big5-1 
-; ; 	      "Frequently used part (A141-C67F) of Big5 (Chinese traditional)"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "Big5"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?0
-; ; 		graphic 0
-; ; 		short-name "Big5 (Level-1)"
-; ; 		long-name "Big5 (Level-1) A141-C67F"
-; ; 		))
-
-; ; (make-charset 'chinese-big5-2 
-; ; 	      "Less frequently used part (C940-FEFE) of Big5 (Chinese
traditional)"
-; ; 	      '(dimension
-; ; 		2
-; ; 		registry "Big5"
-; ; 		chars 94
-; ; 		columns 2
-; ; 		direction l2r
-; ; 		final ?1
-; ; 		graphic 0
-; ; 		short-name "Big5 (Level-2)"
-; ; 		long-name "Big5 (Level-2) C940-FEFE"
-; ; 		))
-
-
-; ;; Additional (private) character sets.  These character sets are
-; ;; treated less space-efficiently in the buffer.
-
-; ;; Syntax:
-; ;; (define-charset CHARSET-ID CHARSET
-; ;;   [ DIMENSION CHARS WIDTH DIRECTION ISO-FINAL-CHAR ISO-GRAPHIC-PLANE
-; ;;     SHORT-NAME LONG-NAME DESCRIPTION ])

 ; ;; ISO-2022 allows a use of character sets not registered in ISO with
 ; ;; final characters `0' (0x30) through `?' (0x3F).  Among them, Emacs
 ; ;; reserves `0' through `9' to support several private character sets.
 ; ;; The remaining final characters `:' through `?' are for users.
-
-; ;; 1-byte 1-column charsets.  Valid range of CHARSET-ID is 160..223.

-; (make-charset 'chinese-sisheng 
-; 	      "SiSheng characters for PinYin/ZhuYin"
+; (make-charset 'latin-iso8859-1 
+; 	      "Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100"
 ; 	      '(dimension
 ; 		1
-; 		registry "sisheng_cwnn"
-; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?0
-; 		graphic 0
-; 		short-name "SiSheng"
-; 		long-name "SiSheng (PinYin/ZhuYin)"
+; 		registry "ISO8859-1"
+; 		chars 96
+; 		final ?A
+; 		graphic 1
+; 		short-name "Latin-1"
+; 		long-name "RHP of Latin-1 (ISO 8859-1): ISO-IR-100"
 ; 		))

+; (make-charset 'latin-iso8859-2 
+; 	      "Right-Hand Part of Latin Alphabet 2 (ISO/IEC 8859-2): ISO-IR-101"
+; 	      '(dimension
+; 		1
+; 		registry "ISO8859-2"
+; 		chars 96
+; 		final ?B
+; 		graphic 1
+; 		short-name "Latin-2"
+; 		long-name "RHP of Latin-2 (ISO 8859-2): ISO-IR-101"
+; 		))

-; ;; IPA characters for phonetic symbols.
-; (make-charset 'ipa "IPA (International Phonetic Association)"
+; (make-charset 'latin-iso8859-3 
+; 	      "Right-Hand Part of Latin Alphabet 3 (ISO/IEC 8859-3): ISO-IR-109"
 ; 	      '(dimension
 ; 		1
-; 		registry "MuleIPA"
+; 		registry "ISO8859-3"
 ; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?0
+; 		final ?C
 ; 		graphic 1
-; 		short-name "IPA"
-; 		long-name "IPA"
+; 		short-name "Latin-3"
+; 		long-name "RHP of Latin-3 (ISO 8859-3): ISO-IR-109"
 ; 		))

+; (make-charset 'latin-iso8859-4 
+; 	      "Right-Hand Part of Latin Alphabet 4 (ISO/IEC 8859-4): ISO-IR-110"
+; 	      '(dimension
+; 		1
+; 		registry "ISO8859-4"
+; 		chars 96
+; 		final ?D
+; 		graphic 1
+; 		short-name "Latin-4"
+; 		long-name "RHP of Latin-4 (ISO 8859-4): ISO-IR-110"
+; 		))

-; ;; Vietnamese VISCII.  VISCII is 1-byte character set which contains
-; ;; more than 96 characters.  Since Emacs can't handle it as one
-; ;; character set, it is divided into two: lower case letters and upper
-; ;; case letters.
-; (make-charset 'vietnamese-viscii-lower "VISCII1.1 lower-case"
+; (make-charset 'latin-iso8859-9 
+; 	      "Right-Hand Part of Latin Alphabet 5 (ISO/IEC 8859-9): ISO-IR-148"
 ; 	      '(dimension
 ; 		1
-; 		registry "VISCII1.1"
+; 		registry "ISO8859-9"
 ; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?1
+; 		final ?M
 ; 		graphic 1
-; 		short-name "VISCII lower"
-; 		long-name "VISCII lower-case"
+; 		short-name "Latin-5"
+; 		long-name "RHP of Latin-5 (ISO 8859-9): ISO-IR-148"
 ; 		))

-; (make-charset 'vietnamese-viscii-upper "VISCII1.1 upper-case"
+(make-charset 'latin-iso8859-10
+	      "Supplementary Set for Latin Alphabet No. 6 (ISO/IEC 8859-10): ISO-IR-157
+\"This set is intended for a version of ISO 4873 using the coding method of
+ISO 8859 and requiring the character repertoires of the languages used in
+Northern Europe.\""
+	      '(dimension
+		1
+		registry "ISO8859-10"
+		chars 96
+		final ?V ;; 0x56 aka octet 5/6
+		graphic 1
+		short-name "Latin-6 (Northern Europe)"
+		long-name "RHP of Latin-6 (Northern Europe) (ISO 8859-10): ISO-IR-157"
+		))
+
+(make-charset 'latin-iso8859-13
+	      "Baltic Rim Supplementary Set (Latin-7) (ISO/IEC 8859-13): ISO-IR-179"
+	      '(dimension
+		1
+		registry "ISO8859-13"
+		chars 96
+		final ?Y ;; 0x59 aka octet 5/9
+		graphic 1
+		short-name "Latin-7 (Baltic Rim)"
+		long-name "RHP of Latin-7 (Baltic Rim) (ISO 8859-13): ISO-IR-179"
+		))
+
+(make-charset 'latin-iso8859-14 
+	      "Celtic Supplementary Latin Set (Latin-8) (ISO/IEC 8859-14): ISO-IR-199
+FIELD OF UTILIZATION: \"Communication and processing of text in the Celtic
+languages, especially Welsh and Irish Gaelic. The set also provides for the
+languages enumerated in ISO/IEC 8859-1 (though French is not fully
+covered).\""
+	      '(dimension
+		1
+		registry "ISO8859-14"
+		chars 96
+		final ?_
+		graphic 1
+		short-name "Latin-8 (Celtic)"
+		long-name "RHP of Latin-8 (Celtic) (ISO 8859-14): ISO-IR-199"
+		))
+
+; (make-charset 'latin-iso8859-15 
+; 	      "European Supplementary Latin Set (\"Latin 9\") (Euro Sign)
(ISO/IEC 8859-15): ISO-IR-203
+;FIELD OF UTILIZATION: \"Communication and processing of text in European
+;languages. The set provides for the languages enumerated in ISO/IEC
+;8859-1. In addition, it contains the EURO SIGN and provides support for the
+;French, and Finnish languages in addition.\""
 ; 	      '(dimension
 ; 		1
-; 		registry "VISCII1.1"
+; 		registry "ISO8859-15"
 ; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?2
+; 		final ?b
 ; 		graphic 1
-; 		short-name "VISCII upper"
-; 		long-name "VISCII upper-case"
+; 		short-name "Latin-9 (Euro Sign)"
+; 		long-name "RHP of Latin-9 (Euro Sign) (ISO 8859-15): ISO-IR-203"
 ; 		))

+(make-charset 'latin-iso8859-16
+	      "Romanian Character Set for Information Interchange (Latin-10) (ISO/IEC
8859-16): ISO-IR-226
+FIELD OF UTILIZATION: \"Communication, processing, transfer of text in the
+Romanian language\""
+	      '(dimension
+		1
+		registry "ISO8859-16"
+		chars 96
+		final ?f			; octet 06/06; cf ISO-IR 226
+		graphic 1
+		short-name "Latin-10 (Romanian)"
+		long-name "RHP of Latin-10 (Romanian) (ISO 8859-16): ISO-IR-226"
+		))

-; ;; For Arabic, we need three different types of character sets.
-; ;; Digits are of direction left-to-right and of width 1-column.
-; ;; Others are of direction right-to-left and of width 1-column or
-; ;; 2-column.
-; (make-charset 'arabic-digit "Arabic digit"
+; (make-charset 'chinese-gb2312 
+; 	      "GB2312 Chinese simplified: ISO-IR-58"
 ; 	      '(dimension
-; 		1
-; 		registry "MuleArabic-0"
+; 		2
+; 		registry "GB2312.1980"
 ; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?2
+; 		final ?A
 ; 		graphic 0
-; 		short-name "Arabic digit"
-; 		long-name "Arabic digit"
+; 		short-name "GB2312"
+; 		long-name "GB2312: ISO-IR-58"
 ; 		))

-; (make-charset 'arabic-1-column "Arabic 1-column"
+; (make-charset 'chinese-cns11643-1 
+; 	      "CNS11643 Plane 1 Chinese traditional: ISO-IR-171"
 ; 	      '(dimension
-; 		1
-; 		registry "MuleArabic-1"
+; 		2
+; 		registry "CNS11643.1992-1"
 ; 		chars 94
-; 		columns 1
-; 		direction r2l
-; 		final ?3
+; 		final ?G
 ; 		graphic 0
-; 		short-name "Arabic 1-col"
-; 		long-name "Arabic 1-column"
+; 		short-name "CNS11643-1"
+; 		long-name "CNS11643-1 (Chinese traditional): ISO-IR-171"
 ; 		))

-
-; ;; ASCII with right-to-left direction.
-; (make-charset 'ascii-right-to-left 
-; 	      "ASCII (left half of ISO 8859-1) with right-to-left direction"
+; (make-charset 'chinese-cns11643-2 
+; 	      "CNS11643 Plane 2 Chinese traditional: ISO-IR-172"
 ; 	      '(dimension
-; 		1
-; 		registry "ISO8859-1"
+; 		2
+; 		registry "CNS11643.1992-2"
 ; 		chars 94
-; 		columns 1
-; 		direction r2l
-; 		final ?B
+; 		final ?H
 ; 		graphic 0
-; 		short-name "rev ASCII"
-; 		long-name "ASCII with right-to-left direction"
+; 		short-name "CNS11643-2"
+; 		long-name "CNS11643-2 (Chinese traditional): ISO-IR-172"
 ; 		))

+; (make-charset 'chinese-big5-1 
+; 	      "Frequently used part (A141-C67F) of Big5 (Chinese traditional)"
+; 	      '(dimension
+; 		2
+; 		registry "Big5"
+; 		chars 94
+; 		final ?0
+; 		graphic 0
+; 		short-name "Big5 (Level-1)"
+; 		long-name "Big5 (Level-1) A141-C67F"
+; 		))

-; ;; Lao script.
-; ;; ISO10646's 0x0E80..0x0EDF are mapped to 0x20..0x7F.
-; (make-charset 'lao "Lao characters (ISO10646 0E80..0EDF)"
+; (make-charset 'chinese-big5-2 
+; 	      "Less frequently used part (C940-FEFE) of Big5 (Chinese traditional)"
 ; 	      '(dimension
-; 		1
-; 		registry "MuleLao-1"
+; 		2
+; 		registry "Big5"
 ; 		chars 94
-; 		columns 1
-; 		direction l2r
 ; 		final ?1
 ; 		graphic 0
-; 		short-name "Lao"
-; 		long-name "Lao"
+; 		short-name "Big5 (Level-2)"
+; 		long-name "Big5 (Level-2) C940-FEFE"
 ; 		))

-
-; ;; CHARSET-IDs 168..223 are not used.
-
-; ;; 1-byte 2-column charsets.  Valid range of CHARSET-ID is 224..239.
-
-; (make-charset 'arabic-2-column "Arabic 2-column"
+; ;; PinYin-ZhuYin
+; (make-charset 'chinese-sisheng 
+; 	      "SiSheng characters for PinYin/ZhuYin"
 ; 	      '(dimension
 ; 		1
-; 		registry "MuleArabic-2"
+; 		;; XEmacs addition: second half of registry spec
+; 		registry "sisheng_cwnn\\|OMRON_UDC_ZH"
 ; 		chars 94
-; 		columns 2
-; 		direction r2l
-; 		final ?4
+; 		final ?0
 ; 		graphic 0
-; 		short-name "Arabic 2-col"
-; 		long-name "Arabic 2-column"
+; 		short-name "SiSheng"
+; 		long-name "SiSheng (PinYin/ZhuYin)"
 ; 		))

-
-; ;; Indian scripts.  Symbolic charset for data exchange.  Glyphs are
-; ;; not assigned.  They are automatically converted to each Indian
-; ;; script which IS-13194 supports.
-
-; (make-charset 'indian-is13194 
-; 	      "Generic Indian charset for data exchange with IS 13194"
+; ;; Chinese CNS11643 Plane3 thru Plane7.  Although these are official
+; ;; character sets, the use is rare and don't have to be treated
+; ;; space-efficiently in the buffer.
+; (make-charset 'chinese-cns11643-3 
+; 	      "CNS11643 Plane 3 Chinese Traditional: ISO-IR-183"
 ; 	      '(dimension
-; 		1
-; 		registry "IS13194-Devanagari"
+; 		2
+; 		registry "CNS11643.1992-3"
 ; 		chars 94
 ; 		columns 2
 ; 		direction l2r
-; 		final ?5
-; 		graphic 1
-; 		short-name "IS 13194"
-; 		long-name "Indian IS 13194"
+; 		final ?I
+; 		graphic 0
+; 		short-name "CNS11643-3"
+; 		long-name "CNS11643-3 (Chinese traditional): ISO-IR-183"
 ; 		))
-

-; ;; CHARSET-IDs 226..239 are not used.
+;; CNS11643 Plane3 thru Plane7
+;; These represent more and more obscure Chinese characters.
+;; By the time you get to Plane 7, we're talking about characters
+;; that appear once in some ancient manuscript and whose meaning
+;; is unknown.
+
+(flet
+    ((make-chinese-cns11643-charset
+      (name plane final)
+      (make-charset
+       name (concat "CNS 11643 Plane " plane " (Chinese
traditional)")
+       `(registry 
+         ,(concat "CNS11643[.-]\\(.*[.-]\\)?" plane "$")
+         dimension 2
+         chars 94
+         final ,final
+         graphic 0
+	 short-name ,(concat "CNS11643-" plane)
+	 long-name ,(format "CNS11643-%s (Chinese traditional): ISO-IR-183"
+			    plane)))
+      ))
+  (make-chinese-cns11643-charset 'chinese-cns11643-3 "3" ?I)
+  (make-chinese-cns11643-charset 'chinese-cns11643-4 "4" ?J)
+  (make-chinese-cns11643-charset 'chinese-cns11643-5 "5" ?K)
+  (make-chinese-cns11643-charset 'chinese-cns11643-6 "6" ?L)
+  (make-chinese-cns11643-charset 'chinese-cns11643-7 "7" ?M)
+  )
+
+;; ISO-IR-165 (CCITT Extended GB)
+;;    It is based on CCITT Recommendation T.101, includes GB 2312-80 +
+;;    GB 8565-88 table A4 + 293 characters.
+(make-charset ;; not in FSF 21.1
+ 'chinese-isoir165
+ "ISO-IR-165 (CCITT Extended GB; Chinese simplified)"
+ `(registry "isoir165"
+   dimension 2
+   chars 94
+   final ?E
+   graphic 0
+   short-name "ISO-IR-165"
+   long-name "ISO-IR-165 (CCITT Extended GB; Chinese simplified)"))

-; ;; 2-byte 1-column charsets.  Valid range of CHARSET-ID is 240..244.
-
-; ;; Actual Glyph for 1-column width.
-; (make-charset 'indian-1-column 
-; 	      "Indian charset for 2-column width glyphs"
+; (make-charset 'katakana-jisx0201 
+; 	      "Katakana Part of JISX0201.1976"
 ; 	      '(dimension
-; 		2
-; 		registry "MuleIndian-1"
+; 		1
+; 		registry "JISX0201"
 ; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?6
-; 		graphic 0
-; 		short-name "Indian 1-col"
-; 		long-name "Indian 1 Column"
+; 		final ?I
+; 		graphic 1
+; 		short-name "JISX0201 Katakana"
+; 		long-name "Japanese Katakana (JISX0201.1976)"
 ; 		))

-
-; (make-charset 'tibetan-1-column "Tibetan 1 column glyph"
+; (make-charset 'latin-jisx0201 
+; 	      "Roman Part of JISX0201.1976"
 ; 	      '(dimension
-; 		2
-; 		registry "MuleTibetan-1"
+; 		1
+; 		registry "JISX0201"
 ; 		chars 94
-; 		columns 1
-; 		direction l2r
-; 		final ?8
+; 		final ?J
 ; 		graphic 0
-; 		short-name "Tibetan 1-col"
-; 		long-name "Tibetan 1 column"
+; 		short-name "JISX0201 Roman"
+; 		long-name "Japanese Roman (JISX0201.1976)"
 ; 		))
-
-
-; ;; Subsets of Unicode.

-; (make-charset 'mule-unicode-2500-33ff 
-; 	      "Unicode characters of the range U+2500..U+33FF."
+; (make-charset 'japanese-jisx0208-1978 
+; 	      "JISX0208.1978 Japanese Kanji (so called \"old JIS\"):
ISO-IR-42"
 ; 	      '(dimension
 ; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?2
+; 		registry "JISX0208.1990"
+; 		registry "JISX0208.1978"
+; 		chars 94
+; 		final ?＠
 ; 		graphic 0
-; 		short-name "Unicode subset 2"
-; 		long-name "Unicode subset (U+2500..U+33FF)"
+; 		short-name "JISX0208.1978"
+; 		long-name "JISX0208.1978 (Japanese): ISO-IR-42"
 ; 		))

-
-; (make-charset 'mule-unicode-e000-ffff 
-; 	      "Unicode characters of the range U+E000..U+FFFF."
+; (make-charset 'japanese-jisx0208 
+; 	      "JISX0208.1983/1990 Japanese Kanji: ISO-IR-87"
 ; 	      '(dimension
 ; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?3
+; 		chars 94
+; 		final ?B
 ; 		graphic 0
-; 		short-name "Unicode subset 3"
-; 		long-name "Unicode subset (U+E000+FFFF)"
+; 		short-name "JISX0208"
+; 		long-name "JISX0208.1983/1990 (Japanese): ISO-IR-87"
 ; 		))
-

-; (make-charset 'mule-unicode-0100-24ff 
-; 	      "Unicode characters of the range U+0100..U+24FF."
+; (make-charset 'japanese-jisx0212 
+; 	      "JISX0212 Japanese supplement: ISO-IR-159"
 ; 	      '(dimension
 ; 		2
-; 		registry "ISO10646-1"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?1
+; 		registry "JISX0212"
+; 		chars 94
+; 		final ?D
 ; 		graphic 0
-; 		short-name "Unicode subset"
-; 		long-name "Unicode subset (U+0100..U+24FF)"
+; 		short-name "JISX0212"
+; 		long-name "JISX0212 (Japanese): ISO-IR-159"
 ; 		))

+(make-charset 'japanese-jisx0213-1 "JISX0213 Plane 1 (Japanese)"
+	      '(dimension
+		2
+		registry "JISX0213.2000-1"
+		chars 94
+		final ?O
+		graphic 0
+		short-name "JISX0213-1"
+		long-name "JISX0213-1"
+		))
+
+;; JISX0213 Plane 2
+(make-charset 'japanese-jisx0213-2 "JISX0213 Plane 2 (Japanese)"
+	      '(dimension
+		2
+		registry "JISX0213.2000-2"
+		chars 94
+		final ?P
+		graphic 0
+		short-name "JISX0213-2"
+		long-name "JISX0213-2"
+		))

-; ;; 2-byte 2-column charsets.  Valid range of CHARSET-ID is 245..254.
-
-; ;; Ethiopic characters (Amahric and Tigrigna).
-; (make-charset 'ethiopic "Ethiopic characters"
+; (make-charset 'korean-ksc5601 
+; 	      "KSC5601 Korean Hangul and Hanja: ISO-IR-149"
 ; 	      '(dimension
 ; 		2
-; 		registry "Ethiopic-Unicode"
+; 		registry "KSC5601.1989"
 ; 		chars 94
 ; 		columns 2
 ; 		direction l2r
-; 		final ?3
+; 		final ?C
 ; 		graphic 0
-; 		short-name "Ethiopic"
-; 		long-name "Ethiopic characters"
+; 		short-name "KSC5601"
+; 		long-name "KSC5601 (Korean): ISO-IR-149"
 ; 		))

+;; See comments in mule-coding.c.
+;; Hangul uses the range [84 - D3], [41 - 7E, 81 - FE]
+;; Symbols and Hanja use [D8 - DE, E0 - F9], [31 - 7E, 91 - FE]
+;; So for our purposes, this is [84 - F9], [31 - FE] */
+(make-charset 'korean-johab
+	      "Johab (Korean)"
+	      '(dimension
+		2
+		registry "johab" ;; ＠＠#### FIXME
+		chars (118 206)
+		offset (#x84 #x31)
+		short-name "Johab"
+		long-name "Johab (Korean)"
+		))
+
+;; Vietnamese VISCII.  VISCII is 1-byte character set which contains
+;; more than 96 characters.  Since Emacs can't handle it as one
+;; character set, it is divided into two: lower case letters and upper
+;; case letters.
+(make-charset 'vietnamese-viscii-lower "VISCII1.1 lower-case"
+	      '(dimension
+		1
+		registry "VISCII1.1"
+		chars 96
+		final ?1
+		graphic 1
+		short-name "VISCII lower"
+		long-name "VISCII lower-case"
+		))
+
+(make-charset 'vietnamese-viscii-upper "VISCII1.1 upper-case"
+	      '(dimension
+		1
+		registry "VISCII1.1"
+		chars 96
+		final ?2
+		graphic 1
+		short-name "VISCII upper"
+		long-name "VISCII upper-case"
+		))

-; ;; Chinese CNS11643 Plane3 thru Plane7.  Although these are official
-; ;; character sets, the use is rare and don't have to be treated
-; ;; space-efficiently in the buffer.
-; (make-charset 'chinese-cns11643-3 
-; 	      "CNS11643 Plane 3 Chinese Traditional: ISO-IR-183"
+; (make-charset 'greek-iso8859-7 
+; 	      "Right-Hand Part of Latin/Greek Alphabet (ISO/IEC 8859-7):
ISO-IR-126"
 ; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-3"
-; 		chars 94
-; 		columns 2
+; 		1
+; 		registry "ISO8859-7"
+; 		chars 96
+; 		columns 1
 ; 		direction l2r
-; 		final ?I
-; 		graphic 0
-; 		short-name "CNS11643-3"
-; 		long-name "CNS11643-3 (Chinese traditional): ISO-IR-183"
+; 		final ?F
+; 		graphic 1
+; 		short-name "RHP of ISO8859/7"
+; 		long-name "RHP of Greek (ISO 8859-7): ISO-IR-126"
 ; 		))

-; (make-charset 'chinese-cns11643-4 
-; 	      "CNS11643 Plane 4 Chinese Traditional: ISO-IR-184"
+; (make-charset 'cyrillic-iso8859-5 
+; 	      "Right-Hand Part of Latin/Cyrillic Alphabet (ISO/IEC 8859-5):
ISO-IR-144"
 ; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-4"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?J
-; 		graphic 0
-; 		short-name "CNS11643-4"
-; 		long-name "CNS11643-4 (Chinese traditional): ISO-IR-184"
+; 		1
+; 		registry "ISO8859-5"
+; 		chars 96
+; 		final ?L
+; 		graphic 1
+; 		short-name "RHP of ISO8859/5"
+; 		long-name "RHP of Cyrillic (ISO 8859-5): ISO-IR-144"
 ; 		))

-; (make-charset 'chinese-cns11643-5 
-; 	      "CNS11643 Plane 5 Chinese Traditional: ISO-IR-185"
+(make-charset 'cyrillic-koi8-r
+	      "Cyrillic KOI8-R"
+	      '(dimension
+		1
+		chars 256
+		short-name "Cyrillic KOI8-R"
+		long-name "Cyrillic KOI8-R"
+		))
+
+(make-charset 'cyrillic-alternativnyj
+	      "Cyrillic Alternativnyj"
+	      '(dimension
+		1
+		chars 256
+		short-name "Cyrillic Alternativnyj"
+		long-name "Cyrillic Alternativnyj"
+		))
+
+; (make-charset 'hebrew-iso8859-8 
+; 	      "Right-Hand Part of Latin/Hebrew Alphabet (ISO/IEC 8859-8):
ISO-IR-138"
 ; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-5"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?K
-; 		graphic 0
-; 		short-name "CNS11643-5"
-; 		long-name "CNS11643-5 (Chinese traditional): ISO-IR-185"
+; 		1
+; 		registry "ISO8859-8"
+; 		chars 96
+; 		columns 1
+; 		direction r2l
+; 		final ?H
+; 		graphic 1
+; 		short-name "RHP of ISO8859/8"
+; 		long-name "RHP of Hebrew (ISO 8859-8): ISO-IR-138"
 ; 		))

-; (make-charset 'chinese-cns11643-6 
-; 	      "CNS11643 Plane 6 Chinese Traditional: ISO-IR-186"
+; (make-charset 'arabic-iso8859-6 
+; 	      "Right-Hand Part of Latin/Arabic Alphabet (ISO/IEC 8859-6):
ISO-IR-127"
 ; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-6"
-; 		chars 94
-; 		columns 2
-; 		direction l2r
-; 		final ?L
-; 		graphic 0
-; 		short-name "CNS11643-6"
-; 		long-name "CNS11643-6 (Chinese traditional): ISO-IR-186"
+; 		1
+; 		registry "ISO8859-6"
+; 		chars 96
+; 		columns 1
+; 		direction r2l
+; 		final ?G
+; 		graphic 1
+; 		short-name "RHP of ISO8859/6"
+; 		long-name "RHP of Arabic (ISO 8859-6): ISO-IR-127"
 ; 		))
+
+;; For Arabic, we need three different types of character sets.
+;; Digits are of direction left-to-right and of width 1-column.
+;; Others are of direction right-to-left and of width 1-column or
+;; 2-column.
+(make-charset 'arabic-digit "Arabic digit"
+	      '(dimension
+		1
+		registry "MuleArabic-0"
+		chars 94
+		columns 1
+		direction l2r
+		final ?2
+		graphic 0
+		short-name "Arabic digit"
+		long-name "Arabic digit"
+		))
+
+(make-charset 'arabic-1-column "Arabic 1-column"
+	      '(dimension
+		1
+		registry "MuleArabic-1"
+		chars 94
+		columns 1
+		direction r2l
+		final ?3
+		graphic 0
+		short-name "Arabic 1-col"
+		long-name "Arabic 1-column"
+		))
+
+(make-charset 'arabic-2-column "Arabic 2-column"
+	      '(dimension
+		1
+		registry "MuleArabic-2"
+		chars 94
+		columns 2
+		direction r2l
+		final ?4
+		graphic 0
+		short-name "Arabic 2-col"
+		long-name "Arabic 2-column"
+		))

-; (make-charset 'chinese-cns11643-7 
-; 	      "CNS11643 Plane 7 Chinese Traditional: ISO-IR-187"
+; (make-charset 'thai-tis620 
+; 	      "Right-Hand Part of TIS620.2533 (Thai): ISO-IR-166"
 ; 	      '(dimension
-; 		2
-; 		registry "CNS11643.1992-7"
-; 		chars 94
-; 		columns 2
+; 		1
+; 		registry "TIS620"
+; 		chars 96
+; 		columns 1
 ; 		direction l2r
-; 		final ?M
-; 		graphic 0
-; 		short-name "CNS11643-7"
-; 		long-name "CNS11643-7 (Chinese traditional): ISO-IR-187"
+; 		final ?T
+; 		graphic 1
+; 		short-name "RHP of TIS620"
+; 		long-name "RHP of Thai (TIS620): ISO-IR-166"
 ; 		))

+(make-charset 'thai-xtis "Precomposed Thai (XTIS by Virach)."
+	      '(registry "xtis-0"
+			 dimension 2
+			 columns 1
+			 chars 94
+			 final ??
+			 graphic 0))
+
+; ;; Indian scripts.  Symbolic charset for data exchange.  Glyphs are
+; ;; not assigned.  They are automatically converted to each Indian
+; ;; script which IS-13194 supports.
+
+(make-charset 'indian-is13194 
+	      "Generic Indian charset for data exchange with IS 13194"
+	      '(dimension
+		1
+		registry "IS13194-Devanagari"
+		chars 94
+		columns 2
+		direction l2r
+		final ?5
+		graphic 1
+		short-name "IS 13194"
+		long-name "Indian IS 13194"
+		))
+
+;; Actual Glyph for 1-column width.
+(make-charset 'indian-1-column 
+	      "Indian charset for 2-column width glyphs"
+	      '(dimension
+		2
+		registry "MuleIndian-1"
+		chars 94
+		columns 1
+		direction l2r
+		final ?6
+		graphic 0
+		short-name "Indian 1-col"
+		long-name "Indian 1 Column"
+		))
+
+;; Actual Glyph for 2-column width.
+(make-charset 'indian-2-column 
+	      "Indian charset for 2-column width glyphs"
+	      '(dimension
+		2
+		registry "MuleIndian-2"
+		chars 94
+		columns 2
+		direction l2r
+		final ?5
+		graphic 0
+		short-name "Indian 2-col"
+		long-name "Indian 2 Column"
+		))
+
+;; Lao script.
+;; ISO10646's 0x0E80..0x0EDF are mapped to 0x20..0x7F.
+(make-charset 'lao "Lao characters (ISO10646 0E80..0EDF)"
+	      '(dimension
+		1
+		registry "MuleLao-1"
+		chars 94
+		columns 1
+		direction l2r
+		final ?1
+		graphic 0
+		short-name "Lao"
+		long-name "Lao"
+		))
+
+;; Ethiopic characters (Amahric and Tigrigna).
+(make-charset 'ethiopic "Ethiopic characters"
+	      '(dimension
+		2
+		registry "Ethiopic-Unicode"
+		chars 94
+		final ?3
+		graphic 0
+		short-name "Ethiopic"
+		long-name "Ethiopic characters"
+		))
+
+(make-charset 'tibetan-1-column "Tibetan 1 column glyph"
+	      '(dimension
+		2
+		registry "MuleTibetan-1"
+		chars 94
+		columns 1
+		direction l2r
+		final ?8
+		graphic 0
+		short-name "Tibetan 1-col"
+		long-name "Tibetan 1 column"
+		))
+
+;; Tibetan script.
+(make-charset 'tibetan "Tibetan characters"
+	      '(dimension
+		2
+		registry "MuleTibetan-2"
+		chars 94
+		columns 2
+		direction l2r
+		final ?7
+		graphic 0
+		short-name "Tibetan 2-col"
+		long-name "Tibetan 2 column"
+		))
+
+;; IPA characters for phonetic symbols.
+(make-charset 'ipa "IPA (International Phonetic Association)"
+	      '(dimension
+		1
+		registry "MuleIPA"
+		chars 96
+		columns 1
+		direction l2r
+		final ?0
+		graphic 1
+		short-name "IPA"
+		long-name "IPA"
+		))

-; ;; Actual Glyph for 2-column width.
-; (make-charset 'indian-2-column 
-; 	      "Indian charset for 2-column width glyphs"
+; ;; Subsets of Unicode.
+
+; #### what is this bogosity ... "chars 96, final ?2" !!?!
+; (make-charset 'mule-unicode-2500-33ff 
+; 	      "Unicode characters of the range U+2500..U+33FF."
 ; 	      '(dimension
 ; 		2
-; 		registry "MuleIndian-2"
-; 		chars 94
-; 		columns 2
+; 		registry "ISO10646-1"
+; 		chars 96
+; 		columns 1
 ; 		direction l2r
-; 		final ?5
+; 		final ?2
 ; 		graphic 0
-; 		short-name "Indian 2-col"
-; 		long-name "Indian 2 Column"
+; 		short-name "Unicode subset 2"
+; 		long-name "Unicode subset (U+2500..U+33FF)"
 ; 		))

-; ;; Tibetan script.
-; (make-charset 'tibetan "Tibetan characters"
+; (make-charset 'mule-unicode-e000-ffff 
+; 	      "Unicode characters of the range U+E000..U+FFFF."
 ; 	      '(dimension
 ; 		2
-; 		registry "MuleTibetan-2"
-; 		chars 94
-; 		columns 2
+; 		registry "ISO10646-1"
+; 		chars 96
+; 		columns 1
 ; 		direction l2r
-; 		final ?7
+; 		final ?3
 ; 		graphic 0
-; 		short-name "Tibetan 2-col"
-; 		long-name "Tibetan 2 column"
+; 		short-name "Unicode subset 3"
+; 		long-name "Unicode subset (U+E000+FFFF)"
 ; 		))

-; ;; CHARSET-ID 253 is not used.
-
-; ;; JISX0213 Plane 2
-; (make-charset 'japanese-jisx0213-2 "JISX0213 Plane 2 (Japanese)"
+; (make-charset 'mule-unicode-0100-24ff 
+; 	      "Unicode characters of the range U+0100..U+24FF."
 ; 	      '(dimension
 ; 		2
-; 		registry "JISX0213.2000-2"
-; 		chars 94
-; 		columns 2
+; 		registry "ISO10646-1"
+; 		chars 96
+; 		columns 1
 ; 		direction l2r
-; 		final ?P
+; 		final ?1
 ; 		graphic 0
-; 		short-name "JISX0213-2"
-; 		long-name "JISX0213-2"
+; 		short-name "Unicode subset"
+; 		long-name "Unicode subset (U+0100..U+24FF)"
 ; 		))
+
+(let ((charsets '((874 thai "Thai")
+		  (1250 latin "Eastern Europe")
+		  (1251 cyrillic "Cyrillic")
+		  (1252 latin "ANSI")
+		  (1253 greek "Greek")
+		  (1254 latin "Turkish")
+		  (1255 hebrew "Hebrew")
+		  (1256 arabic "Arabic")
+		  (1257 latin "Baltic Rim")
+		  (1258 latin "Vietnamese"))))
+  (loop for (num script name) in charsets do
+    (make-charset (intern (format "%s-windows-%s" script num))
+		  (format "Windows code page %s (%s)" num name)
+		  `(dimension
+		    1
+		    chars 256
+		    short-name ,(format "Windows %s (%s)" num name)
+		    long-name ,(format "Windows code page %s (%s)" num name)
+		    ))))
+
+(let ((charsets '((932 japanese "Japanese" #x81 #x40 #xfe #xfe)
+		  (936 chinese "Simplified Chinese" #x81 #x40 #xfe #xfe)
+		  (949 korean "Korean" #x81 #x41 #xfe #xfe)
+		  (950 chinese "Traditional Chinese" #xa1 #x40 #xfe #xfe)
+		  )))
+  (loop for (num script name l1 l2 h1 h2) in charsets do
+    (make-charset (intern (format "%s-windows-%s" script num))
+		  (format "Windows code page %s (%s)" num name)
+		  `(dimension
+		    2
+		    chars (,(1+ (- h1 l1)) ,(1+ (- h2 l2)))
+		    offset (,l1 ,l2)
+		    short-name ,(format "Windows %s (%s)" num name)
+		    long-name ,(format "Windows code page %s (%s)" num name)
+		    ))))

 ;;; mule-charset.el ends here

Index: lisp/mule/mule-cmds.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-cmds.el,v
retrieving revision 1.25
diff -u -r1.25 mule-cmds.el
--- lisp/mule/mule-cmds.el	2005/10/04 16:43:36	1.25
+++ lisp/mule/mule-cmds.el	2005/11/22 14:00:08
＠＠ -1,4 +1,4 ＠＠
-;;; mule-cmds.el --- Commands for multilingual environment -*- coding: iso-2022-7bit;
-*-
+;;; mule-cmds.el --- Commands for multilingual environment

 ;; Copyright (C) 1995,1999 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
＠＠ -1345,8 +1345,8 ＠＠
 (defun init-mule-at-startup ()
   "Initialize MULE environment at startup.  Don't call this."

-  (when (not load-unicode-tables-at-dump-time)
-    (load-unicode-tables))
+;   (when (not load-unicode-tables-at-dump-time)
+;     (load-unicode-tables))

   ;; This is called (currently; might be moved earlier) from startup.el,
   ;; after the basic GUI systems have been initialized, and just before the
Index: lisp/mule/mule-coding.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-coding.el,v
retrieving revision 1.19
diff -u -r1.19 mule-coding.el
--- lisp/mule/mule-coding.el	2002/03/13 08:52:17	1.19
+++ lisp/mule/mule-coding.el	2005/11/22 14:00:08
＠＠ -1,11 +1,11 ＠＠
-;;; mule-coding.el --- Coding-system functions for Mule. -*- coding: iso-2022-7bit; -*-
+;;; mule-coding.el --- Coding-system functions for Mule.

 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN.
 ;; Licensed to the Free Software Foundation.
 ;; Copyright (C) 1995 Amdahl Corporation.
 ;; Copyright (C) 1995 Sun Microsystems.
 ;; Copyright (C) 1997 MORIOKA Tomohiko
-;; Copyright (C) 2001 Ben Wing.
+;; Copyright (C) 2001, 2005 Ben Wing.

 ;; This file is part of XEmacs.

＠＠ -74,13 +74,15 ＠＠
   "Return the 'no-iso6429 property of CODING-SYSTEM."
   (coding-system-property coding-system 'no-iso6429))

-(defun coding-system-ccl-encode (coding-system)
-  "Return the CCL 'encode property of CODING-SYSTEM."
-  (coding-system-property coding-system 'encode))
-
-(defun coding-system-ccl-decode (coding-system)
-  "Return the CCL 'decode property of CODING-SYSTEM."
-  (coding-system-property coding-system 'decode))
+(when (featurep 'ccl)
+  (defun coding-system-ccl-encode (coding-system)
+    "Return the CCL 'encode property of CODING-SYSTEM."
+    (coding-system-property coding-system 'encode))
+
+  (defun coding-system-ccl-decode (coding-system)
+    "Return the CCL 'decode property of CODING-SYSTEM."
+    (coding-system-property coding-system 'decode))
+  )

 (defun coding-system-iso2022-charset (coding-system register)
 "Return the charset initially designated to REGISTER in CODING-SYSTEM.
＠＠ -153,6 +155,20 ＠＠

 ;; compatibility for old XEmacsen
 (define-coding-system-alias 'iso-2022-7 'iso-2022-7bit)
+
+(make-coding-system
+ 'iso-2022-8bit-preserve 'iso2022
+ "ISO 2022 8-bit, ISO-2022-preserving"
+ '(charset-g0 ascii
+   charset-g1 latin-iso8859-1
+   short t
+   iso2022-preserve t
+   mnemonic "ISO8-Preserve"
+   documentation "ISO-2022-based 8-bit encoding with I/O preservation.
+This uses private Unicode characters, as necessary, to preserve the particular
+ISO-2022 charset upon output.  This will make such characters unusable
+in normal editing."
+   ))

 (make-coding-system
  'iso-2022-8 'iso2022
Index: lisp/mule/mule-composite-stub.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-composite-stub.el,v
retrieving revision 1.1
diff -u -r1.1 mule-composite-stub.el
--- lisp/mule/mule-composite-stub.el	2002/03/16 10:39:07	1.1
+++ lisp/mule/mule-composite-stub.el	2005/11/22 14:00:08
＠＠ -1,4 +1,4 ＠＠
-;;; mule-composite-stub.el --- Stubs of composition support -*- coding: iso-2022-7bit;
-*-
+;;; mule-composite-stub.el --- Stubs of composition support

 ;; Copyright (C) 2002 Ben Wing.

Index: lisp/mule/mule-composite.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-composite.el,v
retrieving revision 1.1
diff -u -r1.1 mule-composite.el
--- lisp/mule/mule-composite.el	2002/03/16 10:39:07	1.1
+++ lisp/mule/mule-composite.el	2005/11/22 14:00:08
＠＠ -28,7 +28,6 ＠＠

 ;;; Code:

-;;;###autoload
 (defconst reference-point-alist
   '((tl . 0) (tc . 1) (tr . 2)
     (Bl . 3) (Bc . 4) (Br . 5)
＠＠ -157,7 +156,6 ＠＠
       (setq i (+ i 2))))
   components)

-;;;###autoload
 (defun compose-region (start end &optional components modification-func)
   "UNIMPLEMENTED.
 Compose characters in the current region.
＠＠ -195,7 +193,6 ＠＠
     (compose-region-internal start end components modification-func)
     (set-buffer-modified-p modified-p)))

-;;;###autoload
 (defun decompose-region (start end)
   "UNIMPLEMENTED.
 Decompose text in the current region.
＠＠ -208,7 +205,6 ＠＠
     (remove-text-properties start end '(composition nil))
     (set-buffer-modified-p modified-p)))

-;;;###autoload
 (defun compose-string (string &optional start end components modification-func)
   "UNIMPLEMENTED.
 Compose characters in string STRING.
＠＠ -234,14 +230,12 ＠＠
   (compose-string-internal string start end components modification-func)
   string)

-;;;###autoload
 (defun decompose-string (string)
   "UNIMPLEMENTED.
 Return STRING where `composition' property is removed."
   (remove-text-properties 0 (length string) '(composition nil) string)
   string)

-;;;###autoload
 (defun compose-chars (&rest args)
   "UNIMPLEMENTED.
 Return a string from arguments in which all characters are composed.
＠＠ -266,7 +260,6 ＠＠
       (setq str (concat args)))
     (compose-string-internal str 0 (length str) components)))

-;;;###autoload
 (defun find-composition (pos &optional limit string detail-p)
   "UNIMPLEMENTED.
 Return information about a composition at or nearest to buffer position POS.
＠＠ -307,7 +300,6 ＠＠
     result))

-;;;###autoload
 (defun compose-chars-after (pos &optional limit object)
   "UNIMPLEMENTED.
 Compose characters in current buffer after position POS.
＠＠ -349,7 +341,6 ＠＠
 	      (setq func nil tail (cdr tail)))))))
       result))

-;;;###autoload
 (defun compose-last-chars (args)
   "UNIMPLEMENTED.
 Compose last characters.
＠＠ -377,7 +368,6 ＠＠
 ;;; The following codes are only for backward compatibility with Emacs
 ;;; 20.4 and the earlier.

-;;;###autoload
 (defun decompose-composite-char (char &optional type with-composition-rule)
   "UNIMPLEMENTED.
 Convert CHAR to string.
Index: lisp/mule/mule-msw-init-late.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-msw-init-late.el,v
retrieving revision 1.2
diff -u -r1.2 mule-msw-init-late.el
--- lisp/mule/mule-msw-init-late.el	2002/06/20 21:18:11	1.2
+++ lisp/mule/mule-msw-init-late.el	2005/11/22 14:00:08
＠＠ -1,5 +1,5 ＠＠
 ;;; mule-msw-init-late.el --- initialization code for MS Windows under MULE
-;;; Copyright (C) 2001, 2002 Ben Wing.
+;;; Copyright (C) 2001, 2002, 2005 Ben Wing.

 ;; This file is part of XEmacs.

＠＠ -30,7 +30,7 ＠＠
 ;; guess we're supposed to query the font for what ranges it supports, and
 ;; what its preferred range is.)

-(let ((l '((ascii . "Western")
+(let ((l `((ascii . "Western")
 	   (latin-iso8859-2 . "Central European")
 	   (cyrillic-iso8859-5 . "Cyrillic")
 	   (latin-iso8859-1 . "Western")
＠＠ -49,15 +49,17 ＠＠
 	   (japanese-jisx0212 . "Japanese")
 	   (chinese-gb2312 . "Simplified Chinese")
 	   (korean-ksc5601 . "Korean")
-	   (chinese-big5-1 . "Traditional Chinese")
-	   (chinese-big5-2 . "Traditional Chinese"))))
+	   ,＠(if (find-charset 'chinese-big5-1)
+		 '((chinese-big5-1 . "Traditional Chinese")
+		   (chinese-big5-2 . "Traditional Chinese"))
+	       '((chinese-big5 . "Traditional Chinese"))))))
   (while l
     (let ((charset (car (car l)))
 	  (registry (cdr (car l))))
     (mswindows-set-charset-registry charset registry)
     (setq l (cdr l)))))

-(let ((l '((ascii . 1252)
+(let ((l `((ascii . 1252)
 	   (latin-iso8859-2 . 1250)
 	   (cyrillic-iso8859-5 . 1251)
 	   (latin-iso8859-1 . 1252)
＠＠ -76,8 +78,10 ＠＠
 	   (japanese-jisx0212 . 932)
 	   (chinese-gb2312 . 936)
 	   (korean-ksc5601 . 949)
-	   (chinese-big5-1 . 950)
-	   (chinese-big5-2 . 950))))
+	   ,＠(if (find-charset 'chinese-big5-1)
+		 '((chinese-big5-1 . 950)
+		   (chinese-big5-2 . 950))
+	       '((chinese-big5 . 950))))))
   (while l
     (let ((charset (car (car l)))
 	  (code-page (cdr (car l))))
Index: lisp/mule/mule-tty-init.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-tty-init.el,v
retrieving revision 1.3
diff -u -r1.3 mule-tty-init.el
--- lisp/mule/mule-tty-init.el	2002/03/21 07:30:23	1.3
+++ lisp/mule/mule-tty-init.el	2005/11/22 14:00:08
＠＠ -1,4 +1,4 ＠＠
-;;; mule-tty-init.el --- Initialization code for console tty under MULE -*- coding:
iso-2022-7bit; -*-
+;;; mule-tty-init.el --- Initialization code for console tty under MULE

 ;; Copyright (C) 1998 Free Software Foundation, Inc.
 ;; Copyright (C) 1998 Kazuyuki IENAGA <kazz(a)imasy.or.jp&gt;
Index: lisp/mule/mule-x-init.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-x-init.el,v
retrieving revision 1.7
diff -u -r1.7 mule-x-init.el
--- lisp/mule/mule-x-init.el	2002/06/20 21:18:11	1.7
+++ lisp/mule/mule-x-init.el	2005/11/22 14:00:08
＠＠ -1,4 +1,4 ＠＠
-;;; mule-x-init.el --- initialization code for X Windows under MULE -*- coding:
iso-2022-7bit; -*-
+;;; mule-x-init.el --- initialization code for X Windows under MULE
 ;; Copyright (C) 1994 Free Software Foundation, Inc.
 ;; Copyright (C) 1996, 2002 Ben Wing <ben(a)xemacs.org&gt;

Index: lisp/mule/thai-xtis.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/thai-xtis.el,v
retrieving revision 1.4
diff -u -r1.4 thai-xtis.el
--- lisp/mule/thai-xtis.el	2002/03/18 10:07:37	1.4
+++ lisp/mule/thai-xtis.el	2005/11/22 14:00:08
＠＠ -34,45 +34,27 ＠＠

 ;;; Code:

-(make-charset 'thai-xtis "Precomposed Thai (XTIS by Virach)."
-	      '(registry "xtis-0"
-			 dimension 2
-			 columns 1
-			 chars 94
-			 final ??
-			 graphic 0))
-
 (define-category ?x "Precomposed Thai character.")
 (modify-category-entry 'thai-xtis ?x)

 (when (featurep 'xemacs)
   (let ((deflist	'(;; chars	syntax
-			  ("$(?!0(B-$(?NxP0R0S0`0(B-$(?e0(B"	"w")
-			  ("$(?p0(B-$(?y0(B"	"w")
-			  ("$(?O0f0_0o0z0{0(B"	"_")
-			  ))
-	elm chars len syntax to ch i)
-    (while deflist
-      (setq elm (car deflist))
-      (setq chars (car elm)
-	    len (length chars)
-	    syntax (nth 1 elm)
-	    i 0)
-      (while (< i len)
-	(if (= (aref chars i) ?-)
-	    (setq i (1+ i)
-		  to (nth 1 (split-char (aref chars i))))
-	  (setq ch (nth 1 (split-char (aref chars i)))
-		to ch))
-	(while (<= ch to)
-	  (modify-syntax-entry (vector 'thai-xtis ch) syntax)
-	  (setq ch (1+ ch)))
-	(setq i (1+ i)))
-      (setq deflist (cdr deflist))))
-
+			  (((33 . 78) 80 82 83 (96 . 101) (112 . 121)) "w")
+			  ((79 102 95 111 122 123) "_"))))
+    (loop for (chars syntax) in deflist do
+      (loop for ch in chars do
+	(let (from to)
+	  (if (consp ch)
+	      (setq from (car ch) to (cdr ch))
+	    (setq from ch to ch))
+	  (loop for i from from to to do
+	    (modify-syntax-entry (vector 'thai-xtis i) syntax))))))
   (put-charset-property 'thai-xtis 'preferred-coding-system 'tis-620)
   )

+;; ＠＠#### This entire file is bogus.  Do Thai the normal way.
+(when (featurep 'ccl)
+
 ;; This is the ccl-decode-thai-xtis automaton.
 ;;
 ;; "WRITE x y" == (insert (make-char 'thai-xtis x y))
＠＠ -373,5 +355,6 ＠＠
    (coding-priority tis-620 iso-2022-7bit)
    (sample-text . "$(?!:(B")
    (documentation . t)))
+) ; (featurep 'ccl)

 ;; thai-xtis.el ends here.
Index: lisp/mule/thai.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/thai.el,v
retrieving revision 1.5
diff -u -r1.5 thai.el
--- lisp/mule/thai.el	2002/03/16 10:39:07	1.5
+++ lisp/mule/thai.el	2005/11/22 14:00:08
＠＠ -33,20 +33,6 ＠＠

 ;;; Code:

-; (make-charset 'thai-tis620 
-; 	      "Right-Hand Part of TIS620.2533 (Thai): ISO-IR-166"
-; 	      '(dimension
-; 		1
-; 		registry "TIS620"
-; 		chars 96
-; 		columns 1
-; 		direction l2r
-; 		final ?T
-; 		graphic 1
-; 		short-name "RHP of TIS620"
-; 		long-name "RHP of Thai (TIS620): ISO-IR-166"
-; 		))
-
 ; ; (make-coding-system
 ; ;  'thai-tis620 2 ?T
 ; ;  "8-bit encoding for ASCII (MSB=0) and Thai TIS620 (MSB=1)"
Index: lisp/mule/tibetan.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/tibetan.el,v
retrieving revision 1.3
diff -u -r1.3 tibetan.el
--- lisp/mule/tibetan.el	2002/03/16 10:39:07	1.3
+++ lisp/mule/tibetan.el	2005/11/22 14:00:08
＠＠ -84,33 +84,6 ＠＠
 ;;;2570 $(7%p(B $(7%q(B $(7%r(B $(7%s(B $(7%t(B $(7%u(B $(7%v(B $(7%w(B
$(7%x(B $(7%y(B $(7%z(B $(7%{(B $(7%|(B $(7%}(B $(7%~(B // ;
 ;;;

-(make-charset 'tibetan-1-column "Tibetan 1 column glyph"
-	      '(dimension
-		2
-		registry "MuleTibetan-1"
-		chars 94
-		columns 1
-		direction l2r
-		final ?8
-		graphic 0
-		short-name "Tibetan 1-col"
-		long-name "Tibetan 1 column"
-		))
-
-;; Tibetan script.
-(make-charset 'tibetan "Tibetan characters"
-	      '(dimension
-		2
-		registry "MuleTibetan-2"
-		chars 94
-		columns 2
-		direction l2r
-		final ?7
-		graphic 0
-		short-name "Tibetan 2-col"
-		long-name "Tibetan 2 column"
-		))
-
 ; (make-coding-system
 ;  'tibetan-iso-8bit 2 ?Q
 ;  "8-bit encoding for ASCII (MSB=0) and TIBETAN (MSB=1)"
Index: lisp/mule/vietnamese.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/vietnamese.el,v
retrieving revision 1.6
diff -u -r1.6 vietnamese.el
--- lisp/mule/vietnamese.el	2002/03/21 07:30:24	1.6
+++ lisp/mule/vietnamese.el	2005/11/22 14:00:09
＠＠ -30,36 +30,6 ＠＠

 ;;; Code:

-;; Vietnamese VISCII.  VISCII is 1-byte character set which contains
-;; more than 96 characters.  Since Emacs can't handle it as one
-;; character set, it is divided into two: lower case letters and upper
-;; case letters.
-(make-charset 'vietnamese-viscii-lower "VISCII1.1 lower-case"
-	      '(dimension
-		1
-		registry "VISCII1.1"
-		chars 96
-		columns 1
-		direction l2r
-		final ?1
-		graphic 1
-		short-name "VISCII lower"
-		long-name "VISCII lower-case"
-		))
-
-(make-charset 'vietnamese-viscii-upper "VISCII1.1 upper-case"
-	      '(dimension
-		1
-		registry "VISCII1.1"
-		chars 96
-		columns 1
-		direction l2r
-		final ?2
-		graphic 1
-		short-name "VISCII upper"
-		long-name "VISCII upper-case"
-		))
-
 (modify-syntax-entry 'vietnamese-viscii-lower "w")
 (modify-syntax-entry 'vietnamese-viscii-upper "w")

＠＠ -96,11 +66,11 ＠＠
 	char-component)
     (while (< i 256)
       (setq char-component
-	    (split-char (aref viet-viscii-decode-table i)))
+	    (char-to-charset-codepoint (aref viet-viscii-decode-table i)))
       (cond ((eq (car char-component) 'vietnamese-viscii-lower)
-	     (aset table-lower (nth 1 char-component) i))
+	     (aset table-lower (- (nth 1 char-component) 128) i))
 	    ((eq (car char-component) 'vietnamese-viscii-upper)
-	     (aset table-upper (nth 1 char-component) i)))
+	     (aset table-upper (- (nth 1 char-component) 128) i)))
       (setq i (1+ i)))
     (cons table-lower table-upper))
   "Vietnamese VISCII encoding table.
＠＠ -134,11 +104,11 ＠＠
 	char-component)
     (while (< i 256)
       (setq char-component
-	    (split-char (aref viet-vscii-decode-table i)))
+	    (char-to-charset-codepoint (aref viet-vscii-decode-table i)))
       (cond ((eq (car char-component) 'vietnamese-viscii-lower)
-	     (aset table-lower (nth 1 char-component) i))
+	     (aset table-lower (- (nth 1 char-component) 128) i))
 	    ((eq (car char-component) 'vietnamese-viscii-upper)
-	     (aset table-upper (nth 1 char-component) i)))
+	     (aset table-upper (- (nth 1 char-component) 128) i)))
       (setq i (1+ i)))
     (cons table-lower table-upper))
   "Vietnamese VSCII encoding table.
＠＠ -147,6 +117,8 ＠＠

 )

+(when (featurep 'ccl)
+
 (define-ccl-program ccl-decode-viscii
   `(3
     ((read r0)
＠＠ -279,6 +251,8 ＠＠

 ;; (define-coding-system-alias 'vscii 'vietnamese-vscii)

+) ; (featurep 'ccl)
+
 (make-coding-system
  'viqr 'no-conversion
  "VIQR (Vietnamese)"
＠＠ -300,6 +274,8 ＠＠

 ;; (define-coding-system-alias 'viqr 'vietnamese-viqr)

+(when (featurep 'ccl)
+
 ;; For VISCII users
 (set-charset-ccl-program 'vietnamese-viscii-lower
 			 'ccl-encode-viscii-font)
＠＠ -314,6 +290,8 ＠＠

 ;; (setq font-ccl-encoder-alist
 ;;       (cons (cons "vscii" ccl-encode-vscii-font) font-ccl-encoder-alist))
+
+) ; (featurep 'ccl)

 (defvar viet-viscii-to-external-code-table
   (let ((table (make-char-table 'generic))
cvs server: lisp/mule/windows.el is a new entry, no comparison available
cvs server: Diffing lisp/term
cvs server: Diffing lock
cvs server: Diffing lwlib
cvs server: Diffing man
cvs server: Diffing man/internals
Index: man/internals/internals.texi
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/man/internals/internals.texi,v
retrieving revision 1.69
diff -u -r1.69 internals.texi
--- man/internals/internals.texi	2005/10/20 12:38:04	1.69
+++ man/internals/internals.texi	2005/11/22 14:00:21
＠＠ -6735,12 +6735,13 ＠＠
 as ＠file{xemacs} or ＠file{temacs} because it sets the global variable
 ＠code{initialized} to 1 after step (4) above.) At this point,
 ＠file{xemacs} calls a Lisp function to do any further initialization,
-which includes parsing the command-line (the C code can only do limited
-command-line parsing, which includes looking for the ＠samp{-batch} and
-＠samp{-l} flags and a few other flags that it needs to know about before
-initialization is complete), creating the first frame (or ＠dfn{window}
-in standard window-system parlance), running the user's init file
-(usually the file ＠file{.emacs} in the user's home directory), etc.  The
+which includes parsing the command-line (the C code can only do
+limited command-line parsing, which includes looking for the
+＠samp{-batch} and ＠samp{-l} flags and a few other flags that it needs
+to know about before initialization is complete), creating the first
+frame (or ＠dfn{window} in standard window-system parlance), running
+the user's init file (usually the file ＠file{init.el} in the
+＠file{xemacs} subdirectory of the user's home directory), etc.  The
 function to do this is usually called ＠code{normal-top-level};
 ＠file{loadup.el} tells the C code about this function by setting its
 name as the value of the Lisp variable ＠code{top-level}.
＠＠ -10847,22 +10848,29 ＠＠
 ＠cindex character sets

   A ＠dfn{character set} (or ＠dfn{charset}) is an ordered set of
-characters.  A particular character in a charset is indexed using one or
-more ＠dfn{position codes}, which are non-negative integers.  The number
-of position codes needed to identify a particular character in a charset
-is called the ＠dfn{dimension} of the charset.  In XEmacs/Mule, all
-charsets have dimension 1 or 2, and the size of all charsets (except for
-a few special cases) is either 94, 96, 94 by 94, or 96 by 96.  The range
-of position codes used to index characters from any of these types of
-character sets is as follows:
+characters.  A particular character in a charset is indexed using one
+or more ＠dfn{position codes}, which are non-negative integers.  The
+number of position codes needed to identify a particular character in
+a charset is called the ＠dfn{dimension} of the charset.  In
+XEmacs/Mule, all charsets have dimension 1 or 2.  Formerly the size of
+all charsets (except for a few special cases) was either 94, 96, 94 by
+94, or 96 by 96.  Now, however, the size of either dimension of a
+charset can be anywhere between 1 and 256 and an offset (minimum
+allowed value) can be specified for either dimension, anywhere between
+0 and 255 provided that the maximum value also falls within this
+range.
+
+Some charsets are ＠dfn{ISO-2022-compatible}, meaning that they can be
+used in an ISO-2022 encoding.  Such charsets must have position codes
+in the following range:

 ＠example
 Charset type            Position code 1         Position code 2
-------------------------------------------------------------
-94                      33 - 126                N/A
-96                      32 - 127                N/A
-94x94                   33 - 126                33 - 126
-96x96                   32 - 127                32 - 127
+---------------------------------------------------------------
+94                      33 - 126 or 161-254     N/A
+96                      32 - 127 or 160-255     N/A
+94x94                   33 - 126 or 161-254     33 - 126 or 161-254
+96x96                   32 - 127 or 160-255     32 - 127 or 160-255
 ＠end example

   Note that in the above cases position codes do not start at an
＠＠ -10878,20 +10886,18 ＠＠
 all the slots whose first position code is in the range 118 - 127 are
 empty.]

-  There are three charsets that do not follow the above rules.  All of
-them have one dimension, and have ranges of position codes as follows:
+  There are some charsets that do not follow the above rules, e.g.:

 ＠example
-Charset name            Position code 1
-------------------------------------
-ASCII                   0 - 127
-Control-1               0 - 31
-Composite               0 - some large number
+Charset type            Position code 1         Position code 2
+---------------------------------------------------------------
+ASCII                   0 - 127                 N/A
+Control-1               128 - 159               N/A
+Composite               not implemented         not implemented
+Big5                    161 - 254               64 - 254
+Johab                   132 - 249               49 - 254
 ＠end example

-  (The upper bound of the position code for composite characters has not
-yet been determined, but it will probably be at least 16,383).
-
   ASCII is the union of two subsidiary character sets: Printing-ASCII
 (the printing ASCII character set, consisting of position codes 33 -
 126, like for a standard 94-character charset) and Control-ASCII (the
＠＠ -10901,9 +10907,6 ＠＠
   Control-1 contains the non-printing characters that would appear in a
 binary file with codes 128 - 159.

-  Composite contains characters that are generated by overstriking one
-or more characters from other charsets.
-
   Note that some characters in ASCII, and all characters in Control-1,
 are ＠dfn{control} (non-printing) characters.  These have no printed
 representation but instead control some other function of the printing
＠＠ -10911,19 +10914,6 ＠＠
 stop).  All other characters in all charsets are ＠dfn{graphic}
 (printing) characters.

-  When a binary file is read in, the bytes in the file are assigned to
-character sets as follows:
-
-＠example
-Bytes           Character set           Range
---------------------------------------------------
-0 - 127         ASCII                   0 - 127
-128 - 159       Control-1               0 - 31
-160 - 255       Latin-1                 32 - 127
-＠end example
-
-  This is a bit ad-hoc but gets the job done.
-
 ＠node Encodings, Internal Mule Encodings, Character Sets, Multilingual Support
 ＠section Encodings
 ＠cindex encodings, Mule
＠＠ -11031,38 +11021,41 ＠＠
 ＠cindex Mule encodings, internal
 ＠cindex encodings, internal Mule

-In XEmacs/Mule, each character set is assigned a unique number, called a
-＠dfn{leading byte}.  This is used in the encodings of a character.
-Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has
-a leading byte of 0), although some leading bytes are reserved.
-
-Charsets whose leading byte is in the range 0x80 - 0x9F are called
-＠dfn{official} and are used for built-in charsets.  Other charsets are
-called ＠dfn{private} and have leading bytes in the range 0xA0 - 0xFF;
-these are user-defined charsets.
+In XEmacs/Mule, each character set is assigned a unique number, called
+a ＠dfn{charset ID} (formerly ＠dfn{leading byte}).  This is used in the
+encodings of a character.  In the old-Mule internal encoding, Charset
+ID's are in the range 0x7F - 0x15F for "encodable" charsets (those
+which can be encoded into a string or character), while other charsets
+have higher ID's.
+
+Charsets whose ID in the range 0x7F - 0x9F are called ＠dfn{official}
+and take up less space in the string encoding; these are used for
+built-in charsets.  Other encodable charsets are called ＠dfn{private}
+and have higher ID's, in the range 0xA0 - 0x15F; these are mostly for
+user-defined charsets, although some built-in charsets are in this
+range.

   More specifically:

 ＠example
-Character set                Leading byte
--------------                ------------
-ASCII                        0 (0x7F in arrays indexed by leading byte)
-Composite                    0x8D
-Dimension-1 Official         0x80 - 0x8C/0x8D
-                               (0x8E is free)
-Control                      0x8F
-Dimension-2 Official         0x90 - 0x99
+Character set                ID
+-------------                --
+ASCII                        0x7F
+Control-1                    0x80
+Dimension-1 Official         0x81 - 0x8C/0x8D (subject to change)
+Dimension-2 Official         0x8D/0x8E - 0x9D (subject to change)
                                (0x9A - 0x9D are free)
-Dimension-1 Private Marker   0x9E
-Dimension-2 Private Marker   0x9F
-Dimension-1 Private          0xA0 - 0xEF
-Dimension-2 Private          0xF0 - 0xFF
+Dimension-1 Private          0xA0 - 0xFF
+Dimension-2 Private          0x100 - 0x15F
 ＠end example

+0x9E and 0x9F are reserved; these are used in the string encoding for the
+Dimension-1 Private Marker and Dimension-2 Private Marker, respectively.
+
 There are two internal encodings for characters in XEmacs/Mule.  One is
 called ＠dfn{string encoding} and is an 8-bit encoding that is used for
 representing characters in a buffer or string.  It uses 1 to 4 bytes per
-character.  The other is called ＠dfn{character encoding} and is a 19-bit
+character.  The other is called ＠dfn{character encoding} and is a 21-bit
 encoding that is used for representing characters individually in a
 variable.

＠＠ -11073,6 +11066,7 ＠＠
 ＠menu
 * Internal String Encoding::    
 * Internal Character Encoding::  
+* Old Internal Character Encoding::
 ＠end menu

 ＠node Internal String Encoding, Internal Character Encoding, Internal Mule Encodings,
Internal Mule Encodings
＠＠ -11081,25 +11075,29 ＠＠
 ＠cindex string encoding, internal
 ＠cindex encoding, internal string

-ASCII characters are encoded using their position code directly.  Other
-characters are encoded using their leading byte followed by their
-position code(s) with the high bit set.  Characters in private character
-sets have their leading byte prefixed with a ＠dfn{leading byte prefix},
-which is either 0x9E or 0x9F. (No character sets are ever assigned these
-leading bytes.) Specifically:
+ASCII characters are encoded using their position code directly.
+Other characters are encoded using their ＠dfn{lead byte}, which is
+derived from the charset ID, followed by their position code(s) with
+the high bit set.  Characters in private character sets have their
+lead byte prefixed with a ＠dfn{lead byte prefix}, which is either 0x9E
+or 0x9F. (No character sets are ever assigned these lead bytes.) 
+Specifically:

 ＠example
-Character set           Encoding (PC=position-code, LB=leading-byte)
+Character set           Encoding (PC=position-code, ID=charset ID)
 -------------           --------
 ASCII                   PC-1 |
-Control-1               LB   |  PC1 + 0xA0 |
-Dimension-1 official    LB   |  PC1 + 0x80 |
-Dimension-1 private     0x9E |  LB         | PC1 + 0x80 |
-Dimension-2 official    LB   |  PC1 + 0x80 | PC2 + 0x80 |
-Dimension-2 private     0x9F |  LB         | PC1 + 0x80 | PC2 + 0x80
+Control-1               ID   |  PC1 + 0xA0 |
+Dimension-1 official    ID   |  PC1 + 0x80 |
+Dimension-1 private     0x9E |  ID         | PC1 + 0x80 |
+Dimension-2 official    ID   |  PC1 + 0x80 | PC2 + 0x80 |
+Dimension-2 private     0x9F |  ID - 0x60  | PC1 + 0x80 | PC2 + 0x80
 ＠end example
+
+Notice how the lead byte is usually the same as the charset ID, but
+is offset by 0x60 for dimension-2 private charsets.

-  The basic characteristic of this encoding is that the first byte
+The basic characteristic of this encoding is that the first byte
 of all characters is in the range 0x00 - 0x9F, and the second and
 following bytes of all characters is in the range 0xA0 - 0xFF.
 This means that it is impossible to get out of sync, or more
＠＠ -11127,17 +11125,89 ＠＠
 conditions.  For example, EUC satisfies only (2) and (3), while
 Shift-JIS and Big5 (not yet described) satisfy only (2). (All
 non-modal encodings must satisfy (2), in order to be unambiguous.)
+
+＠strong{NOTE}: The Boyer-Moore code in ＠file{search.c} can only work
+in the presence of a translation table if all characters with a
+distinct translation differ in their string representation from the
+translation only in the final byte. (This is because the Boyer-Moore
+algorithm works on the byte level and uses the last byte of the search
+string to determine how far to jump ahead on each comparison.
+Modifying the algorithm to handle translations of that particular byte
+is reasonable, but trying to handle translations of other bytes makes
+things too complicated.) The code in ＠code{search_buffer()} that
+determines whether to use Boyer-Moore knows this and only uses it when
+appropriate.  The choice of internal representation can have an effect
+on how often Boyer-Moore can be used; luckily, both the
+Unicode-internal (UTF-8) and old-Mule representations work reasonably
+because characters near each other tend to differ only in their final
+byte, and translation pairs tend to be near each other because
+alphabets are grouped together.

-＠node Internal Character Encoding,  , Internal String Encoding, Internal Mule Encodings
+＠node Internal Character Encoding, Old Internal Character Encoding, Internal String
Encoding, Internal Mule Encodings
 ＠subsection Internal Character Encoding
 ＠cindex internal character encoding
 ＠cindex character encoding, internal
 ＠cindex encoding, internal character

-  One 19-bit word represents a single character.  The word is
-separated into three fields:
+In the new old-Mule encoding, one 21-bit word represents a single
+character.  The word is again separated into three fields:

 ＠example
+Bit number:  20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+             <------------------> <------------------>
<------------------>
+Field:                 1                    2                    3
+＠end example
+
+The big difference compared to the older old-Mule encoding is that
+field 1 has grown to 7 bits.  This relates to the number of encodable
+charsets: The old encoding could encode 80 private dimension-1
+charsets and only 16 private dimension-2 charsets; the new encoding
+encodes 96 of both.
+
+＠example
+Character set           Field 1         Field 2         Field 3
+-------------           -------         -------         -------
+ASCII                      0               0              PC1
+   range:                                                   (00 - 7F)
+Control-1                  0               1              PC1
+   range:                                                   (00 - 1F)
+Dimension-1 official       0            ID - 0x80         PC1
+   range:                                    (01 - 1D)      (20 - 7F)
+Dimension-1 private        0            ID - 0x80         PC1
+   range:                                    (20 - 7F)      (20 - 7F)
+Dimension-2 official    ID - 0x80         PC1             PC2
+   range:                    (01 - 1D)       (20 - 7F)      (20 - 7F)
+Dimension-2 private     ID - 0xE0         PC1             PC2
+   range:                    (20 - 7F)       (20 - 7F)      (20 - 7F)
+＠end example
+
+The critical thing to note is that a distinction has been made between
+charset ID's and ``lead bytes''.  A ＠dfn{lead byte} (formerly a
+``leading byte'') is a byte that forms the first byte in a multibyte
+string representation of a character.  This is very different from a
+＠dfn{charset ID}, which is a unique identifier for a character.  The
+charset ID may sometimes correspond to one of the bytes in the
+representation of the character, but not necessarily, and not
+necessarily the first one.
+
+Note that character codes 0 - 255 are the same as the ``binary
+encoding'' described above.
+
+Most of the code in XEmacs knows nothing of the representation of a
+character other than that values 0 - 255 represent ASCII, Control 1,
+and Latin 1.
+
+＠node Old Internal Character Encoding,  , Internal Character Encoding, Internal Mule
Encodings
+＠subsection Old Internal Character Encoding
+＠cindex internal character encoding, old
+＠cindex old internal character encoding
+＠cindex character encoding, internal, old
+＠cindex encoding, internal character, old
+
+In the original old-Mule encoding, one 19-bit word represented a single
+character.  The word was separated into three fields:
+
+＠example
 Bit number:     18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
                 <------------> <------------------>
<------------------>
 Field:                1                  2                    3
＠＠ -11163,20 +11233,7 ＠＠
 Composite                 0x1F             ?               ?
 ＠end example

-Note that character codes 0 - 255 are the same as the ``binary
-encoding'' described above.

-Most of the code in XEmacs knows nothing of the representation of a
-character other than that values 0 - 255 represent ASCII, Control 1,
-and Latin 1.
-
-＠strong{WARNING WARNING WARNING}: The Boyer-Moore code in
-(a)file{search.c}, and the code in ＠code{search_buffer()} that determines
-whether that code can be used, knows that ``field 3'' in a character
-always corresponds to the last byte in the textual representation of the
-character. (This is important because the Boyer-Moore algorithm works by
-looking at the last byte of the search string and &&#### finish this.
-
 ＠node Byte/Character Types; Buffer Positions; Other Typedefs, Internal Text APIs,
Internal Mule Encodings, Multilingual Support
 ＠section Byte/Character Types; Buffer Positions; Other Typedefs
 ＠cindex byte/character types; buffer positions; other typedefs
＠＠ -11395,10 +11452,10 ＠＠
 you cannot do the standard C trick of passing a pointer to a character
 to a function that expects a string.

-An Ichar takes up 19 bits of representation and (for code compatibility
-and such) is compatible with an int.  This representation is visible on
-the Lisp level.  The important characteristics of the Ichar
-representation are
+An Ichar takes up 21 bits of representation in old-Mule and up to 30
+bits in Unicode-internal and (for code compatibility and such) is
+compatible with an int.  This representation is visible on the Lisp
+level.  The important characteristics of the Ichar representation are

 ＠itemize ＠minus
 ＠item
＠＠ -12448,11 +12505,11 ＠＠
 multiplying a ＠code{Charcount} value with ＠code{MAX_ICHAR_LEN} produces
 a ＠code{Bytecount} value.

-In the current Mule implementation, ＠code{MAX_ICHAR_LEN} equals 4.
-Without Mule, it is 1.  In a mature Unicode-based XEmacs, it will also
-be 4 (since all Unicode characters can be encoded in UTF-8 in 4 bytes or
-less), but some versions may use up to 6, in order to use the large
-private space provided by ISO 10646 to ``mirror'' the Mule code space.
+In old-Mule, ＠code{MAX_ICHAR_LEN} equals 4.  Without Mule, it is 1.
+In Unicode-internal, it is 6, which is the maximum size of UTF-8.
+Although all official Unicode characters can be encoded using 4 bytes
+or less in UTF-8, we support characters up to 30 bits wide, for
+private use.

 ＠item itext_ichar
 ＠itemx set_itext_ichar
＠＠ -12583,7 +12640,7 ＠＠
 user complete control over this kind of behavior.
 ＠item
 On output, characters 0--255 are converted into bytes 0--255 and other
-characters are converted into ＠samp{~}.
+characters are converted into ＠samp{?}.
 ＠end enumerate

 ＠item Qnative
＠＠ -13740,7 +13797,7 ＠＠
 note further that you are not limited to the language/sublanguage
 combinations predefined by Windows.  You can set weird combinations
 like "Chinese_Kenya.1255" (Chinese spoken in Kenya, represented by
-Windows-1255, i.e. Hebrew!) and Windows don't complain, despite the
+Windows-1255, i.e. Hebrew!) and Windows doesn't complain, despite the
 language-encoding inconsistency.  You can also make up a weird
 combination and leave out the encoding, e.g. "Chinese_Qatar", which
 maps to "Chinese_Qatar.1256", where Windows-1256 is Arabic -- i.e. it
＠＠ -13970,16 +14027,6 ＠＠
 external programs used to implement the Canna and WNN input methods,
 respectively.  This is currently in beta.

-(a)file{mule-mcpath.c} provides some functions to allow for pathnames
-containing extended characters.  This code is fragmentary, obsolete, and
-completely non-working.  Instead, ＠code{pathname-coding-system} is used
-to specify conversions of names of files and directories.  The standard
-C I/O functions like ＠samp{open()} are wrapped so that conversion occurs
-automatically.
-
-(a)file{mule.c} contains a few miscellaneous things.  It currently seems
-to be unused and probably should be removed.
-

 ＠example
＠＠ -13992,14 +14039,6 ＠＠

-＠example
-(a)file{iso-wide.h}
-＠end example
-
-This contains leftover code from an earlier implementation of
-Asian-language support, and is not currently used.
-
-
 ＠node Consoles; Devices; Frames; Windows, The Redisplay Mechanism, Multilingual Support,
Top
 ＠chapter Consoles; Devices; Frames; Windows
 ＠cindex consoles; devices; frames; windows
＠＠ -15749,7 +15788,7 ＠＠
 need to.  Before invoking ＠code{command_loop_2()},
 ＠code{initial_command_loop()} calls ＠code{top_level_1()}, which handles
 all of the startup stuff (creating the initial frame, handling the
-command-line options, loading the user's ＠file{.emacs} file, etc.).  The
+command-line options, loading the user's ＠file{init.el} file, etc.).  The
 function that actually does this is in Lisp and is pointed to by the
 variable ＠code{top-level}; normally this function is
 ＠code{normal-top-level}.  ＠code{top_level_1()} is just an error-handling
＠＠ -21041,7 +21080,7 ＠＠

 ＠item
 The ＠dfn{default} case is POSIX locale, and no environment
-information in ~/.emacs.
+information in ~/.xemacs/init.el.

 N.B.  This ＠strong{will} cause breakage for all 1-byte users because
 the default case can no longer assume Latin-1.  You ＠strong{may} be
＠＠ -21060,7 +21099,7 ＠＠

 ＠item
 The ＠dfn{European} case is any Latin-* locale, either implied by
-setlocale() and friends or set in ~/.emacs.  Latin-1 is
+setlocale() and friends or set in ~/.xemacs/init.el.  Latin-1 is
 specifically not given precedence over other Latin-*, or
 non-Latin or non-ISO-8859 for that matter.  I suspect but am
 not sure that this case extends to all ISO-8859 encodings, and
＠＠ -21407,10 +21446,10 ＠＠
 are provided to do this so that various locale-specific areas can optionally
 be changed together with it.

-[Then you describe how the XEmacs locale is extracted from .emacs, from
-＠code{setlocale()}, from the LANG environment variables, from -font, or wherever
-else.  All other sections assume this dirty work is done and never even
-mention it]
+[Then you describe how the XEmacs locale is extracted from
+(a)file{init.el}, from ＠code{setlocale()}, from the LANG environment
+variables, from -font, or wherever else.  All other sections assume
+this dirty work is done and never even mention it]

 ＠subsubheading Section 7

＠＠ -23918,7 +23957,7 ＠＠
 replace.  Custom currently doesn't do this.

 PRINCIPLE #2: Currently, lots of functions want to add code to the
-.emacs. (e.g. I get prompted for my mail address from
+(a)file{init.el}. (e.g. I get prompted for my mail address from
 add-change-log-entry, and then prompted if I want to make this
 permanent).  There needs to be a Lisp API for working with arbitrary
 code to be added to a user's startup.  This API hides all the details
cvs server: Diffing man/lispref
cvs server: Diffing man/new-users-guide
cvs server: Diffing man/xemacs
cvs server: Diffing modules
cvs server: Diffing modules/base64
cvs server: Diffing modules/canna
cvs server: modules/canna/canna-api.c is a new entry, no comparison available
cvs server: Diffing modules/common
cvs server: Diffing modules/ldap
cvs server: Diffing modules/postgresql
cvs server: Diffing modules/sample
cvs server: Diffing modules/sample/external
cvs server: Diffing modules/sample/internal
cvs server: Diffing modules/zlib
cvs server: Diffing netinstall
cvs server: Diffing nt
Index: nt/config.inc.samp
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/nt/config.inc.samp,v
retrieving revision 1.22
diff -u -r1.22 config.inc.samp
--- nt/config.inc.samp	2005/09/26 08:13:00	1.22
+++ nt/config.inc.samp	2005/11/22 14:00:22
＠＠ -15,6 +15,9 ＠＠
 # Multilingual support.
 MULE=0

+# Use Unicode internally rather than the old flawed Mule encoding.
+UNICODE_INTERNAL=1
+
 # Native MS Windows support.
 HAVE_MS_WINDOWS=1

Index: nt/xemacs.mak
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/nt/xemacs.mak,v
retrieving revision 1.117
diff -u -r1.117 xemacs.mak
--- nt/xemacs.mak	2005/10/28 05:42:07	1.117
+++ nt/xemacs.mak	2005/11/22 14:00:22
＠＠ -606,13 +606,24 ＠＠
 OPT_DEFINES=$(OPT_DEFINES) -DHAVE_NATIVE_SOUND
 !endif

+!if $(UNICODE_INTERNAL)
+!if !$(MULE)
+!message Unicode-internal requires Mule.  Mule forced on.
+MULE=1
+!endif
+!endif
+
 !if $(MULE)
 OPT_DEFINES=$(OPT_DEFINES) -DMULE
 OPT_OBJS=$(OPT_OBJS) \
-	$(OUTDIR)\mule-ccl.obj \
 	$(OUTDIR)\mule-charset.obj \
 	$(OUTDIR)\mule-coding.obj
+!if $(UNICODE_INTERNAL)
+OPT_DEFINES=$(OPT_DEFINES) -DUNICODE_INTERNAL
+!else
+OPT_OBJS=$(OPT_OBJS) $(OUTDIR)\mule-ccl.obj
 !endif
+!endif

 !if $(DEBUG_XEMACS)
 OPT_OBJS=$(OPT_OBJS) $(OUTDIR)\debug.obj $(OUTDIR)\tests.obj
＠＠ -1144,6 +1155,11 ＠＠
 !endif
 !if $(MULE)
   Compiling in international (MULE) support.
+!if $(UNICODE_INTERNAL)
+  Unicode is used internally.
+!else
+  The old-Mule format is used internally.
+!endif
 !endif
 !if $(HAVE_GTK)
   --------------------------------------------------------------------
cvs server: Diffing nt/installer
cvs server: Diffing nt/installer/Wise
cvs server: Diffing src
Index: src/Makefile.in.in
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/Makefile.in.in,v
retrieving revision 1.119
diff -u -r1.119 Makefile.in.in
--- src/Makefile.in.in	2005/10/21 15:48:42	1.119
+++ src/Makefile.in.in	2005/11/22 14:00:22
＠＠ -3,7 +3,7 ＠＠
 ##   Copyright (C) 1994, 1995 Board of Trustees, University of Illinois
 ##   Copyright (C) 1996, 1997 Sun Microsystems, Inc.
 ##   Copyright (C) 1998, 1999 J. Kean Johnston.
-##   Copyright (C) 2001, 2002, 2003 Ben Wing.
+##   Copyright (C) 2001, 2002, 2003, 2005 Ben Wing.

 ## This file is part of XEmacs.

＠＠ -193,16 +193,17 ＠＠
 #endif

 #ifdef MULE
-mule_objs=mule-ccl.o mule-charset.o mule-coding.o
+mule_objs=mule-charset.o mule-coding.o
+#ifdef HAVE_CCL
+mule_objs += mule-ccl.o
 #endif
-
 #if defined(HAVE_CANNA) && !defined(HAVE_SHLIB)
-mule_canna_objs=$(BLDMODULES)/canna/canna_api.o
+mule_objs += $(BLDMODULES)/canna/canna_api.o
 #endif
-
 #ifdef HAVE_WNN
-mule_wnn_objs=mule-wnnfns.o
+mule_objs += mule-wnnfns.o
 #endif
+#endif /* MULE */

 #ifdef WITH_GMP
 number_objs=number-gmp.o number.o
＠＠ -278,10 +279,10 ＠＠
  gutter.o\
  hash.o imgproc.o indent.o insdel.o intl.o\
  keymap.o $(RTC_patch_objs) line-number.o $(ldap_objs) lread.o lstream.o\
- $(mc_alloc_objs) \
- macros.o marker.o md5.o minibuf.o $(mswindows_objs) $(mswindows_gui_objs)\
- $(mule_objs) $(mule_canna_objs) $(mule_wnn_objs) $(number_objs) objects.o\
- opaque.o $(postgresql_objs) print.o process.o $(process_objs) $(profile_objs)\
+ $(mc_alloc_objs) macros.o marker.o md5.o minibuf.o \
+ $(mswindows_objs) $(mswindows_gui_objs) $(mule_objs) \
+ $(number_objs) objects.o opaque.o\
+ $(postgresql_objs) print.o process.o $(process_objs) $(profile_objs)\
  rangetab.o realpath.o redisplay.o redisplay-output.o regex.o\
  search.o select.o $(sheap_objs) $(shlib_objs) signal.o sound.o\
  specifier.o strftime.o $(sunpro_objs) symbols.o syntax.o sysdep.o\
Index: src/abbrev.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/abbrev.c,v
retrieving revision 1.20
diff -u -r1.20 abbrev.c
--- src/abbrev.c	2004/11/04 23:06:15	1.20
+++ src/abbrev.c	2005/11/22 14:00:22
＠＠ -36,6 +36,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "commands.h"
 #include "insdel.h"
 #include "syntax.h"
＠＠ -161,7 +162,7 ＠＠
   closure.buf = buf;
   closure.point = BUF_PT (buf);
   closure.maxlen = closure.point - BUF_BEGV (buf);
-  closure.chartab = buf->mirror_syntax_table;
+  closure.chartab = BUFFER_MIRROR_SYNTAX_TABLE (buf);
   closure.found = 0;

   map_obarray (obarray, abbrev_match_mapper, &closure);
＠＠ -387,7 +388,7 ＠＠
       Charbpos pos = abbrev_start;
       /* Find the initial.  */
       while (pos < point
-	     && !WORD_SYNTAX_P (buf->mirror_syntax_table,
+	     && !WORD_SYNTAX_P (BUFFER_MIRROR_SYNTAX_TABLE (buf),
 				BUF_FETCH_CHAR (buf, pos)))
 	pos++;
       /* Change just that.  */
Index: src/alloc.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/alloc.c,v
retrieving revision 1.118
diff -u -r1.118 alloc.c
--- src/alloc.c	2005/10/25 08:32:46	1.118
+++ src/alloc.c	2005/11/22 14:00:24
＠＠ -820,142 +820,130 ＠＠
 /*			  Fixed-size type macros			*/
 /************************************************************************/

-/* For fixed-size types that are commonly used, we malloc() large blocks
-   of memory at a time and subdivide them into chunks of the correct
-   size for an object of that type.  This is more efficient than
-   malloc()ing each object separately because we save on malloc() time
-   and overhead due to the fewer number of malloc()ed blocks, and
-   also because we don't need any extra pointers within each object
-   to keep them threaded together for GC purposes.  For less common
-   (and frequently large-size) types, we use lcrecords, which are
-   malloc()ed individually and chained together through a pointer
-   in the lcrecord header.  lcrecords do not need to be fixed-size
-   (i.e. two objects of the same type need not have the same size;
-   however, the size of a particular object cannot vary dynamically).
+/* For fixed-size types that are commonly used, we malloc() large blocks of
+   memory at a time and subdivide them into chunks of the correct size for
+   an object of that type.  This is more efficient than malloc()ing each
+   object separately because we save on malloc() time and overhead due to
+   the fewer number of malloc()ed blocks, and also because we don't need
+   any extra pointers within each object to keep them threaded together for
+   GC purposes.  For less common (and frequently large-size) types, we use
+   lcrecords, which are malloc()ed individually and chained together
+   through a pointer in the lcrecord header.  lcrecords do not need to be
+   fixed-size (i.e. two objects of the same type need not have the same
+   size; however, the size of a particular object cannot vary dynamically).
    It is also much easier to create a new lcrecord type because no
-   additional code needs to be added to alloc.c.  Finally, lcrecords
-   may be more efficient when there are only a small number of them.
+   additional code needs to be added to alloc.c.  Finally, lcrecords may be
+   more efficient when there are only a small number of them.

-   The types that are stored in these large blocks (or "frob blocks")
-   are cons, all number types except fixnum, compiled-function, symbol,
-   marker, extent, event, and string.
-
-   Note that strings are special in that they are actually stored in
-   two parts: a structure containing information about the string, and
-   the actual data associated with the string.  The former structure
-   (a struct Lisp_String) is a fixed-size structure and is managed the
-   same way as all the other such types.  This structure contains a
-   pointer to the actual string data, which is stored in structures of
-   type struct string_chars_block.  Each string_chars_block consists
-   of a pointer to a struct Lisp_String, followed by the data for that
-   string, followed by another pointer to a Lisp_String, followed by
-   the data for that string, etc.  At GC time, the data in these
-   blocks is compacted by searching sequentially through all the
-   blocks and compressing out any holes created by unmarked strings.
-   Strings that are more than a certain size (bigger than the size of
-   a string_chars_block, although something like half as big might
-   make more sense) are malloc()ed separately and not stored in
-   string_chars_blocks.  Furthermore, no one string stretches across
-   two string_chars_blocks.
+   The types that are stored in these large blocks (or "frob blocks") are
+   cons, all number types except fixnum, compiled-function, symbol, marker,
+   extent, event, sometimes char-subtable, and string.
+
+   Note that strings are special in that they are actually stored in two
+   parts: a structure containing information about the string, and the
+   actual data associated with the string.  The former structure (a struct
+   Lisp_String) is a fixed-size structure and is managed the same way as
+   all the other such types.  This structure contains a pointer to the
+   actual string data, which is stored in structures of type struct
+   string_chars_block.  Each string_chars_block consists of a pointer to a
+   struct Lisp_String, followed by the data for that string, followed by
+   another pointer to a Lisp_String, followed by the data for that string,
+   etc.  At GC time, the data in these blocks is compacted by searching
+   sequentially through all the blocks and compressing out any holes
+   created by unmarked strings.  Strings that are more than a certain size
+   (bigger than the size of a string_chars_block, although something like
+   half as big might make more sense) are malloc()ed separately and not
+   stored in string_chars_blocks.  Furthermore, no one string stretches
+   across two string_chars_blocks.

    Vectors are each malloc()ed separately as lcrecords.

-   In the following discussion, we use conses, but it applies equally
-   well to the other fixed-size types.
+   In the following discussion, we use conses, but it applies equally well
+   to the other fixed-size types.

-   We store cons cells inside of cons_blocks, allocating a new
-   cons_block with malloc() whenever necessary.  Cons cells reclaimed
-   by GC are put on a free list to be reallocated before allocating
-   any new cons cells from the latest cons_block.  Each cons_block is
-   just under 2^n - MALLOC_OVERHEAD bytes long, since malloc (at least
-   the versions in malloc.c and gmalloc.c) really allocates in units
-   of powers of two and uses 4 bytes for its own overhead.
-
-   What GC actually does is to search through all the cons_blocks,
-   from the most recently allocated to the oldest, and put all
-   cons cells that are not marked (whether or not they're already
-   free) on a cons_free_list.  The cons_free_list is a stack, and
-   so the cons cells in the oldest-allocated cons_block end up
-   at the head of the stack and are the first to be reallocated.
-   If any cons_block is entirely free, it is freed with free()
-   and its cons cells removed from the cons_free_list.  Because
-   the cons_free_list ends up basically in memory order, we have
-   a high locality of reference (assuming a reasonable turnover
-   of allocating and freeing) and have a reasonable probability
-   of entirely freeing up cons_blocks that have been more recently
-   allocated.  This stage is called the "sweep stage" of GC, and
-   is executed after the "mark stage", which involves starting
-   from all places that are known to point to in-use Lisp objects
-   (e.g. the obarray, where are all symbols are stored; the
-   current catches and condition-cases; the backtrace list of
-   currently executing functions; the gcpro list; etc.) and
-   recursively marking all objects that are accessible.
-
-   At the beginning of the sweep stage, the conses in the cons blocks
-   are in one of three states: in use and marked, in use but not
-   marked, and not in use (already freed).  Any conses that are marked
-   have been marked in the mark stage just executed, because as part
-   of the sweep stage we unmark any marked objects.  The way we tell
-   whether or not a cons cell is in use is through the LRECORD_FREE_P
-   macro.  This uses a special lrecord type `lrecord_type_free',
-   which is never associated with any valid object.
-
-   Conses on the free_cons_list are threaded through a pointer stored
-   in the conses themselves.  Because the cons is still in a
-   cons_block and needs to remain marked as not in use for the next
-   time that GC happens, we need room to store both the "free"
-   indicator and the chaining pointer.  So this pointer is stored
-   after the lrecord header (actually where C places a pointer after
-   the lrecord header; they are not necessarily contiguous).  This
-   implies that all fixed-size types must be big enough to contain at
-   least one pointer.  This is true for all current fixed-size types,
-   with the possible exception of Lisp_Floats, for which we define the
-   meat of the struct using a union of a pointer and a double to
-   ensure adequate space for the free list chain pointer.
-
-   Some types of objects need additional "finalization" done
-   when an object is converted from in use to not in use;
-   this is the purpose of the ADDITIONAL_FREE_type macro.
-   For example, markers need to be removed from the chain
-   of markers that is kept in each buffer.  This is because
-   markers in a buffer automatically disappear if the marker
-   is no longer referenced anywhere (the same does not
-   apply to extents, however).
+   We store cons cells inside of cons_blocks, allocating a new cons_block
+   with malloc() whenever necessary.  Cons cells reclaimed by GC are put on
+   a free list to be reallocated before allocating any new cons cells from
+   the latest cons_block.  Each cons_block is just under 2^n -
+   MALLOC_OVERHEAD bytes long, since malloc (at least the versions in
+   malloc.c and gmalloc.c) really allocates in units of powers of two and
+   uses 4 bytes for its own overhead.
+
+   What GC actually does is to search through all the cons_blocks, from the
+   most recently allocated to the oldest, and put all cons cells that are
+   not marked (whether or not they're already free) on a cons_free_list.
+   The cons_free_list is a stack, and so the cons cells in the
+   oldest-allocated cons_block end up at the head of the stack and are the
+   first to be reallocated.  If any cons_block is entirely free, it is
+   freed with free() and its cons cells removed from the cons_free_list.
+   Because the cons_free_list ends up basically in memory order, we have a
+   high locality of reference (assuming a reasonable turnover of allocating
+   and freeing) and have a reasonable probability of entirely freeing up
+   cons_blocks that have been more recently allocated.  This stage is
+   called the "sweep stage" of GC, and is executed after the "mark
stage",
+   which involves starting from all places that are known to point to
+   in-use Lisp objects (e.g. the obarray, where are all symbols are stored;
+   the current catches and condition-cases; the backtrace list of currently
+   executing functions; the gcpro list; etc.) and recursively marking all
+   objects that are accessible.
+
+   At the beginning of the sweep stage, the conses in the cons blocks are
+   in one of three states: in use and marked, in use but not marked, and
+   not in use (already freed).  Any conses that are marked have been marked
+   in the mark stage just executed, because as part of the sweep stage we
+   unmark any marked objects.  The way we tell whether or not a cons cell
+   is in use is through the LRECORD_FREE_P macro.  This uses a special
+   lrecord type `lrecord_type_free', which is never associated with any
+   valid object.
+
+   Conses on the free_cons_list are threaded through a pointer stored in
+   the conses themselves.  Because the cons is still in a cons_block and
+   needs to remain marked as not in use for the next time that GC happens,
+   we need room to store both the "free" indicator and the chaining
+   pointer.  So this pointer is stored after the lrecord header (actually
+   where C places a pointer after the lrecord header; they are not
+   necessarily contiguous).  This implies that all fixed-size types must be
+   big enough to contain at least one pointer.  This is true for all
+   current fixed-size types, with the possible exception of Lisp_Floats,
+   for which we define the meat of the struct using a union of a pointer
+   and a double to ensure adequate space for the free list chain pointer.
+
+   Some types of objects need additional "finalization" done when an object
+   is converted from in use to not in use; this is the purpose of the
+   ADDITIONAL_FREE_type macro.  For example, markers need to be removed
+   from the chain of markers that is kept in each buffer.  This is because
+   markers in a buffer automatically disappear if the marker is no longer
+   referenced anywhere (the same does not apply to extents, however).

-   WARNING: Things are in an extremely bizarre state when
-   the ADDITIONAL_FREE_type macros are called, so beware!
+   WARNING: Things are in an extremely bizarre state when the
+   ADDITIONAL_FREE_type macros are called, so beware!

    When ERROR_CHECK_GC is defined, we do things differently so as to
    maximize our chances of catching places where there is insufficient
-   GCPROing.  The thing we want to avoid is having an object that
-   we're using but didn't GCPRO get freed by GC and then reallocated
-   while we're in the process of using it -- this will result in
-   something seemingly unrelated getting trashed, and is extremely
-   difficult to track down.  If the object gets freed but not
-   reallocated, we can usually catch this because we set most of the
-   bytes of a freed object to 0xDEADBEEF. (The lisp object type is set
-   to the invalid type `lrecord_type_free', however, and a pointer
-   used to chain freed objects together is stored after the lrecord
-   header; we play some tricks with this pointer to make it more
-   bogus, so crashes are more likely to occur right away.)
-
-   We want freed objects to stay free as long as possible,
-   so instead of doing what we do above, we maintain the
-   free objects in a first-in first-out queue.  We also
-   don't recompute the free list each GC, unlike above;
-   this ensures that the queue ordering is preserved.
-   [This means that we are likely to have worse locality
-   of reference, and that we can never free a frob block
-   once it's allocated. (Even if we know that all cells
-   in it are free, there's no easy way to remove all those
-   cells from the free list because the objects on the
-   free list are unlikely to be in memory order.)]
-   Furthermore, we never take objects off the free list
-   unless there's a large number (usually 1000, but
-   varies depending on type) of them already on the list.
-   This way, we ensure that an object that gets freed will
-   remain free for the next 1000 (or whatever) times that
-   an object of that type is allocated.  */
+   GCPROing.  The thing we want to avoid is having an object that we're
+   using but didn't GCPRO get freed by GC and then reallocated while we're
+   in the process of using it -- this will result in something seemingly
+   unrelated getting trashed, and is extremely difficult to track down.  If
+   the object gets freed but not reallocated, we can usually catch this
+   because we set most of the bytes of a freed object to 0xDEADBEEF. (The
+   lisp object type is set to the invalid type `lrecord_type_free',
+   however, and a pointer used to chain freed objects together is stored
+   after the lrecord header; we play some tricks with this pointer to make
+   it more bogus, so crashes are more likely to occur right away.)
+
+   We want freed objects to stay free as long as possible, so instead of
+   doing what we do above, we maintain the free objects in a first-in
+   first-out queue.  We also don't recompute the free list each GC, unlike
+   above; this ensures that the queue ordering is preserved.  [This means
+   that we are likely to have worse locality of reference, and that we can
+   never free a frob block once it's allocated. (Even if we know that all
+   cells in it are free, there's no easy way to remove all those cells from
+   the free list because the objects on the free list are unlikely to be in
+   memory order.)]  Furthermore, we never take objects off the free list
+   unless there's a large number (usually 1000, but varies depending on
+   type) of them already on the list.  This way, we ensure that an object
+   that gets freed will remain free for the next 1000 (or whatever) times
+   that an object of that type is allocated. --ben */

 #ifndef MALLOC_OVERHEAD
 #ifdef GNU_MALLOC
＠＠ -1247,11 +1235,13 ＠＠
 				     mark_cons, print_cons, 0,
 				     cons_equal,
 				     /*
-				      * No `hash' method needed.
-				      * internal_hash knows how to
-				      * handle conses.
+				      * No `hash' method needed since
+				      * internal_hash_1 knows how to
+				      * handle conses, but there's an
+				      * early shortcut in internal_hash
+				      * so we need to put something here.
 				      */
-				     0,
+				     internal_hash_1,
 				     cons_description,
 				     Lisp_Cons);

＠＠ -2304,6 +2294,15 ＠＠
 						1, /*dumpable-flag*/
 						mark_string, print_string,
 						0, string_equal, 0,
+						/*
+						 * No `hash' method needed
+						 * since internal_hash_1
+						 * knows how to handle
+						 * strings, but there's an
+						 * early shortcut in
+						 * internal_hash so we need
+						 * to put something here.
+						 */
 						string_description,
 						string_getprop,
 						string_putprop,
＠＠ -3364,17 +3363,6 ＠＠
 #define GC_CHECK_LHEADER_INVARIANTS(lheader)
 #endif

-
-static const struct memory_description lisp_object_description_1[] = {
-  { XD_LISP_OBJECT, 0 },
-  { XD_END }
-};
-
-const struct sized_memory_description lisp_object_description = {
-  sizeof (Lisp_Object),
-  lisp_object_description_1
-};
-
 #if defined (USE_KKCC) || defined (PDUMP)

 /* This function extracts the value of a count variable described somewhere 
＠＠ -4531,6 +4519,7 ＠＠

   SWEEP_FIXED_TYPE_BLOCK (event, Lisp_Event);
 }
+
 #endif /* not MC_ALLOC */

 #ifdef EVENT_DATA_AS_OBJECTS
＠＠ -5227,6 +5216,7 ＠＠
       || preparing_for_armageddon)
     return;

+  stderr_out ("Entering GC\n");
   PROFILE_RECORD_ENTERING_SECTION (QSin_garbage_collection);

   /* We used to call selected_frame() here.
＠＠ -5345,26 +5335,21 ＠＠
 #endif /* USE_KKCC */

   { /* staticpro() */
-    Lisp_Object **p = Dynarr_begin (staticpros);
+    Lisp_Object **p = Dynarr_firstp (staticpros);
     Elemcount count;
     for (count = Dynarr_length (staticpros); count; count--)
       mark_object (**p++);
   }

   { /* staticpro_nodump() */
-    Lisp_Object **p = Dynarr_begin (staticpros_nodump);
+    Lisp_Object **p = Dynarr_firstp (staticpros_nodump);
     Elemcount count;
     for (count = Dynarr_length (staticpros_nodump); count; count--)
       mark_object (**p++);
   }

 #ifdef MC_ALLOC
-  { /* mcpro () */
-    Lisp_Object *p = Dynarr_begin (mcpros);
-    Elemcount count;
-    for (count = Dynarr_length (mcpros); count; count--)
-      mark_object (*p++);
-  }
+  mark_Lisp_Object_dynarr (mcpros);
 #endif /* MC_ALLOC */

   { /* GCPRO() */
＠＠ -5523,6 +5508,7 ＠＠
   funcall_allocation_flag = 1;

   PROFILE_RECORD_EXITING_SECTION (QSin_garbage_collection);
+  stderr_out ("Exiting GC\n");

   return;
 }
＠＠ -5593,7 +5579,7 ＠＠
   EMACS_INT s = 0;							\
   struct type##_block *x = current_##type##_block;			\
   while (x) { s += sizeof (*x) + MALLOC_OVERHEAD; x = x->prev; }	\
-  tgu_val += s;							\
+  tgu_val += s;								\
   (pl) = gc_plist_hack ((name), s, (pl));				\
 } while (0)

＠＠ -5631,33 +5617,35 ＠＠
           pl = gc_plist_hack (buf, lcrecord_stats[i].instances_in_use, pl);
         }
     }
+
+#define FROB(ty, strty)							\
+  HACK_O_MATIC (ty, strty "-storage", pl);				\
+  pl = gc_plist_hack (strty "s-free", gc_count_num_##ty##_freelist, pl); \
+  pl = gc_plist_hack (strty "s-used", gc_count_num_##ty##_in_use, pl)

-  HACK_O_MATIC (extent, "extent-storage", pl);
-  pl = gc_plist_hack ("extents-free", gc_count_num_extent_freelist, pl);
-  pl = gc_plist_hack ("extents-used", gc_count_num_extent_in_use, pl);
-  HACK_O_MATIC (event, "event-storage", pl);
-  pl = gc_plist_hack ("events-free", gc_count_num_event_freelist, pl);
-  pl = gc_plist_hack ("events-used", gc_count_num_event_in_use, pl);
-  HACK_O_MATIC (marker, "marker-storage", pl);
-  pl = gc_plist_hack ("markers-free", gc_count_num_marker_freelist, pl);
-  pl = gc_plist_hack ("markers-used", gc_count_num_marker_in_use, pl);
-  HACK_O_MATIC (float, "float-storage", pl);
-  pl = gc_plist_hack ("floats-free", gc_count_num_float_freelist, pl);
-  pl = gc_plist_hack ("floats-used", gc_count_num_float_in_use, pl);
+  FROB (extent, "extent");
+  FROB (event, "event");
+#ifdef EVENT_DATA_AS_OBJECTS
+  FROB (key_data, "key-data");
+  FROB (button_data, "button-data");
+  FROB (motion_data, "motion-data");
+  FROB (process_data, "process-data");
+  FROB (timeout_data, "timeout-data");
+  FROB (magic_data, "magic-data");
+  FROB (magic_eval_data, "magic-eval-data");
+  FROB (eval_data, "eval-data");
+  FROB (misc_user_data, "misc-user-data");
+#endif /* EVENT_DATA_AS_OBJECTS */
+  FROB (marker, "marker");
+  FROB (float, "float");
 #ifdef HAVE_BIGNUM
-  HACK_O_MATIC (bignum, "bignum-storage", pl);
-  pl = gc_plist_hack ("bignums-free", gc_count_num_bignum_freelist, pl);
-  pl = gc_plist_hack ("bignums-used", gc_count_num_bignum_in_use, pl);
+  FROB (bignum, "bignum");
 #endif /* HAVE_BIGNUM */
 #ifdef HAVE_RATIO
-  HACK_O_MATIC (ratio, "ratio-storage", pl);
-  pl = gc_plist_hack ("ratios-free", gc_count_num_ratio_freelist, pl);
-  pl = gc_plist_hack ("ratios-used", gc_count_num_ratio_in_use, pl);
+  FROB (ratio, "ratio");
 #endif /* HAVE_RATIO */
 #ifdef HAVE_BIGFLOAT
-  HACK_O_MATIC (bigfloat, "bigfloat-storage", pl);
-  pl = gc_plist_hack ("bigfloats-free", gc_count_num_bigfloat_freelist, pl);
-  pl = gc_plist_hack ("bigfloats-used", gc_count_num_bigfloat_in_use, pl);
+  FROB (bigfloat, "bigfloat");
 #endif /* HAVE_BIGFLOAT */
   HACK_O_MATIC (string, "string-header-storage", pl);
   pl = gc_plist_hack ("long-strings-total-length",
＠＠ -5672,21 +5660,14 ＠＠
 		      - gc_count_num_short_string_in_use, pl);
   pl = gc_plist_hack ("short-strings-used",
                       gc_count_num_short_string_in_use, pl);
-
-  HACK_O_MATIC (compiled_function, "compiled-function-storage", pl);
-  pl = gc_plist_hack ("compiled-functions-free",
-		      gc_count_num_compiled_function_freelist, pl);
-  pl = gc_plist_hack ("compiled-functions-used",
-		      gc_count_num_compiled_function_in_use, pl);
-
-  HACK_O_MATIC (symbol, "symbol-storage", pl);
-  pl = gc_plist_hack ("symbols-free", gc_count_num_symbol_freelist, pl);
-  pl = gc_plist_hack ("symbols-used", gc_count_num_symbol_in_use, pl);

+  FROB (compiled_function, "compiled-function");
+  FROB (symbol, "symbol");
   HACK_O_MATIC (cons, "cons-storage", pl);
   pl = gc_plist_hack ("conses-free", gc_count_num_cons_freelist, pl);
   pl = gc_plist_hack ("conses-used", gc_count_num_cons_in_use, pl);

+#undef FROB
 #undef HACK_O_MATIC

 #endif /* MC_ALLOC */
Index: src/buffer.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/buffer.c,v
retrieving revision 1.73
diff -u -r1.73 buffer.c
--- src/buffer.c	2005/10/25 08:32:47	1.73
+++ src/buffer.c	2005/11/22 14:00:25
＠＠ -2233,8 +2233,10 ＠＠
   defs->category_table = Vstandard_category_table;
 #endif /* MULE */
   defs->syntax_table = Vstandard_syntax_table;
+#ifdef MIRROR_TABLE
   defs->mirror_syntax_table =
     XCHAR_TABLE (Vstandard_syntax_table)->mirror_table;
+#endif /* MIRROR_TABLE */
   defs->modeline_format = build_string ("%-");  /* reset in loaddefs.el */
   defs->case_fold_search = Qt;
   defs->selective_display_ellipses = Qt;
Index: src/buffer.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/buffer.h,v
retrieving revision 1.33
diff -u -r1.33 buffer.h
--- src/buffer.h	2005/10/24 10:07:34	1.33
+++ src/buffer.h	2005/11/22 14:00:25
＠＠ -33,9 +33,6 ＠＠
 #ifndef INCLUDED_buffer_h_
 #define INCLUDED_buffer_h_

-#include "casetab.h"
-#include "chartab.h"
-
 /************************************************************************/
 /*                                                                      */
 /*                    definition of Lisp buffer object                  */
＠＠ -429,7 +426,7 ＠＠
     case FORMAT_DEFAULT:						\
       {									\
 	Ibyte *VBB_ptr = BYTE_BUF_BYTE_ADDRESS_NO_VERIFY (buf, x);	\
-	while (!ibyte_first_byte_p (*VBB_ptr))			\
+	while (!ibyte_first_byte_p (*VBB_ptr))				\
 	  VBB_ptr--, (x)--;						\
       }									\
       break;								\
＠＠ -461,7 +458,7 ＠＠
     case FORMAT_DEFAULT:						\
       {									\
 	Ibyte *VBF_ptr = BYTE_BUF_BYTE_ADDRESS_NO_VERIFY (buf, x);	\
-	while (!ibyte_first_byte_p (*VBF_ptr))			\
+	while (!ibyte_first_byte_p (*VBF_ptr))				\
 	  VBF_ptr++, (x)++;						\
       }									\
       break;								\
＠＠ -839,7 +836,7 ＠＠

 /* The character at position POS in buffer. */

-#define BYTE_BUF_FETCH_CHAR(buf, pos)					   \
+#define BYTE_BUF_FETCH_CHAR(buf, pos)					\
    itext_ichar_fmt (BYTE_BUF_BYTE_ADDRESS (buf, pos), BUF_FORMAT (buf), \
 		       wrap_buffer (buf))
 #define BUF_FETCH_CHAR(buf, pos) \
＠＠ -928,8 +925,8 ＠＠
 #define BUF_SIZE(buf) (BUF_Z (buf) - BUF_BEG (buf))

 /* Is this buffer narrowed? */
-#define BUF_NARROWED(buf) \
-   ((BYTE_BUF_BEGV (buf) != BYTE_BUF_BEG (buf)) || \
+#define BUF_NARROWED(buf)				\
+   ((BYTE_BUF_BEGV (buf) != BYTE_BUF_BEG (buf)) ||	\
     (BYTE_BUF_ZV   (buf) != BYTE_BUF_Z   (buf)))

 /* Modification count */
＠＠ -952,35 +949,6 ＠＠

 #define BUF_MARKERS(buf) ((buf)->markers)

-#ifdef MULE
-
-DECLARE_INLINE_HEADER (
-Lisp_Object
-BUFFER_CATEGORY_TABLE (struct buffer *buf)
-)
-{
-  return buf ? buf->category_table : Vstandard_category_table;
-}
-
-#endif /* MULE */
-
-DECLARE_INLINE_HEADER (
-Lisp_Object
-BUFFER_SYNTAX_TABLE (struct buffer *buf)
-)
-{
-  return buf ? buf->syntax_table : Vstandard_syntax_table;
-}
-
-DECLARE_INLINE_HEADER (
-Lisp_Object
-BUFFER_MIRROR_SYNTAX_TABLE (struct buffer *buf)
-)
-{
-  return buf ? buf->mirror_syntax_table :
-    XCHAR_TABLE (Vstandard_syntax_table)->mirror_table;
-}
-
 /* WARNING:

    The new definitions of CEILING_OF() and FLOOR_OF() differ semantically
＠＠ -1138,97 +1106,5 ＠＠
 #define R_ALLOC_DECLARE(var,data)

 #endif /* !REL_ALLOC */
-
-
-/************************************************************************/
-/*                         Case conversion                              */
-/************************************************************************/
-
-/* A "trt" table is a mapping from characters to other characters,
-   typically used to convert between uppercase and lowercase.
-   */
-
-/* The _1 macros are named as such because they assume that you have
-   already guaranteed that the character values are all in the range
-   0 - 255.  Bad lossage will happen otherwise. */
-
-#define MAKE_TRT_TABLE() Fmake_char_table (Qgeneric)
-DECLARE_INLINE_HEADER (
-Ichar
-TRT_TABLE_OF (Lisp_Object table, Ichar ch)
-)
-{
-  Lisp_Object TRT_char;
-  TRT_char = get_char_table (ch, table);
-  if (NILP (TRT_char))
-    return ch;
-  else
-    return XCHAR (TRT_char);
-}
-#define SET_TRT_TABLE_OF(table, ch1, ch2)	\
-  Fput_char_table (make_char (ch1), make_char (ch2), table)
-
-DECLARE_INLINE_HEADER (
-Lisp_Object
-BUFFER_CASE_TABLE (struct buffer *buf)
-)
-{
-  return buf ? buf->case_table : Vstandard_case_table;
-}
-
-/* Macros used below. */
-#define DOWNCASE_TABLE_OF(buf, c)	\
-  TRT_TABLE_OF (XCASE_TABLE_DOWNCASE (BUFFER_CASE_TABLE (buf)), c)
-#define UPCASE_TABLE_OF(buf, c)		\
-  TRT_TABLE_OF (XCASE_TABLE_UPCASE (BUFFER_CASE_TABLE (buf)), c)
-
-/* 1 if CH is upper case.  */
-
-DECLARE_INLINE_HEADER (
-int
-UPPERCASEP (struct buffer *buf, Ichar ch)
-)
-{
-  return DOWNCASE_TABLE_OF (buf, ch) != ch;
-}
-
-/* 1 if CH is lower case.  */
-
-DECLARE_INLINE_HEADER (
-int
-LOWERCASEP (struct buffer *buf, Ichar ch)
-)
-{
-  return (UPCASE_TABLE_OF   (buf, ch) != ch &&
-	  DOWNCASE_TABLE_OF (buf, ch) == ch);
-}
-
-/* 1 if CH is neither upper nor lower case.  */
-
-DECLARE_INLINE_HEADER (
-int
-NOCASEP (struct buffer *buf, Ichar ch)
-)
-{
-  return UPCASE_TABLE_OF (buf, ch) == ch;
-}
-
-/* Upcase a character, or make no change if that cannot be done.  */
-
-DECLARE_INLINE_HEADER (
-Ichar
-UPCASE (struct buffer *buf, Ichar ch)
-)
-{
-  return (DOWNCASE_TABLE_OF (buf, ch) == ch) ? UPCASE_TABLE_OF (buf, ch) : ch;
-}
-
-/* Upcase a character known to be not upper case.  Unused. */
-
-#define UPCASE1(buf, ch) UPCASE_TABLE_OF (buf, ch)
-
-/* Downcase a character, or make no change if that cannot be done. */
-
-#define DOWNCASE(buf, ch) DOWNCASE_TABLE_OF (buf, ch)

 #endif /* INCLUDED_buffer_h_ */
Index: src/bufslots.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/bufslots.h,v
retrieving revision 1.10
diff -u -r1.10 bufslots.h
--- src/bufslots.h	2003/01/12 11:08:08	1.10
+++ src/bufslots.h	2005/11/22 14:00:25
＠＠ -102,8 +102,10 ＠＠
     MARKED_SLOT (abbrev_table)
     /* This buffer's syntax table.  */
     MARKED_SLOT (syntax_table)
+#ifdef MIRROR_TABLE
     /* Massaged values from the syntax table, for faster lookup. */
     MARKED_SLOT (mirror_syntax_table)
+#endif /* MIRROR_TABLE */

 #ifdef MULE
     /* This buffer's category table. */
Index: src/bytecode.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/bytecode.c,v
retrieving revision 1.47
diff -u -r1.47 bytecode.c
--- src/bytecode.c	2005/04/08 23:11:19	1.47
+++ src/bytecode.c	2005/11/22 14:00:26
＠＠ -2180,7 +2180,7 ＠＠
 static const struct memory_description compiled_function_description[] = {
   { XD_INT,         offsetof (Lisp_Compiled_Function, args_in_array) },
   { XD_BLOCK_PTR,  offsetof (Lisp_Compiled_Function, args),
-    XD_INDIRECT (0, 0), { &lisp_object_description } },
+    XD_INDIRECT (0, 0), { &Lisp_Object_description } },
   { XD_LISP_OBJECT, offsetof (Lisp_Compiled_Function, instructions) },
   { XD_LISP_OBJECT, offsetof (Lisp_Compiled_Function, constants) },
   { XD_LISP_OBJECT, offsetof (Lisp_Compiled_Function, arglist) },
Index: src/casefiddle.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/casefiddle.c,v
retrieving revision 1.13
diff -u -r1.13 casefiddle.c
--- src/casefiddle.c	2004/11/04 23:06:16	1.13
+++ src/casefiddle.c	2005/11/22 14:00:26
＠＠ -25,6 +25,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "insdel.h"
 #include "syntax.h"

＠＠ -49,7 +50,7 ＠＠

   if (STRINGP (string_or_char))
     {
-      Lisp_Object syntax_table = buf->mirror_syntax_table;
+      Lisp_Object syntax_table = BUFFER_MIRROR_SYNTAX_TABLE (buf);
       Ibyte *storage =
 	alloca_ibytes (XSTRING_LENGTH (string_or_char) * MAX_ICHAR_LEN);
       Ibyte *newp = storage;
＠＠ -157,7 +158,7 ＠＠
 {
   /* This function can GC */
   Charbpos pos, s, e;
-  Lisp_Object syntax_table = buf->mirror_syntax_table;
+  Lisp_Object syntax_table = BUFFER_MIRROR_SYNTAX_TABLE (buf);
   int mccount;
   int wordp = 0, wordp_prev;

Index: src/casetab.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/casetab.c,v
retrieving revision 1.17
diff -u -r1.17 casetab.c
--- src/casetab.c	2005/10/24 10:07:34	1.17
+++ src/casetab.c	2005/11/22 14:00:26
＠＠ -1,7 +1,7 ＠＠
 /* XEmacs routines to deal with case tables.
    Copyright (C) 1987, 1992, 1993, 1994 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2002 Ben Wing.
+   Copyright (C) 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -60,10 +60,10 ＠＠

 #include <config.h>
 #include "lisp.h"
+
 #include "buffer.h"
-#include "opaque.h"
-#include "chartab.h"
 #include "casetab.h"
+#include "opaque.h"

 Lisp_Object Qcase_tablep, Qdowncase, Qupcase;
 Lisp_Object Vstandard_case_table;
＠＠ -90,7 +90,7 ＠＠
 {
   Lisp_Case_Table *ct = XCASE_TABLE (obj);
   if (print_readably)
-    printing_unreadable_object ("#<case-table 0x%x>",
ct->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_fmt_string_lisp
     (printcharfun, "#<case-table downcase=%s upcase=%s canon=%s eqv=%s ",
4,
      CASE_TABLE_DOWNCASE (ct), CASE_TABLE_UPCASE (ct),
＠＠ -286,44 +286,40 ＠＠
 }

 static int
-compute_canon_mapper (struct chartab_range *range,
-		      Lisp_Object UNUSED (table), Lisp_Object val, void *arg)
+compute_canon_mapper (Lisp_Object UNUSED (table), Ichar code, Lisp_Object val,
+		      void * arg)
 {
   Lisp_Object casetab = VOID_TO_LISP (arg);
-  if (range->type == CHARTAB_RANGE_CHAR)
-    SET_TRT_TABLE_OF (XCASE_TABLE_CANON (casetab), range->ch,
-		      TRT_TABLE_OF (XCASE_TABLE_DOWNCASE (casetab),
-				    TRT_TABLE_OF (XCASE_TABLE_UPCASE (casetab),
-						  XCHAR (val))));
+  SET_TRT_TABLE_OF (XCASE_TABLE_CANON (casetab), code,
+		    TRT_TABLE_OF (XCASE_TABLE_DOWNCASE (casetab),
+				  TRT_TABLE_OF (XCASE_TABLE_UPCASE (casetab),
+						XCHAR (val))));

   return 0;
 }

 static int
-initialize_identity_mapper (struct chartab_range *range,
-			    Lisp_Object UNUSED (table),
-			    Lisp_Object UNUSED (val), void *arg)
+initialize_identity_mapper (Lisp_Object UNUSED (table), Ichar code,
+			    Lisp_Object UNUSED (val), void * arg)
 {
   Lisp_Object trt = VOID_TO_LISP (arg);
-  if (range->type == CHARTAB_RANGE_CHAR)
-    SET_TRT_TABLE_OF (trt, range->ch, range->ch);
+  SET_TRT_TABLE_OF (trt, code, code);

   return 0;
 }

 static int
-compute_up_or_eqv_mapper (struct chartab_range *range,
-			  Lisp_Object UNUSED (table),
-			  Lisp_Object val, void *arg)
+compute_up_or_eqv_mapper (Lisp_Object UNUSED (table), Ichar code,
+			  Lisp_Object val, void * arg)
 {
   Lisp_Object inverse = VOID_TO_LISP (arg);
   Ichar toch = XCHAR (val);

-  if (range->type == CHARTAB_RANGE_CHAR && range->ch != toch)
+  if (code != toch)
     {
       Ichar c = TRT_TABLE_OF (inverse, toch);
-      SET_TRT_TABLE_OF (inverse, toch, range->ch);
-      SET_TRT_TABLE_OF (inverse, range->ch, c);
+      SET_TRT_TABLE_OF (inverse, toch, code);
+      SET_TRT_TABLE_OF (inverse, code, c);
     }

   return 0;
Index: src/casetab.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/casetab.h,v
retrieving revision 1.7
diff -u -r1.7 casetab.h
--- src/casetab.h	2005/10/24 10:07:34	1.7
+++ src/casetab.h	2005/11/22 14:00:26
＠＠ -1,6 +1,6 ＠＠
 /* XEmacs routines to deal with case tables.
    Copyright (C) 2000 Yoshiki Hayashi.
-   Copyright (C) 2002 Ben Wing.
+   Copyright (C) 2002, 2005 Ben Wing.
 This file is part of XEmacs.

 XEmacs is free software; you can redistribute it and/or modify it
＠＠ -23,6 +23,10 ＠＠
 #ifndef INCLUDED_casetab_h_
 #define INCLUDED_casetab_h_

+/* These are needed for LOWERCASEP, NOCASEP, UPCASE, DOWNCASE below */
+#include "buffer.h"
+#include "chartab.h"
+
 struct Lisp_Case_Table
 {
   struct LCRECORD_HEADER header;
＠＠ -84,5 +88,97 ＠＠
   SET_CASE_TABLE_EQV (XCASE_TABLE (ct),  p)

 extern Lisp_Object Vstandard_case_table;
+
+
+/************************************************************************/
+/*                         Case conversion                              */
+/************************************************************************/
+
+/* A "trt" table is a mapping from characters to other characters,
+   typically used to convert between uppercase and lowercase.
+   */
+
+/* The _1 macros are named as such because they assume that you have
+   already guaranteed that the character values are all in the range
+   0 - 255.  Bad lossage will happen otherwise. */
+
+#define MAKE_TRT_TABLE() Fmake_char_table (Qgeneric)
+DECLARE_INLINE_HEADER (
+Ichar
+TRT_TABLE_OF (Lisp_Object table, Ichar ch)
+)
+{
+  Lisp_Object TRT_char;
+  TRT_char = get_char_table (ch, table);
+  if (NILP (TRT_char))
+    return ch;
+  else
+    return XCHAR (TRT_char);
+}
+#define SET_TRT_TABLE_OF(table, ch1, ch2)	\
+  Fput_char_table (make_char (ch1), make_char (ch2), table)
+
+DECLARE_INLINE_HEADER (
+Lisp_Object
+BUFFER_CASE_TABLE (struct buffer *buf)
+)
+{
+  return buf ? buf->case_table : Vstandard_case_table;
+}
+
+/* Macros used below. */
+#define DOWNCASE_TABLE_OF(buf, c)	\
+  TRT_TABLE_OF (XCASE_TABLE_DOWNCASE (BUFFER_CASE_TABLE (buf)), c)
+#define UPCASE_TABLE_OF(buf, c)		\
+  TRT_TABLE_OF (XCASE_TABLE_UPCASE (BUFFER_CASE_TABLE (buf)), c)
+
+/* 1 if CH is upper case.  */
+
+DECLARE_INLINE_HEADER (
+int
+UPPERCASEP (struct buffer *buf, Ichar ch)
+)
+{
+  return DOWNCASE_TABLE_OF (buf, ch) != ch;
+}
+
+/* 1 if CH is lower case.  */
+
+DECLARE_INLINE_HEADER (
+int
+LOWERCASEP (struct buffer *buf, Ichar ch)
+)
+{
+  return (UPCASE_TABLE_OF   (buf, ch) != ch &&
+	  DOWNCASE_TABLE_OF (buf, ch) == ch);
+}
+
+/* 1 if CH is neither upper nor lower case.  */
+
+DECLARE_INLINE_HEADER (
+int
+NOCASEP (struct buffer *buf, Ichar ch)
+)
+{
+  return UPCASE_TABLE_OF (buf, ch) == ch;
+}
+
+/* Upcase a character, or make no change if that cannot be done.  */
+
+DECLARE_INLINE_HEADER (
+Ichar
+UPCASE (struct buffer *buf, Ichar ch)
+)
+{
+  return (DOWNCASE_TABLE_OF (buf, ch) == ch) ? UPCASE_TABLE_OF (buf, ch) : ch;
+}
+
+/* Upcase a character known to be not upper case.  Unused. */
+
+#define UPCASE1(buf, ch) UPCASE_TABLE_OF (buf, ch)
+
+/* Downcase a character, or make no change if that cannot be done. */
+
+#define DOWNCASE(buf, ch) DOWNCASE_TABLE_OF (buf, ch)

 #endif /* INCLUDED_casetab_h_ */
Index: src/charset.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/charset.h,v
retrieving revision 1.12
diff -u -r1.12 charset.h
--- src/charset.h	2005/10/24 10:07:34	1.12
+++ src/charset.h	2005/11/22 14:00:26
＠＠ -1,7 +1,7 ＠＠
 /* Header for charsets.
    Copyright (C) 1992, 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -27,6 +27,50 ＠＠
 #ifndef INCLUDED_charset_h_
 #define INCLUDED_charset_h_

+/* If defined, we always use the maximum depth for the translation tables.
+   This will increase their size to a certain extent but speed up lookup,
+   as it eliminates all branches. */
+#define MAXIMIZE_UNICODE_TABLE_DEPTH
+
+/* These are placeholders.  In case we decide to be more clever and allow
+   for different sizes of the to-Unicode tables depending on the size of
+   the associated charset, we will have to change the places where these
+   constants are referenced.  Doing this requires having multiple versions
+   of the blank to-Unicode tables; probably not worth it since the
+   to-Unicode tables are not space hogs compared to the from-Unicode tables,
+   in any case. */
+
+#define CHARSET_MIN_OFFSET 0
+#define CHARSET_MAX_SIZE 256
+
+void get_charset_limits (Lisp_Object charset, int *low0, int *high0,
+			 int *low1, int *high1);
+int get_charset_iso2022_type (Lisp_Object charset);
+void non_ascii_unicode_to_charset_codepoint (int code,
+					     Lisp_Object_dynarr *charsets,
+					     Lisp_Object *charset, int *c1,
+					     int *c2);
+Ichar old_mule_non_ascii_charset_codepoint_to_ichar_raw (Lisp_Object charset,
+							 int c1, int c2);
+void old_mule_non_ascii_itext_to_charset_codepoint_raw (const Ibyte *ptr,
+							Lisp_Object *charset,
+							int *c1,
+							int *c2);
+Bytecount old_mule_non_ascii_charset_codepoint_to_itext_raw (Lisp_Object charset,
+							     int c1, int c2,
+							     Ibyte *ptr);
+void old_mule_non_ascii_ichar_to_charset_codepoint_raw (Ichar c,
+							Lisp_Object *charset,
+							int *c1, int *c2);
+Ichar charset_codepoint_to_ichar (Lisp_Object charset, int c1, int c2,
+				  enum converr fail);
+Bytecount non_ascii_charset_codepoint_to_itext (Lisp_Object charset, int c1,
+						int c2, Ibyte *ptr,
+						enum converr fail);
+int charset_codepoint_to_private_unicode (Lisp_Object charset, int c1, int c2);
+void private_unicode_to_charset_codepoint (int priv, Lisp_Object *charset,
+					   int *c1, int *c2);
+extern Lisp_Object Vcharset_hash_table;

 #ifndef MULE
＠＠ -39,143 +83,12 ＠＠
    be done */

 #define Vcharset_ascii Qnil
+#define ichar_charset_obsolete_me_baby_please(c) Qnil

-#define ichar_charset(ch) Vcharset_ascii
-#define ichar_leading_byte(ch) LEADING_BYTE_ASCII
-#define ichar_len(ch) 1
-#define ichar_len_fmt(ch, fmt) 1
-#define LEADING_BYTE_ASCII 0x80
-#define NUM_LEADING_BYTES 1
-#define MIN_LEADING_BYTE 0x80
-#define CHARSETP(cs) 1
-#define charset_by_leading_byte(lb) Vcharset_ascii
-#define XCHARSET_LEADING_BYTE(cs) LEADING_BYTE_ASCII
-#define XCHARSET_GRAPHIC(cs) -1
-#define XCHARSET_COLUMNS(cs) 1
-#define XCHARSET_DIMENSION(cs) 1
-#define BREAKUP_ICHAR(ch, charset, byte1, byte2) do {	\
-  (charset) = Vcharset_ascii;				\
-  (byte1) = (ch);					\
-  (byte2) = 0;						\
-} while (0)
-
 #else /* MULE */

 /************************************************************************/
-/*                    Definition of leading bytes                       */
-/************************************************************************/
-
-#define MIN_LEADING_BYTE		0x7F
-
-/** The following are for 1-byte characters in an official charset. **/
-enum LEADING_BYTE_OFFICIAL_1
-{
-  MIN_LEADING_BYTE_OFFICIAL_1 = 0x80,
-  /* LEADING_BYTE_LATIN_ISO8859_1 *MUST* be equal to
-     MIN_LEADING_BYTE_OFFICIAL_1. */
-  LEADING_BYTE_LATIN_ISO8859_1 =  /* 0x80 Right half of ISO 8859-1 */
-    MIN_LEADING_BYTE_OFFICIAL_1,
-  LEADING_BYTE_LATIN_ISO8859_2,   /* 0x81 Right half of ISO 8859-2 */
-  LEADING_BYTE_LATIN_ISO8859_3,   /* 0x82 Right half of ISO 8859-3 */
-  LEADING_BYTE_LATIN_ISO8859_4,   /* 0x83 Right half of ISO 8859-4 */
-  LEADING_BYTE_THAI_TIS620,       /* 0x84 TIS620-2533 */
-  LEADING_BYTE_GREEK_ISO8859_7,   /* 0x85 Right half of ISO 8859-7 */
-  LEADING_BYTE_ARABIC_ISO8859_6,  /* 0x86 Right half of ISO 8859-6 */
-  LEADING_BYTE_HEBREW_ISO8859_8,  /* 0x87 Right half of ISO 8859-8 */
-  LEADING_BYTE_KATAKANA_JISX0201, /* 0x88 Right half of JIS X0201-1976 */
-  LEADING_BYTE_LATIN_JISX0201,    /* 0x89 Left  half of JIS X0201-1976 */
-  LEADING_BYTE_CYRILLIC_ISO8859_5,/* 0x8A Right half of ISO 8859-5 */
-  LEADING_BYTE_LATIN_ISO8859_9,   /* 0x8B Right half of ISO 8859-9 */
-  LEADING_BYTE_LATIN_ISO8859_15,  /* 0x8C Right half of ISO 8859-15 */
-#ifdef ENABLE_COMPOSITE_CHARS
-  LEADING_BYTE_COMPOSITE,         /* 0x8D For a composite character */
-  MAX_LEADING_BYTE_OFFICIAL_1 =
-    LEADING_BYTE_COMPOSITE - 1,
-#else
-    /* Does not need to be the last entry, but simplifies things */
-  LEADING_BYTE_COMPOSITE_REPLACEMENT, /* 0x8D Replaces ESC 0 - ESC 4 in a
-                                         buffer */
-  MAX_LEADING_BYTE_OFFICIAL_1 =
-    LEADING_BYTE_COMPOSITE_REPLACEMENT,
-#endif
-                                  /* 0x8E Unused */
-};
-
-/* These next 2 + LEADING_BYTE_COMPOSITE need special treatment in a string
-   and/or character */
-
-#define LEADING_BYTE_ASCII		0x7F /* Not used except in arrays
-						indexed by leading byte */
-#define LEADING_BYTE_CONTROL_1          0x8F /* represent normal 80-9F */
-
-/** The following are for 2-byte characters in an official charset. **/
-enum LEADING_BYTE_OFFICIAL_2
-{
-  MIN_LEADING_BYTE_OFFICIAL_2 = 0x90,
-  LEADING_BYTE_JAPANESE_JISX0208_1978 =
-    MIN_LEADING_BYTE_OFFICIAL_2,       /* 0x90 Japanese JIS X0208-1978 */
-  LEADING_BYTE_CHINESE_GB2312,         /* 0x91 Chinese Hanzi GB2312-1980 */
-  LEADING_BYTE_JAPANESE_JISX0208,      /* 0x92 Japanese JIS X0208-1983 */
-  LEADING_BYTE_KOREAN_KSC5601,         /* 0x93 Hangul KS C5601-1987 */
-  LEADING_BYTE_JAPANESE_JISX0212,      /* 0x94 Japanese JIS X0212-1990 */
-  LEADING_BYTE_CHINESE_CNS11643_1,     /* 0x95 Chinese CNS11643 Set 1 */
-  LEADING_BYTE_CHINESE_CNS11643_2,     /* 0x96 Chinese CNS11643 Set 2 */
-  LEADING_BYTE_CHINESE_BIG5_1,         /* 0x97 Big5 Level 1 */
-  LEADING_BYTE_CHINESE_BIG5_2,         /* 0x98 Big5 Level 2 */
-  MAX_LEADING_BYTE_OFFICIAL_2 =
-    LEADING_BYTE_CHINESE_BIG5_2,
-
-                                       /* 0x99 unused */
-                                       /* 0x9A unused */
-                                       /* 0x9B unused */
-                                       /* 0x9C unused */
-                                       /* 0x9D unused */
-};
-
-
-/** The following are for 1- and 2-byte characters in a private charset. **/
-
-#define PRE_LEADING_BYTE_PRIVATE_1	0x9E	/* 1-byte char-set */
-#define PRE_LEADING_BYTE_PRIVATE_2	0x9F	/* 2-byte char-set */
-
-#define MIN_LEADING_BYTE_PRIVATE_1	0xA0
-#define MAX_LEADING_BYTE_PRIVATE_1	0xEF
-#define MIN_LEADING_BYTE_PRIVATE_2	0xF0
-#define MAX_LEADING_BYTE_PRIVATE_2	0xFF
-
-#define NUM_LEADING_BYTES 129
-
-
-/************************************************************************/
-/*                    Operations on leading bytes                       */
-/************************************************************************/
-
-/* Is this leading byte for a private charset? */
-
-#define leading_byte_private_p(lb) ((lb) >= MIN_LEADING_BYTE_PRIVATE_1)
-
-/* Is this a prefix for a private leading byte? */
-
-DECLARE_INLINE_HEADER (
-int
-leading_byte_prefix_p (Ibyte lb)
-)
-{
-  return (lb == PRE_LEADING_BYTE_PRIVATE_1 ||
-	  lb == PRE_LEADING_BYTE_PRIVATE_2);
-}
-
-/* Given a private leading byte, return the leading byte prefix stored
-   in a string. */
-
-#define private_leading_byte_prefix(lb)			\
-  ((unsigned int) (lb) < MIN_LEADING_BYTE_PRIVATE_2 ?	\
-   PRE_LEADING_BYTE_PRIVATE_1 :				\
-   PRE_LEADING_BYTE_PRIVATE_2)
-
-
-/************************************************************************/
 /*            Information about a particular character set              */
 /************************************************************************/

＠＠ -183,16 +96,19 ＠＠
 {
   struct LCRECORD_HEADER header;

-  int id;
-  Lisp_Object name;
-  Lisp_Object doc_string;
-  Lisp_Object registry;
+  int id; /* ID for this charset.  If old-Mule, this is the actual leading
+	     byte for this charset; otherwise, an arbitrary unique value. */
+  Lisp_Object name; /* Unique symbol that identifies this charset */
+  Lisp_Object doc_string; /* */
+  Lisp_Object registry; /* regexp matching XLFD registry portion */
   Lisp_Object short_name;
   Lisp_Object long_name;

   Lisp_Object reverse_direction_charset;

+#ifdef HAVE_CCL
   Lisp_Object ccl_program;
+#endif /* HAVE_CCL */

   /* Unicode translation tables.  See unicode.c for the format of these
      tables, and discussion of how they are initialized.
＠＠ -200,16 +116,21 ＠＠
   void *to_unicode_table;
   void *from_unicode_table;
   int from_unicode_levels;
+  /* A value (combination of two octets) that is not a legal index in this
+     charset. #### Problematic with a full 256x256 charset.  To get around
+     this partially, we choose a value that is outside the range of nearly
+     all charsets, and unlikely in Unicode.  #### But to do it properly,
+     we need a separate table tracking all entries that map to this value.
+     Most likely we should use a hash table; you're unlikely to have many
+     entries mapping to the same conversion value.  Use a simple NULL
+     pointer for an empty hash table, in the common case where no entries
+     at all have that value. */
+  UINT_16_BIT badval;

   /* Final byte of this character set in ISO2022 designating escape
      sequence */
   Ibyte final;

-  /* Number of bytes (1 - 4) required in the internal representation
-     for characters in this character set.  This is *not* the
-     same as the dimension of the character set). */
-  int rep_bytes;
-
   /* Number of columns a character in this charset takes up, on TTY
      devices.  Not used for X devices. */
   int columns;
＠＠ -217,18 +138,27 ＠＠
   /* Direction of this character set */
   int direction;

-  /* Type of this character set (94, 96, 94x94, 96x96) */
-  int type;
-
   /* Number of bytes used in encoding of this character set (1 or 2) */
   int dimension;

   /* Number of chars in each dimension (usually 94 or 96) */
-  int chars;
+  int chars[2];
+
+  /* Minimum offset of index in each dimension (usually 33 for dimension <= 94,
+     32 for dimension 95 or 96, 0 otherwise). */
+  int offset[2];

   /* Which half of font to be used to display this character set */
   int graphic;

+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+  /* If >= 0, indicates a charset where the conversion between Unicode
+     and its members is "algorithmic" in a simple linear fashion, starting
+     at the value of ALGO_LOW.  Currently used only under old-Mule for
+     handling the crockish Unicode-subset charsets. */
+  int algo_low;
+#endif /* ALLOW_ALGORITHMIC_CONVERSION_TABLES */
+
   /* If set, this is a "temporary" charset created when we encounter
      an unknown final.  This is so that we can successfully compile
      and load such files.  We allow a real charset to be created on top
＠＠ -244,6 +174,9 ＠＠
 #define CHECK_CHARSET(x) CHECK_RECORD (x, charset)
 #define CONCHECK_CHARSET(x) CONCHECK_RECORD (x, charset)

+/* These are special types used in conjunction with ISO-2022, which only
+   allows charsets of these dimensions.  In general, we allow charsets of
+   any dimensions as long as no single dimension exceeeds 256. */
 #define CHARSET_TYPE_94    0	/* This charset includes 94    characters. */
 #define CHARSET_TYPE_96    1	/* This charset includes 96    characters. */
 #define CHARSET_TYPE_94X94 2	/* This charset includes 94x94 characters. */
＠＠ -252,301 +185,491 ＠＠
 #define CHARSET_LEFT_TO_RIGHT	0
 #define CHARSET_RIGHT_TO_LEFT	1

-/* Leading byte and id have been regrouped. -- OG */
-#define CHARSET_ID(cs)		 ((cs)->id)
-#define CHARSET_LEADING_BYTE(cs) ((Ibyte) CHARSET_ID (cs))
-#define CHARSET_NAME(cs)	 ((cs)->name)
-#define CHARSET_SHORT_NAME(cs)	 ((cs)->short_name)
-#define CHARSET_LONG_NAME(cs)	 ((cs)->long_name)
-#define CHARSET_REP_BYTES(cs)	 ((cs)->rep_bytes)
+#define CHARSET_CHARS(cs, dim)	 ((cs)->chars[dim])
 #define CHARSET_COLUMNS(cs)	 ((cs)->columns)
-#define CHARSET_GRAPHIC(cs)	 ((cs)->graphic)
-#define CHARSET_TYPE(cs)	 ((cs)->type)
+#define CHARSET_DIMENSION(cs)	 ((cs)->dimension)
 #define CHARSET_DIRECTION(cs)	 ((cs)->direction)
-#define CHARSET_FINAL(cs)	 ((cs)->final)
 #define CHARSET_DOC_STRING(cs)	 ((cs)->doc_string)
+#define CHARSET_FINAL(cs)	 ((cs)->final)
+#define CHARSET_FROM_UNICODE_LEVELS(cs) ((cs)->from_unicode_levels)
+#define CHARSET_FROM_UNICODE_TABLE(cs) ((cs)->from_unicode_table)
+#define CHARSET_GRAPHIC(cs)	 ((cs)->graphic)
+#define CHARSET_ID(cs)		 ((cs)->id)
+#define CHARSET_LONG_NAME(cs)	 ((cs)->long_name)
+#define CHARSET_NAME(cs)	 ((cs)->name)
+#define CHARSET_OFFSET(cs, dim)	 ((cs)->offset[dim])
 #define CHARSET_REGISTRY(cs)	 ((cs)->registry)
-#define CHARSET_CCL_PROGRAM(cs)  ((cs)->ccl_program)
-#define CHARSET_DIMENSION(cs)	 ((cs)->dimension)
-#define CHARSET_CHARS(cs)	 ((cs)->chars)
 #define CHARSET_REVERSE_DIRECTION_CHARSET(cs) ((cs)->reverse_direction_charset)
+#define CHARSET_SHORT_NAME(cs)	 ((cs)->short_name)
 #define CHARSET_TO_UNICODE_TABLE(cs) ((cs)->to_unicode_table)
-#define CHARSET_FROM_UNICODE_TABLE(cs) ((cs)->from_unicode_table)
-#define CHARSET_FROM_UNICODE_LEVELS(cs) ((cs)->from_unicode_levels)
-

-#define CHARSET_PRIVATE_P(cs) leading_byte_private_p (CHARSET_LEADING_BYTE (cs))
-
-#define XCHARSET_ID(cs)		  CHARSET_ID           (XCHARSET (cs))
-#define XCHARSET_NAME(cs)	  CHARSET_NAME         (XCHARSET (cs))
-#define XCHARSET_SHORT_NAME(cs)	  CHARSET_SHORT_NAME   (XCHARSET (cs))
-#define XCHARSET_LONG_NAME(cs)	  CHARSET_LONG_NAME    (XCHARSET (cs))
-#define XCHARSET_REP_BYTES(cs)	  CHARSET_REP_BYTES    (XCHARSET (cs))
+#define XCHARSET_CHARS(cs, dim)	  CHARSET_CHARS        (XCHARSET (cs), dim)
 #define XCHARSET_COLUMNS(cs)	  CHARSET_COLUMNS      (XCHARSET (cs))
-#define XCHARSET_GRAPHIC(cs)      CHARSET_GRAPHIC      (XCHARSET (cs))
-#define XCHARSET_TYPE(cs)	  CHARSET_TYPE         (XCHARSET (cs))
+#define XCHARSET_DIMENSION(cs)	  CHARSET_DIMENSION    (XCHARSET (cs))
 #define XCHARSET_DIRECTION(cs)	  CHARSET_DIRECTION    (XCHARSET (cs))
-#define XCHARSET_FINAL(cs)	  CHARSET_FINAL        (XCHARSET (cs))
 #define XCHARSET_DOC_STRING(cs)	  CHARSET_DOC_STRING   (XCHARSET (cs))
+#define XCHARSET_FINAL(cs)	  CHARSET_FINAL        (XCHARSET (cs))
+#define XCHARSET_FROM_UNICODE_LEVELS(cs) CHARSET_FROM_UNICODE_LEVELS (XCHARSET (cs))
+#define XCHARSET_FROM_UNICODE_TABLE(cs) CHARSET_FROM_UNICODE_TABLE (XCHARSET (cs))
+#define XCHARSET_GRAPHIC(cs)      CHARSET_GRAPHIC      (XCHARSET (cs))
+#define XCHARSET_ID(cs)		  CHARSET_ID           (XCHARSET (cs))
+#define XCHARSET_LONG_NAME(cs)	  CHARSET_LONG_NAME    (XCHARSET (cs))
+#define XCHARSET_NAME(cs)	  CHARSET_NAME         (XCHARSET (cs))
+#define XCHARSET_OFFSET(cs, dim)  CHARSET_OFFSET       (XCHARSET (cs), dim)
 #define XCHARSET_REGISTRY(cs)	  CHARSET_REGISTRY     (XCHARSET (cs))
-#define XCHARSET_LEADING_BYTE(cs) CHARSET_LEADING_BYTE (XCHARSET (cs))
+#define XCHARSET_REVERSE_DIRECTION_CHARSET(cs)   CHARSET_REVERSE_DIRECTION_CHARSET
(XCHARSET (cs))
+#define XCHARSET_SHORT_NAME(cs)	  CHARSET_SHORT_NAME   (XCHARSET (cs))
+#define XCHARSET_TO_UNICODE_TABLE(cs) CHARSET_TO_UNICODE_TABLE (XCHARSET (cs))
+
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+#define CHARSET_ALGO_LOW(cs)	 ((cs)->algo_low)
+#define XCHARSET_ALGO_LOW(cs)	  CHARSET_ALGO_LOW     (XCHARSET (cs))
+#endif /* ALLOW_ALGORITHMIC_CONVERSION_TABLES */
+
+#ifdef HAVE_CCL
+#define CHARSET_CCL_PROGRAM(cs)  ((cs)->ccl_program)
 #define XCHARSET_CCL_PROGRAM(cs)  CHARSET_CCL_PROGRAM  (XCHARSET (cs))
-#define XCHARSET_DIMENSION(cs)	  CHARSET_DIMENSION    (XCHARSET (cs))
-#define XCHARSET_CHARS(cs)	  CHARSET_CHARS        (XCHARSET (cs))
-#define XCHARSET_PRIVATE_P(cs)	  CHARSET_PRIVATE_P    (XCHARSET (cs))
-#define XCHARSET_REVERSE_DIRECTION_CHARSET(cs) \
-  CHARSET_REVERSE_DIRECTION_CHARSET (XCHARSET (cs))
-#define XCHARSET_TO_UNICODE_TABLE(cs) \
-  CHARSET_TO_UNICODE_TABLE (XCHARSET (cs))
-#define XCHARSET_FROM_UNICODE_TABLE(cs) \
-  CHARSET_FROM_UNICODE_TABLE (XCHARSET (cs))
-#define XCHARSET_FROM_UNICODE_LEVELS(cs) \
-  CHARSET_FROM_UNICODE_LEVELS (XCHARSET (cs))
+#endif /* HAVE_CCL */

 struct charset_lookup
 {
-  /* Table of charsets indexed by leading byte. */
-  Lisp_Object charset_by_leading_byte[NUM_LEADING_BYTES];
-
   /* Table of charsets indexed by type/final-byte/direction. */
   Lisp_Object charset_by_attributes[4][128][2];
-  Ibyte next_allocated_1_byte_leading_byte;
-  Ibyte next_allocated_2_byte_leading_byte;
+
+#ifndef UNICODE_INTERNAL
+  /* Table of charsets indexed by ID, for encodable ID's. */
+  Lisp_Object charset_by_encodable_id[NUM_ENCODABLE_CHARSET_IDS];
+
+  int next_allocated_private_dim1_id;
+  int next_allocated_private_dim2_id;
+#endif /* not UNICODE_INTERNAL */
 };

 extern struct charset_lookup *chlook;

+Lisp_Object charset_by_id (int id);
+
+#ifndef UNICODE_INTERNAL
+
+/* Given an encodable charset ID, return the corresponding charset.
+ */
+
 DECLARE_INLINE_HEADER (
 Lisp_Object
-charset_by_leading_byte (int lb)
+charset_by_encodable_id (int id)
 )
 {
 #ifdef ERROR_CHECK_TEXT
   /* When error-checking is on, x86 GCC 2.95.2 -O3 miscompiles the
      following unless we introduce `tem'. */
-  int tem = lb;
-  text_checking_assert (tem >= MIN_LEADING_BYTE && tem <= 0xFF);
+  int tem = id;
+  text_checking_assert (tem >= MIN_ENCODABLE_CHARSET_ID &&
+			tem <= MAX_ENCODABLE_CHARSET_ID);
 #endif
-  return chlook->charset_by_leading_byte[lb - MIN_LEADING_BYTE];
+  return chlook->charset_by_encodable_id[id - MIN_ENCODABLE_CHARSET_ID];
 }

+#endif /* not UNICODE_INTERNAL */
+
 DECLARE_INLINE_HEADER (
 Lisp_Object
 charset_by_attributes (int type, int final, int dir)
 )
 {
-  type_checking_assert (type  < countof (chlook->charset_by_attributes) &&
+  text_checking_assert (type  >= 0 &&
+			type  < countof (chlook->charset_by_attributes) &&
+			final >= 0 &&
 			final < countof (chlook->charset_by_attributes[0]) &&
+			dir   >= 0 &&
 			dir   < countof (chlook->charset_by_attributes[0][0]));
   return chlook->charset_by_attributes[type][final][dir];
 }

 /************************************************************************/
-/*                        Dealing with characters                       */
+/*                     General character manipulation                   */
 /************************************************************************/

-/* The bit fields of character are divided into 3 parts:
-   FIELD1(5bits):FIELD2(7bits):FIELD3(7bits) */
+/* Return true if the specified codepoint is valid in the specified charset. */

-#define ICHAR_FIELD1_MASK (0x1F << 14)
-#define ICHAR_FIELD2_MASK (0x7F << 7)
-#define ICHAR_FIELD3_MASK 0x7F
+DECLARE_INLINE_HEADER (
+int
+valid_charset_codepoint_p (Lisp_Object charset, int c1, int c2)
+)
+{
+  int l1, h1, l2, h2;
+  get_charset_limits (charset, &l1, &h1, &l2, &h2);
+  return c1 >= l1 && c1 <= h1 && c2 >= l2 && c2 <=
h2;
+}

-/* Macros to access each field of a character code of C.  */
+/* Convert a charset codepoint (CHARSET, one or two octets) to Unicode.
+   Return -1 if can't convert. */

-#define ichar_field1(c) (((c) & ICHAR_FIELD1_MASK) >> 14)
-#define ichar_field2(c) (((c) & ICHAR_FIELD2_MASK) >> 7)
-#define ichar_field3(c)  ((c) & ICHAR_FIELD3_MASK)
+DECLARE_INLINE_HEADER (
+int
+charset_codepoint_to_unicode_raw_1 (Lisp_Object charset, int c1, int c2
+				    INLINE_TEXT_CHECK_ARGS)
+)
+{
+  inline_text_checking_assert (valid_charset_codepoint_p (charset, c1, c2));
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+  {
+    /* Conceivably a good idea not to have this in Unicode-internal, since
+       it slows down this function slightly, and it may be called a huge
+       number of times when reading in a file.  Probably doesn't matter,
+       though. */
+    int algo_low = XCHARSET_ALGO_LOW (charset);
+    if (algo_low >= 0)
+      {
+	return (c1 - XCHARSET_OFFSET (charset, 0)) *
+	  XCHARSET_CHARS (charset, 1) + (c2 - XCHARSET_OFFSET (charset, 1)) +
+	  algo_low;
+      }
+  }
+#endif /* ALLOW_ALGORITHMIC_CONVERSION_TABLES */
+#ifndef MAXIMIZE_UNICODE_TABLE_DEPTH
+  if (XCHARSET_DIMENSION (charset) == 1)
+    return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))
+      [c2- CHARSET_MIN_OFFSET];
+  else
+#endif /* not MAXIMIZE_UNICODE_TABLE_DEPTH */
+    return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))
+      [c1 - CHARSET_MIN_OFFSET][c2 - CHARSET_MIN_OFFSET];
+}

-/* Field 1, if non-zero, usually holds a leading byte for a
-   dimension-2 charset.  Field 2, if non-zero, usually holds a leading
-   byte for a dimension-1 charset. */
+#define charset_codepoint_to_unicode_raw(charset, c1, c2) \
+  charset_codepoint_to_unicode_raw_1 (charset, c1, c2 INLINE_TEXT_CHECK_CALL)

-/* Converting between field values and leading bytes.  */

-#define FIELD2_TO_OFFICIAL_LEADING_BYTE (MIN_LEADING_BYTE_OFFICIAL_1 - 1)
-#define FIELD2_TO_PRIVATE_LEADING_BYTE  0x80
+/* Convert a charset codepoint to Unicode, with error behavior specifiable.
+   FAIL controls what happens when the charset codepoint cannot be
+   converted to an official Unicode codepoint.
+
+   This is inline because these functions are often called with a constant
+   value for FAIL, and a good inlining optimizing compiler will strip away
+   all the garbage so that a call to charset_codepoint_unicode with
+   CONVERR_FAIL reduces directly to a call to
+   charset_codepoint_to_unicode_raw_1(). */

-#define FIELD1_TO_OFFICIAL_LEADING_BYTE (MIN_LEADING_BYTE_OFFICIAL_2 - 1)
-#define FIELD1_TO_PRIVATE_LEADING_BYTE  0xE1
+DECLARE_INLINE_HEADER (
+int
+charset_codepoint_to_unicode (Lisp_Object charset, int c1, int c2,
+			      enum converr fail)
+)
+{
+  int code;

-/* Minimum and maximum allowed values for the fields. */
+  code = charset_codepoint_to_unicode_raw (charset, c1, c2);
+  if (code < 0)
+    {
+      switch (fail)
+	{
+	case CONVERR_FAIL:
+	  break;
+
+	case CONVERR_SUBSTITUTE:
+	  code = UNICODE_REPLACEMENT_CHAR;
+	  break;
+
+	case CONVERR_SUCCEED:
+	case CONVERR_USE_PRIVATE:
+	  code = charset_codepoint_to_private_unicode (charset, c1, c2);
+	  break;
+
+	case CONVERR_ABORT: /* ＠＠#### implement me */
+	default:
+	  ABORT ();
+	}
+    }

-#define MIN_ICHAR_FIELD2_OFFICIAL \
-  (MIN_LEADING_BYTE_OFFICIAL_1 - FIELD2_TO_OFFICIAL_LEADING_BYTE)
-#define MAX_ICHAR_FIELD2_OFFICIAL \
-  (MAX_LEADING_BYTE_OFFICIAL_1 - FIELD2_TO_OFFICIAL_LEADING_BYTE)
+  return code;
+}

-#define MIN_ICHAR_FIELD1_OFFICIAL \
-  (MIN_LEADING_BYTE_OFFICIAL_2 - FIELD1_TO_OFFICIAL_LEADING_BYTE)
-#define MAX_ICHAR_FIELD1_OFFICIAL \
-  (MAX_LEADING_BYTE_OFFICIAL_2 - FIELD1_TO_OFFICIAL_LEADING_BYTE)

-#define MIN_ICHAR_FIELD2_PRIVATE \
-  (MIN_LEADING_BYTE_PRIVATE_1 - FIELD2_TO_PRIVATE_LEADING_BYTE)
-#define MAX_ICHAR_FIELD2_PRIVATE \
-  (MAX_LEADING_BYTE_PRIVATE_1 - FIELD2_TO_PRIVATE_LEADING_BYTE)
+/* Convert Unicode codepoint to charset codepoint.  CHARSET will be nil if
+   can't convert.  Requires a precedence list of charsets, to determine
+   the order that charsets are checked for conversion codepoints. */

-#define MIN_ICHAR_FIELD1_PRIVATE \
-  (MIN_LEADING_BYTE_PRIVATE_2 - FIELD1_TO_PRIVATE_LEADING_BYTE)
-#define MAX_ICHAR_FIELD1_PRIVATE \
-  (MAX_LEADING_BYTE_PRIVATE_2 - FIELD1_TO_PRIVATE_LEADING_BYTE)
+DECLARE_INLINE_HEADER (
+void
+unicode_to_charset_codepoint (int code, Lisp_Object_dynarr *charsets,
+			      Lisp_Object *charset, int *c1, int *c2)
+)
+{
+  text_checking_assert (valid_unicode_codepoint_p (code));
+  /* #### This optimization is not necessarily correct.  We'd like to be
+     able to have a jis-roman charset and to convert to that if called for.
+     What we really should do is have a special type for the charset
+     precedence list rather than just a dynarr; and in this type, mark
+     whether ASCII is higher-precedence than any other charsets with values
+     in the range 0 - 7F. */
+  if (code <= 0x7F)
+    {
+      *charset = Vcharset_ascii;
+      *c1 = 0;
+      *c2 = code;
+      return;
+    }

-/* Minimum character code of each <type> character.  */
+  non_ascii_unicode_to_charset_codepoint (code, charsets, charset, c1, c2);
+}

-#define MIN_CHAR_OFFICIAL_TYPE9N    (MIN_ICHAR_FIELD2_OFFICIAL <<  7)
-#define MIN_CHAR_PRIVATE_TYPE9N     (MIN_ICHAR_FIELD2_PRIVATE  <<  7)
-#define MIN_CHAR_OFFICIAL_TYPE9NX9N (MIN_ICHAR_FIELD1_OFFICIAL << 14)
-#define MIN_CHAR_PRIVATE_TYPE9NX9N  (MIN_ICHAR_FIELD1_PRIVATE  << 14)
-#define MIN_CHAR_COMPOSITION        (0x1F << 14)
+/* Return a character whose charset is CHARSET and position-codes are C1
+   and C2.  C1 and C2 must be within the range of the charset. (For
+   charsets of dimension 1, C2 must be 0.)

-/* Leading byte of a character.
+   The allowed range of a charset is derived from way the charset is usually
+   coded in a simple MBCS representation.

-   NOTE: This takes advantage of the fact that
-   FIELD2_TO_OFFICIAL_LEADING_BYTE and
-   FIELD2_TO_PRIVATE_LEADING_BYTE are the same.
+   For ISO-2022 charsets, the range of each position code is either 32/33
+   to 127/126 or 160/161 to 255/254, with the choice of low or high range
+   depending on the way the charset is usually coded in a simple MBCS
+   representation (this choice is almost always derivable from the ISO-2022
+   `graphic' property of the charset).  The choice of larger or smaller
+   range (bounds to the left and right of the slash, respectively) depends
+   on the size of the charset (94 or 94x94 vs. 96 or 96x96).
+
+   See `make-char'.
+
+   Return (Ichar) -1 in Unicode-internal if no (official) Unicode
+   equivalent for this charset codepoint.  See also
+   charset_codepoint_to_ichar(), which allows other possibilties for
+   handling such cases.
    */

 DECLARE_INLINE_HEADER (
-Ibyte
-ichar_leading_byte (Ichar c)
+Ichar
+charset_codepoint_to_ichar_raw (Lisp_Object charset, int c1, int c2)
 )
 {
-  if (ichar_ascii_p (c))
-    return LEADING_BYTE_ASCII;
-  else if (c < 0xA0)
-    return LEADING_BYTE_CONTROL_1;
-  else if (c < MIN_CHAR_OFFICIAL_TYPE9NX9N)
-    return ichar_field2 (c) + FIELD2_TO_OFFICIAL_LEADING_BYTE;
-  else if (c < MIN_CHAR_PRIVATE_TYPE9NX9N)
-    return ichar_field1 (c) + FIELD1_TO_OFFICIAL_LEADING_BYTE;
-  else if (c < MIN_CHAR_COMPOSITION)
-    return ichar_field1 (c) + FIELD1_TO_PRIVATE_LEADING_BYTE;
+#ifdef UNICODE_INTERNAL
+  return (Ichar) charset_codepoint_to_unicode_raw (charset, c1, c2);
+#else
+  Ichar retval;
+  text_checking_assert (valid_charset_codepoint_p (charset, c1, c2));
+  if (EQ (charset, Vcharset_ascii))
+    retval = c2;
   else
-    {
-#ifdef ENABLE_COMPOSITE_CHARS
-      return LEADING_BYTE_COMPOSITE;
+    retval = old_mule_non_ascii_charset_codepoint_to_ichar_raw (charset, c1,
+								c2);
+  text_checking_assert (valid_ichar_p (retval));
+  return retval;
+#endif /* (not) UNICODE_INTERNAL */
+}
+
+/* Convert a charset codepoint into a character, as for
+   charset_codepoint_to_ichar_raw().  FAIL controls what happens when the
+   charset codepoint cannot be converted to Unicode. */
+DECLARE_INLINE_HEADER (
+Ichar
+charset_codepoint_to_ichar (Lisp_Object charset, int c1, int c2,
+			    enum converr fail)
+)
+{
+#ifdef UNICODE_INTERNAL
+  return (Ichar) charset_codepoint_to_unicode (charset, c1, c2, fail);
 #else
-      ABORT();
-      return 0;
-#endif /* ENABLE_COMPOSITE_CHARS */
-    }
+  return charset_codepoint_to_ichar_raw (charset, c1, c2);
+#endif /* (not) UNICODE_INTERNAL */
 }

+/* Given an Ichar and charset precedence dynarr, convert it to a charset
+   codepoint.  CHARSET will be nil if no conversion possible. */
+ 
 DECLARE_INLINE_HEADER (
-Bytecount
-ichar_len (Ichar c)
+void
+ichar_to_charset_codepoint (Ichar ch, Lisp_Object_dynarr *
+			    USED_IF_UNICODE_INTERNAL (charsets),
+			    Lisp_Object *charset, int *c1, int *c2)
 )
 {
-  if (ichar_ascii_p (c))
-    return 1;
-  else if (c < MIN_CHAR_OFFICIAL_TYPE9NX9N)
-    return 2;
-  else if (c < MIN_CHAR_PRIVATE_TYPE9NX9N)
-    return 3; /* dimension-2 official or dimension-1 private */
-  else if (c < MIN_CHAR_COMPOSITION)
-    return 4;
-  else
-    {
-#ifdef ENABLE_COMPOSITE_CHARS
-#error Not yet implemented
+  text_checking_assert (valid_ichar_p (ch));
+#ifdef UNICODE_INTERNAL
+  unicode_to_charset_codepoint ((int) ch, charsets, charset, c1, c2);
 #else
-      ABORT();
-      return 0;
-#endif /* ENABLE_COMPOSITE_CHARS */
+  if (ch <= 0x7F)
+    {
+      *charset = Vcharset_ascii;
+      *c1 = 0;
+      *c2 = (int) ch;
+      return;
     }
+  old_mule_non_ascii_ichar_to_charset_codepoint_raw (ch, charset, c1, c2);
+#endif /* (not) UNICODE_INTERNAL */
 }

+/* Convert a charset codepoint into a character in the internal string
+   representation.  Return number of bytes written out.  FAIL controls
+   failure mode when charset conversion to Unicode is not possible. */
 DECLARE_INLINE_HEADER (
 Bytecount
-ichar_len_fmt (Ichar c, Internal_Format fmt)
+charset_codepoint_to_itext (Lisp_Object charset, int c1, int c2, Ibyte *ptr,
+			    enum converr fail)
 )
 {
-  switch (fmt)
+  if (EQ (charset, Vcharset_ascii))
     {
-    case FORMAT_DEFAULT:
-      return ichar_len (c);
-    case FORMAT_16_BIT_FIXED:
-      return 2;
-    case FORMAT_32_BIT_FIXED:
-      return 4;
-    default:
-      text_checking_assert (fmt == FORMAT_8_BIT_FIXED);
+      text_checking_assert (c2 >= 0 && c2 < 0x80);
+      ptr[0] = (Ibyte) c2;
       return 1;
     }
+  return non_ascii_charset_codepoint_to_itext (charset, c1, c2, ptr, fail);
 }
-
-#define ichar_charset(c) charset_by_leading_byte (ichar_leading_byte (c))
-
-/* Return a character whose charset is CHARSET and position-codes are C1
-   and C2.  TYPE9N character ignores C2. (For typical charsets, i.e. not
-   ASCII, Control-1 or Composite, C1 and C2 will be in the range of 32 to
-   127 or 33 to 126.  See `make-char'.)
-
-   NOTE: This takes advantage of the fact that
-   FIELD2_TO_OFFICIAL_LEADING_BYTE and
-   FIELD2_TO_PRIVATE_LEADING_BYTE are the same.
-   */

+/* Convert a character in the internal string representation (guaranteed
+   not to be ASCII) into a charset codepoint.  CHARSET will be nil if no
+   conversion possible. */
 DECLARE_INLINE_HEADER (
-Ichar
-make_ichar (Lisp_Object charset, int c1, int c2)
+void
+non_ascii_itext_to_charset_codepoint_raw (const Ibyte *ptr,
+					  Lisp_Object_dynarr *
+					  USED_IF_UNICODE_INTERNAL (charsets),
+					  Lisp_Object *charset, int *c1,
+					  int *c2)
 )
 {
-  Ichar retval;
-  if (EQ (charset, Vcharset_ascii))
-    retval = c1;
-  else if (EQ (charset, Vcharset_control_1))
-    retval = c1 | 0x80;
-#ifdef ENABLE_COMPOSITE_CHARS
-  else if (EQ (charset, Vcharset_composite))
-    retval = (0x1F << 14) | ((c1) << 7) | (c2);
-#endif
-  else if (XCHARSET_DIMENSION (charset) == 1)
-    retval = ((XCHARSET_LEADING_BYTE (charset) -
-	       FIELD2_TO_OFFICIAL_LEADING_BYTE) << 7) | (c1);
-  else if (!XCHARSET_PRIVATE_P (charset))
-    retval = ((XCHARSET_LEADING_BYTE (charset) -
-	       FIELD1_TO_OFFICIAL_LEADING_BYTE) << 14) | ((c1) << 7) | (c2);
-  else
-    retval = ((XCHARSET_LEADING_BYTE (charset) -
-	       FIELD1_TO_PRIVATE_LEADING_BYTE) << 14) | ((c1) << 7) | (c2);
-  text_checking_assert (valid_ichar_p (retval));
-  return retval;
+#ifdef UNICODE_INTERNAL
+  non_ascii_unicode_to_charset_codepoint
+    ((int) non_ascii_itext_ichar (ptr), charsets, charset, c1, c2);
+#else
+  old_mule_non_ascii_itext_to_charset_codepoint_raw (ptr, charset, c1, c2);
+#endif /* (not) UNICODE_INTERNAL */
 }
-
-/* BREAKUP_ICHAR_1_UNSAFE assumes that the charset has already been
-   calculated, and just computes c1 and c2.

-   BREAKUP_ICHAR also computes and stores the charset. */
+/* Convert a character in the internal string representation (guaranteed
+   not to be ASCII) into a charset codepoint.  CHARSET will be nil if no
+   conversion possible. */
+DECLARE_INLINE_HEADER (
+void
+itext_to_charset_codepoint_raw (const Ibyte *ptr, Lisp_Object_dynarr *charsets,
+				Lisp_Object *charset, int *c1, int *c2)
+)
+{
+  /* #### Not necessarily correct; see unicode_to_charset_codepoint(). */
+  if (byte_ascii_p (*ptr))
+    {
+      *charset = Vcharset_ascii;
+      *c1 = 0;
+      *c2 = *ptr;
+      return;
+    }

-#define BREAKUP_ICHAR_1_UNSAFE(c, charset, c1, c2)	\
-  XCHARSET_DIMENSION (charset) == 1			\
-  ? ((c1) = ichar_field3 (c), (c2) = 0)		\
-: ((c1) = ichar_field2 (c),				\
-     (c2) = ichar_field3 (c))
+  non_ascii_itext_to_charset_codepoint_raw (ptr, charsets, charset, c1, c2);
+}

 DECLARE_INLINE_HEADER (
 void
-breakup_ichar_1 (Ichar c, Lisp_Object *charset, int *c1, int *c2)
+itext_to_charset_codepoint (const Ibyte *ptr, Lisp_Object_dynarr *charsets,
+			    Lisp_Object *charset, int *c1, int *c2,
+			    enum converr fail)
 )
 {
-  text_checking_assert (valid_ichar_p (c));
-  *charset = ichar_charset (c);
-  BREAKUP_ICHAR_1_UNSAFE (c, *charset, *c1, *c2);
+  itext_to_charset_codepoint_raw (ptr, charsets, charset, c1, c2);
+  switch (fail)
+    {
+    case CONVERR_FAIL: return;
+    case CONVERR_ABORT: ABORT(); /* ＠＠#### implement me */
+    default:
+      *charset = Vcharset_ascii;
+      *c1 = 0;
+      *c2 = CANT_CONVERT_CHAR_WHEN_ENCODING;
+      return;
+    }
 }
+
+/* Convert a charset codepoint (guaranteed not to be ASCII) into a
+   character in the internal string representation and write to dynarr DST.
+   Returns number of bytes added to the Dynarr.  FAIL controls failure
+   mode when charset conversion to Unicode is not possible. */
+DECLARE_INLINE_HEADER (
+Bytecount
+non_ascii_charset_codepoint_to_dynarr (Lisp_Object charset, int c1, int c2,
+				       unsigned_char_dynarr *dst,
+				       enum converr fail)
+)
+{
+  /* Potentially, we could rewrite the routines that write out to an Ibyte*
+     to work directly with Dynarrs, but it would be a lot of code
+     duplication and it's not clear it would be any faster. */
+  Ibyte work[MAX_ICHAR_LEN];
+  text_checking_assert (!EQ (charset, Vcharset_ascii));
+  Bytecount len = non_ascii_charset_codepoint_to_itext (charset, c1, c2, work,
+							fail);
+  if (len)
+    Dynarr_add_many (dst, work, len);
+  return len;
+}
+
+/* Convert a charset codepoint into a character in the internal string
+   representation and write to dynarr DST.  Returns length of chars added
+   to the Dynarr.  FAIL controls failure mode when charset conversion to
+   Unicode is not possible. */
+DECLARE_INLINE_HEADER (
+int
+charset_codepoint_to_dynarr (Lisp_Object charset, int c1, int c2,
+			     unsigned_char_dynarr *dst,
+			     enum converr fail)
+)
+{
+  if (EQ (charset, Vcharset_ascii))
+    {
+      text_checking_assert (c2 >= 0 && c2 < 0x80);
+      Dynarr_add (dst, (Ibyte) c2);
+      return 1;
+    }

-/* BREAKUP_ICHAR separates an Ichar into its components.  The charset of
-   character C is set to CHARSET, and the position-codes of C are set to C1
-   and C2.  C2 of TYPE9N character is 0.  */
+  return non_ascii_charset_codepoint_to_dynarr (charset, c1, c2, dst, fail);
+}

-#define BREAKUP_ICHAR(c, charset, c1, c2) \
-  breakup_ichar_1 (c, &(charset), &(c1), &(c2))
+DECLARE_INLINE_HEADER (
+Ichar
+old_mule_handle_bad_ichar (enum converr fail)
+)
+{
+  switch (fail)
+    {
+    case CONVERR_FAIL: return -1;
+    case CONVERR_ABORT: ABORT (); /* ＠＠#### implement me */
+    default:
+      return CANT_CONVERT_CHAR_WHEN_DECODING;
+    }
+}

-void get_charset_limits (Lisp_Object charset, int *low, int *high);
-int ichar_to_unicode (Ichar chr);
+#ifdef UNICODE_INTERNAL
+/* ＠＠####
+   Get rid of this crap now!!!!!!!!!!!!!!
+
+   This will simply not fly in a Unicode world, where there may not be any
+   national charset for a particular character.  Almost everywhere that this
+   is used, it's used for font handling.  We need to replace device methods
+   like find_charset_font() and font_spec_matches_charset() with similar
+   methods that operate on a character, not a charset.  We might still need
+   to do some charset lookup if we want to implement the idea that we use
+   the appropriate Chinese, Japanese or Korean specific font depending
+   on the language that a particular character is tagged as (as determined
+   by the string extent surrounding the character in a buffer, or a
+   buffer-local value indicating the language) -- but we absolutely do not
+   want to be *dependent* on finding some national charset. (And in any
+   case it probably makes more sense to do such conditionalizing on the
+   Unicode range of the character, and just check whether a font
+   contains the appropriate character -- or maybe not even conditionalize
+   at all on any character-specific property.) */
+DECLARE_INLINE_HEADER (
+Lisp_Object
+ichar_charset_obsolete_me_baby_please (Ichar ch)
+)
+{
+  int byte1, byte2;
+  Lisp_Object charset;
+  ichar_to_charset_codepoint (ch, get_unicode_precedence (), &charset,
+			      &byte1, &byte2);
+  return charset;
+}
+#else
+#define old_mule_ichar_charset(c) \
+  charset_by_encodable_id (old_mule_ichar_charset_id (c))
+#define ichar_charset_obsolete_me_baby_please(c) old_mule_ichar_charset (c)
+#endif /* (not) UNICODE_INTERNAL */

 #endif /* MULE */
-
-#endif /* INCLUDED_charset_h_ */
+#endif /* not INCLUDED_charset_h_ */
Index: src/chartab.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/chartab.c,v
retrieving revision 1.38
diff -u -r1.38 chartab.c
--- src/chartab.c	2005/10/25 11:16:21	1.38
+++ src/chartab.c	2005/11/22 14:00:27
＠＠ -1,7 +1,7 ＠＠
 /* XEmacs routines to deal with char tables.
    Copyright (C) 1992, 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1995, 1996, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2003, 2005 Ben Wing.
    Copyright (C) 1995, 1997, 1999 Electrotechnical Laboratory, JAPAN.
    Licensed to the Free Software Foundation.

＠＠ -33,6 +33,10 ＠＠
              loosely based on the original Mule.
    Jareth Hein: fixed a couple of bugs in the implementation, and
    	     added regex support for categories with check_category_at
+   Ben Wing: Drastic rewrite, October 2005, for Unicode-internal support.
+   The old implementation used an ugly system indexed by charset ID,
+   with `char-table-entry' objects.  The implementation of map_char_table()
+   was long and nasty.  The new system uses page tables, as in unicode.c.
  */

 #include <config.h>
＠＠ -93,100 +97,646 ＠＠
    */

 /************************************************************************/
-/*                         Char Table object                            */
+/*                         Char Table tables                            */
 /************************************************************************/

-#ifdef MULE
+/* We use the same code from unicode.c.
+
+   Code duplication is generally a bad thing, but there isn't that much total
+   code and there are a lot of differences.  I originally tried abstracting
+   using preprocessing, but it got real ugly real fast.  This is even more
+   the case now that char tables can use Lisp objects for their subtables. */
+
+static SUBTAB_TYPE chartab_blank[5];
+
+static const struct memory_description char_subtable_description[] = {
+  { XD_LISP_OBJECT_ARRAY, offsetof (Lisp_Char_Subtable, ptr), 256 }, 
+  { XD_END }
+};

 static Lisp_Object
-mark_char_table_entry (Lisp_Object obj)
+mark_char_subtable (Lisp_Object obj)
+{
+  int i;
+
+  for (i = 1; i < 256; i++)
+    mark_object (XCHAR_SUBTABLE (obj)->ptr[i]);
+
+  return XCHAR_SUBTABLE (obj)->ptr[0];
+}
+
+DEFINE_LRECORD_IMPLEMENTATION ("char-subtable", char_subtable,
+			       1, /*dumpable-flag*/
+                               mark_char_subtable, internal_object_printer,
+			       0, 0, 0, char_subtable_description,
+			       Lisp_Char_Subtable);
+
+
+/************************************************************************/
+/*                      Char table implementation                       */
+/************************************************************************/
+
+static void
+init_blank_chartab_tables (void)
 {
-  Lisp_Char_Table_Entry *cte = XCHAR_TABLE_ENTRY (obj);
   int i;
+
+  chartab_blank[1] = ALLOCATE_LEVEL_1_SUB_TABLE ();
+  chartab_blank[2] = ALLOCATE_LEVEL_N_SUB_TABLE ();
+  chartab_blank[3] = ALLOCATE_LEVEL_N_SUB_TABLE ();
+  chartab_blank[4] = ALLOCATE_LEVEL_N_SUB_TABLE ();
+  for (i = 0; i < 256; i++)
+    {
+      LISPOBJ_ARRAY_FROM_SUBTAB (chartab_blank[1])[i] = Qunbound;
+      SUBTAB_ARRAY_FROM_SUBTAB (chartab_blank[2])[i] = chartab_blank[1];
+      SUBTAB_ARRAY_FROM_SUBTAB (chartab_blank[3])[i] = chartab_blank[2];
+      SUBTAB_ARRAY_FROM_SUBTAB (chartab_blank[4])[i] = chartab_blank[3];
+    }
+}
+
+static SUBTAB_TYPE
+copy_chartab_table (SUBTAB_TYPE table, int level)
+{
+  SUBTAB_TYPE newtab;
+  Bytecount size;
+
+  text_checking_assert (level >= 1 && level <= 4);
+  /* WARNING: sizeof (Lisp_Object) maybe != sizeof (SUBTAB_TYPE). */
+  if (level == 1)
+    {
+      size = sizeof (Lisp_Object);
+      newtab = ALLOCATE_LEVEL_1_SUB_TABLE ();
+      memcpy (LISPOBJ_ARRAY_FROM_SUBTAB (newtab),
+	      LISPOBJ_ARRAY_FROM_SUBTAB (table),
+	      256 * size);
+    }
+  else
+    {
+      size = sizeof (SUBTAB_TYPE);
+      newtab = ALLOCATE_LEVEL_N_SUB_TABLE ();
+      memcpy (SUBTAB_ARRAY_FROM_SUBTAB (newtab),
+	      SUBTAB_ARRAY_FROM_SUBTAB (table),
+	      256 * size);
+    }
+
+  if (level >= 2)
+    {
+      int i;
+      SUBTAB_ARRAY_TYPE tab = SUBTAB_ARRAY_FROM_SUBTAB (newtab);
+      for (i = 0; i < 256; i++)
+	{
+	  if (!SUBTAB_EQ (tab[i], chartab_blank[level - 1]))
+	    tab[i] = copy_chartab_table (tab[i], level - 1);
+	}
+    }
+
+  return newtab;
+}
+
+static SUBTAB_TYPE
+create_new_chartab_table (int level)
+{
+  return copy_chartab_table (chartab_blank[level], level);
+}

-  for (i = 0; i < 96; i++)
+static void
+free_chartab_table (SUBTAB_TYPE table, int level)
+{
+  if (level >= 2)
     {
-      mark_object (cte->level2[i]);
+      int i;
+      SUBTAB_ARRAY_TYPE tab = SUBTAB_ARRAY_FROM_SUBTAB (table);
+
+      for (i = 0; i < 256; i++)
+	{
+	  if (!SUBTAB_EQ (tab[i], chartab_blank[level - 1]))
+	    free_chartab_table (tab[i], level - 1);
+	}
     }
-  return Qnil;
+
+  FREE_ONE_SUBTABLE (table);
 }

+#ifdef MEMORY_USAGE_STATS
+
+static Bytecount
+compute_chartab_table_size_1 (SUBTAB_TYPE table, int level,
+			      struct overhead_stats *stats)
+{
+  Bytecount size = 0;
+
+  if (level >= 2)
+    {
+      int i;
+      SUBTAB_ARRAY_TYPE tab = SUBTAB_ARRAY_FROM_SUBTAB (table);
+      for (i = 0; i < 256; i++)
+	{
+	  if (!SUBTAB_EQ (tab[i], chartab_blank[level - 1]))
+	    size += compute_chartab_table_size_1 (tab[i], level - 1, stats);
+	}
+    }
+
+  size += SUBTAB_STORAGE_SIZE (table, level, stats);
+  return size;
+}
+
+Bytecount
+compute_chartab_table_size (Lisp_Object chartab,
+			    struct overhead_stats *stats)
+{
+  return (compute_chartab_table_size_1
+	  (XCHAR_TABLE_TABLE (chartab),
+	   XCHAR_TABLE_LEVELS (chartab),
+	   stats));
+}
+
+#endif
+
+void
+put_char_table_1 (Lisp_Object chartab, Ichar ch, Lisp_Object val)
+{
+  /* #### NOTE NOTE NOTE!
+
+  If it turns out that people are often setting large ranges to a
+  particular value (and particularly so if we have to implement the FSF
+  characteristic of allowing `t' to signify *all* characters), then we
+  should consider either (a) modifying things so that the subtables are
+  actual Lisp objects and at any level there can either be a subtable or
+  some other Lisp object, which signifies the value everywhere at and below
+  that level (then we also don't need blank tables; instead we just use
+  Qunbound); (b) modifying the code that loops over a range to create
+  shared subtables, similar to the current blank tables. (Then, we would
+  need to implement reference-counting over the tables, to know when to
+  free them, and copy-on-write semantics if the reference count is greater
+  than one.  This would also obviate the need for special blank tables.
+  Either we need to keep the reference count with the subtables themselves,
+  which is convenient but potentially a bad idea since it makes the tables
+  slightly larger than a power of 2 and hence difficult for the memory
+  manager to handle efficiently, or we need to use a separate hash table to
+  track the references.  This scheme is harder to implement and may make
+  table updating slower compared to the Lisp-object scheme, but has the
+  advantage that lookup is faster -- we don't need to do a bunch of if-then
+  checks for each lookup.  The Lisp-object scheme also suffers from the
+  same slightly-over-a-power-of-2 problem.)
+
+  shared tables we check for could potentially be specific to the particular
+  char table; we'd keep track of the shared tables in the char-table object,
+  and check to see if they are shared with the generic blank tables.)
+
+  If we don't do this, we should make sure to put in a call to QUIT
+  periodically when setting a range so if someone does something stupid
+  like set a range of (0,2000000000), they can break out.  We also need a
+  big warning about this in the docs to `put-char-table' and such.
+  Maybe we should also allow for two different types of char tables, one
+  that allows for semi-efficient handling of large ranges and one that doesn't
+  (but is faster).  In such a case it might make sense for there to be a
+  get_char_table() method pointer to avoid an if-check every time for the
+  type.  Similarly if we allow the `always-maximize-table-size' option to
+  be given. */
+
+  int levels;
+  int u4, u3, u2, u1;
+#ifndef MAXIMIZE_CHAR_TABLE_DEPTH
+  int code_levels;
+#endif
+
+  text_checking_assert (valid_ichar_p (ch));
+  CHARTAB_BREAKUP_CHAR_CODE ((int) ch, u4, u3, u2, u1, code_levels);
+
+  levels = CHARTAB_LEVELS (XCHAR_TABLE_LEVELS (chartab));
+  text_checking_assert (levels >= 1 && levels <= 4);
+
+#ifndef MAXIMIZE_CHAR_TABLE_DEPTH
+  /* Make sure the chartab's tables have at least as many levels as
+     the code point has: Note that the table is guaranteed to have
+     at least one level, because it was created that way */
+  if (levels < code_levels)
+    {
+      int i;
+
+      for (i = 2; i <= code_levels; i++)
+	{
+	  if (levels < i)
+	    {
+	      SUBTAB_TYPE old_table = XCHAR_TABLE_TABLE (chartab);
+	      SUBTAB_TYPE table = create_new_chartab_table (i);
+	      XCHAR_TABLE_TABLE (chartab) = table;
+	      SUBTAB_ARRAY_FROM_SUBTAB (table)[0] = old_table;
+	    }
+	}
+
+      levels = code_levels;
+      XCHAR_TABLE_LEVELS (chartab) = code_levels;
+    }
+#endif /* not MAXIMIZE_CHAR_TABLE_DEPTH */
+
+  /* Now, make sure there is a non-default table at each level */
+  {
+    int i;
+    SUBTAB_TYPE table = XCHAR_TABLE_TABLE (chartab);
+
+    for (i = levels; i >= 2; i--)
+      {
+	int ind;
+
+	switch (i)
+	  {
+	  case 4: ind = u4; break;
+	  case 3: ind = u3; break;
+	  case 2: ind = u2; break;
+	  default: ABORT (); ind = 0;
+	  }
+
+	if (SUBTAB_EQ (SUBTAB_ARRAY_FROM_SUBTAB (table)[ind],
+		       chartab_blank[i - 1]))
+	  SUBTAB_ARRAY_FROM_SUBTAB (table)[ind] =
+	    create_new_chartab_table (i - 1);
+	table = SUBTAB_ARRAY_FROM_SUBTAB (table)[ind];
+      }
+  }
+
+  /* Finally, set the character */
+	  
+  {
+    register SUBTAB_TYPE table = XCHAR_TABLE_TABLE (chartab);
+    /* We are really helping the compiler here.  CHARTAB_LEVELS() will
+       evaluate to a constant when MAXIMIZE_CHAR_TABLE_DEPTH is true,
+       so any reasonable optimizing compiler should eliminate the
+       switch entirely. */
+    switch (CHARTAB_LEVELS (levels))
+      {
+#if 1 /* The new way */
+	/* fall through */
+      case 4: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u4];
+      case 3: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u3];
+      case 2: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u2];
+      case 1: LISPOBJ_ARRAY_FROM_SUBTAB (table)[u1] = val;
+#else /* The old way */
+      case 1: ((Lisp_Object *) table)[u1] = val; break;
+      case 2: ((Lisp_Object **) table)[u2][u1] = val; break;
+      case 3: ((Lisp_Object ***) table)[u3][u2][u1] = val; break;
+      case 4: ((Lisp_Object ****) table)[u4][u3][u2][u1] = val; break;
+#endif
+      }
+  }
+}
+
+/* Map over all characters in the range [START, END].  TABLE is an array
+   of 256 elements, LEVEL is the depth (1 - 4).  OFFSET is the character
+   offset corresponding to this table.  CHARTAB is the char-table object
+   being mapped over.  The FN will be called with CHARTAB, the code of the
+   character in question, its value, and the value of ARG.  Stops mapping
+   the first time that FN returns non-zero, and returns that value.
+   Returns zero if mapping got all the way to the end. */
+
 static int
-char_table_entry_equal (Lisp_Object obj1, Lisp_Object obj2, int depth)
+map_chartab_table (SUBTAB_TYPE table, int level, int offset, int start,
+		   int end, Lisp_Object chartab,
+		   int (*fn) (Lisp_Object chartab, Ichar code, Lisp_Object val,
+			      void *arg),
+		   void *arg)
 {
-  Lisp_Char_Table_Entry *cte1 = XCHAR_TABLE_ENTRY (obj1);
-  Lisp_Char_Table_Entry *cte2 = XCHAR_TABLE_ENTRY (obj2);
   int i;
+  int startind = max (0, (start - offset) >> ((level - 1) * 8));
+  int endind = min (255, (end - offset) >> ((level - 1) * 8));

-  for (i = 0; i < 96; i++)
-    if (!internal_equal (cte1->level2[i], cte2->level2[i], depth + 1))
-      return 0;
+  structure_checking_assert (startind <= 255);
+  structure_checking_assert (endind >= 0);
+  structure_checking_assert (startind <= endind);

+  switch (level)
+    {
+    case 1:
+      {
+	Lisp_Object *tab = LISPOBJ_ARRAY_FROM_SUBTAB (table);
+	for (i = startind; i <= endind; i++)
+	  {
+	    if (!UNBOUNDP (tab[i]))
+	      {
+		int retval = (fn) (chartab, offset + i, tab[i], arg);
+		if (retval)
+		  return retval;
+	      }
+	  }
+	break;
+      }
+    case 2:
+    case 3:
+    case 4:
+      {
+	SUBTAB_ARRAY_TYPE tab = SUBTAB_ARRAY_FROM_SUBTAB (table);
+	for (i = startind; i <= endind; i++)
+	  {
+	    if (!SUBTAB_EQ (tab[i], chartab_blank[level - 1]))
+	      {
+		int retval =
+		  map_chartab_table (tab[i], level - 1,
+				     offset + (i << ((level - 1) * 8)),
+				     start, end, chartab, fn, arg);
+		if (retval)
+		  return retval;
+	      }
+	  }
+	break;
+      }
+    default:
+      ABORT ();
+    }
+
+  return 0;
+}
+
+/* Check whether the given table is entirely blank.  TABLE is LEVEL levels
+   deep.  Start checking at START (this will normally be 1, since we don't
+   want to check the 0th level, which indexes the possibly non-blank
+   lower levels.
+   */
+
+static int
+check_if_blank (SUBTAB_TYPE table, int level, int start, int depth)
+{
+  int i;
+
+  switch (level)
+    {
+    case 1:
+      {
+	Lisp_Object *tab = LISPOBJ_ARRAY_FROM_SUBTAB (table);
+	for (i = start; i < 256; i++)
+	  {
+	    if (!UNBOUNDP (tab[i]))
+	      return 0;
+	  }
+	break;
+      }
+    case 2:
+    case 3:
+    case 4:
+      {
+	SUBTAB_ARRAY_TYPE tab = SUBTAB_ARRAY_FROM_SUBTAB (table);
+	for (i = start; i < 256; i++)
+	  {
+	    if (!SUBTAB_EQ (tab[i], chartab_blank[level - 1]) &&
+		!check_if_blank (tab[i], level - 1, 0, depth))
+	      return 0;
+	  }
+	break;
+      }
+    default:
+      ABORT ();
+    }
+
   return 1;
 }

-static Hashcode
-char_table_entry_hash (Lisp_Object obj, int depth)
+static int
+chartab_tables_equal (SUBTAB_TYPE table1, SUBTAB_TYPE table2, int level,
+		      int depth)
+{
+  int i;
+
+  switch (level)
+    {
+    case 1:
+      {
+	Lisp_Object *tab1 = LISPOBJ_ARRAY_FROM_SUBTAB (table1);
+	Lisp_Object *tab2 = LISPOBJ_ARRAY_FROM_SUBTAB (table2);
+	for (i = 0; i < 256; i++)
+	  {
+	    if (!internal_equal (tab1[i], tab2[i], depth + 1))
+	      return 0;
+	  }
+	break;
+      }
+    case 2:
+    case 3:
+    case 4:
+      {
+	SUBTAB_ARRAY_TYPE tab1 = SUBTAB_ARRAY_FROM_SUBTAB (table1);
+	SUBTAB_ARRAY_TYPE tab2 = SUBTAB_ARRAY_FROM_SUBTAB (table2);
+	for (i = 0; i < 256; i++)
+	  {
+	    if (SUBTAB_EQ (tab1[i], chartab_blank[level - 1]) &&
+		SUBTAB_EQ (tab2[i], chartab_blank[level - 1]))
+	      ;
+	    else if (SUBTAB_EQ (tab1[i], chartab_blank[level - 1]))
+	      {
+		if (!check_if_blank (tab2[i], level - 1, 0, depth))
+		  return 0;
+	      }
+	    else if (SUBTAB_EQ (tab2[i], chartab_blank[level - 1]))
+	      {
+		if (!check_if_blank (tab1[i], level - 1, 0, depth))
+		  return 0;
+	      }
+	    else
+	      {
+		if (!chartab_tables_equal (tab1[1], tab2[1], level - 1,
+					   depth))
+		  return 0;
+	      }
+	  }
+	break;
+      }
+    default:
+      ABORT ();
+    }
+
+  return 1;
+}
+
+static int
+char_table_equal (Lisp_Object obj1, Lisp_Object obj2, int depth)
 {
-  Lisp_Char_Table_Entry *cte = XCHAR_TABLE_ENTRY (obj);
+  /* NOTE:
+
+     This code was formerly written so that it checks to see whether all
+     entries are actually equal, whether or not they have the same default,
+     by using the actual value of the char table entry, after the default
+     had been supplied if necessary.  What we do now, which is considerably
+     simpler, is just check the underlying entries, *before* applying the
+     default.  Some things that are `equal' under the other scheme aren't
+     `equal' under this scheme.  I think this makes more sense because
+     objects that are `equal' should be identical in their behavior and
+     have the same print representation; neither of these may be true in the
+     former case.
+
+     To speed things up, we should also keep track of the # of items
+     currently set. */
+
+  SUBTAB_TYPE table;
+
+  if (XCHAR_TABLE_TYPE (obj1) != XCHAR_TABLE_TYPE (obj2))
+    return 0;
+
+  if (!internal_equal (XCHAR_TABLE_DEFAULT (obj1), XCHAR_TABLE_DEFAULT (obj2),
+		       depth + 1) ||
+      !internal_equal (XCHAR_TABLE_PARENT (obj1), XCHAR_TABLE_PARENT (obj2),
+		       depth + 1))
+    return 0;

-  return internal_array_hash (cte->level2, 96, depth + 1);
+  /* Switch if necessary so that obj1 always has >= # of levels of obj2 */
+  if (XCHAR_TABLE_LEVELS (obj2) > XCHAR_TABLE_LEVELS (obj1))
+    {
+      Lisp_Object tmp = obj1;
+      obj1 = obj2;
+      obj2 = tmp;
+    }
+
+  table = XCHAR_TABLE_TABLE (obj1);
+  /* If one table has more levels than the other, make sure the extra
+     levels are all blank.  Successively drill down the tables,
+     checking that all subtables except #0 are completely blank (including
+     recursively checking any sub-subtables of them). */
+  if (XCHAR_TABLE_LEVELS (obj1) > XCHAR_TABLE_LEVELS (obj2))
+    {
+      int i;
+      for (i = XCHAR_TABLE_LEVELS (obj1); i > XCHAR_TABLE_LEVELS (obj2); i--)
+	{
+	  if (!check_if_blank (table, i, 1, depth))
+	    return 0;
+	  table = SUBTAB_ARRAY_FROM_SUBTAB (table)[0];
+	}
+    }
+
+  /* If we got this far, TABLE points to the appropriate (sub)table with the
+     same number of levels as that of OBJ2. */
+  return chartab_tables_equal (table,
+			       XCHAR_TABLE_TABLE (obj2),
+			       XCHAR_TABLE_LEVELS (obj2),
+			       depth);
 }

-static const struct memory_description char_table_entry_description[] = {
-  { XD_LISP_OBJECT_ARRAY, offsetof (Lisp_Char_Table_Entry, level2), 96 },
-  { XD_END }
-};
+/* Characters likely to have case pairs or special syntax -- e.g. comment
+   characters -- or unusual mappings in e.g. JIS-ROMAN, or in certain
+   national character sets that map higher ASCII punctuation chars into
+   extra letters (cf. the need for digraphs and trigraphs in C/C++) plus
+   some random ones to boot; probably, ASCII chars are more likely to show
+   up in char tables than others. */
+static char *likely_test =
"\t\n\r\f\016\025\0330128!＠#$%^&*`'_+=-,.<>?;:/~()[]{}\\\"acehijlnortuxyzADEGIKMOQSVY";

-DEFINE_LRECORD_IMPLEMENTATION ("char-table-entry", char_table_entry,
-			       1, /* dumpable flag */
-                               mark_char_table_entry, internal_object_printer,
-			       0, char_table_entry_equal,
-			       char_table_entry_hash,
-			       char_table_entry_description,
-			       Lisp_Char_Table_Entry);
+static Hashcode
+char_table_hash (Lisp_Object obj, int depth)
+{
+  Hashcode hashval = HASH2 (XCHAR_TABLE_TYPE (obj),
+			    internal_hash (XCHAR_TABLE_DEFAULT (obj),
+					   depth + 1));
+  char *p;
+  Ichar ch;

+  /* Hash those most likely to have values */
+  for (p = likely_test; *p; p++)
+    hashval = HASH2 (hashval, internal_hash
+		     (get_char_table_raw ((Ichar) *p, obj), depth + 1));
+  /* Hash some random Latin characters */
+  for (ch = 130; ch <= 255; ch += 5)
+    hashval = HASH2 (hashval, internal_hash
+		     (get_char_table_raw (ch, obj), depth + 1));
+  /* Don't bother trying to hash higher stuff if there is none. */
+  if (XCHAR_TABLE_LEVELS (obj) > 1)
+    {
+#ifdef UNICODE_INTERNAL
+      /* #### We should really hash less in some of the higher realms but try
+	 to get at least one value from each of the defined Unicode ranges.
+	 Note that we cannot do charset lookups like we do below for old-Mule
+	 because computed hash values for a particular object need to be the
+	 same throughout the lifetime of the program, whereas they would
+	 change if the Unicode-to-charset tables are changed. */
+      /* Hash some random extended Latin characters */
+      for (ch = 260; ch <= 500; ch += 10)
+	hashval = HASH2 (hashval, internal_hash (get_char_table_raw (ch, obj),
+						 depth + 1));
+      /* Hash some higher characters */
+      for (ch = 500; ch <= 4000; ch += 50)
+	hashval = HASH2 (hashval, internal_hash (get_char_table_raw (ch, obj),
+						 depth + 1));
+      /* Hash some random CJK characters */
+      for (ch = 0x4E00; ch <= 0x9FFF; ch += 791)
+	hashval = HASH2 (hashval, internal_hash (get_char_table_raw (ch, obj),
+						 depth + 1));
+      /* Hash some random Hangul characters */
+      for (ch = 0xAC00; ch <= 0xD7AF; ch += 791)
+	hashval = HASH2 (hashval, internal_hash (get_char_table_raw (ch, obj),
+						 depth + 1));
+#elif defined (MULE)
+/* 0xA1 aka 0x21 is usually the first alphabetic character and differs
+   across charsets, whereas 0xA0 is no-break-space across many of them.
+   charset_codepoint_to_ichar_raw() can't fail because we are in non-
+   Unicode-internal. */
+#define FROB1(cs)							\
+  hashval = HASH2 (hashval,						\
+		   internal_hash (get_char_table_raw			\
+				  (charset_codepoint_to_ichar_raw	\
+				   (cs, 0, 0x21),			\
+				   obj), depth + 1))
+/* 0x3021 is the first CJK character in a number of different CJK charsets
+   and differs across them. */
+#define FROB2(cs)							\
+  hashval = HASH2 (hashval,						\
+		   internal_hash (get_char_table_raw			\
+				  (charset_codepoint_to_ichar_raw	\
+				   (cs, 0x30, 0x21),			\
+				   obj), depth + 1))
+      FROB1 (Vcharset_latin_iso8859_2);
+      FROB1 (Vcharset_latin_iso8859_3);
+      FROB1 (Vcharset_latin_iso8859_4);
+      FROB1 (Vcharset_thai_tis620);
+      FROB1 (Vcharset_greek_iso8859_7);
+      FROB1 (Vcharset_arabic_iso8859_6);
+      FROB1 (Vcharset_hebrew_iso8859_8);
+      FROB1 (Vcharset_katakana_jisx0201);
+      FROB1 (Vcharset_latin_jisx0201);
+      FROB1 (Vcharset_cyrillic_iso8859_5);
+      FROB1 (Vcharset_latin_iso8859_9);
+      FROB1 (Vcharset_latin_iso8859_15);
+      FROB1 (Vcharset_chinese_sisheng);
+      FROB2 (Vcharset_japanese_jisx0208_1978);
+      FROB2 (Vcharset_chinese_gb2312);
+      FROB2 (Vcharset_japanese_jisx0208);
+      FROB2 (Vcharset_korean_ksc5601);
+      FROB2 (Vcharset_japanese_jisx0212);
+      FROB2 (Vcharset_chinese_cns11643_1);
+      FROB2 (Vcharset_chinese_cns11643_2);
+      FROB2 (Vcharset_chinese_big5_1);
+      FROB2 (Vcharset_chinese_big5_2);
+#undef FROB1
+#undef FROB2
 #endif /* MULE */
+    }
+  return hashval;
+}

 static Lisp_Object
 mark_char_table (Lisp_Object obj)
 {
-  Lisp_Char_Table *ct = XCHAR_TABLE (obj);
-  int i;
-
-  for (i = 0; i < NUM_ASCII_CHARS; i++)
-    mark_object (ct->ascii[i]);
-#ifdef MULE
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    mark_object (ct->level1[i]);
-#endif
-  mark_object (ct->parent);
-  mark_object (ct->default_);
-  return ct->mirror_table;
+  mark_object (XCHAR_TABLE_PARENT (obj));
+  mark_object (XCHAR_TABLE_DEFAULT (obj));
+#ifdef MIRROR_TABLE
+  mark_object (XCHAR_TABLE_MIRROR_TABLE (obj));
+#endif /* MIRROR_TABLE */
+  return XCHAR_TABLE_TABLE (obj);
 }
-
-/* WARNING: All functions of this nature need to be written extremely
-   carefully to avoid crashes during GC.  Cf. prune_specifiers()
-   and prune_weak_hash_tables(). */

-void
-prune_syntax_tables (void)
+/* Allocate and blank the tables. */
+static void
+init_chartab_tables (Lisp_Object chartab)
 {
-  Lisp_Object rest, prev = Qnil;
+  /* CHARTAB_LEVELS (foo) will evaluates to 4 when MAXIMIZE_CHAR_TABLE_DEPTH
+     and MULE, to 1 if MAXIMIZE_CHAR_TABLE_DEPTH and not MULE, and to
+     foo otherwise. */
+  XCHAR_TABLE_LEVELS (chartab) = CHARTAB_LEVELS (1);
+  XCHAR_TABLE_TABLE (chartab) =
+    create_new_chartab_table (XCHAR_TABLE_LEVELS (chartab));
+}

-  for (rest = Vall_syntax_tables;
-       !NILP (rest);
-       rest = XCHAR_TABLE (rest)->next_table)
+static void
+free_chartab_tables (Lisp_Object chartab)
+{
+  if (!UNBOUNDP (XCHAR_TABLE_TABLE (chartab)))
     {
-      if (! marked_p (rest))
-	{
-	  /* This table is garbage.  Remove it from the list. */
-	  if (NILP (prev))
-	    Vall_syntax_tables = XCHAR_TABLE (rest)->next_table;
-	  else
-	    XCHAR_TABLE (prev)->next_table =
-	      XCHAR_TABLE (rest)->next_table;
-	}
+      free_chartab_table (XCHAR_TABLE_TABLE (chartab),
+			  XCHAR_TABLE_LEVELS (chartab));
+      XCHAR_TABLE_TABLE (chartab) = Qunbound;
     }
 }

＠＠ -223,82 +773,6 ＠＠
   RETURN_NOT_REACHED (CHAR_TABLE_TYPE_GENERIC);
 }

-static void
-decode_char_table_range (Lisp_Object range, struct chartab_range *outrange)
-{
-  if (EQ (range, Qt))
-    outrange->type = CHARTAB_RANGE_ALL;
-  else if (CHAR_OR_CHAR_INTP (range))
-    {
-      outrange->type = CHARTAB_RANGE_CHAR;
-      outrange->ch = XCHAR_OR_CHAR_INT (range);
-    }
-#ifndef MULE
-  else
-    sferror ("Range must be t or a character", range);
-#else /* MULE */
-  else if (VECTORP (range))
-    {
-      Lisp_Vector *vec = XVECTOR (range);
-      Lisp_Object *elts = vector_data (vec);
-      if (vector_length (vec) != 2)
-	sferror ("Length of charset row vector must be 2",
-			     range);
-      outrange->type = CHARTAB_RANGE_ROW;
-      outrange->charset = Fget_charset (elts[0]);
-      CHECK_INT (elts[1]);
-      outrange->row = XINT (elts[1]);
-      switch (XCHARSET_TYPE (outrange->charset))
-	{
-	case CHARSET_TYPE_94:
-	case CHARSET_TYPE_96:
-	  sferror ("Charset in row vector must be multi-byte",
-			       outrange->charset);
-	case CHARSET_TYPE_94X94:
-	  check_int_range (outrange->row, 33, 126);
-	  break;
-	case CHARSET_TYPE_96X96:
-	  check_int_range (outrange->row, 32, 127);
-	  break;
-	default:
-	  ABORT ();
-	}
-    }
-  else
-    {
-      if (!CHARSETP (range) && !SYMBOLP (range))
-	sferror
-	  ("Char table range must be t, charset, char, or vector", range);
-      outrange->type = CHARTAB_RANGE_CHARSET;
-      outrange->charset = Fget_charset (range);
-    }
-#endif /* MULE */
-}
-
-static Lisp_Object
-encode_char_table_range (struct chartab_range *range)
-{
-  switch (range->type)
-    {
-    case CHARTAB_RANGE_ALL:
-      return Qt;
-      
-#ifdef MULE
-    case CHARTAB_RANGE_CHARSET:
-      return XCHARSET_NAME (Fget_charset (range->charset));
-
-    case CHARTAB_RANGE_ROW:
-      return vector2 (XCHARSET_NAME (Fget_charset (range->charset)),
-		      make_int (range->row));
-#endif
-    case CHARTAB_RANGE_CHAR:
-      return make_char (range->ch);
-    default:
-      ABORT ();
-    }
-  return Qnil; /* not reached */
-}
-
 struct ptemap
 {
   Lisp_Object printcharfun;
＠＠ -306,19 +780,14 ＠＠
 };

 static int
-print_table_entry (struct chartab_range *range, Lisp_Object UNUSED (table),
+print_table_entry (Lisp_Object UNUSED (table), Ichar ch,
 		   Lisp_Object val, void *arg)
 {
   struct ptemap *a = (struct ptemap *) arg;
-  struct gcpro gcpro1;
-  Lisp_Object lisprange;
   if (!a->first)
     write_c_string (a->printcharfun, " ");
   a->first = 0;
-  lisprange = encode_char_table_range (range);
-  GCPRO1 (lisprange);
-  write_fmt_string_lisp (a->printcharfun, "%s %s", 2, lisprange, val);
-  UNGCPRO;
+  write_fmt_string_lisp (a->printcharfun, "%s %s", 2, make_char (ch), val);
   return 0;
 }

＠＠ -343,62 +812,105 ＠＠
      default to be modified, which we don't (yet) support -- but FSF does */
 }

-static int
-char_table_equal (Lisp_Object obj1, Lisp_Object obj2, int depth)
-{
-  Lisp_Char_Table *ct1 = XCHAR_TABLE (obj1);
-  Lisp_Char_Table *ct2 = XCHAR_TABLE (obj2);
-  int i;
-
-  if (CHAR_TABLE_TYPE (ct1) != CHAR_TABLE_TYPE (ct2))
-    return 0;
-
-  for (i = 0; i < NUM_ASCII_CHARS; i++)
-    if (!internal_equal (ct1->ascii[i], ct2->ascii[i], depth + 1))
-      return 0;
-
-#ifdef MULE
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    if (!internal_equal (ct1->level1[i], ct2->level1[i], depth + 1))
-      return 0;
-#endif /* MULE */
-
-  return internal_equal (ct1->default_, ct2->default_, depth + 1);
-}
-
-static Hashcode
-char_table_hash (Lisp_Object obj, int depth)
-{
-  Lisp_Char_Table *ct = XCHAR_TABLE (obj);
-  Hashcode hashval = internal_array_hash (ct->ascii, NUM_ASCII_CHARS,
-					   depth + 1);
-#ifdef MULE
-  hashval = HASH2 (hashval,
-		   internal_array_hash (ct->level1, NUM_LEADING_BYTES,
-					depth + 1));
-#endif /* MULE */
-  return HASH2 (hashval, internal_hash (ct->default_, depth + 1));
-}
-
 static const struct memory_description char_table_description[] = {
-  { XD_LISP_OBJECT_ARRAY, offsetof (Lisp_Char_Table, ascii), NUM_ASCII_CHARS },
-#ifdef MULE
-  { XD_LISP_OBJECT_ARRAY, offsetof (Lisp_Char_Table, level1), NUM_LEADING_BYTES },
-#endif
+  { XD_LISP_OBJECT, offsetof (Lisp_Char_Table, table) },
   { XD_LISP_OBJECT, offsetof (Lisp_Char_Table, parent) },
   { XD_LISP_OBJECT, offsetof (Lisp_Char_Table, default_) },
-  { XD_LISP_OBJECT, offsetof (Lisp_Char_Table, mirror_table) },
   { XD_LO_LINK,     offsetof (Lisp_Char_Table, next_table) },
+#ifdef MIRROR_TABLE
+  { XD_LISP_OBJECT, offsetof (Lisp_Char_Table, mirror_table) },
+#endif /* MIRROR_TABLE */
   { XD_END }
 };

 DEFINE_LRECORD_IMPLEMENTATION ("char-table", char_table,
 			       1, /*dumpable-flag*/
-                               mark_char_table, print_char_table, 0,
+                               mark_char_table, print_char_table,
+			       0,
 			       char_table_equal, char_table_hash,
 			       char_table_description,
 			       Lisp_Char_Table);

+/* WARNING: All functions of this nature need to be written extremely
+   carefully to avoid crashes during GC.  Cf. prune_specifiers()
+   and prune_weak_hash_tables(). */
+
+void
+prune_syntax_tables (void)
+{
+  Lisp_Object rest, prev = Qnil;
+
+  for (rest = Vall_syntax_tables;
+       !NILP (rest);
+       rest = XCHAR_TABLE (rest)->next_table)
+    {
+      if (! marked_p (rest))
+	{
+	  /* This table is garbage.  Remove it from the list. */
+	  if (NILP (prev))
+	    Vall_syntax_tables = XCHAR_TABLE (rest)->next_table;
+	  else
+	    XCHAR_TABLE (prev)->next_table =
+	      XCHAR_TABLE (rest)->next_table;
+	}
+    }
+}
+
+static void
+decode_char_table_range (Lisp_Object range, struct chartab_range *outrange)
+{
+  if (EQ (range, Qt))
+    outrange->type = CHARTAB_RANGE_ALL;
+  else if (CHAR_OR_CHAR_INTP (range))
+    {
+      outrange->type = CHARTAB_RANGE_CHAR;
+      outrange->ch = XCHAR_OR_CHAR_INT (range);
+    }
+  else if (CONSP (range))
+    {
+      CHECK_CHAR_COERCE_INT (XCAR (range));
+      CHECK_CHAR_COERCE_INT (XCDR (range));
+      outrange->type = CHARTAB_RANGE_RANGE;
+      outrange->ch = XCHAR_OR_CHAR_INT (XCAR (range));
+      outrange->chtop = XCHAR_OR_CHAR_INT (XCDR (range));
+    }
+#ifndef MULE
+  else
+    sferror ("Range must be t, character or cons of char range", range);
+#else /* MULE */
+  else if (VECTORP (range))
+    {
+      Lisp_Vector *vec = XVECTOR (range);
+      Lisp_Object *elts = vector_data (vec);
+      if (vector_length (vec) != 2)
+	sferror ("Length of charset row vector must be 2",
+			     range);
+      outrange->type = CHARTAB_RANGE_ROW;
+      outrange->charset = Fget_charset (elts[0]);
+      CHECK_INT (elts[1]);
+      outrange->row = XINT (elts[1]);
+      if (XCHARSET_DIMENSION (outrange->charset) == 1)
+	sferror ("Charset in row vector must be multi-byte",
+		 outrange->charset);
+      else
+	{
+	  check_int_range (outrange->row,
+			   XCHARSET_OFFSET (outrange->charset, 0),
+			   XCHARSET_OFFSET (outrange->charset, 0) +
+			   XCHARSET_CHARS (outrange->charset, 0) - 1);
+	}
+    }
+  else
+    {
+      if (!CHARSETP (range) && !SYMBOLP (range))
+	sferror
+	  ("Char table range must be t, char, charset, cons or vector", range);
+      outrange->type = CHARTAB_RANGE_CHARSET;
+      outrange->charset = Fget_charset (range);
+    }
+#endif /* MULE */
+}
+
 DEFUN ("char-table-p", Fchar_table_p, 1, 1, 0, /*
 Return non-nil if OBJECT is a char table.
 */
＠＠ -448,8 +960,10 ＠＠
 static void
 set_char_table_dirty (Lisp_Object table)
 {
+#ifdef MIRROR_TABLE
   assert (!XCHAR_TABLE (table)->mirror_table_p);
   XCHAR_TABLE (XCHAR_TABLE (table)->mirror_table)->dirty = 1;
+#endif /* MIRROR_TABLE */
 }

 void
＠＠ -458,31 +972,7 ＠＠
   Lisp_Char_Table *ct = XCHAR_TABLE (table);
   ct->default_ = value;
   if (ct->type == CHAR_TABLE_TYPE_SYNTAX)
-    set_char_table_dirty (table);
-}
-
-static void
-fill_char_table (Lisp_Char_Table *ct, Lisp_Object value)
-{
-  int i;
-
-  for (i = 0; i < NUM_ASCII_CHARS; i++)
-    ct->ascii[i] = value;
-#ifdef MULE
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    {
-      /* Don't get stymied when initting the table, or when trying to
-	 free a pdump object. */
-      if (!EQ (ct->level1[i], Qnull_pointer) &&
-	  CHAR_TABLE_ENTRYP (ct->level1[i]) &&
-	  !OBJECT_DUMPED_P (ct->level1[1]))
-	FREE_LCRECORD (ct->level1[i]);
-      ct->level1[i] = value;
-    }
-#endif /* MULE */
-
-  if (ct->type == CHAR_TABLE_TYPE_SYNTAX)
-    set_char_table_dirty (wrap_char_table (ct));
+    set_char_table_dirty (table);
 }

 DEFUN ("reset-char-table", Freset_char_table, 1, 1, 0, /*
＠＠ -521,8 +1011,9 ＠＠

   /* Avoid doubly updating the syntax table by setting the default ourselves,
      since set_char_table_default() also updates. */
-  ct->default_ = def;
-  fill_char_table (ct, Qunbound);
+  XCHAR_TABLE_DEFAULT (char_table) = def;
+  free_chartab_tables (char_table);
+  init_chartab_tables (char_table);

   return Qnil;
 }
＠＠ -595,6 +1086,8 ＠＠
   ct = ALLOC_LCRECORD_TYPE (Lisp_Char_Table, &lrecord_char_table);
   ct->type = ty;
   obj = wrap_char_table (ct);
+  ct->table = Qunbound;
+#ifdef MIRROR_TABLE
   if (ty == CHAR_TABLE_TYPE_SYNTAX)
     {
       /* Qgeneric not Qsyntax because a syntax table has a mirror table
＠＠ -606,6 +1099,7 ＠＠
     }
   else
     ct->mirror_table = Qnil;
+#endif /* MIRROR_TABLE */
   ct->next_table = Qnil;
   ct->parent = Qnil;
   ct->default_ = Qnil;
＠＠ -618,43 +1112,6 ＠＠
   return obj;
 }

-#ifdef MULE
-
-static Lisp_Object
-make_char_table_entry (Lisp_Object initval)
-{
-  int i;
-  Lisp_Char_Table_Entry *cte =
-    ALLOC_LCRECORD_TYPE (Lisp_Char_Table_Entry, &lrecord_char_table_entry);
-
-  for (i = 0; i < 96; i++)
-    cte->level2[i] = initval;
-
-  return wrap_char_table_entry (cte);
-}
-
-static Lisp_Object
-copy_char_table_entry (Lisp_Object entry)
-{
-  Lisp_Char_Table_Entry *cte = XCHAR_TABLE_ENTRY (entry);
-  int i;
-  Lisp_Char_Table_Entry *ctenew =
-    ALLOC_LCRECORD_TYPE (Lisp_Char_Table_Entry, &lrecord_char_table_entry);
-
-  for (i = 0; i < 96; i++)
-    {
-      Lisp_Object new_ = cte->level2[i];
-      if (CHAR_TABLE_ENTRYP (new_))
-	ctenew->level2[i] = copy_char_table_entry (new_);
-      else
-	ctenew->level2[i] = new_;
-    }
-
-  return wrap_char_table_entry (ctenew);
-}
-
-#endif /* MULE */
-
 DEFUN ("copy-char-table", Fcopy_char_table, 1, 1, 0, /*
 Return a new char table which is a copy of CHAR-TABLE.
 It will contain the same values for the same characters and ranges
＠＠ -664,7 +1121,6 ＠＠
 {
   Lisp_Char_Table *ct, *ctnew;
   Lisp_Object obj;
-  int i;

   CHECK_CHAR_TABLE (char_table);
   ct = XCHAR_TABLE (char_table);
＠＠ -672,31 +1128,12 ＠＠
   ctnew->type = ct->type;
   ctnew->parent = ct->parent;
   ctnew->default_ = ct->default_;
-  ctnew->mirror_table_p = ct->mirror_table_p;
+  ctnew->levels = ct->levels;
+  ctnew->table = copy_chartab_table (ct->table, ct->levels);
   obj = wrap_char_table (ctnew);
-
-  for (i = 0; i < NUM_ASCII_CHARS; i++)
-    {
-      Lisp_Object new_ = ct->ascii[i];
-#ifdef MULE
-      assert (! (CHAR_TABLE_ENTRYP (new_)));
-#endif /* MULE */
-      ctnew->ascii[i] = new_;
-    }
-
-#ifdef MULE
-
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    {
-      Lisp_Object new_ = ct->level1[i];
-      if (CHAR_TABLE_ENTRYP (new_))
-	ctnew->level1[i] = copy_char_table_entry (new_);
-      else
-	ctnew->level1[i] = new_;
-    }
-
-#endif /* MULE */

+#ifdef MIRROR_TABLE
+  ctnew->mirror_table_p = ct->mirror_table_p;
   if (!ct->mirror_table_p && CHAR_TABLEP (ct->mirror_table))
     {
       ctnew->mirror_table = Fcopy_char_table (ct->mirror_table);
＠＠ -704,6 +1141,7 ＠＠
     }
   else
     ctnew->mirror_table = ct->mirror_table;
+#endif /* MIRROR_TABLE */
   ctnew->next_table = Qnil;
   if (ctnew->type == CHAR_TABLE_TYPE_SYNTAX)
     {
＠＠ -713,37 +1151,6 ＠＠
   return obj;
 }

-#ifdef MULE
-
-/* called from get_char_table(). */
-Lisp_Object
-get_non_ascii_char_table_value (Lisp_Char_Table *ct, int leading_byte,
-				Ichar c)
-{
-  Lisp_Object val;
-  Lisp_Object charset = charset_by_leading_byte (leading_byte);
-  int byte1, byte2;
-
-  BREAKUP_ICHAR_1_UNSAFE (c, charset, byte1, byte2);
-  val = ct->level1[leading_byte - MIN_LEADING_BYTE];
-  if (CHAR_TABLE_ENTRYP (val))
-    {
-      Lisp_Char_Table_Entry *cte = XCHAR_TABLE_ENTRY (val);
-      val = cte->level2[byte1 - 32];
-      if (CHAR_TABLE_ENTRYP (val))
-	{
-	  cte = XCHAR_TABLE_ENTRY (val);
-	  assert (byte2 >= 32);
-	  val = cte->level2[byte2 - 32];
-	  assert (!CHAR_TABLE_ENTRYP (val));
-	}
-    }
-
-  return val;
-}
-
-#endif /* MULE */
-
 DEFUN ("char-table-default", Fchar_table_default, 1, 1, 0, /*
 Return the default value for CHAR-TABLE.  When an entry for a character
 does not exist, the default is returned.
＠＠ -780,157 +1187,6 ＠＠

   return get_char_table (XCHAR (character), char_table);
 }
-
-static int
-copy_mapper (struct chartab_range *range, Lisp_Object UNUSED (table),
-	     Lisp_Object val, void *arg)
-{
-  put_char_table (VOID_TO_LISP (arg), range, val);
-  return 0;
-}
-
-void
-copy_char_table_range (Lisp_Object from, Lisp_Object to,
-		       struct chartab_range *range)
-{
-  map_char_table (from, range, copy_mapper, LISP_TO_VOID (to));
-}
-
-static Lisp_Object
-get_range_char_table_1 (struct chartab_range *range, Lisp_Object table,
-			Lisp_Object multi)
-{
-  Lisp_Char_Table *ct = XCHAR_TABLE (table);
-  Lisp_Object retval = Qnil;
-
-  switch (range->type)
-    {
-    case CHARTAB_RANGE_CHAR:
-      return get_char_table (range->ch, table);
-
-    case CHARTAB_RANGE_ALL:
-      {
-	int i;
-	retval = ct->ascii[0];
-
-	for (i = 1; i < NUM_ASCII_CHARS; i++)
-	  if (!EQ (retval, ct->ascii[i]))
-	    return multi;
-
-#ifdef MULE
-	for (i = MIN_LEADING_BYTE; i < MIN_LEADING_BYTE + NUM_LEADING_BYTES;
-	     i++)
-	  {
-	    if (!CHARSETP (charset_by_leading_byte (i))
-		|| i == LEADING_BYTE_ASCII
-		|| i == LEADING_BYTE_CONTROL_1)
-	      continue;
-	    if (!EQ (retval, ct->level1[i - MIN_LEADING_BYTE]))
-	      return multi;
-	  }
-#endif /* MULE */
-
-	break;
-      }
-
-#ifdef MULE
-    case CHARTAB_RANGE_CHARSET:
-      if (EQ (range->charset, Vcharset_ascii))
-	{
-	  int i;
-	  retval = ct->ascii[0];
-
-	  for (i = 1; i < 128; i++)
-	    if (!EQ (retval, ct->ascii[i]))
-	      return multi;
-	  break;
-	}
-
-      if (EQ (range->charset, Vcharset_control_1))
-	{
-	  int i;
-	  retval = ct->ascii[128];
-
-	  for (i = 129; i < 160; i++)
-	    if (!EQ (retval, ct->ascii[i]))
-	      return multi;
-	  break;
-	}
-
-      {
-	retval = ct->level1[XCHARSET_LEADING_BYTE (range->charset) -
-			    MIN_LEADING_BYTE];
-	if (CHAR_TABLE_ENTRYP (retval))
-	  return multi;
-	break;
-      }
-
-    case CHARTAB_RANGE_ROW:
-      {
-	retval = ct->level1[XCHARSET_LEADING_BYTE (range->charset) -
-			    MIN_LEADING_BYTE];
-	if (!CHAR_TABLE_ENTRYP (retval))
-	  break;
-	retval = XCHAR_TABLE_ENTRY (retval)->level2[range->row - 32];
-	if (CHAR_TABLE_ENTRYP (retval))
-	  return multi;
-	break;
-      }
-#endif /* not MULE */
-
-    default:
-      ABORT ();
-    }
-
-  if (UNBOUNDP (retval))
-    return ct->default_;
-  return retval;
-}
-
-Lisp_Object
-get_range_char_table (struct chartab_range *range, Lisp_Object table,
-		      Lisp_Object multi)
-{
-  if (range->type == CHARTAB_RANGE_CHAR)
-    return get_char_table (range->ch, table);
-  else
-    return get_range_char_table_1 (range, table, multi);
-}
-
-#ifdef ERROR_CHECK_TYPES
-
-/* Only exists so as not to trip an assert in get_char_table(). */
-Lisp_Object
-updating_mirror_get_range_char_table (struct chartab_range *range,
-				      Lisp_Object table,
-				      Lisp_Object multi)
-{
-  if (range->type == CHARTAB_RANGE_CHAR)
-    return get_char_table_1 (range->ch, table);
-  else
-    return get_range_char_table_1 (range, table, multi);
-}
-
-#endif /* ERROR_CHECK_TYPES */
-
-DEFUN ("get-range-char-table", Fget_range_char_table, 2, 3, 0, /*
-Find value for RANGE in CHAR-TABLE.
-If there is more than one value, return MULTI (defaults to nil).
-
-Valid values for RANGE are single characters, charsets, a row in a
-two-octet charset, and all characters.  See `put-char-table'.
-*/
-       (range, char_table, multi))
-{
-  struct chartab_range rainj;
-
-  if (CHAR_OR_CHAR_INTP (range))
-    return Fget_char_table (range, char_table);
-  CHECK_CHAR_TABLE (char_table);
-
-  decode_char_table_range (range, &rainj);
-  return get_range_char_table (&rainj, char_table, multi);
-}

 static int
 check_valid_char_table_value (Lisp_Object value, enum char_table_type type,
＠＠ -966,8 +1222,8 ＠＠
     case CHAR_TABLE_TYPE_DISPLAY:
       /* #### fix this */
       maybe_signal_error (Qunimplemented,
-			       "Display char tables not yet implemented",
-			       value, Qchar_table, errb);
+			  "Display char tables not yet implemented",
+			  value, Qchar_table, errb);
       return 0;

     case CHAR_TABLE_TYPE_CHAR:
＠＠ -1034,91 +1290,57 ＠＠
 		Lisp_Object val)
 {
   Lisp_Char_Table *ct = XCHAR_TABLE (table);
+#ifdef MULE
+  int l1, h1, l2, h2;
+#endif

   switch (range->type)
     {
-    case CHARTAB_RANGE_ALL:
-      fill_char_table (ct, val);
-      return; /* fill_char_table() recorded the table as dirty. */
-
 #ifdef MULE
-    case CHARTAB_RANGE_CHARSET:
-      if (EQ (range->charset, Vcharset_ascii))
-	{
-	  int i;
-	  for (i = 0; i < 128; i++)
-	    ct->ascii[i] = val;
-	}
-      else if (EQ (range->charset, Vcharset_control_1))
-	{
-	  int i;
-	  for (i = 128; i < 160; i++)
-	    ct->ascii[i] = val;
-	}
-      else
-	{
-	  int lb = XCHARSET_LEADING_BYTE (range->charset) - MIN_LEADING_BYTE;
-	  if (CHAR_TABLE_ENTRYP (ct->level1[lb]) &&
-	      !OBJECT_DUMPED_P (ct->level1[lb]))
-	    FREE_LCRECORD (ct->level1[lb]);
-	  ct->level1[lb] = val;
-	}
-      break;
-
     case CHARTAB_RANGE_ROW:
       {
-	Lisp_Char_Table_Entry *cte;
-	int lb = XCHARSET_LEADING_BYTE (range->charset) - MIN_LEADING_BYTE;
-	/* make sure that there is a separate entry for the row. */
-	if (!CHAR_TABLE_ENTRYP (ct->level1[lb]))
-	  ct->level1[lb] = make_char_table_entry (ct->level1[lb]);
-	cte = XCHAR_TABLE_ENTRY (ct->level1[lb]);
-	cte->level2[range->row - 32] = val;
+	get_charset_limits (range->charset, &l1, &h1, &l2, &h2);
+	l1 = h1 = range->row;
+	goto iterate_charset;
       }
+
+    case CHARTAB_RANGE_CHARSET:
+      {
+	int i, j;
+	get_charset_limits (range->charset, &l1, &h1, &l2, &h2);
+      iterate_charset:
+	for (i = l1; i <= h1; i++)
+	  for (j = l2; j <= h2; j++)
+	    {
+	      Ichar ch = charset_codepoint_to_ichar_raw (range->charset, i, j);
+	      if (ch >= 0)
+		put_char_table_1 (table, ch, val);
+	    }
+      }
       break;
 #endif /* MULE */

-    case CHARTAB_RANGE_CHAR:
-#ifdef MULE
+#define CHAR_INTERVAL_FOR_QUIT 1000
+    case CHARTAB_RANGE_RANGE:
       {
-	Lisp_Object charset;
-	int byte1, byte2;
-
-	BREAKUP_ICHAR (range->ch, charset, byte1, byte2);
-	if (EQ (charset, Vcharset_ascii))
-	  ct->ascii[byte1] = val;
-	else if (EQ (charset, Vcharset_control_1))
-	  ct->ascii[byte1 + 128] = val;
-	else
+	Ichar i;
+	for (i = range->ch; i <= range->chtop; i += CHAR_INTERVAL_FOR_QUIT)
 	  {
-	    Lisp_Char_Table_Entry *cte;
-	    int lb = XCHARSET_LEADING_BYTE (charset) - MIN_LEADING_BYTE;
-	    /* make sure that there is a separate entry for the row. */
-	    if (!CHAR_TABLE_ENTRYP (ct->level1[lb]))
-	      ct->level1[lb] = make_char_table_entry (ct->level1[lb]);
-	    cte = XCHAR_TABLE_ENTRY (ct->level1[lb]);
-	    /* now CTE is a char table entry for the charset;
-	       each entry is for a single row (or character of
-	       a one-octet charset). */
-	    if (XCHARSET_DIMENSION (charset) == 1)
-	      cte->level2[byte1 - 32] = val;
-	    else
-	      {
-		/* assigning to one character in a two-octet charset. */
-		/* make sure that the charset row contains a separate
-		   entry for each character. */
-		if (!CHAR_TABLE_ENTRYP (cte->level2[byte1 - 32]))
-		  cte->level2[byte1 - 32] =
-		    make_char_table_entry (cte->level2[byte1 - 32]);
-		cte = XCHAR_TABLE_ENTRY (cte->level2[byte1 - 32]);
-		cte->level2[byte2 - 32] = val;
-	      }
+	    Ichar stop = min (i + CHAR_INTERVAL_FOR_QUIT - 1, range->chtop);
+	    Ichar j;
+
+	    /* QUIT every CHAR_INTERVAL_FOR_QUIT characters */
+	    for (j = i; j <= stop; j++)
+	      put_char_table_1 (table, j, val);
+	    QUIT;
 	  }
       }
-#else /* not MULE */
-      ct->ascii[(unsigned char) (range->ch)] = val;
+
+      break;
+
+    case CHARTAB_RANGE_CHAR:
+      put_char_table_1 (table, range->ch, val);
       break;
-#endif /* not MULE */
     }

   if (ct->type == CHAR_TABLE_TYPE_SYNTAX)
＠＠ -1126,16 +1348,17 ＠＠
 }

 DEFUN ("put-char-table", Fput_char_table, 3, 3, 0, /*
-Set the value for chars in RANGE to be VALUE in CHAR-TABLE.
+Set the value for CHAR to be VALUE in CHAR-TABLE.

-RANGE specifies one or more characters to be affected and should be
+CHAR specifies one or more characters to be affected and should be
 one of the following:

--- t (all characters are affected)
--- A charset (only allowed when Mule support is present)
+-- A charset (only allowed when Mule support is present; all characters
+   in the charset are set)
 -- A vector of two elements: a two-octet charset and a row number; the row
    must be an integer, not a character (only allowed when Mule support is
    present)
+-- A cons of two characters (a range, inclusive on both ends)
 -- A single character

 VALUE must be a value appropriate for the type of CHAR-TABLE.
＠＠ -1150,6 +1373,9 ＠＠
   ct = XCHAR_TABLE (char_table);
   check_valid_char_table_value (value, ct->type, ERROR_ME);
   decode_char_table_range (range, &rainj);
+  if (rainj.type == CHARTAB_RANGE_ALL)
+    invalid_argument ("Can't currently set all characters in a char
table",
+                      range);
   value = canonicalize_char_table_value (value, ct->type);
   put_char_table (char_table, &rainj, value);
   return Qnil;
＠＠ -1162,6 +1388,7 ＠＠
 one of the following:

 -- t (all characters are affected)
+-- A cons of two characters (a range, inclusive on both ends)
 -- A charset (only allowed when Mule support is present)
 -- A vector of two elements: a two-octet charset and a row number
    (only allowed when Mule support is present)
＠＠ -1176,258 +1403,91 ＠＠

   CHECK_CHAR_TABLE (char_table);
   decode_char_table_range (range, &rainj);
-  put_char_table (char_table, &rainj, Qunbound);
-  return Qnil;
-}
-
-/* Map FN over the ASCII chars in CT. */
-
-static int
-map_over_charset_ascii_1 (Lisp_Char_Table *ct,
-			  int start, int stop,
-			  int (*fn) (struct chartab_range *range,
-				     Lisp_Object table, Lisp_Object val,
-				     void *arg),
-			  void *arg)
-{
-  struct chartab_range rainj;
-  int i, retval;
-
-  rainj.type = CHARTAB_RANGE_CHAR;
-
-  for (i = start, retval = 0; i <= stop && retval == 0; i++)
-    {
-      rainj.ch = (Ichar) i;
-      if (!UNBOUNDP (ct->ascii[i]))
-	retval = (fn) (&rainj, wrap_char_table (ct), ct->ascii[i], arg);
-    }
-
-  return retval;
-}
-
-
-/* Map FN over the ASCII chars in CT. */
-
-static int
-map_over_charset_ascii (Lisp_Char_Table *ct,
-			int (*fn) (struct chartab_range *range,
-				   Lisp_Object table, Lisp_Object val,
-				   void *arg),
-			void *arg)
-{
-  return map_over_charset_ascii_1 (ct, 0,
-#ifdef MULE
-				   127,
-#else
-				   255,
-#endif
-				   fn, arg);
-}
-
-#ifdef MULE
-
-/* Map FN over the Control-1 chars in CT. */
-
-static int
-map_over_charset_control_1 (Lisp_Char_Table *ct,
-			    int (*fn) (struct chartab_range *range,
-				       Lisp_Object table, Lisp_Object val,
-				       void *arg),
-			    void *arg)
-{
-  return map_over_charset_ascii_1 (ct, 128, 159, fn, arg);
-}
-
-/* Map FN over the row ROW of two-byte charset CHARSET.
-   There must be a separate value for that row in the char table.
-   CTE specifies the char table entry for CHARSET. */
-
-static int
-map_over_charset_row (Lisp_Char_Table *ct,
-		      Lisp_Char_Table_Entry *cte,
-		      Lisp_Object charset, int row,
-		      int (*fn) (struct chartab_range *range,
-				 Lisp_Object table, Lisp_Object val,
-				 void *arg),
-		      void *arg)
-{
-  Lisp_Object val = cte->level2[row - 32];
-
-  if (UNBOUNDP (val))
-    return 0;
-  else if (!CHAR_TABLE_ENTRYP (val))
+  if (rainj.type == CHARTAB_RANGE_ALL)
     {
-      struct chartab_range rainj;
-      
-      rainj.type = CHARTAB_RANGE_ROW;
-      rainj.charset = charset;
-      rainj.row = row;
-      return (fn) (&rainj, wrap_char_table (ct), val, arg);
+      free_chartab_tables (char_table);
+      init_chartab_tables (char_table);
     }
   else
-    {
-      struct chartab_range rainj;
-      int i, retval;
-      int start, stop;
-	  
-      get_charset_limits (charset, &start, &stop);
-
-      cte = XCHAR_TABLE_ENTRY (val);
-
-      rainj.type = CHARTAB_RANGE_CHAR;
-
-      for (i = start, retval = 0; i <= stop && retval == 0; i++)
-	{
-	  rainj.ch = make_ichar (charset, row, i);
-	  if (!UNBOUNDP (cte->level2[i - 32]))
-	    retval = (fn) (&rainj, wrap_char_table (ct), cte->level2[i - 32],
-			   arg);
-	}
-      return retval;
-    }
-}
-
-
-static int
-map_over_other_charset (Lisp_Char_Table *ct, int lb,
-			int (*fn) (struct chartab_range *range,
-				   Lisp_Object table, Lisp_Object val,
-				   void *arg),
-			void *arg)
-{
-  Lisp_Object val = ct->level1[lb - MIN_LEADING_BYTE];
-  Lisp_Object charset = charset_by_leading_byte (lb);
-
-  if (!CHARSETP (charset)
-      || lb == LEADING_BYTE_ASCII
-      || lb == LEADING_BYTE_CONTROL_1)
-    return 0;
-
-  if (UNBOUNDP (val))
-    return 0;
-  if (!CHAR_TABLE_ENTRYP (val))
-    {
-      struct chartab_range rainj;
-
-      rainj.type = CHARTAB_RANGE_CHARSET;
-      rainj.charset = charset;
-      return (fn) (&rainj, wrap_char_table (ct), val, arg);
-    }
-  {
-    Lisp_Char_Table_Entry *cte = XCHAR_TABLE_ENTRY (val);
-    int start, stop;
-    int i, retval;
-
-    get_charset_limits (charset, &start, &stop);
-    if (XCHARSET_DIMENSION (charset) == 1)
-      {
-	struct chartab_range rainj;
-	rainj.type = CHARTAB_RANGE_CHAR;
-
-	for (i = start, retval = 0; i <= stop && retval == 0; i++)
-	  {
-	    rainj.ch = make_ichar (charset, i, 0);
-	    if (!UNBOUNDP (cte->level2[i - 32]))
-	      retval = (fn) (&rainj, wrap_char_table (ct), cte->level2[i - 32],
-			     arg);
-	  }
-      }
-    else
-      {
-	for (i = start, retval = 0; i <= stop && retval == 0; i++)
-	  retval = map_over_charset_row (ct, cte, charset, i, fn, arg);
-      }
-
-    return retval;
-  }
+    put_char_table (char_table, &rainj, Qunbound);
+  return Qnil;
 }

-#endif /* MULE */
-
 /* Map FN (with client data ARG) over range RANGE in char table CT.
    Mapping stops the first time FN returns non-zero, and that value
    becomes the return value of map_char_table().
-
-   #### This mapping code is way ugly.  The FSF version, in contrast,
-   is short and sweet, and much more recursive.  There should be some way
-   of cleaning this up. */
+ */

 int
 map_char_table (Lisp_Object table,
 		struct chartab_range *range,
-		int (*fn) (struct chartab_range *range,
-			   Lisp_Object table, Lisp_Object val, void *arg),
+		int (*fn) (Lisp_Object table, Ichar code, Lisp_Object val,
+			   void *arg),
 		void *arg)
 {
-  Lisp_Char_Table *ct = XCHAR_TABLE (table);
+#ifdef MULE
+  int l1, h1, l2, h2;
+#endif
+  int levels = XCHAR_TABLE_LEVELS (table);
+  /* Compute maximum allowed value for this table, which may be less than
+     the range we have been requested to map over. */
+  int maxval = /* Value is 2^31-1 for 4, but 2^24-1 for 3,
+		  2^16-1 for 2, 2^8-1 for 1. */
+    levels == 4 ? INT_32_BIT_MAX : (1 << (levels * 8)) - 1;
   switch (range->type)
     {
     case CHARTAB_RANGE_ALL:
-      {
-	int retval;
+      return map_chartab_table (XCHAR_TABLE_TABLE (table),
+				XCHAR_TABLE_LEVELS (table),
+				0, 0, maxval,
+                                table, fn, arg);
+
+    case CHARTAB_RANGE_RANGE:
+      return map_chartab_table (XCHAR_TABLE_TABLE (table),
+				XCHAR_TABLE_LEVELS (table),
+				0, min (range->ch, maxval),
+				min (range->chtop, maxval),
+				table, fn, arg);

-	retval = map_over_charset_ascii (ct, fn, arg);
-	if (retval)
-	  return retval;
 #ifdef MULE
-	retval = map_over_charset_control_1 (ct, fn, arg);
-	if (retval)
-	  return retval;
-	{
-	  int i;
-	  int start = MIN_LEADING_BYTE;
-	  int stop  = start + NUM_LEADING_BYTES;
-
-	  for (i = start, retval = 0; i < stop && retval == 0; i++)
-	    {
-	      if (i != LEADING_BYTE_ASCII && i != LEADING_BYTE_CONTROL_1)
-		retval = map_over_other_charset (ct, i, fn, arg);
-	    }
-	}
-#endif /* MULE */
-	return retval;
+    case CHARTAB_RANGE_ROW:
+      {
+	get_charset_limits (range->charset, &l1, &h1, &l2, &h2);
+	l1 = h1 = range->row;
+	goto iterate_charset;
       }

-#ifdef MULE
     case CHARTAB_RANGE_CHARSET:
-      return map_over_other_charset (ct,
-				     XCHARSET_LEADING_BYTE (range->charset),
-				     fn, arg);
-
-    case CHARTAB_RANGE_ROW:
       {
-	Lisp_Object val = ct->level1[XCHARSET_LEADING_BYTE (range->charset) -
-				     MIN_LEADING_BYTE];
-
-	if (CHAR_TABLE_ENTRYP (val))
-	  return map_over_charset_row (ct, XCHAR_TABLE_ENTRY (val),
-				       range->charset, range->row, fn, arg);
-	else if (!UNBOUNDP (val))
-	  {
-	    struct chartab_range rainj;
-
-	    rainj.type = CHARTAB_RANGE_ROW;
-	    rainj.charset = range->charset;
-	    rainj.row = range->row;
-	    return (fn) (&rainj, table, val, arg);
-	  }
-	else
-	  return 0;
+	int i, j;
+	get_charset_limits (range->charset, &l1, &h1, &l2, &h2);
+      iterate_charset:
+	for (i = l1; i <= h1; i++)
+	  for (j = l2; j <= h2; j++)
+	    {
+	      Ichar ch = charset_codepoint_to_ichar_raw (range->charset, i, j);
+	      if (ch >= 0)
+		{
+		  Lisp_Object val = get_char_table (ch, table);
+		  if (!UNBOUNDP (val))
+		    {
+		      int retval = (fn) (table, ch, val, arg);
+		      if (retval)
+			return retval;
+		    }
+		}
+	    }
       }
+      break;
+
 #endif /* MULE */

     case CHARTAB_RANGE_CHAR:
       {
-	Ichar ch = range->ch;
-	Lisp_Object val = get_char_table (ch, table);
-	struct chartab_range rainj;
+	Lisp_Object val = get_char_table (range->ch, table);

 	if (!UNBOUNDP (val))
-	  {
-	    rainj.type = CHARTAB_RANGE_CHAR;
-	    rainj.ch = ch;
-	    return (fn) (&rainj, table, val, arg);
-	  }
+	  return (fn) (table, range->ch, val, arg);
 	else
 	  return 0;
       }
＠＠ -1446,31 +1506,26 ＠＠
 };

 static int
-slow_map_char_table_fun (struct chartab_range *range,
-			 Lisp_Object UNUSED (table), Lisp_Object val,
-			 void *arg)
+slow_map_char_table_fun (Lisp_Object UNUSED (table),
+			 Ichar ch, Lisp_Object val, void *arg)
 {
   struct slow_map_char_table_arg *closure =
     (struct slow_map_char_table_arg *) arg;

-  closure->retval = call2 (closure->function, encode_char_table_range (range),
-			   val);
+  closure->retval = call2 (closure->function, make_char (ch), val);
   return !NILP (closure->retval);
 }

 DEFUN ("map-char-table", Fmap_char_table, 2, 3, 0, /*
 Map FUNCTION over CHAR-TABLE until it returns non-nil; return that value.
-FUNCTION is called with two arguments, each key and entry in the table.
+FUNCTION is called with two arguments, a character and the value for that
+character in the table.  FUNCTION will only be called for characters whose
+value has been set.

 RANGE specifies a subrange to map over.  If omitted or t, it defaults to
-the entire table.
-
-Both RANGE and the keys passed to FUNCTION are in the same format as the
-RANGE argument to `put-char-table'.  N.B. This function does NOT map over
-all characters in RANGE, but over the subranges that have been assigned to.
-Thus this function is most suitable for searching a char-table, or for
-populating one char-table based on the contents of another.  The current
-implementation does not coalesce ranges all of whose values are the same.
+the entire table.  Other possible values are the same as can be passed to
+`put-char-table': an individual character, a cons specifying a character
+range, a charset or a vector giving a charset and a row in that charset.
 */
        (function, char_table, range))
 {
＠＠ -1773,14 +1828,16 ＠＠
   Lisp_Object tail;
   int default_result;

-#if 0
+#ifdef ENABLE_COMPOSITE_CHARS
   if (COMPOSITE_CHAR_P (c1))
     c1 = cmpchar_component (c1, 0, 1);
   if (COMPOSITE_CHAR_P (c2))
     c2 = cmpchar_component (c2, 0, 1);
 #endif

-  if (EQ (ichar_charset (c1), ichar_charset (c2)))
+  /* ＠＠#### fix me */
+  if (EQ (ichar_charset_obsolete_me_baby_please (c1),
+	  ichar_charset_obsolete_me_baby_please (c2)))
     {
       tail = Vword_separating_categories;
       default_result = 0;
＠＠ -1818,10 +1875,9 ＠＠
 syms_of_chartab (void)
 {
   INIT_LRECORD_IMPLEMENTATION (char_table);
+  INIT_LRECORD_IMPLEMENTATION (char_subtable);

 #ifdef MULE
-  INIT_LRECORD_IMPLEMENTATION (char_table_entry);
-
   DEFSYMBOL (Qcategory_table_p);
   DEFSYMBOL (Qcategory_designator_p);
   DEFSYMBOL (Qcategory_table_value_p);
＠＠ -1840,7 +1896,6 ＠＠
   DEFSUBR (Fmake_char_table);
   DEFSUBR (Fcopy_char_table);
   DEFSUBR (Fget_char_table);
-  DEFSUBR (Fget_range_char_table);
   DEFSUBR (Fvalid_char_table_value_p);
   DEFSUBR (Fcheck_valid_char_table_value);
   DEFSUBR (Fput_char_table);
＠＠ -1867,6 +1922,13 ＠＠
   /* DO NOT staticpro this.  It works just like Vweak_hash_tables. */
   Vall_syntax_tables = Qnil;
   dump_add_weak_object_chain (&Vall_syntax_tables);
+
+  init_blank_chartab_tables ();
+
+  staticpro (&chartab_blank[1]);
+  staticpro (&chartab_blank[2]);
+  staticpro (&chartab_blank[3]);
+  staticpro (&chartab_blank[4]);
 }

 void
Index: src/chartab.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/chartab.h,v
retrieving revision 1.16
diff -u -r1.16 chartab.h
--- src/chartab.h	2005/10/24 10:07:34	1.16
+++ src/chartab.h	2005/11/22 14:00:27
＠＠ -1,7 +1,7 ＠＠
 /* Declarations having to do with Mule char tables.
    Copyright (C) 1992 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2002, 2003 Ben Wing.
+   Copyright (C) 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -31,35 +31,104 ＠＠
 #include "charset.h"

 /************************************************************************/
-/*                               Char Tables                            */
+/*                            Basic Char Table Format                   */
 /************************************************************************/
-
-/* Under Mule, we use a complex representation (see below).
-   When not under Mule, there are only 256 possible characters
-   so we just represent them directly. */

-#ifdef MULE
+/* Things are written this way because at one point I designed the
+   subtables so they could either be stored as "plain tables" (as direct
+   256-element arrays), as unified Lisp objects (where the header and
+   following array is a single unit) or as split Lisp object (with a
+   wrapper Lisp object around a separately allocated table).  The plain
+   tables are the fastest and most memory efficient of the three, but
+   can't be used with either KKCC or NEWGC. (KKCC doesn't keep track of
+   whether it has already traversed non-Lisp-object arrays, and thus
+   traverses the shared "blank" subtables numerous times when marking,
+   making it become *extremely* slow.  NEWGC requires that all Lisp objects
+   occur inside of other Lisp objects, never inside of The split
+   Lisp object is slightly more efficient than the */

-struct Lisp_Char_Table_Entry
+struct Lisp_Char_Subtable
 {
-  struct LCRECORD_HEADER header;
-
-  /* In the interests of simplicity, we just use a fixed 96-entry
-     table.  If we felt like being smarter, we could make this
-     variable-size and add an offset value into this structure. */
-  Lisp_Object level2[96];
+  struct LCRECORD_HEADER lheader;
+  Lisp_Object ptr[256];
 };
-typedef struct Lisp_Char_Table_Entry Lisp_Char_Table_Entry;
+
+#define ALLOCATE_LEVEL_N_SUB_TABLE()					\
+  wrap_char_subtable (ALLOC_LCRECORD_TYPE				\
+		      (Lisp_Char_Subtable, &lrecord_char_subtable))
+
+#define SUBTAB_STORAGE_SIZE(table, level, stats)			\
+  LISPOBJ_STORAGE_SIZE (XCHAR_SUBTABLE (table),				\
+			sizeof (struct Lisp_Char_Subtable), stats)
+
+#define FREE_ONE_SUBTABLE(table) FREE_LCRECORD (table)
+
+/* If we use split Lisp char subtables, we'd modify the above struct and three
defines.
+   If we use "plain" non-Lisp char subtables, we'd modify the three macros
above
+   and the macros below as well, and omit the definition of a Lisp subtable
+   object. */
+
+#define SUBTAB_TYPE Lisp_Object
+#define SUBTAB_ARRAY_TYPE SUBTAB_TYPE *
+#define SUBTAB_ARRAY_FROM_SUBTAB(tab) (XCHAR_SUBTABLE (tab)->ptr)
+#define LISPOBJ_ARRAY_FROM_SUBTAB(tab) (XCHAR_SUBTABLE (tab)->ptr)
+#define SUBTAB_EQ(a, b) EQ (a, b)
+
+#define ALLOCATE_LEVEL_1_SUB_TABLE() ALLOCATE_LEVEL_N_SUB_TABLE ()
+
+typedef struct Lisp_Char_Subtable Lisp_Char_Subtable;
+
+DECLARE_LRECORD (char_subtable, Lisp_Char_Subtable);
+#define XCHAR_SUBTABLE(x) XRECORD (x, char_subtable, Lisp_Char_Subtable)
+#define wrap_char_subtable(p) wrap_record (p, char_subtable)
+#define CHAR_SUBTABLEP(x) RECORDP (x, char_subtable)
+#define CHECK_CHAR_SUBTABLE(x) CHECK_RECORD (x, char_subtable)
+#define CONCHECK_CHAR_SUBTABLE(x) CONCHECK_RECORD (x, char_subtable)
+
+/************************************************************************/
+/*                               Char Tables                            */
+/************************************************************************/
+
+#ifndef MULE
+#define MAXIMIZE_CHAR_TABLE_DEPTH
+#endif

-DECLARE_LRECORD (char_table_entry, Lisp_Char_Table_Entry);
-#define XCHAR_TABLE_ENTRY(x) \
-  XRECORD (x, char_table_entry, Lisp_Char_Table_Entry)
-#define wrap_char_table_entry(p) wrap_record (p, char_table_entry)
-#define CHAR_TABLE_ENTRYP(x) RECORDP (x, char_table_entry)
-/* #define CHECK_CHAR_TABLE_ENTRY(x) CHECK_RECORD (x, char_table_entry)
-   char table entries should never escape to Lisp */
+/* Break up a 32-bit character code into 8-bit parts. */

-#endif /* MULE */
+#ifdef MAXIMIZE_CHAR_TABLE_DEPTH
+# define CHARTAB_BREAKUP_CHAR_CODE(val, u1, u2, u3, u4, levels)	\
+do {								\
+  int buc_val = (val);						\
+								\
+  (u1) = buc_val >> 24;						\
+  (u2) = (buc_val >> 16) & 255;					\
+  (u3) = (buc_val >> 8) & 255;					\
+  (u4) = buc_val & 255;						\
+} while (0)
+/* Define the current chartab levels given an expr indicating the level value.
+   This is an optimization designed to cause compiler simplfication of code
+   due to constant expression in if, switch, etc. statements. */
+# ifdef MULE
+#  define CHARTAB_LEVELS(expr) 4
+# else
+#  define CHARTAB_LEVELS(expr) 1
+# endif
+#else /* not MAXIMIZE_CHAR_TABLE_DEPTH */
+# define CHARTAB_BREAKUP_CHAR_CODE(val, u1, u2, u3, u4, levels)	\
+do {								\
+  int buc_val = (val);						\
+								\
+  (u1) = buc_val >> 24;						\
+  (u2) = (buc_val >> 16) & 255;					\
+  (u3) = (buc_val >> 8) & 255;					\
+  (u4) = buc_val & 255;						\
+  (levels) = (buc_val <= 0xFF ? 1 :				\
+	      buc_val <= 0xFFFF ? 2 :				\
+	      buc_val <= 0xFFFFFF ? 3 :				\
+	      4);						\
+} while (0)
+# define CHARTAB_LEVELS(expr) (expr)
+#endif /* not MAXIMIZE_CHAR_TABLE_DEPTH */

 enum char_table_type
 {
＠＠ -72,59 +141,35 ＠＠
   CHAR_TABLE_TYPE_CHAR
 };

-#ifdef MULE
-#define NUM_ASCII_CHARS 160
-#else
-#define NUM_ASCII_CHARS 256
-#endif
-
 struct Lisp_Char_Table
 {
   struct LCRECORD_HEADER header;
+
+  /* Currently we use the same structure as for the Unicode->charset
+     translation tables in unicode.c.  This is extremely fast (constant-
+     time lookup) but a potential space hog, especially in the presence of
+     sparse, non-localized data.  Alternative representations could use
+     hash tables or sorted gap arrays (see extents.c; all the code is
+     already there, including the binary-search algorithm to do lookups).
+     Possibly, we could/should allow the type to be chosen at creation
+     time as a parameter to `make-char-table'. */

-  Lisp_Object ascii[NUM_ASCII_CHARS];
+  SUBTAB_TYPE table;
+  int levels;
   Lisp_Object default_;
   Lisp_Object parent; /* #### not yet implemented */

-#ifdef MULE
-  /* We basically duplicate the Mule vectors-of-vectors implementation.
-     We can do this because we know a great deal about the sorts of
-     things we are going to be indexing.
-
-     The current implementation is as follows:
-
-     ascii[0-159] is used for ASCII and Control-1 characters.
-
-     level1[0 .. (NUM_LEADING_BYTES-1)] indexes charsets by leading
-     byte (subtract MIN_LEADING_BYTE from the leading byte).  If the
-     value of this is not an opaque, then it specifies a value for all
-     characters in the charset.  Otherwise, it will be a
-     96-Lisp-Object opaque that we created, specifying a value for
-     each row.  If the value of this is not an opaque, then it
-     specifies a value for all characters in the row.  Otherwise, it
-     will be a 96-Lisp-Object opaque that we created, specifying a
-     value for each character.
-
-     NOTE: 1) This will fail if some C routine passes an opaque to
-              Fput_char_table().  Currently this is not a problem
-	      since all char tables that are created are Lisp-visible
-	      and thus no one should ever be putting an opaque in
-	      a char table.  Another possibility is to consider
-	      adding a type to */
-
-  Lisp_Object level1[NUM_LEADING_BYTES];
-
-#endif /* MULE */
-
   enum char_table_type type;

+  Lisp_Object next_table; /* DO NOT mark through this. */
+#ifdef MIRROR_TABLE
   /* stuff used for syntax tables */
   Lisp_Object mirror_table; /* points to mirror table for this table
 			       (a cache for quicker access), or a back
 			       pointer if MIRROR_TABLE_P. */
-  Lisp_Object next_table; /* DO NOT mark through this. */
   char dirty; /* nonzero if mirror dirty and needs updating. */
   char mirror_table_p; /* nonzero if this is a mirror table. */
+#endif /* MIRROR_TABLE */
 };
 typedef struct Lisp_Char_Table Lisp_Char_Table;

＠＠ -135,47 +180,110 ＠＠
 #define CHECK_CHAR_TABLE(x) CHECK_RECORD (x, char_table)
 #define CONCHECK_CHAR_TABLE(x) CONCHECK_RECORD (x, char_table)

+/* Note, there is no speed gain whatsoever from dereferencing XCHAR_TABLE()
+   once into a temporary variable and then using it, as compared with just
+   repeatedly using with XCHAR_TABLE_FOO macros, at least in a production
+   build (no-error checking, optimization).  Without error-checking,
+   XCHAR_TABLE() is merely a cast to (foo *), which is a no-op. */
+
+#define XCHAR_TABLE_TABLE(ct) (XCHAR_TABLE (ct)->table)
+#define XCHAR_TABLE_LEVELS(ct) (XCHAR_TABLE (ct)->levels)
+#define XCHAR_TABLE_DEFAULT(ct) (XCHAR_TABLE (ct)->default_)
+#define XCHAR_TABLE_PARENT(ct) (XCHAR_TABLE (ct)->parent)
+
 #define CHAR_TABLE_TYPE(ct) ((ct)->type)
-#define XCHAR_TABLE_TYPE(ct) CHAR_TABLE_TYPE (XCHAR_TABLE (ct))
+#define XCHAR_TABLE_TYPE(ct) (XCHAR_TABLE (ct)->type)
+
+#define XCHAR_TABLE_NEXT_TABLE(ct) (XCHAR_TABLE (ct)->next_table)
+#ifdef MIRROR_TABLE
+#define XCHAR_TABLE_MIRROR_TABLE(ct) (XCHAR_TABLE (ct)->mirror_table)
+#define XCHAR_TABLE_DIRTY(ct) (XCHAR_TABLE (ct)->dirty)
+#define XCHAR_TABLE_MIRROR_TABLE_P(ct) (XCHAR_TABLE (ct)->mirror_table_p)
+#endif /* MIRROR_TABLE */

-Lisp_Object get_non_ascii_char_table_value (Lisp_Char_Table *ct,
-					    int leading_byte,
-					    Ichar c);
+/* Get the raw value of CHARTAB for character CH.  This returns Qunbound
+   if the character's value has not been set. */

 DECLARE_INLINE_HEADER (
 Lisp_Object
-get_char_table_1 (Ichar ch, Lisp_Object table)
+get_char_table_raw (Ichar ch, Lisp_Object chartab)
 )
 {
-  Lisp_Object retval;
-  Lisp_Char_Table *ct = XCHAR_TABLE (table);
-#ifdef MULE
-  if (ch < NUM_ASCII_CHARS)
-    retval = ct->ascii[ch];
-  else
-    {
-      unsigned char lb = ichar_leading_byte (ch);
-      if (!CHAR_TABLE_ENTRYP (ct->level1[lb - MIN_LEADING_BYTE]))
-	retval = ct->level1[lb - MIN_LEADING_BYTE];
-      else
-	retval = get_non_ascii_char_table_value (ct, lb, ch);
-    }
-#else /* not MULE */
-  retval = ct->ascii[(unsigned char) ch];
-#endif /* not MULE */
+  int levels;
+  int u4, u3, u2, u1;
+#ifndef MAXIMIZE_CHAR_TABLE_DEPTH
+  int code_levels;
+#endif
+
+  text_checking_assert (valid_ichar_p (ch));
+  CHARTAB_BREAKUP_CHAR_CODE ((int) ch, u4, u3, u2, u1, code_levels);
+
+  levels = CHARTAB_LEVELS (XCHAR_TABLE_LEVELS (chartab));
+  text_checking_assert (levels >= 1 && levels <= 4);
+
+#if !defined (MULE) && defined (MAXIMIZE_CHAR_TABLE_DEPTH)
+  /* This better be the case or something has gone majorly wrong --
+     the "maximum" depth can't actually account for the highest possible
+     character. */
+  text_checking_assert (ch <= 255);
+#endif
+
+#ifndef MAXIMIZE_CHAR_TABLE_DEPTH
+  /* If not that many levels even in the table, then value definitely not
+     in the table */
+  if (levels < code_levels)
+    return Qunbound;
+#endif /* not MAXIMIZE_CHAR_TABLE_DEPTH */
+
+  {
+    register SUBTAB_TYPE table = XCHAR_TABLE_TABLE (chartab);
+    /* We are really helping the compiler here.  CHARTAB_LEVELS() will
+       evaluate to a constant when MAXIMIZE_CHAR_TABLE_DEPTH is true,
+       so any reasonable optimizing compiler should eliminate the
+       switch entirely. */
+    switch (CHARTAB_LEVELS (levels))
+      {
+	/* Fall through */
+      case 4: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u4];
+      case 3: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u3];
+      case 2: table = SUBTAB_ARRAY_FROM_SUBTAB (table)[u2];
+      case 1: return LISPOBJ_ARRAY_FROM_SUBTAB (table)[u1];
+      }
+  }
+
+  ABORT (); /* Should never happen */
+  return Qunbound;
+}
+
+/* Same as get_char_table but don't trip an assert that we aren't retrieving
+   the value for a mirror table. (Normally we have this assert in place
+   to make sure that mirror tables don't escape to where they shouldn't be.
+   But some code really does need to access the mirror value itself --
+   otherwise, of course, we wouldn't have any need for mirror tables. */
+DECLARE_INLINE_HEADER (
+Lisp_Object
+get_char_table_1 (Ichar ch, Lisp_Object chartab)
+)
+{
+  Lisp_Object retval = get_char_table_raw (ch, chartab);
   if (!UNBOUNDP (retval))
     return retval;
   else
-    return ct->default_;
+    return XCHAR_TABLE_DEFAULT (chartab);
 }

+/* Get the value of CHARTAB for character CH.  If the character's value has
+   not been set, this returns the default value for the char table. */
+
 #ifdef ERROR_CHECK_TYPES
 DECLARE_INLINE_HEADER (
 Lisp_Object
 get_char_table (Ichar ch, Lisp_Object table)
 )
 {
+#ifdef MIRROR_TABLE
   assert (!XCHAR_TABLE (table)->mirror_table_p);
+#endif /* MIRROR_TABLE */
   return get_char_table_1 (ch, table);
 }
 #else
＠＠ -189,38 +297,28 ＠＠
   CHARTAB_RANGE_CHARSET,
   CHARTAB_RANGE_ROW,
 #endif
+  CHARTAB_RANGE_RANGE,
   CHARTAB_RANGE_CHAR
 };

 struct chartab_range
 {
   enum chartab_range_type type;
-  Ichar ch;
+  Ichar ch, chtop;
   Lisp_Object charset;
   int row;
 };

 void set_char_table_default (Lisp_Object table, Lisp_Object value);
+void put_char_table_1 (Lisp_Object chartab, Ichar ch, Lisp_Object val);
 void put_char_table (Lisp_Object table, struct chartab_range *range,
 		     Lisp_Object val);
 int map_char_table (Lisp_Object table,
 		    struct chartab_range *range,
-		    int (*fn) (struct chartab_range *range,
-			       Lisp_Object table,
-			       Lisp_Object val, void *arg),
+		    int (*fn) (Lisp_Object table, Ichar code, Lisp_Object val,
+			       void *arg),
 		    void *arg);
 void prune_syntax_tables (void);
-Lisp_Object get_range_char_table (struct chartab_range *range,
-				  Lisp_Object table, Lisp_Object multi);
-#ifdef ERROR_CHECK_TYPES
-Lisp_Object updating_mirror_get_range_char_table (struct chartab_range *range,
-						  Lisp_Object table,
-						  Lisp_Object multi);
-#else
-#define updating_mirror_get_range_char_table get_range_char_table
-#endif
-void copy_char_table_range (Lisp_Object from, Lisp_Object to,
-			    struct chartab_range *range);
 int word_boundary_p (Ichar c1, Ichar c2);

 EXFUN (Fcopy_char_table, 1);
Index: src/cmdloop.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/cmdloop.c,v
retrieving revision 1.24
diff -u -r1.24 cmdloop.c
--- src/cmdloop.c	2005/10/25 11:16:21	1.24
+++ src/cmdloop.c	2005/11/22 14:00:27
＠＠ -1,6 +1,6 ＠＠
 /* Editor command loop.
    Copyright (C) 1992, 1993, 1994 Free Software Foundation, Inc.
-   Copyright (C) 1995, 1996, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 1996, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -137,6 +137,13 ＠＠
   stderr_out ("*** Backtrace\n");
   Fbacktrace (Qexternal_debugging_output, Qt);
   stderr_out ("*** Killing XEmacs\n");
+#ifdef DEBUG_XEMACS
+  if (!NILP (Vdebug_on_error))
+    {
+      stderr_out ("XEmacs exiting to debugger.\n");
+      Fforce_debugging_signal (Qt);
+    }
+#endif
 #ifdef HAVE_MS_WINDOWS
   Fmswindows_message_box (build_msg_string ("Initialization error"),
 			  Qnil, Qnil);
Index: src/cmds.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/cmds.c,v
retrieving revision 1.19
diff -u -r1.19 cmds.c
--- src/cmds.c	2005/07/17 20:08:47	1.19
+++ src/cmds.c	2005/11/22 14:00:27
＠＠ -378,7 +378,7 ＠＠
   int tab_width;

   overwrite = buf->overwrite_mode;
-  syntax_table = buf->mirror_syntax_table;
+  syntax_table = BUFFER_MIRROR_SYNTAX_TABLE (buf);

 #if 0
   /* No, this is very bad, it makes undo *always* undo a character at a time
＠＠ -552,6 +552,6 ＠＠
 Such characters have value t in this table.
 */);
   Vauto_fill_chars = Fmake_char_table (Qgeneric);
-  XCHAR_TABLE (Vauto_fill_chars)->ascii[' '] = Qt;
-  XCHAR_TABLE (Vauto_fill_chars)->ascii['\n'] = Qt;
+  put_char_table_1 (Vauto_fill_chars, ' ', Qt);
+  put_char_table_1 (Vauto_fill_chars, '\n', Qt);
 }
Index: src/compiler.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/compiler.h,v
retrieving revision 1.10
diff -u -r1.10 compiler.h
--- src/compiler.h	2005/10/25 11:16:21	1.10
+++ src/compiler.h	2005/11/22 14:00:27
＠＠ -1,7 +1,7 ＠＠
 /* Compiler-specific definitions for XEmacs.
    Copyright (C) 1998-1999, 2003 Free Software Foundation, Inc.
    Copyright (C) 1994 Richard Mlynarik.
-   Copyright (C) 1995, 1996, 2000-2004 Ben Wing.
+   Copyright (C) 1995, 1996, 2000-2004, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -223,8 +223,17 ＠＠
 # define UNUSED(decl) UNUSED_ARG (decl) ATTRIBUTE_UNUSED
 # ifdef MULE
 #  define USED_IF_MULE(decl) decl
+#  ifdef UNICODE_INTERNAL
+#   define USED_IF_UNICODE_INTERNAL(decl) decl
+#   define USED_IF_MULE_NOT_UNICODE_INTERNAL(decl) UNUSED (decl)
+#  else
+#   define USED_IF_UNICODE_INTERNAL(decl) UNUSED (decl)
+#   define USED_IF_MULE_NOT_UNICODE_INTERNAL(decl) decl
+#  endif
 # else
 #  define USED_IF_MULE(decl) UNUSED (decl)
+#  define USED_IF_UNICODE_INTERNAL(decl) UNUSED (decl)
+#  define USED_IF_MULE_NOT_UNICODE_INTERNAL(decl) UNUSED (decl)
 # endif
 # if defined (MULE) || defined (ERROR_CHECK_TEXT)
 #  define USED_IF_MULE_OR_CHECK_TEXT(decl) decl
Index: src/config.h.in
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/config.h.in,v
retrieving revision 1.102
diff -u -r1.102 config.h.in
--- src/config.h.in	2005/10/25 11:19:58	1.102
+++ src/config.h.in	2005/11/22 14:00:27
＠＠ -413,6 +413,7 ＠＠
 #undef HAVE_WAITPID
 #undef HAVE_WCSCMP
 #undef HAVE_WCSLEN
+#undef HAVE_WCWIDTH

 #undef HAVE_UTIME
 #undef HAVE_UTIMES
＠＠ -558,10 +559,11 ＠＠
    change is done, and do other extent-related checks. */
 #undef ERROR_CHECK_EXTENTS

-/* Turn on checks related to types -- make sure that all X... macros are
+/* Make sure that arguments are valid to functions and such.  Turns on
+   checks related to types -- makes sure that all X... macros are
    dereferencing the correct type, and that all XSET... macros (as much as
-   possible) are setting the correct type of structure; check any other
-   places that a specific type is expected. */
+   possible) are setting the correct type of structure.  Some code may
+   use a more specific error-checking flag. */
 #undef ERROR_CHECK_TYPES

 /* Turn on checks related to text -- check that text in strings and buffers
＠＠ -638,6 +640,12 ＠＠
    files are read in as binary.  Doesn't apply to Cygwin or MinGW. */
 #undef HAVE_DEFAULT_EOL_DETECTION

+/* Use Unicode-compatible character representation internally.  Otherwise,
+   the old Mule representation is used, which explicitly encodes the
+   character set of the char (with the result that multiple characters map
+   onto the same Unicode codepoint). */
+#undef UNICODE_INTERNAL
+
 /* Support X FontSets.  Evil, yes, but if we're going to make it go away
    by using faces in the menubar we should do so. */
 #undef USE_XFONTSET
＠＠ -1005,6 +1013,15 ＠＠
 #define OBJECTS_SYSTEM sunOS-fix.o strcmp.o strcpy.o
 #endif

+/* CCL is unworkable under UNICODE_INTERNAL currently.  See comments in
+   mule-ccl.c.  Algorithmic conversion tables are currently used only for
+   the Unicode-slop charsets for characters having no other representation
+   in a charset. */
+#if defined (MULE) && !defined (UNICODE_INTERNAL)
+#define HAVE_CCL
+#define ALLOW_ALGORITHMIC_CONVERSION_TABLES
+#endif
+
 #ifdef HAVE_CANNA
 #  define CANNA2
 #  define CANNA_MULE
＠＠ -1091,6 +1108,10 ＠＠
    #### font-lock does its own version using parse-partial-sexp.  We should
    merge the two. */
 #define USE_C_FONT_LOCK
+
+#if defined (ERROR_CHECK_EXTENTS) || defined (ERROR_CHECK_TYPES) || defined
(ERROR_CHECK_TEXT) || defined (ERROR_CHECK_GC) || defined (ERROR_CHECK_MALLOC) || defined
(ERROR_CHECK_BYTE_CODE) || defined (ERROR_CHECK_GLYPHS) || defined (ERROR_CHECK_DISPLAY)
|| defined (ERROR_CHECK_STRUCTURES)
+#define ERROR_CHECK_ANY
+#endif

 #ifdef ERROR_CHECK_ALL
 #define ERROR_CHECK_EXTENTS
Index: src/console-gtk.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/console-gtk.c,v
retrieving revision 1.6
diff -u -r1.6 console-gtk.c
--- src/console-gtk.c	2005/06/26 18:05:02	1.6
+++ src/console-gtk.c	2005/11/22 14:00:28
＠＠ -144,9 +144,12 ＠＠
    to 2.0. So if you're porting XEmacs to GTK 2.0, bear that in mind. */
       char_to_associate 
 #ifdef __GDK_KEYS_H__ 
-	= Funicode_to_char
-	      (make_int(gdk_keyval_to_unicode
-			(gdk_keyval_from_name(symbol_name))), Qnil);
+	= make_char (unicode_to_ichar
+		     (gdk_keyval_to_unicode
+		      (gdk_keyval_from_name (symbol_name)),
+		      /* ＠＠#### need to get some sort of buffer to compute
+			 this off; only applies in the old-Mule world */
+		      get_unicode_precedence (), CONVERR_SUCCEED));
 #else /* GTK 1.whatever doesn't. Use the X11 map. */
         = gtk_keysym_to_character(gdk_keyval_from_name(symbol_name));
 #endif
Index: src/console-impl.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/console-impl.h,v
retrieving revision 1.12
diff -u -r1.12 console-impl.h
--- src/console-impl.h	2005/10/24 10:07:34	1.12
+++ src/console-impl.h	2005/11/22 14:00:28
＠＠ -424,6 +424,8 ＠＠
      an XD_UNION clause to determine the Lisp objects in console_data. */
   enum console_variant contype;

+  /* ~~#### Instead of doing this, attach this data to the end of the same
+     structure; avoids the need to create new TTY, X, etc. Lisp objects */
   /* A structure of auxiliary data specific to the console type.
      struct x_console is used for X window frames; defined in console-x.h
      struct tty_console is used to TTY's; defined in console-tty.h */
Index: src/console.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/console.c,v
retrieving revision 1.44
diff -u -r1.44 console.c
--- src/console.c	2005/10/25 11:16:21	1.44
+++ src/console.c	2005/11/22 14:00:28
＠＠ -161,8 +161,7 ＠＠
   struct console *con = XCONSOLE (obj);

   if (print_readably)
-    printing_unreadable_object ("#<console %s 0x%x>",
-				XSTRING_DATA (con->name), con->header.uid);
+    printing_unreadable_lcrecord (obj, XSTRING_DATA (con->name));

   write_fmt_string (printcharfun, "#<%s-console",
 		    !CONSOLE_LIVE_P (con) ? "dead" : CONSOLE_TYPE_NAME (con));
Index: src/data.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/data.c,v
retrieving revision 1.65
diff -u -r1.65 data.c
--- src/data.c	2005/10/24 10:07:35	1.65
+++ src/data.c	2005/11/22 14:00:29
＠＠ -2546,7 +2546,7 ＠＠
 		 int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<weak-list>");
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string_lisp (printcharfun, "#<weak-list %s %S>", 2,
 			 encode_weak_list_type (XWEAK_LIST (obj)->type),
＠＠ -3021,12 +3021,12 ＠＠
 }

 static void
-print_weak_box (Lisp_Object UNUSED (obj), Lisp_Object printcharfun,
+print_weak_box (Lisp_Object obj, Lisp_Object printcharfun,
 		int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<weak_box>");
-  write_fmt_string (printcharfun, "#<weak_box>");
+    printing_unreadable_lcrecord (obj, 0);
+  write_fmt_string (printcharfun, "#<weak-box>"); /* #### fix */
 }

 static int
＠＠ -3247,12 +3247,12 ＠＠
 }

 static void
-print_ephemeron (Lisp_Object UNUSED (obj), Lisp_Object printcharfun,
+print_ephemeron (Lisp_Object obj, Lisp_Object printcharfun,
 		 int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<ephemeron>");
-  write_fmt_string (printcharfun, "#<ephemeron>");
+    printing_unreadable_lcrecord (obj, 0);
+  write_fmt_string (printcharfun, "#<ephemeron>"); /* #### fix */
 }

 static int
Index: src/database.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/database.c,v
retrieving revision 1.37
diff -u -r1.37 database.c
--- src/database.c	2005/10/25 11:16:22	1.37
+++ src/database.c	2005/11/22 14:00:29
＠＠ -167,7 +167,7 ＠＠
   Lisp_Database *db = XDATABASE (obj);

   if (print_readably)
-    printing_unreadable_object ("#<database 0x%x>", db->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string_lisp (printcharfun, "#<database \"%s\"
(%s/%s/",
 			 3, db->fname, db->funcs->get_type (db),
Index: src/debug.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/debug.c,v
retrieving revision 1.10
diff -u -r1.10 debug.c
--- src/debug.c	2004/11/04 23:06:19	1.10
+++ src/debug.c	2005/11/22 14:00:29
＠＠ -207,4 +207,5 ＠＠
 void
 vars_of_debug (void)
 {
+  Fprovide (intern ("debug-xemacs"));
 }
Index: src/device-impl.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/device-impl.h,v
retrieving revision 1.6
diff -u -r1.6 device-impl.h
--- src/device-impl.h	2005/10/24 10:07:35	1.6
+++ src/device-impl.h	2005/11/22 14:00:29
＠＠ -80,6 +80,8 ＠＠
   /* Duplicates devmeths->symbol.  See comment in struct console. */
   enum console_variant devtype;

+  /* ~~#### Instead of doing this, attach this data to the end of the same
+     structure; avoids the need to create new TTY, X, etc. Lisp objects */
   /* A structure of auxiliary data specific to the device type.
      struct x_device is used for X window frames; defined in console-x.h
      struct tty_device is used to TTY's; defined in console-tty.h */
Index: src/device-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/device-msw.c,v
retrieving revision 1.59
diff -u -r1.59 device-msw.c
--- src/device-msw.c	2005/10/25 11:16:22	1.59
+++ src/device-msw.c	2005/11/22 14:00:30
＠＠ -1120,8 +1120,7 ＠＠
 {
   Lisp_Devmode *dm = XDEVMODE (obj);
   if (print_readably)
-    printing_unreadable_object ("#<msprinter-settings 0x%x>",
-				dm->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_c_string (printcharfun, "#<msprinter-settings");
   if (!NILP (dm->printer_name))
     write_fmt_string_lisp (printcharfun, " for %S", 1, dm->printer_name);
Index: src/device-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/device-x.c,v
retrieving revision 1.65
diff -u -r1.65 device-x.c
--- src/device-x.c	2005/10/25 11:16:22	1.65
+++ src/device-x.c	2005/11/22 14:00:30
＠＠ -359,9 +359,12 ＠＠
   Extbyte *data;

   TO_EXTERNAL_FORMAT (LISP_STRING, str, ALLOCA, (data, len), Qbinary);
-  Dynarr_add_many (cda, data, len);
-  validify_resource_component (Dynarr_atp (cda, Dynarr_length (cda) - len),
-			       len);
+  if (len)
+    {
+      Dynarr_add_many (cda, data, len);
+      validify_resource_component (Dynarr_atp (cda, Dynarr_length (cda) - len),
+				   len);
+    }
 }

 #if 0
Index: src/device.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/device.c,v
retrieving revision 1.36
diff -u -r1.36 device.c
--- src/device.c	2005/10/25 11:16:22	1.36
+++ src/device.c	2005/11/22 14:00:31
＠＠ -145,8 +145,7 ＠＠
   struct device *d = XDEVICE (obj);

   if (print_readably)
-    printing_unreadable_object ("#<device %s 0x%x>",
-				XSTRING_DATA (d->name), d->header.uid);
+    printing_unreadable_lcrecord (obj, XSTRING_DATA (d->name));

   write_fmt_string (printcharfun, "#<%s-device", !DEVICE_LIVE_P (d) ?
"dead":
 		    DEVICE_TYPE_NAME (d));
Index: src/dumper.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/dumper.c,v
retrieving revision 1.31
diff -u -r1.31 dumper.c
--- src/dumper.c	2005/10/14 01:22:00	1.31
+++ src/dumper.c	2005/11/22 14:00:31
＠＠ -436,7 +436,7 ＠＠

 #ifdef MC_ALLOC
 /* PDUMP_HASHSIZE is a large prime. */
-#define PDUMP_HASHSIZE        1000003
+#define PDUMP_HASHSIZE        4754591
 /* Nothing special about PDUMP_HASH_MULTIPLIER: arbitrary odd integer
    smaller than PDUMP_HASHSIZE. */
 #define PDUMP_HASH_MULTIPLIER   12347
＠＠ -444,7 +444,7 ＠＠
    probing. */
 #define PDUMP_HASH_STEP        574853
 #else /* not MC_ALLOC */
-#define PDUMP_HASHSIZE 200001
+#define PDUMP_HASHSIZE 2164111
 #endif /* not MC_ALLOC */

 static pdump_block_list_elt **pdump_hash;
Index: src/dynarr.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/dynarr.c,v
retrieving revision 1.12
diff -u -r1.12 dynarr.c
--- src/dynarr.c	2005/01/24 23:33:50	1.12
+++ src/dynarr.c	2005/11/22 14:00:31
＠＠ -1,6 +1,6 ＠＠
 /* Support for dynamic arrays.
    Copyright (C) 1993 Sun Microsystems, Inc.
-   Copyright (C) 2002, 2003, 2004 Ben Wing.
+   Copyright (C) 2002, 2003, 2004, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -98,7 +98,8 ＠＠

    int Dynarr_largest(d)
       [MACRO] Return the maximum value that Dynarr_length(d) would
-      ever have returned.
+      ever have returned.  This is used esp. in the redisplay code,
+      which reuses dynarrs for performance reasons.

    type Dynarr_at(d, i)
       [MACRO] Return the element at the specified index (no bounds checking
＠＠ -126,6 +127,88 ＠＠
 #include <config.h>
 #include "lisp.h"

+/* ------------------------ dynamic arrays ------------------- */
+
+static const struct memory_description int_description_1[] = {
+  { XD_END }
+};
+
+const struct sized_memory_description int_description = {
+  sizeof (int),
+  int_description_1
+};
+
+static const struct memory_description int_dynarr_description_1[] = {
+  XD_DYNARR_DESC (int_dynarr, &int_description),
+  { XD_END }
+};
+
+const struct sized_memory_description int_dynarr_description = {
+  sizeof (int_dynarr),
+  int_dynarr_description_1
+};
+
+static const struct memory_description unsigned_char_description_1[] = {
+  { XD_END }
+};
+
+const struct sized_memory_description unsigned_char_description = {
+  sizeof (unsigned char),
+  unsigned_char_description_1
+};
+
+static const struct memory_description unsigned_char_dynarr_description_1[] = {
+  XD_DYNARR_DESC (unsigned_char_dynarr, &unsigned_char_description),
+  { XD_END }
+};
+
+const struct sized_memory_description unsigned_char_dynarr_description = {
+  sizeof (unsigned_char_dynarr),
+  unsigned_char_dynarr_description_1
+};
+
+static const struct memory_description Lisp_Object_description_1[] = {
+  { XD_LISP_OBJECT, 0 },
+  { XD_END }
+};
+
+const struct sized_memory_description Lisp_Object_description = {
+  sizeof (Lisp_Object),
+  Lisp_Object_description_1
+};
+
+static const struct memory_description Lisp_Object_dynarr_description_1[] = {
+  XD_DYNARR_DESC (Lisp_Object_dynarr, &Lisp_Object_description),
+  { XD_END }
+};
+
+/* Not static; used in mule-coding.c */
+const struct sized_memory_description Lisp_Object_dynarr_description = {
+  sizeof (Lisp_Object_dynarr),
+  Lisp_Object_dynarr_description_1
+};
+
+static const struct memory_description Lisp_Object_pair_description_1[] = {
+  { XD_LISP_OBJECT, offsetof (Lisp_Object_pair, key) },
+  { XD_LISP_OBJECT, offsetof (Lisp_Object_pair, value) },
+  { XD_END }
+};
+
+const struct sized_memory_description Lisp_Object_pair_description = {
+  sizeof (Lisp_Object_pair),
+  Lisp_Object_pair_description_1
+};
+
+static const struct memory_description Lisp_Object_pair_dynarr_description_1[] = {
+  XD_DYNARR_DESC (Lisp_Object_pair_dynarr, &Lisp_Object_pair_description),
+  { XD_END }
+};
+
+const struct sized_memory_description Lisp_Object_pair_dynarr_description = {
+  sizeof (Lisp_Object_pair_dynarr),
+  Lisp_Object_pair_dynarr_description_1
+};
+
 static int Dynarr_min_size = 8;

 static void
＠＠ -179,30 +262,33 ＠＠
 {
   Dynarr *dy = (Dynarr *) Dynarr_verify (d);

-  Dynarr_resize (dy, dy->cur+len);
+  if (dy->len + len > dy->max)
+    Dynarr_resize (dy, dy->len + len);
 #if 0
   /* WTF? We should be catching these problems. */
   /* Silently adjust start to be valid. */
-  if (start > dy->cur)
-    start = dy->cur;
+  if (start > dy->len)
+    start = dy->len;
   else if (start < 0)
     start = 0;
 #else
-  assert (start >= 0 && start <= dy->cur);
+  /* #### This could conceivably be wrong, if code wants to access stuff
+     between len and largest. */
+  type_checking_assert (start >= 0 && start <= dy->len);
 #endif

-  if (start != dy->cur)
+  if (start != dy->len)
     {
       memmove ((char *) dy->base + (start + len)*dy->elsize,
 	       (char *) dy->base + start*dy->elsize,
-	       (dy->cur - start)*dy->elsize);
+	       (dy->len - start)*dy->elsize);
     }
   if (el)
     memcpy ((char *) dy->base + start*dy->elsize, el, len*dy->elsize);
-  dy->cur += len;
+  dy->len += len;

-  if (dy->cur > dy->largest)
-    dy->largest = dy->cur;
+  if (dy->len > dy->largest)
+    dy->largest = dy->len;
 }

 void
＠＠ -210,11 +296,11 ＠＠
 {
   Dynarr *dy = (Dynarr *) Dynarr_verify (d);

-  assert (start >= 0 && len >= 0 && start + len <= dy->cur);
+  type_checking_assert (start >= 0 && len >= 0 && start + len <=
dy->len);
   memmove ((char *) dy->base + start*dy->elsize,
 	   (char *) dy->base + (start + len)*dy->elsize,
-	   (dy->cur - start - len)*dy->elsize);
-  dy->cur -= len;
+	   (dy->len - start - len)*dy->elsize);
+  dy->len -= len;
 }

 void
＠＠ -253,7 +339,7 ＠＠
       Bytecount malloc_used = malloced_storage_size (dy->base,
 						     dy->elsize * dy->max, 0);
       /* #### This may or may not be correct.  Some Dynarrs would
-	 prefer that we use dy->cur instead of dy->largest here. */
+	 prefer that we use dy->len instead of dy->largest here. */
       Bytecount was_requested = dy->elsize * dy->largest;
       Bytecount dynarr_overhead = dy->elsize * (dy->max - dy->largest);

＠＠ -272,6 +358,51 ＠＠

 #endif /* MEMORY_USAGE_STATS */

+void
+mark_Lisp_Object_dynarr (Lisp_Object_dynarr *dyn)
+{
+  int i;
+  for (i = 0; i < Dynarr_length (dyn); i++)
+    mark_object (Dynarr_at (dyn, i));
+}
+
+/* --------------------------- static dynarrs ------------------------- */
+
+/* Add a number of contiguous elements to the array starting at START. */
+void
+Stynarr_insert_many_1 (void *d, const void *els, int len, int start,
+		       int num_static, int elsize, int staticoff)
+{
+  Stynarr *dy = (Stynarr *) d;
+  type_checking_assert (start >= 0 && start <= dy->nels);
+  /* If we'll need Dynarr space, make sure the Dynarr is there */
+  if (len + dy->nels > num_static && !dy->els)
+    VOIDP_CAST (dy->els) = Dynarr_newf (elsize);
+  /* Entirely within Dynarr? */
+  if (start >= num_static)
+    Dynarr_insert_many (dy->els, els, len, start - num_static);
+  /* Entirely within static part? */
+  else if (len + dy->nels <= num_static)
+    {
+      if (start != dy->nels)
+	{
+	  memmove ((char *) dy + staticoff + (start + len)*elsize,
+		   (char *) dy + staticoff + start*elsize,
+		   (dy->nels - start)*elsize);
+	}
+      if (els)
+	memcpy ((char *) dy + staticoff + start*elsize, els, len*elsize);
+    }
+  /* Else, partly within static, partly within Dynarr */
+  else
+    {
+      /* #### Finish me */
+      ABORT ();
+    }
+}
+
+/* ---------------------- stack-like malloc ----------------------- */
+
 /* Version of malloc() that will be extremely efficient when allocation
    nearly always occurs in LIFO (stack) order.

＠＠ -302,7 +433,8 ＠＠
   else
     this_one = Dynarr_new (char);
   Dynarr_add (stack_like_in_use_list, this_one);

-  Dynarr_resize (this_one, size);
+  Dynarr_reset (this_one);
+  Dynarr_add_many (this_one, 0, size);
   return Dynarr_atp (this_one, 0);
 }

Index: src/elhash.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/elhash.c,v
retrieving revision 1.43
diff -u -r1.43 elhash.c
--- src/elhash.c	2005/10/24 10:07:35	1.43
+++ src/elhash.c	2005/11/22 14:00:32
＠＠ -1,6 +1,6 ＠＠
 /* Implementation of the hash table lisp object type.
    Copyright (C) 1992, 1993, 1994 Free Software Foundation, Inc.
-   Copyright (C) 1995, 1996, 2002, 2004 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2004, 2005 Ben Wing.
    Copyright (C) 1997 Free Software Foundation, Inc.

 This file is part of XEmacs.
＠＠ -1657,10 +1657,11 ＠＠
    hash, but practically this won't ever happen. */

 Hashcode
-internal_hash (Lisp_Object obj, int depth)
+internal_hash_1 (Lisp_Object obj, int depth)
 {
   if (depth > 5)
     return 0;
+
   if (CONSP (obj))
     {
       /* no point in worrying about tail recursion, since we're not
＠＠ -1671,16 +1672,16 ＠＠
   if (STRINGP (obj))
     {
       return hash_string (XSTRING_DATA (obj), XSTRING_LENGTH (obj));
-    }
-  if (LRECORDP (obj))
-    {
-      const struct lrecord_implementation
-	*imp = XRECORD_LHEADER_IMPLEMENTATION (obj);
-      if (imp->hash)
-	return imp->hash (obj, depth);
     }
-
-  return LISP_HASH (obj);
+  /* We should have already caught all non-lrecords and all lrecords without
+     a hash method, in internal_hash(). */
+  structure_checking_assert (LRECORDP (obj));
+  {
+    const struct lrecord_implementation
+      *imp = XRECORD_LHEADER_IMPLEMENTATION (obj);
+    structure_checking_assert (imp->hash);
+    return imp->hash (obj, depth);
+  }
 }

 DEFUN ("sxhash", Fsxhash, 1, 1, 0, /*
Index: src/emacs.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/emacs.c,v
retrieving revision 1.160
diff -u -r1.160 emacs.c
--- src/emacs.c	2005/10/25 08:32:47	1.160
+++ src/emacs.c	2005/11/22 14:00:33
＠＠ -1635,7 +1635,9 ＠＠
       syms_of_file_coding ();
       syms_of_unicode ();
 #ifdef MULE
+#ifdef HAVE_CCL
       syms_of_mule_ccl ();
+#endif /* HAVE_CCL */
       syms_of_mule_charset ();
       syms_of_mule_coding ();
 #ifdef HAVE_WNN
＠＠ -2201,7 +2203,9 ＠＠
 #endif	/* HAVE_MS_WINDOWS */

 #ifdef MULE
+#ifdef HAVE_CCL
       vars_of_mule_ccl ();
+#endif /* HAVE_CCL */
       vars_of_mule_charset ();
 #endif
       vars_of_file_coding ();
＠＠ -2352,8 +2356,11 ＠＠
 	 function and another. */

 #ifdef MULE
-      /* This depends on vars initialized in vars_of_unicode(). */
+      /* This creates charsets, which depends on vars initialized in
+	 vars_of_unicode(). */
       complex_vars_of_mule_charset ();
+      /* This depends on charsets created in complex_vars_of_mule_charset(). */
+      complex_vars_of_mule_coding ();
 #endif
       /* This one doesn't depend on anything really, and could go into
 	 vars_of_(), but lots of lots of code gets called and it's easily
＠＠ -2362,6 +2369,7 ＠＠
 	 then we suddenly have dependence on the previous call. */
       complex_vars_of_file_coding ();
 #ifdef WIN32_ANY
+      /* Likewise this one. */
       complex_vars_of_intl_win32 ();
 #endif

＠＠ -3278,7 +3286,7 ＠＠

 /* Return whether all bytes in the specified memory block can be read. */
 int
-debug_can_access_memory (void *ptr, Bytecount len)
+debug_can_access_memory (const void *ptr, Bytecount len)
 {
   return !IsBadReadPtr (ptr, len);
 }
＠＠ -3299,7 +3307,7 ＠＠

 /* Return whether all bytes in the specified memory block can be read. */
 int
-debug_can_access_memory (void *ptr, Bytecount len)
+debug_can_access_memory (const void *ptr, Bytecount len)
 {
   /* Use volatile to protect variables from being clobbered by longjmp. */
   SIGTYPE (*volatile old_sigbus) (int);
Index: src/eval.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/eval.c,v
retrieving revision 1.90
diff -u -r1.90 eval.c
--- src/eval.c	2005/10/25 11:16:23	1.90
+++ src/eval.c	2005/11/22 14:00:35
＠＠ -2995,20 +2995,6 ＠＠
   signal_error (Qout_of_memory, reason, frob);
 }

-DOESNT_RETURN
-printing_unreadable_object (const CIbyte *fmt, ...)
-{
-  Lisp_Object obj;
-  va_list args;
-
-  va_start (args, fmt);
-  obj = emacs_vsprintf_string (CGETTEXT (fmt), args);
-  va_end (args);
-
-  /* Fsignal GC-protects its args */
-  signal_error (Qprinting_unreadable_object, 0, obj);
-}
-

 /************************************************************************/
 /*			      User commands				*/
Index: src/event-Xt.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/event-Xt.c,v
retrieving revision 1.87
diff -u -r1.87 event-Xt.c
--- src/event-Xt.c	2005/09/27 05:29:44	1.87
+++ src/event-Xt.c	2005/11/22 14:00:36
＠＠ -1,7 +1,7 ＠＠
 /* The event_stream interface for X11 with Xt, and/or tty frames.
    Copyright (C) 1991-5, 1997 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1996, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1996, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -136,6 +136,14 ＠＠
 /************************************************************************/
 /*                            keymap handling                           */
 /************************************************************************/
+
+/* ＠＠#### This is the wrong approach, I think.  We should not be exposing the
+   name of the keysym anywhere, or forcing the user to use this name.
+   At the Lisp level, the user should simply see the character itself, and
+   should be able to bind the actual character.  Furthermore, introducing
+   the symbol introduces an X-specific dependency; I can't expect to set
+   a binding for a particular Unicode character and have it work on both
+   Windows and X. --ben */

 /* See comment near character_to_event(). */
 static void
Index: src/event-xlike-inc.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/event-xlike-inc.c,v
retrieving revision 1.2
diff -u -r1.2 event-xlike-inc.c
--- src/event-xlike-inc.c	2005/06/26 18:05:04	1.2
+++ src/event-xlike-inc.c	2005/11/22 14:00:36
＠＠ -1,7 +1,7 ＠＠
 /* Shared event code between X and GTK -- include file.
    Copyright (C) 1991-5, 1997 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1996, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1996, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -27,8 +27,6 ＠＠
    included here, not in event-xlike.c.  However, event-xlike.c is always
    X-specific, whereas the following code isn't, in the GTK case. */

-EXFUN (Funicode_to_char, 2);  /* In unicode.c.  */
-
 static int
 #ifdef THIS_IS_GTK
 emacs_gtk_event_pending_p (int how_many)
＠＠ -164,6 +162,8 ＠＠

 #if defined(THIS_IS_X) || !defined(__GDK_KEYS_H__)

+#ifdef MULE
+
 /* Use an appropriate map to Unicode within x_keysym_to_character. Arguments
    are evaluated multiple times.

＠＠ -175,7 +175,13 ＠＠
       && map[keysym - FIRST_KNOWN_##map ]) do				\
     {									\
       keysym -= FIRST_KNOWN_##map ;					\
-      return Funicode_to_char(make_int(map[keysym]), Qnil);		\
+      return make_char (unicode_to_ichar ((int) map[keysym],		\
+					  /* ＠＠#### need to get some sort \
+					     of buffer to compute this off; \
+					     only applies in the old-Mule \
+					     world */			\
+					  get_unicode_precedence (),	\
+					  CONVERR_SUCCEED));		\
     } while (0)

 /* Maps to Unicode for X11 KeySyms, where we don't have a direct internal
＠＠ -184,11 +190,8 ＠＠
    sequences of KeySyms often leave out entries, so you'll have to fill them
    in. Doesn't include support for Hangul, which it should, if the X11
    Hangul keysyms have ever been used anywhere.
+*/

-   I'm not #ifdef'ing this based on wheter MULE is defined, because it's a
-   matter of 324 bytes in a stripped executable, and I want the
-   testing. :-P */
-
 static UINT_16_BIT const TECHNICAL[] = 
   {
     0x23B7,	/* #x08A1 LEFT RADICAL	Technical */
＠＠ -461,6 +464,445 ＠＠
     0x22A3,	/* #x0BFC RIGHT TACK	APL */
   };

+static UINT_16_BIT const HANGUL[] = 
+  {
+#define FIRST_KNOWN_HANGUL 0xEA1
+    0x3131,     /* #x0EA1 Hangul_Kiyeog */
+    0x3132,     /* #x0EA2 Hangul_SsangKiyeog */
+    0x3133,     /* #x0EA3 Hangul_KiyeogSios */
+    0x3134,     /* #x0EA4 Hangul_Nieun */
+    0x3135,     /* #x0EA5 Hangul_NieunJieuj */
+    0x3136,     /* #x0EA6 Hangul_NieunHieuh */
+    0x3137,     /* #x0EA7 Hangul_Dikeud */
+    0x3138,     /* #x0EA8 Hangul_SsangDikeud */
+    0x3139,     /* #x0EA9 Hangul_Rieul */
+    0x313a,     /* #x0EAA Hangul_RieulKiyeog */
+    0x313b,     /* #x0EAB Hangul_RieulMieum */
+    0x313c,     /* #x0EAC Hangul_RieulPieub */
+    0x313d,     /* #x0EAD Hangul_RieulSios */
+    0x313e,     /* #x0EAE Hangul_RieulTieut */
+    0x313f,     /* #x0EAF Hangul_RieulPhieuf */
+    0x3140,     /* #x0EB0 Hangul_RieulHieuh */
+    0x3141,     /* #x0EB1 Hangul_Mieum */
+    0x3142,     /* #x0EB2 Hangul_Pieub */
+    0x3143,     /* #x0EB3 Hangul_SsangPieub */
+    0x3144,     /* #x0EB4 Hangul_PieubSios */
+    0x3145,     /* #x0EB5 Hangul_Sios */
+    0x3146,     /* #x0EB6 Hangul_SsangSios */
+    0x3147,     /* #x0EB7 Hangul_Ieung */
+    0x3148,     /* #x0EB8 Hangul_Jieuj */
+    0x3149,     /* #x0EB9 Hangul_SsangJieuj */
+    0x314a,     /* #x0EBA Hangul_Cieuc */
+    0x314b,     /* #x0EBB Hangul_Khieuq */
+    0x314c,     /* #x0EBC Hangul_Tieut */
+    0x314d,     /* #x0EBD Hangul_Phieuf */
+    0x314e,     /* #x0EBE Hangul_Hieuh */
+    0x314f,     /* #x0EBF Hangul_A */
+    0x3150,     /* #x0EC0 Hangul_AE */
+    0x3151,     /* #x0EC1 Hangul_YA */
+    0x3152,     /* #x0EC2 Hangul_YAE */
+    0x3153,     /* #x0EC3 Hangul_EO */
+    0x3154,     /* #x0EC4 Hangul_E */
+    0x3155,     /* #x0EC5 Hangul_YEO */
+    0x3156,     /* #x0EC6 Hangul_YE */
+    0x3157,     /* #x0EC7 Hangul_O */
+    0x3158,     /* #x0EC8 Hangul_WA */
+    0x3159,     /* #x0EC9 Hangul_WAE */
+    0x315a,     /* #x0ECA Hangul_OE */
+    0x315b,     /* #x0ECB Hangul_YO */
+    0x315c,     /* #x0ECC Hangul_U */
+    0x315d,     /* #x0ECD Hangul_WEO */
+    0x315e,     /* #x0ECE Hangul_WE */
+    0x315f,     /* #x0ECF Hangul_WI */
+    0x3160,     /* #x0ED0 Hangul_YU */
+    0x3161,     /* #x0ED1 Hangul_EU */
+    0x3162,     /* #x0ED2 Hangul_YI */
+    0x3163,     /* #x0ED3 Hangul_I */
+    0x11a8,     /* #x0ED4 Hangul_J_Kiyeog */
+    0x11a9,     /* #x0ED5 Hangul_J_SsangKiyeog */
+    0x11aa,     /* #x0ED6 Hangul_J_KiyeogSios */
+    0x11ab,     /* #x0ED7 Hangul_J_Nieun */
+    0x11ac,     /* #x0ED8 Hangul_J_NieunJieuj */
+    0x11ad,     /* #x0ED9 Hangul_J_NieunHieuh */
+    0x11ae,     /* #x0EDA Hangul_J_Dikeud */
+    0x11af,     /* #x0EDB Hangul_J_Rieul */
+    0x11b0,     /* #x0EDC Hangul_J_RieulKiyeog */
+    0x11b1,     /* #x0EDD Hangul_J_RieulMieum */
+    0x11b2,     /* #x0EDE Hangul_J_RieulPieub */
+    0x11b3,     /* #x0EDF Hangul_J_RieulSios */
+    0x11b4,     /* #x0EE0 Hangul_J_RieulTieut */
+    0x11b5,     /* #x0EE1 Hangul_J_RieulPhieuf */
+    0x11b6,     /* #x0EE2 Hangul_J_RieulHieuh */
+    0x11b7,     /* #x0EE3 Hangul_J_Mieum */
+    0x11b8,     /* #x0EE4 Hangul_J_Pieub */
+    0x11b9,     /* #x0EE5 Hangul_J_PieubSios */
+    0x11ba,     /* #x0EE6 Hangul_J_Sios */
+    0x11bb,     /* #x0EE7 Hangul_J_SsangSios */
+    0x11bc,     /* #x0EE8 Hangul_J_Ieung */
+    0x11bd,     /* #x0EE9 Hangul_J_Jieuj */
+    0x11be,     /* #x0EEA Hangul_J_Cieuc */
+    0x11bf,     /* #x0EEB Hangul_J_Khieuq */
+    0x11c0,     /* #x0EEC Hangul_J_Tieut */
+    0x11c1,     /* #x0EED Hangul_J_Phieuf */
+    0x11c2,     /* #x0EEE Hangul_J_Hieuh */
+    0x316d,     /* #x0EEF Hangul_RieulYeorinHieuh */
+    0x3171,     /* #x0EF0 Hangul_SunkyeongeumMieum */
+    0x3178,     /* #x0EF1 Hangul_SunkyeongeumPieub */
+    0x317f,     /* #x0EF2 Hangul_PanSios */
+    0x3181,     /* #x0EF3 Hangul_KkogjiDalrinIeung */
+    0x3184,     /* #x0EF4 Hangul_SunkyeongeumPhieuf */
+    0x3186,     /* #x0EF5 Hangul_YeorinHieuh */
+    0x318d,     /* #x0EF6 Hangul_AraeA */
+    0x318e,     /* #x0EF7 Hangul_AraeAE */
+    0x11eb,     /* #x0EF8 Hangul_J_PanSios */
+    0x11f0,     /* #x0EF9 Hangul_J_KkogjiDalrinIeung */
+    0x11f9,     /* #x0EFA Hangul_J_YeorinHieuh */
+    0x0000,     /* #x0EFB */
+    0x0000,     /* #x0EFC */
+    0x0000,     /* #x0EFD */
+    0x0000,     /* #x0EFE */
+    0x20a9,     /* #x0EFF Korean_Won */
+  };
+
+static UINT_16_BIT const ARMENIAN[] = 
+  {
+#define FIRST_KNOWN_ARMENIAN 0x14A1
+    0x0000,     /* #x14A1 Armenian_eternity */
+    0x0587,     /* #x14A2 Armenian_ligature_ew */
+    0x0589,     /* #x14A3 Armenian_verjaket */
+    0x0029,     /* #x14A4 Armenian_parenright */
+    0x0028,     /* #x14A5 Armenian_parenleft */
+    0x00bb,     /* #x14A6 Armenian_guillemotright */
+    0x00ab,     /* #x14A7 Armenian_guillemotleft */
+    0x2014,     /* #x14A8 Armenian_em_dash */
+    0x002e,     /* #x14A9 Armenian_mijaket */
+    0x055d,     /* #x14AA Armenian_but */
+    0x002c,     /* #x14AB Armenian_comma */
+    0x2013,     /* #x14AC Armenian_en_dash */
+    0x058a,     /* #x14AD Armenian_yentamna */
+    0x2026,     /* #x14AE Armenian_ellipsis */
+    0x055c,     /* #x14AF Armenian_amanak */
+    0x055b,     /* #x14B0 Armenian_shesht */
+    0x055e,     /* #x14B1 Armenian_paruyk */
+    0x0531,     /* #x14B2 Armenian_AYB */
+    0x0561,     /* #x14B3 Armenian_ayb */
+    0x0532,     /* #x14B4 Armenian_BEN */
+    0x0562,     /* #x14B5 Armenian_ben */
+    0x0533,     /* #x14B6 Armenian_GIM */
+    0x0563,     /* #x14B7 Armenian_gim */
+    0x0534,     /* #x14B8 Armenian_DA */
+    0x0564,     /* #x14B9 Armenian_da */
+    0x0535,     /* #x14BA Armenian_YECH */
+    0x0565,     /* #x14BB Armenian_yech */
+    0x0536,     /* #x14BC Armenian_ZA */
+    0x0566,     /* #x14BD Armenian_za */
+    0x0537,     /* #x14BE Armenian_E */
+    0x0567,     /* #x14BF Armenian_e */
+    0x0538,     /* #x14C0 Armenian_AT */
+    0x0568,     /* #x14C1 Armenian_at */
+    0x0539,     /* #x14C2 Armenian_TO */
+    0x0569,     /* #x14C3 Armenian_to */
+    0x053a,     /* #x14C4 Armenian_ZHE */
+    0x056a,     /* #x14C5 Armenian_zhe */
+    0x053b,     /* #x14C6 Armenian_INI */
+    0x056b,     /* #x14C7 Armenian_ini */
+    0x053c,     /* #x14C8 Armenian_LYUN */
+    0x056c,     /* #x14C9 Armenian_lyun */
+    0x053d,     /* #x14CA Armenian_KHE */
+    0x056d,     /* #x14CB Armenian_khe */
+    0x053e,     /* #x14CC Armenian_TSA */
+    0x056e,     /* #x14CD Armenian_tsa */
+    0x053f,     /* #x14CE Armenian_KEN */
+    0x056f,     /* #x14CF Armenian_ken */
+    0x0540,     /* #x14D0 Armenian_HO */
+    0x0570,     /* #x14D1 Armenian_ho */
+    0x0541,     /* #x14D2 Armenian_DZA */
+    0x0571,     /* #x14D3 Armenian_dza */
+    0x0542,     /* #x14D4 Armenian_GHAT */
+    0x0572,     /* #x14D5 Armenian_ghat */
+    0x0543,     /* #x14D6 Armenian_TCHE */
+    0x0573,     /* #x14D7 Armenian_tche */
+    0x0544,     /* #x14D8 Armenian_MEN */
+    0x0574,     /* #x14D9 Armenian_men */
+    0x0545,     /* #x14DA Armenian_HI */
+    0x0575,     /* #x14DB Armenian_hi */
+    0x0546,     /* #x14DC Armenian_NU */
+    0x0576,     /* #x14DD Armenian_nu */
+    0x0547,     /* #x14DE Armenian_SHA */
+    0x0577,     /* #x14DF Armenian_sha */
+    0x0548,     /* #x14E0 Armenian_VO */
+    0x0578,     /* #x14E1 Armenian_vo */
+    0x0549,     /* #x14E2 Armenian_CHA */
+    0x0579,     /* #x14E3 Armenian_cha */
+    0x054a,     /* #x14E4 Armenian_PE */
+    0x057a,     /* #x14E5 Armenian_pe */
+    0x054b,     /* #x14E6 Armenian_JE */
+    0x057b,     /* #x14E7 Armenian_je */
+    0x054c,     /* #x14E8 Armenian_RA */
+    0x057c,     /* #x14E9 Armenian_ra */
+    0x054d,     /* #x14EA Armenian_SE */
+    0x057d,     /* #x14EB Armenian_se */
+    0x054e,     /* #x14EC Armenian_VEV */
+    0x057e,     /* #x14ED Armenian_vev */
+    0x054f,     /* #x14EE Armenian_TYUN */
+    0x057f,     /* #x14EF Armenian_tyun */
+    0x0550,     /* #x14F0 Armenian_RE */
+    0x0580,     /* #x14F1 Armenian_re */
+    0x0551,     /* #x14F2 Armenian_TSO */
+    0x0581,     /* #x14F3 Armenian_tso */
+    0x0552,     /* #x14F4 Armenian_VYUN */
+    0x0582,     /* #x14F5 Armenian_vyun */
+    0x0553,     /* #x14F6 Armenian_PYUR */
+    0x0583,     /* #x14F7 Armenian_pyur */
+    0x0554,     /* #x14F8 Armenian_KE */
+    0x0584,     /* #x14F9 Armenian_ke */
+    0x0555,     /* #x14FA Armenian_O */
+    0x0585,     /* #x14FB Armenian_o */
+    0x0556,     /* #x14FC Armenian_FE */
+    0x0586,     /* #x14FD Armenian_fe */
+    0x055a,     /* #x14FE Armenian_apostrophe */
+    0x00a7,     /* #x14FF Armenian_section_sign */
+  };
+
+static UINT_16_BIT const GEORGIAN[] = 
+  {
+#define FIRST_KNOWN_GEORGIAN 0x15D0
+    0x10d0,     /* #x15D0 Georgian_an */
+    0x10d1,     /* #x15D1 Georgian_ban */
+    0x10d2,     /* #x15D2 Georgian_gan */
+    0x10d3,     /* #x15D3 Georgian_don */
+    0x10d4,     /* #x15D4 Georgian_en */
+    0x10d5,     /* #x15D5 Georgian_vin */
+    0x10d6,     /* #x15D6 Georgian_zen */
+    0x10d7,     /* #x15D7 Georgian_tan */
+    0x10d8,     /* #x15D8 Georgian_in */
+    0x10d9,     /* #x15D9 Georgian_kan */
+    0x10da,     /* #x15DA Georgian_las */
+    0x10db,     /* #x15DB Georgian_man */
+    0x10dc,     /* #x15DC Georgian_nar */
+    0x10dd,     /* #x15DD Georgian_on */
+    0x10de,     /* #x15DE Georgian_par */
+    0x10df,     /* #x15DF Georgian_zhar */
+    0x10e0,     /* #x15E0 Georgian_rae */
+    0x10e1,     /* #x15E1 Georgian_san */
+    0x10e2,     /* #x15E2 Georgian_tar */
+    0x10e3,     /* #x15E3 Georgian_un */
+    0x10e4,     /* #x15E4 Georgian_phar */
+    0x10e5,     /* #x15E5 Georgian_khar */
+    0x10e6,     /* #x15E6 Georgian_ghan */
+    0x10e7,     /* #x15E7 Georgian_qar */
+    0x10e8,     /* #x15E8 Georgian_shin */
+    0x10e9,     /* #x15E9 Georgian_chin */
+    0x10ea,     /* #x15EA Georgian_can */
+    0x10eb,     /* #x15EB Georgian_jil */
+    0x10ec,     /* #x15EC Georgian_cil */
+    0x10ed,     /* #x15ED Georgian_char */
+    0x10ee,     /* #x15EE Georgian_xan */
+    0x10ef,     /* #x15EF Georgian_jhan */
+    0x10f0,     /* #x15F0 Georgian_hae */
+    0x10f1,     /* #x15F1 Georgian_he */
+    0x10f2,     /* #x15F2 Georgian_hie */
+    0x10f3,     /* #x15F3 Georgian_we */
+    0x10f4,     /* #x15F4 Georgian_har */
+    0x10f5,     /* #x15F5 Georgian_hoe */
+    0x10f6,     /* #x15F6 Georgian_fi */
+  };
+
+static UINT_16_BIT const AZERI_ETC[] = 
+  {
+#define FIRST_KNOWN_AZERI_ETC 0x16A2
+    0x0000,     /* #x16A2 Ccedillaabovedot */
+    0x1e8a,     /* #x16A3 Xabovedot */
+    0x0000,     /* #x16A4 */
+    0x0000,     /* #x16A5 Qabovedot */
+    0x012c,     /* #x16A6 Ibreve */
+    0x0000,     /* #x16A7 IE */
+    0x0000,     /* #x16A8 UO */
+    0x01b5,     /* #x16A9 Zstroke */
+    0x01e6,     /* #x16AA Gcaron */
+    0x0000,     /* #x16AB */
+    0x0000,     /* #x16AC */
+    0x0000,     /* #x16AD */
+    0x0000,     /* #x16AE */
+    0x019f,     /* #x16AF Obarred */
+    0x0000,     /* #x16B0 */
+    0x0000,     /* #x16B1 */
+    0x0000,     /* #x16B2 ccedillaabovedot */
+    0x1e8b,     /* #x16B3 xabovedot */
+    0x0000,     /* #x16B4 Ocaron */
+    0x0000,     /* #x16B5 qabovedot */
+    0x012d,     /* #x16B6 ibreve */
+    0x0000,     /* #x16B7 ie */
+    0x0000,     /* #x16B8 uo */
+    0x01b6,     /* #x16B9 zstroke */
+    0x01e7,     /* #x16BA gcaron */
+    0x0000,     /* #x16BB */
+    0x0000,     /* #x16BC */
+    0x01d2,     /* #x16BD ocaron */
+    0x0000,     /* #x16BE */
+    0x0275,     /* #x16BF obarred */
+    0x0000,     /* #x16C0 */
+    0x0000,     /* #x16C1 */
+    0x0000,     /* #x16C2 */
+    0x0000,     /* #x16C3 */
+    0x0000,     /* #x16C4 */
+    0x0000,     /* #x16C5 */
+    0x018f,     /* #x16C6 SCHWA */
+    0x0000,     /* #x16C7 */
+    0x0000,     /* #x16C8 */
+    0x0000,     /* #x16C9 */
+    0x0000,     /* #x16CA */
+    0x0000,     /* #x16CB */
+    0x0000,     /* #x16CC */
+    0x0000,     /* #x16CD */
+    0x0000,     /* #x16CE */
+    0x0000,     /* #x16CF */
+    0x0000,     /* #x16D0 */
+    0x1e36,     /* #x16D1 Lbelowdot */
+    0x0000,     /* #x16D2 Lstrokebelowdot */
+    0x0000,     /* #x16D3 Gtilde */
+    0x0000,     /* #x16D4 */
+    0x0000,     /* #x16D5 */
+    0x0000,     /* #x16D6 */
+    0x0000,     /* #x16D7 */
+    0x0000,     /* #x16D8 */
+    0x0000,     /* #x16D9 */
+    0x0000,     /* #x16DA */
+    0x0000,     /* #x16DB */
+    0x0000,     /* #x16DC */
+    0x0000,     /* #x16DD */
+    0x0000,     /* #x16DE */
+    0x0000,     /* #x16DF */
+    0x0000,     /* #x16E0 */
+    0x1e37,     /* #x16E1 lbelowdot */
+    0x0000,     /* #x16E2 lstrokebelowdot */
+    0x0000,     /* #x16E3 gtilde */
+    0x0000,     /* #x16E4 */
+    0x0000,     /* #x16E5 */
+    0x0000,     /* #x16E6 */
+    0x0000,     /* #x16E7 */
+    0x0000,     /* #x16E8 */
+    0x0000,     /* #x16E9 */
+    0x0000,     /* #x16EA */
+    0x0000,     /* #x16EB */
+    0x0000,     /* #x16EC */
+    0x0000,     /* #x16ED */
+    0x0000,     /* #x16EE */
+    0x0000,     /* #x16EF */
+    0x0000,     /* #x16F0 */
+    0x0000,     /* #x16F1 */
+    0x0000,     /* #x16F2 */
+    0x0000,     /* #x16F3 */
+    0x0000,     /* #x16F4 */
+    0x0000,     /* #x16F5 */
+    0x0259,     /* #x16F6 schwa */
+  };
+
+static UINT_16_BIT const VIETNAMESE[] = 
+  {
+#define FIRST_KNOWN_VIETNAMESE 0x1E9F
+    0x0303,     /* #x1E9F combining_tilde */
+    0x1ea0,     /* #x1EA0 Abelowdot */
+    0x1ea1,     /* #x1EA1 abelowdot */
+    0x1ea2,     /* #x1EA2 Ahook */
+    0x1ea3,     /* #x1EA3 ahook */
+    0x1ea4,     /* #x1EA4 Acircumflexacute */
+    0x1ea5,     /* #x1EA5 acircumflexacute */
+    0x1ea6,     /* #x1EA6 Acircumflexgrave */
+    0x1ea7,     /* #x1EA7 acircumflexgrave */
+    0x1ea8,     /* #x1EA8 Acircumflexhook */
+    0x1ea9,     /* #x1EA9 acircumflexhook */
+    0x1eaa,     /* #x1EAA Acircumflextilde */
+    0x1eab,     /* #x1EAB acircumflextilde */
+    0x1eac,     /* #x1EAC Acircumflexbelowdot */
+    0x1ead,     /* #x1EAD acircumflexbelowdot */
+    0x1eae,     /* #x1EAE Abreveacute */
+    0x1eaf,     /* #x1EAF abreveacute */
+    0x1eb0,     /* #x1EB0 Abrevegrave */
+    0x1eb1,     /* #x1EB1 abrevegrave */
+    0x1eb2,     /* #x1EB2 Abrevehook */
+    0x1eb3,     /* #x1EB3 abrevehook */
+    0x1eb4,     /* #x1EB4 Abrevetilde */
+    0x1eb5,     /* #x1EB5 abrevetilde */
+    0x1eb6,     /* #x1EB6 Abrevebelowdot */
+    0x1eb7,     /* #x1EB7 abrevebelowdot */
+    0x1eb8,     /* #x1EB8 Ebelowdot */
+    0x1eb9,     /* #x1EB9 ebelowdot */
+    0x1eba,     /* #x1EBA Ehook */
+    0x1ebb,     /* #x1EBB ehook */
+    0x1ebc,     /* #x1EBC Etilde */
+    0x1ebd,     /* #x1EBD etilde */
+    0x1ebe,     /* #x1EBE Ecircumflexacute */
+    0x1ebf,     /* #x1EBF ecircumflexacute */
+    0x1ec0,     /* #x1EC0 Ecircumflexgrave */
+    0x1ec1,     /* #x1EC1 ecircumflexgrave */
+    0x1ec2,     /* #x1EC2 Ecircumflexhook */
+    0x1ec3,     /* #x1EC3 ecircumflexhook */
+    0x1ec4,     /* #x1EC4 Ecircumflextilde */
+    0x1ec5,     /* #x1EC5 ecircumflextilde */
+    0x1ec6,     /* #x1EC6 Ecircumflexbelowdot */
+    0x1ec7,     /* #x1EC7 ecircumflexbelowdot */
+    0x1ec8,     /* #x1EC8 Ihook */
+    0x1ec9,     /* #x1EC9 ihook */
+    0x1eca,     /* #x1ECA Ibelowdot */
+    0x1ecb,     /* #x1ECB ibelowdot */
+    0x1ecc,     /* #x1ECC Obelowdot */
+    0x1ecd,     /* #x1ECD obelowdot */
+    0x1ece,     /* #x1ECE Ohook */
+    0x1ecf,     /* #x1ECF ohook */
+    0x1ed0,     /* #x1ED0 Ocircumflexacute */
+    0x1ed1,     /* #x1ED1 ocircumflexacute */
+    0x1ed2,     /* #x1ED2 Ocircumflexgrave */
+    0x1ed3,     /* #x1ED3 ocircumflexgrave */
+    0x1ed4,     /* #x1ED4 Ocircumflexhook */
+    0x1ed5,     /* #x1ED5 ocircumflexhook */
+    0x1ed6,     /* #x1ED6 Ocircumflextilde */
+    0x1ed7,     /* #x1ED7 ocircumflextilde */
+    0x1ed8,     /* #x1ED8 Ocircumflexbelowdot */
+    0x1ed9,     /* #x1ED9 ocircumflexbelowdot */
+    0x1eda,     /* #x1EDA Ohornacute */
+    0x1edb,     /* #x1EDB ohornacute */
+    0x1edc,     /* #x1EDC Ohorngrave */
+    0x1edd,     /* #x1EDD ohorngrave */
+    0x1ede,     /* #x1EDE Ohornhook */
+    0x1edf,     /* #x1EDF ohornhook */
+    0x1ee0,     /* #x1EE0 Ohorntilde */
+    0x1ee1,     /* #x1EE1 ohorntilde */
+    0x1ee2,     /* #x1EE2 Ohornbelowdot */
+    0x1ee3,     /* #x1EE3 ohornbelowdot */
+    0x1ee4,     /* #x1EE4 Ubelowdot */
+    0x1ee5,     /* #x1EE5 ubelowdot */
+    0x1ee6,     /* #x1EE6 Uhook */
+    0x1ee7,     /* #x1EE7 uhook */
+    0x1ee8,     /* #x1EE8 Uhornacute */
+    0x1ee9,     /* #x1EE9 uhornacute */
+    0x1eea,     /* #x1EEA Uhorngrave */
+    0x1eeb,     /* #x1EEB uhorngrave */
+    0x1eec,     /* #x1EEC Uhornhook */
+    0x1eed,     /* #x1EED uhornhook */
+    0x1eee,     /* #x1EEE Uhorntilde */
+    0x1eef,     /* #x1EEF uhorntilde */
+    0x1ef0,     /* #x1EF0 Uhornbelowdot */
+    0x1ef1,     /* #x1EF1 uhornbelowdot */
+    0x0300,     /* #x1EF2 combining_grave */
+    0x0301,     /* #x1EF3 combining_acute */
+    0x1ef4,     /* #x1EF4 Ybelowdot */
+    0x1ef5,     /* #x1EF5 ybelowdot */
+    0x1ef6,     /* #x1EF6 Yhook */
+    0x1ef7,     /* #x1EF7 yhook */
+    0x1ef8,     /* #x1EF8 Ytilde */
+    0x1ef9,     /* #x1EF9 ytilde */
+  
+    0x01a0,     /* #x1EFA Ohorn */
+    0x01a1,     /* #x1EFB ohorn */
+    0x01af,     /* #x1EFC Uhorn */
+    0x01b0,     /* #x1EFD uhorn */
+  
+    0x0309,     /* #x1EFE combining_hook */
+    0x0323,     /* #x1EFF combining_belowdot */
+  };
+#endif /* MULE */
+
 /* For every key on the keyboard that has a known character correspondence,
    we define the character-of-keysym property of its XEmacs keysym, and make
    the default binding for the key be self-insert-command.
＠＠ -490,132 +932,268 ＠＠

 #ifndef THIS_IS_GTK
 static Lisp_Object
-x_keysym_to_character(KeySym keysym)
+x_keysym_to_character (KeySym keysym)
 #else
 Lisp_Object
-gtk_keysym_to_character(guint keysym)
+gtk_keysym_to_character (guint keysym)
 #endif
 {
+#ifdef MULE
   Lisp_Object charset = Qzero;
   int code = 0;
+#endif /* MULE */

+  /* ＠＠#### Add support for 0xFE?? and 0xFF?? keysyms
+     Add support for KOI8-U extensions in the 0x06?? range
+
+     See http://www.cl.cam.ac.uk/~mgk25/ucs/keysyms.txt
+ */
+
   /* Markus Kuhn's spec says keysyms in the range #x01000100 to #x0110FFFF
      and only those should correspond directly to Unicode code points, in
      the range #x100-#x10FFFF; actual implementations can have the Latin 1
      code points do the same thing with keysyms
      #x010000A0-#x01000100. */

+#ifndef MULE
+  if (keysym >= 0x010000A0 && keysym <= 0x010000FF)
+    return make_char (keysym & 0xFFFFFF);
+#else
   if (keysym >= 0x010000A0 && keysym <= 0x0110FFFF)
-    return Funicode_to_char (make_int(keysym & 0xffffff), Qnil);
+    return make_char (unicode_to_ichar ((int) (keysym & 0xFFFFFF),
+					/* ＠＠####
+					 need to get some sort of buffer
+					 to compute this off; only
+					 applies in the old-Mule world */
+					get_unicode_precedence (),
+					CONVERR_SUCCEED));
+#endif /* not MULE */

   if ((keysym & 0xff) < 0xa0)
     return Qnil;

+#ifdef MULE
   switch (keysym >> 8)
     {
-
-#define USE_CHARSET(var,cs) \
-  ((var) = charset_by_leading_byte (LEADING_BYTE_##cs))
-
     case 0: /* ASCII + Latin1 */
-      USE_CHARSET (charset, LATIN_ISO8859_1);
-      code = keysym & 0x7f;
+      charset = Vcharset_latin_iso8859_1;
+      code = keysym & 0xff;
       break;
     case 1: /* Latin2 */
-      USE_CHARSET (charset, LATIN_ISO8859_2);
-      code = keysym & 0x7f;
+      charset = Vcharset_latin_iso8859_2;
+      code = keysym & 0xff;
       break;
     case 2: /* Latin3 */
-      USE_CHARSET (charset, LATIN_ISO8859_3);
-      code = keysym & 0x7f;
+      charset = Vcharset_latin_iso8859_3;
+      code = keysym & 0xff;
       break;
     case 3: /* Latin4 */
-      USE_CHARSET (charset, LATIN_ISO8859_4);
-      code = keysym & 0x7f;
+      charset = Vcharset_latin_iso8859_4;
+      code = keysym & 0xff;
       break;
     case 4: /* Katakana */
-      USE_CHARSET (charset, KATAKANA_JISX0201);
+      charset = Vcharset_katakana_jisx0201;
       if ((keysym & 0xff) > 0xa0)
-	code = keysym & 0x7f;
+	code = keysym & 0xff;
       break;
     case 5: /* Arabic */
-      USE_CHARSET (charset, ARABIC_ISO8859_6);
-      code = keysym & 0x7f;
+      charset = Vcharset_arabic_iso8859_6;
+      code = keysym & 0xff;
       break;
     case 6: /* Cyrillic */
       {
-	static UExtbyte const cyrillic[] = /* 0x20 - 0x7f */
-	{0x00, 0x72, 0x73, 0x71, 0x74, 0x75, 0x76, 0x77,
-	 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x00, 0x7e, 0x7f,
-	 0x70, 0x22, 0x23, 0x21, 0x24, 0x25, 0x26, 0x27,
-	 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x00, 0x2e, 0x2f,
-	 0x6e, 0x50, 0x51, 0x66, 0x54, 0x55, 0x64, 0x53,
-	 0x65, 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e,
-	 0x5f, 0x6f, 0x60, 0x61, 0x62, 0x63, 0x56, 0x52,
-	 0x6c, 0x6b, 0x57, 0x68, 0x6d, 0x69, 0x67, 0x6a,
-	 0x4e, 0x30, 0x31, 0x46, 0x34, 0x35, 0x44, 0x33,
-	 0x45, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e,
-	 0x3f, 0x4f, 0x40, 0x41, 0x42, 0x43, 0x36, 0x32,
-	 0x4c, 0x4b, 0x37, 0x48, 0x4d, 0x49, 0x47, 0x4a};
-	USE_CHARSET (charset, CYRILLIC_ISO8859_5);
+
+	/* ＠＠#### Support me:
+
+0x0680   U0492   u   # Cyrillic_GHE_bar
+0x0681   U0496   u   # Cyrillic_ZHE_descender
+0x0682   U049a   u   # Cyrillic_KA_descender
+0x0683   U049c   u   # Cyrillic_KA_vertstroke
+0x0684   U04a2   u   # Cyrillic_EN_descender
+0x0685   U04ae   u   # Cyrillic_U_straight
+0x0686   U04b0   u   # Cyrillic_U_straight_bar
+0x0687   U04b2   u   # Cyrillic_HA_descender
+0x0688   U04b6   u   # Cyrillic_CHE_descender
+0x0689   U04b8   u   # Cyrillic_CHE_vertstroke
+0x068a   U04ba   u   # Cyrillic_SHHA
+0x068c   U04d8   u   # Cyrillic_SCHWA
+0x068d   U04e2   u   # Cyrillic_I_macron
+0x068e   U04e8   u   # Cyrillic_O_bar
+0x068f   U04ee   u   # Cyrillic_U_macron
+0x0690   U0493   u   # Cyrillic_ghe_bar
+0x0691   U0497   u   # Cyrillic_zhe_descender
+0x0692   U049b   u   # Cyrillic_ka_descender
+0x0693   U049d   u   # Cyrillic_ka_vertstroke
+0x0694   U04a3   u   # Cyrillic_en_descender
+0x0695   U04af   u   # Cyrillic_u_straight
+0x0696   U04b1   u   # Cyrillic_u_straight_bar
+0x0697   U04b3   u   # Cyrillic_ha_descender
+0x0698   U04b7   u   # Cyrillic_che_descender
+0x0699   U04b9   u   # Cyrillic_che_vertstroke
+0x069a   U04bb   u   # Cyrillic_shha
+0x069c   U04d9   u   # Cyrillic_schwa
+0x069d   U04e3   u   # Cyrillic_i_macron
+0x069e   U04e9   u   # Cyrillic_o_bar
+0x069f   U04ef   u   # Cyrillic_u_macron
+
+0x06ad   U0491   .   # Ukrainian_ghe_with_upturn
+0x06bd   U0490   .   # Ukrainian_GHE_WITH_UPTURN
+*/
+
+	static UExtbyte const cyrillic[] = /* 0xa0 - 0xff */
+	{0x00, 0xf2, 0xf3, 0xf1, 0xf4, 0xf5, 0xf6, 0xf7,
+	 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0x00, 0xfe, 0xff,
+	 0xf0, 0xa2, 0xa3, 0xa1, 0xa4, 0xa5, 0xa6, 0xa7,
+	 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0x00, 0xae, 0xaf,
+	 0xee, 0xd0, 0xd1, 0xe6, 0xd4, 0xd5, 0xe4, 0xd3,
+	 0xe5, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde,
+	 0xdf, 0xef, 0xe0, 0xe1, 0xe2, 0xe3, 0xd6, 0xd2,
+	 0xec, 0xeb, 0xd7, 0xe8, 0xed, 0xe9, 0xe7, 0xea,
+	 0xce, 0xb0, 0xb1, 0xc6, 0xb4, 0xb5, 0xc4, 0xb3,
+	 0xc5, 0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe,
+	 0xbf, 0xcf, 0xc0, 0xc1, 0xc2, 0xc3, 0xb6, 0xb2,
+	 0xcc, 0xcb, 0xb7, 0xc8, 0xcd, 0xc9, 0xc7, 0xca};
+	charset = Vcharset_cyrillic_iso8859_5;
 	code = cyrillic[(keysym & 0x7f) - 0x20];
 	break;
       }
     case 7: /* Greek */
       {
-	static UExtbyte const greek[] = /* 0x20 - 0x7f */
-	{0x00, 0x36, 0x38, 0x39, 0x3a, 0x5a, 0x00, 0x3c,
-	 0x3e, 0x5b, 0x00, 0x3f, 0x00, 0x00, 0x35, 0x2f,
-	 0x00, 0x5c, 0x5d, 0x5e, 0x5f, 0x7a, 0x40, 0x7c,
-	 0x7d, 0x7b, 0x60, 0x7e, 0x00, 0x00, 0x00, 0x00,
-	 0x00, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
-	 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f,
-	 0x50, 0x51, 0x53, 0x00, 0x54, 0x55, 0x56, 0x57,
-	 0x58, 0x59, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
-	 0x00, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
-	 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f,
-	 0x70, 0x71, 0x73, 0x72, 0x74, 0x75, 0x76, 0x77,
-	 0x78, 0x79, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
-	USE_CHARSET (charset, GREEK_ISO8859_7);
+	static UExtbyte const greek[] = /* 0xa0 - 0xff */
+	{0x00, 0xb6, 0xb8, 0xb9, 0xba, 0xda, 0x00, 0xbc,
+	 0xbe, 0xdb, 0x00, 0xbf, 0x00, 0x00, 0xb5, 0xaf,
+	 0x00, 0xdc, 0xdd, 0xde, 0xdf, 0xfa, 0xc0, 0xfc,
+	 0xfd, 0xfb, 0xe0, 0xfe, 0x00, 0x00, 0x00, 0x00,
+	 0x00, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
+	 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
+	 0xd0, 0xd1, 0xd3, 0x00, 0xd4, 0xd5, 0xd6, 0xd7,
+	 0xd8, 0xd9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	 0x00, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
+	 0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
+	 0xf0, 0xf1, 0xf3, 0xf2, 0xf4, 0xf5, 0xf6, 0xf7,
+	 0xf8, 0xf9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+	charset = Vcharset_greek_iso8859_7;
 	code = greek[(keysym & 0x7f) - 0x20];
 	break;
       }
     case 8: 
-      USE_UNICODE_MAP(keysym, TECHNICAL);
+      USE_UNICODE_MAP (keysym, TECHNICAL);
       break;
     case 9: 
-      USE_UNICODE_MAP(keysym, SPECIAL);
+      USE_UNICODE_MAP (keysym, SPECIAL);
       break;
     case 10:
-      USE_UNICODE_MAP(keysym, PUBLISHING);
+      USE_UNICODE_MAP (keysym, PUBLISHING);
       break;
     case 11:
-      USE_UNICODE_MAP(keysym, APL);
+      USE_UNICODE_MAP (keysym, APL);
       break;
     case 12: /* Hebrew */
-      USE_CHARSET (charset, HEBREW_ISO8859_8);
-      code = keysym & 0x7f;
+      charset = Vcharset_hebrew_iso8859_8;
+      code = keysym & 0xff;
       break;
     case 13: /* Thai */
       /* #### This needs to deal with character composition.
-                  Are you sure we can't leave it to the X server? */
-      USE_CHARSET (charset, THAI_TIS620);
-      code = keysym & 0x7f;
-      break;
-    case 14: /* Korean Hangul. Would like some information on whether this
-		is worth doing--there don't appear to be any Korean keyboard
-		layouts in the XKB data files. */
+	 Are you sure we can't leave it to the X server? */
+      charset = Vcharset_thai_tis620;
+      code = keysym & 0xff;
       break;
-
+    case 14: /* Korean Hangul. */
+      USE_UNICODE_MAP (keysym, HANGUL);
+      break;
+    case 18: /* Latin 8 - ISO8859-14. */
+      charset = Ffind_charset (intern ("latin-iso8859-14"));
+      code = keysym & 0xff;
+      break;
     case 19: /* Latin 9 - ISO8859-15. */
-      USE_CHARSET (charset, LATIN_ISO8859_15);
-      code = keysym & 0x7f;
+      charset = Vcharset_latin_iso8859_15;
+      code = keysym & 0xff;
+      break;
+    case 20: /* Armenian. */
+      USE_UNICODE_MAP (keysym, ARMENIAN);
+      break;
+    case 21: /* Georgian. */
+      USE_UNICODE_MAP (keysym, GEORGIAN);
+      break;
+    case 22: /* Azeri (and other Turkic or Caucasian languages of ex-USSR) */
+      USE_UNICODE_MAP (keysym, AZERI_ETC);
+      break;
+    case 30: /* Vietnamese */
+      USE_UNICODE_MAP (keysym, VIETNAMESE);
       break;
     case 32: /* Currency. The lower sixteen bits of these keysyms happily
 		correspond exactly to the Unicode code points of the
 		associated characters */
-      return Funicode_to_char(make_int(keysym & 0xffff), Qnil);
-      break;
+      return make_char (unicode_to_ichar ((int) (keysym & 0xffff),
+					  /* ＠＠####
+					     need to get some sort of buffer
+					     to compute this off; only
+					     applies in the old-Mule world */
+					  get_unicode_precedence (),
+					  CONVERR_SUCCEED));
+
+/* ＠＠#### Support me!
+
+   Actually, these are somewhat already supported by x-init.el/x-compose.el,
+   but only acute, grave, circum(flex), cedilla, diaeresis, tilde.  We
+   should try to eliminate that code and use general Unicode support for
+   converting to precomposed sequences.
+
+0xfe50   U0300   f   # dead_grave
+0xfe51   U0301   f   # dead_acute
+0xfe52   U0302   f   # dead_circumflex
+0xfe53   U0303   f   # dead_tilde
+0xfe54   U0304   f   # dead_macron
+0xfe55   U0306   f   # dead_breve
+0xfe56   U0307   f   # dead_abovedot
+0xfe57   U0308   f   # dead_diaeresis
+0xfe58   U030a   f   # dead_abovering
+0xfe59   U030b   f   # dead_doubleacute
+0xfe5a   U030c   f   # dead_caron
+0xfe5b   U0327   f   # dead_cedilla
+0xfe5c   U0328   f   # dead_ogonek
+0xfe5d   U0345   f   # dead_iota
+0xfe5e   U3099   f   # dead_voiced_sound
+0xfe5f   U309a   f   # dead_semivoiced_sound
+0xfe60   U0323   f   # dead_belowdot
+0xfe61   U0309   f   # dead_hook
+0xfe62   U031b   f   # dead_horn
+
+What about these?  We don't have to convert these to ASCII but make sure we
+Handle all of the KP-foo things and get them to behave like plain foo when
+KP-foo isn't bound (which includes self-inserting the associated character
+if necessary). DOCUMENT the existing system that does this.
+
+0xff08   U0008   f   # BackSpace	/- back space, back char -/
+0xff09   U0009   f   # Tab
+0xff0a   U000a   f   # Linefeed	/- Linefeed, LF -/
+0xff0b   U000b   f   # Clear
+0xff0d   U000d   f   # Return	/- Return, enter -/
+0xff13   U0013   f   # Pause	/- Pause, hold -/
+0xff14   U0014   f   # Scroll_Lock
+0xff15   U0015   f   # Sys_Req
+0xff1b   U001b   f   # Escape
+0xff80   U0020   f   # KP_Space	/- space -/
+0xff89   U0009   f   # KP_Tab
+0xff8d   U000d   f   # KP_Enter	/- enter -/
+0xffaa   U002a   f   # KP_Multiply
+0xffab   U002b   f   # KP_Add
+0xffac   U002c   f   # KP_Separator	/- separator, often comma -/
+0xffad   U002d   f   # KP_Subtract
+0xffae   U002e   f   # KP_Decimal
+0xffaf   U002f   f   # KP_Divide
+0xffb0   U0030   f   # KP_0
+0xffb1   U0031   f   # KP_1
+0xffb2   U0032   f   # KP_2
+0xffb3   U0033   f   # KP_3
+0xffb4   U0034   f   # KP_4
+0xffb5   U0035   f   # KP_5
+0xffb6   U0036   f   # KP_6
+0xffb7   U0037   f   # KP_7
+0xffb8   U0038   f   # KP_8
+0xffb9   U0039   f   # KP_9
+0xffbd   U003d   f   # KP_Equal	/- equals -/
+*/
     default:
       break;
     }
＠＠ -623,10 +1201,17 ＠＠
   if (code == 0)
     return Qnil;

-#ifdef MULE
-  return make_char (make_ichar (charset, code, 0));
+  if (!NILP (charset))
+    {
+      Ichar ich = charset_codepoint_to_ichar (charset, 0, code, CONVERR_FAIL);
+      if (ich >= 0)
+	return make_char (ich);
+    }
+  return Qnil;
 #else
-  return make_char (code + 0x80);
+  if (keysym >= 0x100)
+    return Qnil;
+  return make_char (keysym);
 #endif
 }

Index: src/extents.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/extents.c,v
retrieving revision 1.62
diff -u -r1.62 extents.c
--- src/extents.c	2005/10/25 11:16:23	1.62
+++ src/extents.c	2005/11/22 14:00:38
＠＠ -470,15 +470,6 ＠＠
 static Lisp_Object canonicalize_extent_property (Lisp_Object prop,
 						 Lisp_Object value);

-typedef struct
-{
-  Lisp_Object key, value;
-} Lisp_Object_pair;
-typedef struct
-{
-  Dynarr_declare (Lisp_Object_pair);
-} Lisp_Object_pair_dynarr;
-
 static void extent_properties (EXTENT e, Lisp_Object_pair_dynarr *props);

 Lisp_Object Vextent_face_memoize_hash_table;
＠＠ -1044,9 +1035,9 ＠＠
   { XD_BLOCK_PTR, offsetof (Gap_Array, markers), 1,
     { &gap_array_marker_description }, XD_FLAG_NO_KKCC },
   { XD_BLOCK_ARRAY, offsetof (Gap_Array, array), XD_INDIRECT (0, 0),
-    { &lisp_object_description } },
+    { &Lisp_Object_description } },
   { XD_BLOCK_ARRAY, XD_INDIRECT (1, offsetof (Gap_Array, array)),
-    XD_INDIRECT (2, 0), { &lisp_object_description } },
+    XD_INDIRECT (2, 0), { &Lisp_Object_description } },
   { XD_END }
 };

Index: src/faces.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/faces.c,v
retrieving revision 1.48
diff -u -r1.48 faces.c
--- src/faces.c	2005/10/24 10:07:36	1.48
+++ src/faces.c	2005/11/22 14:00:38
＠＠ -1,7 +1,7 ＠＠
 /* "Face" primitives
    Copyright (C) 1994 Free Software Foundation, Inc.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 1995, 1996, 2001, 2002 Ben Wing.
+   Copyright (C) 1995, 1996, 2001, 2002, 2005 Ben Wing.
    Copyright (C) 1995 Sun Microsystems, Inc.

 This file is part of XEmacs.
＠＠ -30,6 +30,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "charset.h"
 #include "device-impl.h"
 #include "elhash.h"
 #include "extents-impl.h" /* for extent_face */
＠＠ -567,6 +568,13 ＠＠
 				       domain, errb, no_fallback, depth);
   if (UNBOUNDP (retval))
     {
+      /* Do the "second stage"; used under Windows.  This call to
+	 specifier_instance_no_quit(), and the previous one, will end up
+	 calling font_instantiate() if the property in a question is a font
+	 (currently, this means EQ (property, Qfont), because only the
+	 face property named `font' contains a font object).  See the
+	 comments there. */
+
       if (CONSP (matchspec))
 	  Fsetcdr (matchspec, Qt);
       retval = specifier_instance_no_quit (Fget (face, property, Qnil),
＠＠ -1044,13 +1052,6 ＠＠
     {
       struct face_cachel *cachel = Dynarr_atp (elements, elt);

-      {
-	int i;
-
-	for (i = 0; i < NUM_LEADING_BYTES; i++)
-	  if (!NILP (cachel->font[i]) && !UNBOUNDP (cachel->font[i]))
-	    mark_object (cachel->font[i]);
-      }
       mark_object (cachel->face);
       mark_object (cachel->foreground);
       mark_object (cachel->background);
＠＠ -1069,12 +1070,13 ＠＠
   Lisp_Object new_val;
   Lisp_Object face = cachel->face;
   int bound = 1;
-  int offs = XCHARSET_LEADING_BYTE (charset) - MIN_LEADING_BYTE;
+  int offs = FACE_CACHEL_OFFSET_ENSURE (cachel, charset);

-  if (!UNBOUNDP (cachel->font[offs])
-      && cachel->font_updated[offs])
-    return cachel->font[offs];

+  if (!UNBOUNDP (Stynarr_at (cachel->font, offs).value)
+      && Stynarr_at (cachel->font_updated, offs))
+    return Stynarr_at (cachel->font, offs).value;
+
   if (UNBOUNDP (face))
     {
       /* a merged face. */
＠＠ -1082,38 +1084,44 ＠＠
       struct window *w = XWINDOW (domain);

       new_val = Qunbound;
-      cachel->font_specified[offs] = 0;
-      for (i = 0; i < cachel->nfaces; i++)
+      Stynarr_at (cachel->font_specified, offs) = 0;
+      for (i = 0; i < Stynarr_length (cachel->merged_faces); i++)
 	{
-	  struct face_cachel *oth;
+	  struct face_cachel *oth =
+	    Dynarr_atp (w->face_cachels,
+			FACE_CACHEL_FINDEX_UNSAFE (cachel, i));
+	  int off2;

-	  oth = Dynarr_atp (w->face_cachels,
-			    FACE_CACHEL_FINDEX_UNSAFE (cachel, i));
 	  /* Tout le monde aime la recursion */
 	  ensure_face_cachel_contains_charset (oth, domain, charset);
+	  off2 = FACE_CACHEL_OFFSET_ENSURE (oth, charset);

-	  if (oth->font_specified[offs])
+	  if (Stynarr_at (oth->font_specified, off2))
 	    {
-	      new_val = oth->font[offs];
-	      cachel->font_specified[offs] = 1;
+	      new_val = Stynarr_at (oth->font, off2).value;
+	      Stynarr_at (cachel->font_specified, offs) = 1;
 	      break;
 	    }
 	}

-      if (!cachel->font_specified[offs])
+      if (!Stynarr_at (cachel->font_specified, offs))
 	/* need to do the default face. */
 	{
 	  struct face_cachel *oth =
 	    Dynarr_atp (w->face_cachels, DEFAULT_INDEX);
+	  int off2;
+
 	  ensure_face_cachel_contains_charset (oth, domain, charset);
+	  off2 = FACE_CACHEL_OFFSET_ENSURE (oth, charset);

-	  new_val = oth->font[offs];
+	  new_val = Stynarr_at (oth->font, off2).value;
 	}

-      if (!UNBOUNDP (cachel->font[offs]) && !EQ (cachel->font[offs],
new_val))
+      if (!UNBOUNDP (Stynarr_at (cachel->font, offs).value) &&
+	  !EQ (Stynarr_at (cachel->font, offs).value, new_val))
 	cachel->dirty = 1;
-      cachel->font_updated[offs] = 1;
-      cachel->font[offs] = new_val;
+      Stynarr_at (cachel->font_updated, offs) = 1;
+      Stynarr_at (cachel->font, offs).value = new_val;
       return new_val;
     }

＠＠ -1130,63 +1138,71 ＠＠
 						 ERROR_ME_DEBUG_WARN, 0,
 						 Qzero);
     }
-  if (!UNBOUNDP (cachel->font[offs]) && !EQ (new_val, cachel->font[offs]))
+  if (!UNBOUNDP (Stynarr_at (cachel->font, offs).value) &&
+      !EQ (new_val, Stynarr_at (cachel->font, offs).value))
     cachel->dirty = 1;
-  cachel->font_updated[offs] = 1;
-  cachel->font[offs] = new_val;
-  cachel->font_specified[offs] = (bound || EQ (face, Vdefault_face));
+  Stynarr_at (cachel->font_updated, offs) = 1;
+  Stynarr_at (cachel->font, offs).value = new_val;
+  Stynarr_at (cachel->font_specified, offs) =
+    (bound || EQ (face, Vdefault_face));
   return new_val;
 }

+static Lisp_Object_dynarr *face_charset_dynarr;
+
 /* Ensure that the given cachel contains updated fonts for all
-   the charsets specified. */
+   the given characters. */

 void
 ensure_face_cachel_complete (struct face_cachel *cachel,
-			     Lisp_Object domain, unsigned char *charsets)
+			     Lisp_Object domain, Ichar *ptr,
+			     Charcount len)
 {
   int i;

-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    if (charsets[i])
-      {
-	Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
-	assert (CHARSETP (charset));
-	ensure_face_cachel_contains_charset (cachel, domain, charset);
-      }
+  if (!face_charset_dynarr)
+    face_charset_dynarr = Dynarr_new (Lisp_Object);
+  Dynarr_reset (face_charset_dynarr);
+  find_charsets_in_ichar_string (face_charset_dynarr, ptr, len);
+  for (i = 0; i < Dynarr_length (face_charset_dynarr); i++)
+    ensure_face_cachel_contains_charset (cachel, domain,
+					 Dynarr_at (face_charset_dynarr, i));
 }

 void
-face_cachel_charset_font_metric_info (struct face_cachel *cachel,
-				      unsigned char *charsets,
-				      struct font_metric_info *fm)
+face_cachel_char_font_metric_info (struct face_cachel *cachel,
+				   Lisp_Object domain,
+				   Ichar *ptr, Charcount len,
+				   struct font_metric_info *fm)
 {
   int i;

+  ensure_face_cachel_complete (cachel, domain, ptr, len);
+
   fm->width = 1;
   fm->height = fm->ascent = 1;
   fm->descent = 0;
   fm->proportional_p = 0;

-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    {
-      if (charsets[i])
-	{
-	  Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
-	  Lisp_Object font_instance = FACE_CACHEL_FONT (cachel, charset);
-	  Lisp_Font_Instance *fi = XFONT_INSTANCE (font_instance);
-
-	  assert (CHARSETP (charset));
-	  assert (FONT_INSTANCEP (font_instance));
-
-	  if (fm->ascent  < (int) fi->ascent)  fm->ascent  = (int) fi->ascent;
-	  if (fm->descent < (int) fi->descent) fm->descent = (int) fi->descent;
-	  fm->height = fm->ascent + fm->descent;
-	  if (fi->proportional_p)
-	    fm->proportional_p = 1;
-	  if (EQ (charset, Vcharset_ascii))
-	    fm->width = fi->width;
-	}
+  if (!face_charset_dynarr)
+    face_charset_dynarr = Dynarr_new (Lisp_Object);
+  Dynarr_reset (face_charset_dynarr);
+  find_charsets_in_ichar_string (face_charset_dynarr, ptr, len);
+  for (i = 0; i < Dynarr_length (face_charset_dynarr); i++)
+    {
+      Lisp_Object charset = Dynarr_at (face_charset_dynarr, i);
+      Lisp_Object font_instance = FACE_CACHEL_FONT (cachel, charset);
+      Lisp_Font_Instance *fi = XFONT_INSTANCE (font_instance);
+
+      assert (FONT_INSTANCEP (font_instance));
+
+      if (fm->ascent  < (int) fi->ascent)  fm->ascent  = (int)
fi->ascent;
+      if (fm->descent < (int) fi->descent) fm->descent = (int)
fi->descent;
+      fm->height = fm->ascent + fm->descent;
+      if (fi->proportional_p)
+	fm->proportional_p = 1;
+      if (EQ (charset, Vcharset_ascii))
+	fm->width = fi->width;
     }
 }

＠＠ -1251,8 +1267,7 ＠＠
   if (must_finish_frobbing)
     {
       int default_face = EQ (face, Vdefault_face);
-      struct face_cachel *cachel
-	= Dynarr_atp (w->face_cachels, Dynarr_length (w->face_cachels) - 1);
+      struct face_cachel *cachel = Dynarr_lastp (w->face_cachels);

       FROB (background_pixmap);
       MAYBE_UNFROB_BACKGROUND_PIXMAP;
＠＠ -1394,12 +1409,17 ＠＠
   FROB (blinking);
   /* And do ASCII, of course. */
   {
-    int offs = LEADING_BYTE_ASCII - MIN_LEADING_BYTE;
-
-    if (!cachel->font_specified[offs] && FINDEX_FIELD (font_specified[offs]))
+    int off1 = FACE_CACHEL_OFFSET_ENSURE (cachel, Vcharset_ascii);
+    int off2 = FACE_CACHEL_OFFSET (Dynarr_atp (w->face_cachels, findex),
+				   Vcharset_ascii);
+    int spec1 = Stynarr_at (cachel->font_specified, off1);
+    int spec2 = off2 >= 0 && Stynarr_at (Dynarr_atp (w->face_cachels,
findex)
+					 ->font_specified, off2);
+    if (!spec1 && spec2)
       {
-	cachel->font[offs] = FINDEX_FIELD (font[offs]);
-	cachel->font_specified[offs] = 1;
+	Stynarr_at (cachel->font, off1).value =
+	  Stynarr_at (Dynarr_atp (w->face_cachels, findex)->font, off2).value;
+	Stynarr_at (cachel->font_specified, off1) = 1;
 	cachel->dirty = 1;
       }
   }
＠＠ -1417,16 +1437,12 ＠＠
 {
   xzero (*cachel);
   cachel->face = Qunbound;
-  cachel->nfaces = 0;
-  cachel->merged_faces = 0;
+  Stynarr_init (cachel->merged_faces);
+  Stynarr_init (cachel->font);
+  Stynarr_init (cachel->font_specified);
+  Stynarr_init (cachel->font_updated);
   cachel->foreground = Qunbound;
   cachel->background = Qunbound;
-  {
-    int i;
-
-    for (i = 0; i < NUM_LEADING_BYTES; i++)
-      cachel->font[i] = Qunbound;
-  }
   cachel->display_table = Qunbound;
   cachel->background_pixmap = Qunbound;
 }
＠＠ -1473,8 +1489,10 ＠＠
       for (i = 0; i < Dynarr_length (w->face_cachels); i++)
 	{
 	  struct face_cachel *cachel = Dynarr_atp (w->face_cachels, i);
-	  if (cachel->merged_faces)
-	    Dynarr_free (cachel->merged_faces);
+	  Stynarr_free (cachel->merged_faces);
+	  Stynarr_free (cachel->font);
+	  Stynarr_free (cachel->font_specified);
+	  Stynarr_free (cachel->font_updated);
 	}
       Dynarr_reset (w->face_cachels);
       get_builtin_face_cache_index (w, Vdefault_face);
＠＠ -1503,8 +1521,8 ＠＠
       int i;

       cachel->updated = 0;
-      for (i = 0; i < NUM_LEADING_BYTES; i++)
-	cachel->font_updated[i] = 0;
+      for (i = 0; i < Stynarr_length (cachel->font_updated); i++)
+	Stynarr_at (cachel->font_updated, i) = 0;
     }
 }

＠＠ -1523,9 +1541,10 ＠＠
       total += Dynarr_memory_usage (face_cachels, ovstats);
       for (i = 0; i < Dynarr_length (face_cachels); i++)
 	{
-	  int_dynarr *merged = Dynarr_at (face_cachels, i).merged_faces;
-	  if (merged)
-	    total += Dynarr_memory_usage (merged, ovstats);
+	  struct face_cachel *cachel = Dynarr_atp (face_cachels, i);
+	  /* #### Hack; look inside of the Stynarr struct */
+	  if (cachel->merged_faces.els)
+	    total += Dynarr_memory_usage (cachel->merged_faces.els, ovstats);
 	}
     }

＠＠ -1555,10 +1574,10 ＠＠
   int i;

   if (!EQ (cachel1->face, cachel2->face)
-      || cachel1->nfaces != cachel2->nfaces)
+      || FACE_CACHEL_NFACES (cachel1) != FACE_CACHEL_NFACES (cachel2))
     return 0;

-  for (i = 0; i < cachel1->nfaces; i++)
+  for (i = 0; i < FACE_CACHEL_NFACES (cachel1); i++)
     if (FACE_CACHEL_FINDEX_UNSAFE (cachel1, i)
 	!= FACE_CACHEL_FINDEX_UNSAFE (cachel2, i))
       return 0;
＠＠ -1621,14 +1640,13 ＠＠
       for (i = len - 1; i >= 0; i--)
 	{
 	  EXTENT current = Dynarr_at (ef->extents, i);
-	  int has_findex = 0;
 	  Lisp_Object face = extent_face (current);

 	  if (FACEP (face))
 	    {
 	      findex = get_builtin_face_cache_index (w, face);
-	      has_findex = 1;
 	      merge_face_cachel_data (w, findex, &cachel);
+	      Stynarr_add (cachel.merged_faces, findex);
 	    }
 	  /* remember, we're called from within redisplay
 	     so we can't error. */
＠＠ -1639,34 +1657,10 ＠＠
 		{
 		  findex = get_builtin_face_cache_index (w, one_face);
 		  merge_face_cachel_data (w, findex, &cachel);
-
-		  /* code duplication here but there's no clean
-		     way to avoid it. */
-		  if (cachel.nfaces >= NUM_STATIC_CACHEL_FACES)
-		    {
-		      if (!cachel.merged_faces)
-			cachel.merged_faces = Dynarr_new (int);
-		      Dynarr_add (cachel.merged_faces, findex);
-		    }
-		  else
-		    cachel.merged_faces_static[cachel.nfaces] = findex;
-		  cachel.nfaces++;
+		  Stynarr_add (cachel.merged_faces, findex);
 		}
 	      face = XCDR (face);
 	    }
-
-	  if (has_findex)
-	    {
-	      if (cachel.nfaces >= NUM_STATIC_CACHEL_FACES)
-		{
-		  if (!cachel.merged_faces)
-		    cachel.merged_faces = Dynarr_new (int);
-		  Dynarr_add (cachel.merged_faces, findex);
-		}
-	      else
-		cachel.merged_faces_static[cachel.nfaces] = findex;
-	      cachel.nfaces++;
-	    }
 	}

       /* Now finally merge in the default face. */
＠＠ -1674,13 +1668,16 ＠＠
       merge_face_cachel_data (w, findex, &cachel);

       findex = get_merged_face_cache_index (w, &cachel);
-      if (cachel.merged_faces &&
+      /* #### Hack; look inside of the Stynarr struct */
+      if (cachel.merged_faces.els &&
 	  /* merged_faces did not get stored and available via return value */
-	  Dynarr_at (w->face_cachels, findex).merged_faces !=
-	  cachel.merged_faces)
+	  Dynarr_at (w->face_cachels, findex).merged_faces.els !=
+	  cachel.merged_faces.els)
 	{
-	  Dynarr_free (cachel.merged_faces);
-	  cachel.merged_faces = 0;
+	  Stynarr_free (cachel.merged_faces);
+	  Stynarr_free (cachel.font);
+	  Stynarr_free (cachel.font_specified);
+	  Stynarr_free (cachel.font_updated);
 	}
       return findex;
     }
Index: src/faces.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/faces.h,v
retrieving revision 1.14
diff -u -r1.14 faces.h
--- src/faces.h	2005/10/24 10:07:36	1.14
+++ src/faces.h	2005/11/22 14:00:39
＠＠ -1,6 +1,6 ＠＠
 /* Face data structures.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 1995, 2002 Ben Wing
+   Copyright (C) 1995, 2002, 2005 Ben Wing

 This file is part of XEmacs.

＠＠ -24,8 +24,6 ＠＠
 #ifndef INCLUDED_faces_h_
 #define INCLUDED_faces_h_

-#include "charset.h" /* for NUM_LEADING_BYTES */
-
 /* a Lisp_Face is the C object corresponding to a face.  There is one
    of these per face.  It basically contains all of the specifiers for
    the built-in face properties, plus the plist of user-specified
＠＠ -142,9 +140,8 ＠＠

      The order of the faces here is decreasing extent priority. */
   Lisp_Object face;
-  int merged_faces_static[NUM_STATIC_CACHEL_FACES];
-  int_dynarr *merged_faces;
-  int nfaces;
+
+  Stynarr_declare (merged_faces, int, NUM_STATIC_CACHEL_FACES);

   /* The values stored here are computed by calling specifier_instance()
      on the appropriate specifiers.  This means that we will have either
＠＠ -156,14 +153,15 ＠＠

   Lisp_Object foreground;
   Lisp_Object background;
-  /* There are currently 128 or 129 possible charsets under Mule.  For the
-     moment we just take the easy way out and allocate space for each
-     of them.  This avoids messing with Dynarrs.
-
-     #### We should look into this and probably clean it up
-     to use Dynarrs.  This may be a big space hog as is. */
-  Lisp_Object font[NUM_LEADING_BYTES];

+  /* Static dynarr, mapping charsets to font objects.  Note: The use of an
+     unordered list like this make lookup O(n), potentially slower than a
+     hash table; but the associated constant will be very small, and it's
+     rare that there will be very many charsets associated with this cachel
+     (at most 128 in the previous scheme, and many fewer in practice, often
+     only one). */
+  Stynarr_declare (font, Lisp_Object_pair, NUM_STATIC_CACHEL_FACES);
+
   Lisp_Object display_table;
   Lisp_Object background_pixmap;

＠＠ -176,8 +174,7 ＠＠

   /* Used when merging to tell if the above field represents an actual
      value of this face or a fallback value. */
-  /* #### Of course we should use a bit array or something. */
-  unsigned char font_specified[NUM_LEADING_BYTES];
+  Stynarr_declare (font_specified, unsigned_char, NUM_STATIC_CACHEL_FACES);
   unsigned int foreground_specified :1;
   unsigned int background_specified :1;
   unsigned int display_table_specified :1;
＠＠ -218,8 +215,7 ＠＠
      storing a "blank font" if the instantiation fails. */
   unsigned int dirty :1;
   unsigned int updated :1;
-  /* #### Of course we should use a bit array or something. */
-  unsigned char font_updated[NUM_LEADING_BYTES];
+  Stynarr_declare (font_updated, unsigned_char, NUM_STATIC_CACHEL_FACES);
 };

 DECLARE_LRECORD (face, Lisp_Face);
＠＠ -232,14 +228,15 ＠＠
 						 Lisp_Object domain,
 						 Lisp_Object charset);
 void ensure_face_cachel_complete (struct face_cachel *cachel,
-				  Lisp_Object domain,
-				  unsigned char *charsets);
+				  Lisp_Object domain, Ichar *ptr,
+				  Charcount len);
 void update_face_cachel_data (struct face_cachel *cachel,
 			      Lisp_Object domain,
 			      Lisp_Object face);
-void face_cachel_charset_font_metric_info (struct face_cachel *cachel,
-					   unsigned char *charsets,
-					   struct font_metric_info *fm);
+void face_cachel_char_font_metric_info (struct face_cachel *cachel,
+					Lisp_Object domain,
+					Ichar *ptr, Charcount len,
+					struct font_metric_info *fm);
 void mark_face_cachels (face_cachel_dynarr *elements);
 void mark_face_cachels_as_clean (struct window *w);
 void mark_face_cachels_as_not_updated (struct window *w);
＠＠ -277,47 +274,103 ＠＠
 				    int *height, int *width);
 void default_face_height_and_width_1 (Lisp_Object domain,
 				      int *height, int *width);
+
+/* Return the font corresponding to CHARSET in CACHEL, or Qunbound if not
+   found.  Could just use FACE_CACHEL_OFFSET() below. */

-#define FACE_CACHEL_FONT(cachel, charset) \
-  (cachel->font[XCHARSET_LEADING_BYTE (charset) - MIN_LEADING_BYTE])
+DECLARE_INLINE_HEADER (
+Lisp_Object
+FACE_CACHEL_FONT (struct face_cachel *cachel, Lisp_Object charset)
+)
+{
+  int i;
+  for (i = 0; i < Stynarr_length (cachel->font); i++)
+    {
+      Lisp_Object_pair *el = Stynarr_atp (cachel->font, i);
+      if (EQ (el->key, charset))
+	return el->value;
+    }
+  return Qunbound;
+}
+
+/* Return the offset corresponding to CHARSET in CACHEL, or -1 if not
+   found. */
+
+DECLARE_INLINE_HEADER (
+int
+FACE_CACHEL_OFFSET (struct face_cachel *cachel, Lisp_Object charset)
+)
+{
+  int i;
+  for (i = 0; i < Stynarr_length (cachel->font); i++)
+    {
+      Lisp_Object_pair *el = Stynarr_atp (cachel->font, i);
+      if (EQ (el->key, charset))
+	return i;
+    }
+  return -1;
+}
+
+/* Return the offset corresponding to CHARSET in CACHEL; if necessary, add
+   the charset to cachel, with font value of Qunbound. */
+
+DECLARE_INLINE_HEADER (
+int
+FACE_CACHEL_OFFSET_ENSURE (struct face_cachel *cachel, Lisp_Object charset)
+)
+{
+  int i = FACE_CACHEL_OFFSET (cachel, charset);
+  if (i == -1)
+    {
+      Lisp_Object_pair lop;
+      lop.key = charset;
+      lop.value = Qunbound;
+      i = Stynarr_length (cachel->font);
+      Stynarr_add (cachel->font, lop);
+      Stynarr_add (cachel->font_specified, 0);
+      Stynarr_add (cachel->font_updated, 0);
+    }
+  return i;
+}

 #define WINDOW_FACE_CACHEL(window, index) \
   Dynarr_atp ((window)->face_cachels, index)

-#define FACE_CACHEL_FINDEX_UNSAFE(cachel, offset)			     \
-  ((offset) < NUM_STATIC_CACHEL_FACES					     \
-   ? (cachel)->merged_faces_static[offset]				     \
-: Dynarr_at ((cachel)->merged_faces, (offset) - NUM_STATIC_CACHEL_FACES))
+/* UNSAFE because it evaluates OFFSET and (CACHEL) multiply */
+#define FACE_CACHEL_FINDEX_UNSAFE(cachel, offset)                       \
+  Stynarr_at (cachel->merged_faces, offset)
+#define FACE_CACHEL_NFACES(cachel)                                      \
+  Stynarr_length (cachel->merged_faces)

-#define WINDOW_FACE_CACHEL_FACE(window, index)				\
+#define WINDOW_FACE_CACHEL_FACE(window, index)                          \
   (WINDOW_FACE_CACHEL (window, index)->face)
-#define WINDOW_FACE_CACHEL_FOREGROUND(window, index)			\
+#define WINDOW_FACE_CACHEL_FOREGROUND(window, index)                    \
   (WINDOW_FACE_CACHEL (window, index)->foreground)
-#define WINDOW_FACE_CACHEL_BACKGROUND(window, index)			\
+#define WINDOW_FACE_CACHEL_BACKGROUND(window, index)                    \
   (WINDOW_FACE_CACHEL (window, index)->background)
 /* #### This can be referenced by various functions,
    but face_cachels isn't initialized for the stream device.
    Since it doesn't need the value we just return nil here to avoid
    blowing up in multiple places. */
-#define WINDOW_FACE_CACHEL_FONT(window, index, charset)			\
-  ((window)->face_cachels					\
-   ? FACE_CACHEL_FONT (WINDOW_FACE_CACHEL (window, index), charset)	\
+#define WINDOW_FACE_CACHEL_FONT(window, index, charset)                 \
+  ((window)->face_cachels                                               \
+   ? FACE_CACHEL_FONT (WINDOW_FACE_CACHEL (window, index), charset)     \
    : Qnil)
-#define WINDOW_FACE_CACHEL_DISPLAY_TABLE(window, index)			\
+#define WINDOW_FACE_CACHEL_DISPLAY_TABLE(window, index)                 \
   (WINDOW_FACE_CACHEL (window, index)->display_table)
-#define WINDOW_FACE_CACHEL_BACKGROUND_PIXMAP(window, index)		\
+#define WINDOW_FACE_CACHEL_BACKGROUND_PIXMAP(window, index)             \
   (WINDOW_FACE_CACHEL (window, index)->background_pixmap)
-#define WINDOW_FACE_CACHEL_DIRTY(window, index)				\
+#define WINDOW_FACE_CACHEL_DIRTY(window, index)                         \
   (WINDOW_FACE_CACHEL (window, index)->dirty)
-#define WINDOW_FACE_CACHEL_UNDERLINE_P(window, index)			\
+#define WINDOW_FACE_CACHEL_UNDERLINE_P(window, index)                   \
   (WINDOW_FACE_CACHEL (window, index)->underline)
-#define WINDOW_FACE_CACHEL_HIGHLIGHT_P(window, index)			\
+#define WINDOW_FACE_CACHEL_HIGHLIGHT_P(window, index)                   \
   (WINDOW_FACE_CACHEL (window, index)->highlight)
-#define WINDOW_FACE_CACHEL_DIM_P(window, index)				\
+#define WINDOW_FACE_CACHEL_DIM_P(window, index)                         \
   (WINDOW_FACE_CACHEL (window, index)->dim)
-#define WINDOW_FACE_CACHEL_BLINKING_P(window, index)			\
+#define WINDOW_FACE_CACHEL_BLINKING_P(window, index)                    \
   (WINDOW_FACE_CACHEL (window, index)->blinking)
-#define WINDOW_FACE_CACHEL_REVERSE_P(window, index)			\
+#define WINDOW_FACE_CACHEL_REVERSE_P(window, index)                     \
   (WINDOW_FACE_CACHEL (window, index)->reverse)

 #define FACE_PROPERTY_SPECIFIER(face, property) Fget (face, property, Qnil)
Index: src/file-coding.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/file-coding.c,v
retrieving revision 1.49
diff -u -r1.49 file-coding.c
--- src/file-coding.c	2005/10/25 11:16:24	1.49
+++ src/file-coding.c	2005/11/22 14:00:40
＠＠ -285,8 +285,7 ＠＠
 {
   Lisp_Coding_System *c = XCODING_SYSTEM (obj);
   if (print_readably)
-    printing_unreadable_object
-      ("printing unreadable object #<coding-system 0x%x>",
c->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string_lisp (printcharfun, "#<coding-system %s ", 1,
c->name);
   print_coding_system_properties (obj, printcharfun);
＠＠ -1059,6 +1058,10 ＠＠
      Convert CRLF sequences or CR to LF.
 `shift-jis'
      Shift-JIS (a Japanese encoding commonly used in PC operating systems).
+<<<<<<< file-coding.c
+`mbcs'
+     An encoding that directly encodes the indices of one or more charsets
+     with one or two bytes.
 `unicode'
      Any Unicode encoding (UCS-4, UTF-8, UTF-16, etc.).
 `mswindows-unicode-to-multibyte'
＠＠ -1079,7 +1082,7 ＠＠
 `ccl'
      The conversion is performed using a user-written pseudo-code
      program.  CCL (Code Conversion Language) is the name of this
-     pseudo-code.
+     pseudo-code.  Not available when (featurep 'unicode-internal).
 `gzip'
      GZIP compression format.
 `internal'
＠＠ -1261,6 +1264,12 ＠＠
      or Control-1 character sets; this is explicitly disallowed by the
      ISO2022 standard.

+`iso2022-preserve'

+     If non-nil, preserve round-trip conversion even when Unicode is used
+     as an internal representation, by using private characters from the
+     Unicode space.  WARNING: This will make such characters unusable for
+     normal editing purposes.
+
 `input-charset-conversion'
      A list of conversion specifications, specifying conversion of
      characters in one charset to another when decoding is performed.
＠＠ -1321,6 +1330,12 ＠＠
      be used with UTF-8.  That is the term used in the standard. ]]

+The following additional property is recognized if TYPE is `mbcs':
+
+`charsets'
+     List of charsets encoded using this coding system.
+
+
 The following additional properties are recognized if TYPE is
 `mswindows-multibyte':

＠＠ -1345,7 +1360,7 ＠＠

 The following additional properties are recognized if TYPE is `undecided':
-[[ Doesn't GNU use \"detect-*\" for the following two? ]]
+\[[ Doesn't GNU use \"detect-*\" for the following two? ]]

 `do-eol'
      Do EOL detection.
＠＠ -1949,7 +1964,8 ＠＠
   struct coding_stream *str = CODING_STREAM_DATA (stream);
   MAYBE_XCODESYSMETH (str->codesys, rewind_coding_stream, (str));

-  str->ch = 0;
+  str->ch = -1;
+  str->pind_remaining = 0;
   Dynarr_reset (str->convert_to);
   Dynarr_reset (str->convert_from);
   return Lstream_rewind (str->other_end);
＠＠ -2064,6 +2080,7 ＠＠
     }
   str->orig_codesys = codesys;
   str->codesys = coding_system_real_canonical (codesys);
+  str->ch = -1;

   if (str->data)
     {
＠＠ -2102,6 +2119,7 ＠＠
   xzero (*str);
   str->codesys = Qnil;
   str->orig_codesys = Qnil;
+  str->ch = -1;
   str->us = lstr;
   str->other_end = stream;
   str->convert_to = Dynarr_new (unsigned_char);
＠＠ -2333,7 +2351,7 ＠＠
 static const struct memory_description chain_coding_system_description[] = {
   { XD_INT, offsetof (struct chain_coding_system, count) },
   { XD_BLOCK_PTR, offsetof (struct chain_coding_system, chain),
-    XD_INDIRECT (0, 0), { &lisp_object_description } },
+    XD_INDIRECT (0, 0), { &Lisp_Object_description } },
   { XD_LISP_OBJECT, offsetof (struct chain_coding_system,
 			      canonicalize_after_coding) },
   { XD_END }
＠＠ -2342,7 +2360,7 ＠＠
 static const struct memory_description chain_coding_stream_description_1 [] = {
   { XD_INT, offsetof (struct chain_coding_stream, lstream_count) },
   { XD_BLOCK_PTR, offsetof (struct chain_coding_stream, lstreams),
-    XD_INDIRECT (0, 0), { &lisp_object_description } },
+    XD_INDIRECT (0, 0), { &Lisp_Object_description } },
   { XD_END }
 };

＠＠ -2709,7 +2727,6 ＠＠
 		       unsigned_char_dynarr *dst, Bytecount n)
 {
   UExtbyte c;
-  unsigned int ch     = str->ch;
   Bytecount orign = n;

   if (str->direction == CODING_DECODE)
＠＠ -2720,9 +2737,6 ＠＠

 	  DECODE_ADD_BINARY_CHAR (c, dst);
 	}
-
-      if (str->eof)
-	DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
     }
   else
     {
＠＠ -2731,40 +2745,26 ＠＠
 	{
 	  c = *src++;
 	  if (byte_ascii_p (c))
-	    {
-	      assert (ch == 0);
-	      Dynarr_add (dst, c);
-	    }
+	    Dynarr_add (dst, c);
 #ifdef MULE
-	  else if (ibyte_leading_byte_p (c))
-	    {
-	      assert (ch == 0);
-	      if (c == LEADING_BYTE_LATIN_ISO8859_1 ||
-		  c == LEADING_BYTE_CONTROL_1)
-		ch = c;
-	      else
-		/* #### This is just plain unacceptable. */
-		Dynarr_add (dst, '~'); /* untranslatable character */
-	    }
 	  else
 	    {
-	      if (ch == LEADING_BYTE_LATIN_ISO8859_1)
-		Dynarr_add (dst, c);
-	      else if (ch == LEADING_BYTE_CONTROL_1)
+	      COPY_PARTIAL_CHAR_BYTE (c, str);
+	      if (!str->pind_remaining)
 		{
-		  assert (c < 0xC0);
-		  Dynarr_add (dst, c - 0x20);
+		  Ichar ch = non_ascii_itext_ichar (str->partial);
+		  if (ch < 256)
+		    Dynarr_add (dst, (unsigned char) ch);
+		  else
+		    /* #### This is just plain unacceptable. */
+		    /* untranslatable character */
+		    Dynarr_add (dst, CANT_CONVERT_CHAR_WHEN_ENCODING);
 		}
-	      /* else it should be the second or third byte of an
-		 untranslatable character, so ignore it */
-	      ch = 0;
 	    }
 #endif /* MULE */
-
 	}
     }

-  str->ch    = ch;
   return orign;
 }

Index: src/file-coding.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/file-coding.h,v
retrieving revision 1.27
diff -u -r1.27 file-coding.h
--- src/file-coding.h	2005/10/24 10:07:37	1.27
+++ src/file-coding.h	2005/11/22 14:00:40
＠＠ -2,7 +2,7 ＠＠
    #### rename me to coding-system.h
    Copyright (C) 1991, 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2000, 2001, 2002 Ben Wing.
+   Copyright (C) 2000, 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -230,7 +230,8 ＠＠
   ccl_coding_system,
   shift_jis_coding_system,
   big5_coding_system,
-  unicode_coding_system
+  unicode_coding_system,
+  mbcs_coding_system
 };

 struct coding_system_methods
＠＠ -640,6 +641,11 ＠＠

 #define MAX_BYTES_PROCESSED_FOR_DETECTION 65536

+/* ~~#### Must be a Lisp object because some detection states need a pointer
+   back to the stream in which this detection is operating. (Maybe just move
+   the stream up and make it a general property?  That way we could avoid,
+   for the moment, at least, any need to have mark methods for the various
+   detector parts). */
 struct detection_state
 {
   int seen_non_ascii;
＠＠ -892,14 +898,19 ＠＠
      data. */
   unsigned_char_dynarr *convert_from;

-  /* If set, this is the last chunk of data being processed.  When this is
-     finished, output any necessary terminating control characters, escape
-     sequences, etc. */
-  unsigned int eof:1;
-  
-  /* CH holds a partially built-up character.  This is really part of the
-     state-dependent data and should be moved there. */
-  unsigned int ch;
+  /* Hold a partially built-up character.  This is in some respects part
+     of the state-dependent data, but it is used in all coding methods. */
+  Ibyte partial[MAX_ICHAR_LEN];
+
+  /* Index into partially built-up character. */
+  int pind;
+
+  /* Number of bytes remaining to be built up in partially built-up char. */
+  int pind_remaining;
+
+  /* CH holds a partially built-up character, or -1 for none.  This is
+     really part of the state-dependent data and should be moved there. */
+  int ch;

   /* Coding-system-specific data holding extra state about the
      conversion.  Logically a struct TYPE_coding_stream; a pointer
＠＠ -914,6 +925,11 ＠＠

   enum encode_decode direction;

+  /* If set, this is the last chunk of data being processed.  When this is
+     finished, output any necessary terminating control characters, escape
+     sequences, etc. */
+  unsigned int eof:1;
+
   /* If set, don't close the stream at the other end when being closed. */
   unsigned int no_close_other:1;
   /* If set, read only one byte at a time from other end to avoid any
＠＠ -937,50 +953,18 ＠＠
 # define CODING_STREAM_TYPE_DATA(s, type) \
   ((struct type##_coding_stream *) (s)->data)
 #endif
-
-/* C should be a binary character in the range 0 - 255; convert
-   to internal format and add to Dynarr DST. */
-
-#ifdef MULE

-#define DECODE_ADD_BINARY_CHAR(c, dst)			\
-do {							\
-  if (byte_ascii_p (c))					\
-    Dynarr_add (dst, c);				\
-  else if (byte_c1_p (c))				\
-    {							\
-      Dynarr_add (dst, LEADING_BYTE_CONTROL_1);		\
-      Dynarr_add (dst, c + 0x20);			\
-    }							\
-  else							\
-    {							\
-      Dynarr_add (dst, LEADING_BYTE_LATIN_ISO8859_1);	\
-      Dynarr_add (dst, c);				\
-    }							\
-} while (0)
-
-#else /* not MULE */
-
-#define DECODE_ADD_BINARY_CHAR(c, dst)		\
+#define DECODE_OUTPUT_PARTIAL_CHAR(str, dst)	\
 do {						\
-  Dynarr_add (dst, c);				\
-} while (0)
-
-#endif /* MULE */
-
-#define DECODE_OUTPUT_PARTIAL_CHAR(ch, dst)	\
-do {						\
-  if (ch)					\
+  if (str->eof && str->ch >= 0)			\
     {						\
-      DECODE_ADD_BINARY_CHAR (ch, dst);		\
-      ch = 0;					\
+      DECODE_ADD_BINARY_CHAR (str->ch, dst);	\
+      str->ch = -1;				\
     }						\
 } while (0)

 #ifdef MULE
-/* Convert shift-JIS code (sj1, sj2) into internal string
-   representation (c1, c2). (The leading byte is assumed.) */
-
+/* Convert shift-JIS code (sj1, sj2) into JISX0208 position codes (c1, c2). */
 #define DECODE_SHIFT_JIS(sj1, sj2, c1, c2)		\
 do {							\
   int I1 = sj1, I2 = sj2;				\
＠＠ -990,15 +974,15 ＠＠
   else							\
     c1 = (I1 << 1) - ((I1 >= 0xe0) ? 0xe1 : 0x61),	\
     c2 = I2 + ((I2 >= 0x7f) ? 0x60 : 0x61);		\
+  c1 -= 0x80; c2 -= 0x80;				\
 } while (0)

-/* Convert the internal string representation of a Shift-JIS character
-   (c1, c2) into Shift-JIS code (sj1, sj2).  The leading byte is
-   assumed. */
+/* Convert the JISX0208 position codes (c1, c2) into Shift-JIS code
+   (sj1, sj2). */

 #define ENCODE_SHIFT_JIS(c1, c2, sj1, sj2)		\
 do {							\
-  int I1 = c1, I2 = c2;					\
+  int I1 = c1 + 0x80, I2 = c2 + 0x80;			\
   if (I1 & 1)						\
     sj1 = (I1 >> 1) + ((I1 < 0xdf) ? 0x31 : 0x71),	\
     sj2 = I2 - ((I2 >= 0xe0) ? 0x60 : 0x61);		\
＠＠ -1008,6 +992,24 ＠＠
 } while (0)
 #endif /* MULE */

+/* Copy the byte C in string representation into the accumulated partial
+   character in coding_stream STR. */
+#define COPY_PARTIAL_CHAR_BYTE(c, str)				\
+do {								\
+  if (ibyte_first_byte_p (c))					\
+    {								\
+      str->partial[0] = c;					\
+      str->pind = 1;						\
+      str->pind_remaining = rep_bytes_by_first_byte (c) - 1;	\
+    }								\
+  else								\
+    {								\
+      str->partial[str->pind++] = c;				\
+      str->pind_remaining--;					\
+    }								\
+ }                                                              \
+while (0)
+
 DECLARE_CODING_SYSTEM_TYPE (no_conversion);
 DECLARE_CODING_SYSTEM_TYPE (convert_eol);
 #if 0
＠＠ -1021,8 +1023,11 ＠＠
 #endif

 #ifdef MULE
+DECLARE_CODING_SYSTEM_TYPE (mbcs);
 DECLARE_CODING_SYSTEM_TYPE (iso2022);
+#ifdef HAVE_CCL
 DECLARE_CODING_SYSTEM_TYPE (ccl);
+#endif /* HAVE_CCL */
 DECLARE_CODING_SYSTEM_TYPE (shift_jis);
 DECLARE_CODING_SYSTEM_TYPE (big5);
 #endif
＠＠ -1043,7 +1048,8 ＠＠
 void set_coding_stream_coding_system (Lstream *stream,
 				      Lisp_Object codesys);
 Lisp_Object detect_coding_stream (Lisp_Object stream);
-Ichar decode_big5_char (int o1, int o2);
+void big5_char_to_fake_codepoint (int b1, int b2, Lisp_Object *charset,
+				  int *c1, int *c2);
 void add_entry_to_coding_system_type_list (struct coding_system_methods *m);
 Lisp_Object make_internal_coding_system (Lisp_Object existing,
 					 Ascbyte *prefix,
Index: src/fns.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/fns.c,v
retrieving revision 1.64
diff -u -r1.64 fns.c
--- src/fns.c	2005/10/25 11:16:24	1.64
+++ src/fns.c	2005/11/22 14:00:41
＠＠ -1,6 +1,6 ＠＠
 /* Random utility Lisp functions.
    Copyright (C) 1985, 86, 87, 93, 94, 95 Free Software Foundation, Inc.
-   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -41,6 +41,7 ＠＠

 #include "buffer.h"
 #include "bytecode.h"
+#include "casetab.h"
 #include "device.h"
 #include "events.h"
 #include "extents.h"
＠＠ -359,22 +360,16 ＠＠

 DEFUN ("string-lessp", Fstring_lessp, 2, 2, 0, /*
 Return t if first arg string is less than second in lexicographic order.
-Comparison is simply done on a character-by-character basis using the
-numeric value of a character. (Note that this may not produce
-particularly meaningful results under Mule if characters from
-different charsets are being compared.)
-
-Symbols are also allowed; their print names are used instead.
-
-Currently we don't do proper language-specific collation or handle
-multiple character sets.  This may be changed when Unicode support
-is implemented.
+Under old-Mule, comparison of chars within a charset is will-defined but
+comparison of chars from different charsets is produce well-defined
+results; however, ASCII, Control-1 and Latin-1 are guaranteed to compare in
+the expected fashion and will be below all other charsets.  Under
+Unicode-internal, comparison will happen according to the Unicode codepoint
+assigned to each character.
 */
        (string1, string2))
 {
   Lisp_Object p1, p2;
-  Charcount end, len2;
-  int i;

   if (SYMBOLP (string1))
     p1 = XSYMBOL (string1)->name;
＠＠ -392,33 +387,17 ＠＠
       p2 = string2;
     }

-  end  = string_char_length (p1);
-  len2 = string_char_length (p2);
-  if (end > len2)
-    end = len2;
-
-  {
-    Ibyte *ptr1 = XSTRING_DATA (p1);
-    Ibyte *ptr2 = XSTRING_DATA (p2);
-
-    /* #### It is not really necessary to do this: We could compare
-       byte-by-byte and still get a reasonable comparison, since this
-       would compare characters with a charset in the same way.  With
-       a little rearrangement of the leading bytes, we could make most
-       inter-charset comparisons work out the same, too; even if some
-       don't, this is not a big deal because inter-charset comparisons
-       aren't really well-defined anyway. */
-    for (i = 0; i < end; i++)
-      {
-	if (itext_ichar (ptr1) != itext_ichar (ptr2))
-	  return itext_ichar (ptr1) < itext_ichar (ptr2) ? Qt : Qnil;
-	INC_IBYTEPTR (ptr1);
-	INC_IBYTEPTR (ptr2);
-      }
-  }
-  /* Can't do i < len2 because then comparison between "foo" and
"foo^＠"
-     won't work right in I18N2 case */
-  return end < len2 ? Qt : Qnil;
+  /* Since we've assigned Control-1 and Latin-1 the two lowest leading
+     bytes, the statement above about them and their ordering w.r.t.  other
+     charsets is guaranteed.  Also, UTF-8 preserves Unicode character order
+     when comparing byte-by-byte.  So need no to do an actual char-by-char
+     comparison. */
+
+  if (qxememcmp4 (XSTRING_DATA (p1), XSTRING_LENGTH (p1),
+		  XSTRING_DATA (p2), XSTRING_LENGTH (p2)) < 0)
+    return Qt;
+  else
+    return Qnil;
 }

 DEFUN ("string-modified-tick", Fstring_modified_tick, 1, 1, 0, /*
Index: src/font-lock.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/font-lock.c,v
retrieving revision 1.16
diff -u -r1.16 font-lock.c
--- src/font-lock.c	2005/01/24 23:33:55	1.16
+++ src/font-lock.c	2005/11/22 14:00:41
＠＠ -561,6 +561,8 ＠＠
 	      }
             else if (context_cache.context == context_none)
 	      {
+		/* Calling syntax_match() in the syntax table, not the mirror
+		   table, is correct; see syntax_match() */
 		Lisp_Object stringtermobj =
 		  syntax_match (scache->syntax_table, c);
 		Ichar stringterm;
Index: src/frame-impl.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/frame-impl.h,v
retrieving revision 1.6
diff -u -r1.6 frame-impl.h
--- src/frame-impl.h	2005/10/24 10:07:37	1.6
+++ src/frame-impl.h	2005/11/22 14:00:41
＠＠ -112,6 +112,8 ＠＠
   display_line_dynarr *current_display_lines[4];
   display_line_dynarr *desired_display_lines[4];

+  /* ~~#### Instead of doing this, attach this data to the end of the same
+     structure; avoids the need to create new TTY, X, etc. Lisp objects */
   /* A structure of auxiliary data specific to the device type.  For
      example, struct x_frame is for X window frames; defined in
      console-x-impl.h. */
Index: src/frame-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/frame-x.c,v
retrieving revision 1.70
diff -u -r1.70 frame-x.c
--- src/frame-x.c	2005/04/27 09:01:48	1.70
+++ src/frame-x.c	2005/11/22 14:00:42
＠＠ -29,6 +29,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "charset.h" /* for Vcharset_ascii */
 #include "device-impl.h"
 #include "events.h"
 #include "extents.h"
Index: src/frame.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/frame.c,v
retrieving revision 1.73
diff -u -r1.73 frame.c
--- src/frame.c	2005/10/25 11:16:24	1.73
+++ src/frame.c	2005/11/22 14:00:43
＠＠ -241,8 +241,7 ＠＠
   struct frame *frm = XFRAME (obj);

   if (print_readably)
-    printing_unreadable_object ("#<frame %s 0x%x>",
-				XSTRING_DATA (frm->name), frm->header.uid);
+    printing_unreadable_lcrecord (obj, XSTRING_DATA (frm->name));

   write_fmt_string (printcharfun, "#<%s-frame ", !FRAME_LIVE_P (frm) ?
"dead":
 		    FRAME_TYPE_NAME (frm));
Index: src/general-slots.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/general-slots.h,v
retrieving revision 1.16
diff -u -r1.16 general-slots.h
--- src/general-slots.h	2005/07/03 21:48:01	1.16
+++ src/general-slots.h	2005/11/22 14:00:43
＠＠ -124,6 +124,7 ＠＠
 SYMBOL (Qexternal);
 SYMBOL (Qface);
 SYMBOL (Qfaces);
+SYMBOL (Qfail);
 SYMBOL (Qfallback);
 SYMBOL (Qfile);
 SYMBOL_MODULE_API (Qfile_name);
＠＠ -207,6 +208,7 ＠＠
 SYMBOL (Qnothing);
 SYMBOL_MODULE_API (Qnotice);
 SYMBOL (Qobject);
+SYMBOL (Qoffset);
 SYMBOL (Qok);
 SYMBOL (Qold_assoc);
 SYMBOL (Qold_delete);
＠＠ -261,8 +263,10 ＠＠
 SYMBOL (Qstream);
 SYMBOL (Qstring);
 SYMBOL_KEYWORD (Q_style);
-SYMBOL_KEYWORD (Q_suffix);
+SYMBOL (Qsubstitute);
 SYMBOL (Qsubtype);
+SYMBOL (Qsucceed);
+SYMBOL_KEYWORD (Q_suffix);
 SYMBOL (Qsymbol);
 SYMBOL (Qsyntax);
 SYMBOL (Qsystem_default);
Index: src/glyphs-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/glyphs-msw.c,v
retrieving revision 1.56
diff -u -r1.56 glyphs-msw.c
--- src/glyphs-msw.c	2005/09/27 05:48:26	1.56
+++ src/glyphs-msw.c	2005/11/22 14:00:44
＠＠ -31,6 +31,7 ＠＠
 #include <config.h>
 #include "lisp.h"

+#include "charset.h"
 #include "device-impl.h"
 #include "elhash.h"
 #include "faces.h"
＠＠ -1818,8 +1819,10 ＠＠
   for (p = XSTRING_DATA (text); *p;)
     {
       Ichar c = itext_ichar (p);
-      if (!EQ (ichar_charset (c), Vcharset_ascii))
-	return ichar_charset (c);
+      /* ＠＠#### fix me */
+      Lisp_Object charset = ichar_charset_obsolete_me_baby_please (c);
+      if (!EQ (charset, Vcharset_ascii))
+	return charset;
       INC_IBYTEPTR (p);
     }
 #endif /* MULE */
Index: src/glyphs-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/glyphs-x.c,v
retrieving revision 1.83
diff -u -r1.83 glyphs-x.c
--- src/glyphs-x.c	2005/09/27 05:48:26	1.83
+++ src/glyphs-x.c	2005/11/22 14:00:45
＠＠ -2341,9 +2341,46 ＠＠
 #ifdef HAVE_X_WIDGETS

 /************************************************************************/
-/*                            widgets                            */
+/*                                widgets                               */
 /************************************************************************/

+static Lisp_Object
+very_bogusly_return_only_the_first_needed_font (Lisp_Object string,
+						Lisp_Object face,
+						Lisp_Object domain)
+{
+  int i;
+
+  struct face_cachel frame_cachel;
+  struct face_cachel *cachel;
+  Lisp_Object frame = DOMAIN_FRAME (domain);
+  Ichar_dynarr *buf = Dynarr_new (Ichar);
+
+  convert_ibyte_string_into_ichar_dynarr
+    (XSTRING_DATA (string), XSTRING_LENGTH (string), buf);
+
+  reset_face_cachel (&frame_cachel);
+  update_face_cachel_data (&frame_cachel, frame, face);
+  cachel = &frame_cachel;
+
+  ensure_face_cachel_complete (cachel, domain,
+			       Dynarr_atp (buf, 0),
+			       Dynarr_length (buf));
+
+  Dynarr_free (buf);
+
+  /* ＠＠#### This is majorly bogus.  We are just returning the first font
+     we find, which will be wrong when there are multiple fonts needed. */
+  for (i = 0; i < Stynarr_length (cachel->font); i++)
+    {
+      Lisp_Object font = Stynarr_at (cachel->font, i).value;
+      assert (!UNBOUNDP (font));
+      return font;
+    }
+
+  return Qnil;			/* NOT REACHED */
+}
+
 static void
 update_widget_face (widget_value* wv, Lisp_Image_Instance *ii,
 		    Lisp_Object domain)
＠＠ -2365,17 +2402,20 ＠＠
   lw_add_widget_value_arg (wv, XtNbackground, bcolor.pixel);

 #ifdef LWLIB_WIDGETS_MOTIF
+  /* ＠＠#### Fix me.  This should extract all fonts. */
   fontList = XmFontListCreate
     (FONT_INSTANCE_X_FONT
-     (XFONT_INSTANCE (query_string_font
+     (XFONT_INSTANCE (very_bogusly_return_only_the_first_needed_font
 		      (IMAGE_INSTANCE_WIDGET_TEXT (ii),
 		       IMAGE_INSTANCE_WIDGET_FACE (ii),
 		       domain))),  XmSTRING_DEFAULT_CHARSET);
   lw_add_widget_value_arg (wv, XmNfontList, (XtArgVal)fontList);
 #endif
+  /* ＠＠#### Fix me.  This should somehow convey all fonts needed when
+     not Motif. */
   lw_add_widget_value_arg
     (wv, XtNfont, (XtArgVal)FONT_INSTANCE_X_FONT
-     (XFONT_INSTANCE (query_string_font
+     (XFONT_INSTANCE (very_bogusly_return_only_the_first_needed_font
 		      (IMAGE_INSTANCE_WIDGET_TEXT (ii),
 		       IMAGE_INSTANCE_WIDGET_FACE (ii),
 		       domain))));
Index: src/glyphs.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/glyphs.c,v
retrieving revision 1.54
diff -u -r1.54 glyphs.c
--- src/glyphs.c	2005/10/24 10:07:37	1.54
+++ src/glyphs.c	2005/11/22 14:00:46
＠＠ -991,8 +991,7 ＠＠
   Lisp_Image_Instance *ii = XIMAGE_INSTANCE (obj);

   if (print_readably)
-    printing_unreadable_object ("#<image-instance 0x%x>",
-           ii->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_fmt_string_lisp (printcharfun, "#<image-instance (%s) ", 1,
 			 Fimage_instance_type (obj));
   if (!NILP (ii->name))
＠＠ -2374,10 +2373,10 ＠＠
    helper that is used elsewhere for calculating text geometry. */
 void
 query_string_geometry (Lisp_Object string, Lisp_Object face,
-		       int* width, int* height, int* descent, Lisp_Object domain)
+		       int *width, int *height, int *descent,
+		       Lisp_Object domain)
 {
   struct font_metric_info fm;
-  unsigned char charsets[NUM_LEADING_BYTES];
   struct face_cachel frame_cachel;
   struct face_cachel *cachel;
   Lisp_Object frame = DOMAIN_FRAME (domain);
＠＠ -2387,10 +2386,11 ＠＠
   /* Compute height */
   if (height)
     {
+      Ichar_dynarr *buf = Dynarr_new (Ichar);
+      convert_ibyte_string_into_ichar_dynarr
+	(XSTRING_DATA (string), XSTRING_LENGTH (string), buf);
+
       /* Compute string metric info */
-      find_charsets_in_ibyte_string (charsets,
-				       XSTRING_DATA   (string),
-				       XSTRING_LENGTH (string));

       /* Fallback to the default face if none was provided. */
       if (!NILP (face))
＠＠ -2405,61 +2405,24 ＠＠
 				       DEFAULT_INDEX);
 	}

-      ensure_face_cachel_complete (cachel, domain, charsets);
-      face_cachel_charset_font_metric_info (cachel, charsets, &fm);
+      face_cachel_char_font_metric_info (cachel, domain, Dynarr_atp (buf, 0),
+					 Dynarr_length (buf), &fm);

       *height = fm.ascent + fm.descent;
       /* #### descent only gets set if we query the height as well. */
       if (descent)
 	*descent = fm.descent;
+      Dynarr_free (buf);
     }

   /* Compute width */
   if (width)
-    {
-      if (!NILP (face))
-	*width = redisplay_frame_text_width_string (XFRAME (frame),
-						    face,
-						    0, string, 0, -1);
-      else
-	*width = redisplay_frame_text_width_string (XFRAME (frame),
-						    Vdefault_face,
-						    0, string, 0, -1);
-    }
-}
-
-Lisp_Object
-query_string_font (Lisp_Object string, Lisp_Object face, Lisp_Object domain)
-{
-  unsigned char charsets[NUM_LEADING_BYTES];
-  struct face_cachel frame_cachel;
-  struct face_cachel *cachel;
-  int i;
-  Lisp_Object frame = DOMAIN_FRAME (domain);
-
-  /* Compute string font info */
-  find_charsets_in_ibyte_string (charsets,
-				   XSTRING_DATA   (string),
-				   XSTRING_LENGTH (string));
-
-  reset_face_cachel (&frame_cachel);
-  update_face_cachel_data (&frame_cachel, frame, face);
-  cachel = &frame_cachel;
-
-  ensure_face_cachel_complete (cachel, domain, charsets);
-
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
     {
-      if (charsets[i])
-	{
-	  return FACE_CACHEL_FONT (cachel,
-				   charset_by_leading_byte (i +
-							    MIN_LEADING_BYTE));
-
-	}
+      *width = redisplay_frame_text_width_string (XFRAME (frame),
+						  !NILP (face) ? face:
+						  Vdefault_face,
+						  0, string, 0, -1);
     }
-
-  return Qnil;			/* NOT REACHED */
 }

 static void
＠＠ -3680,7 +3643,7 ＠＠
   Lisp_Glyph *glyph = XGLYPH (obj);

   if (print_readably)
-    printing_unreadable_object ("#<glyph 0x%x>", glyph->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string_lisp (printcharfun, "#<glyph (%s", 1, Fglyph_type
(obj));
   write_fmt_string_lisp (printcharfun, ") %S", 1, glyph->image);
Index: src/glyphs.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/glyphs.h,v
retrieving revision 1.40
diff -u -r1.40 glyphs.h
--- src/glyphs.h	2005/10/24 10:07:37	1.40
+++ src/glyphs.h	2005/11/22 14:00:46
＠＠ -1055,8 +1055,6 ＠＠
 void query_string_geometry ( Lisp_Object string, Lisp_Object face,
 			     int* width, int* height, int* descent,
 			     Lisp_Object domain);
-Lisp_Object query_string_font (Lisp_Object string,
-			       Lisp_Object face, Lisp_Object domain);
 Lisp_Object add_glyph_animated_timeout (EMACS_INT tickms, Lisp_Object device);
 void disable_glyph_animated_timeout (int i);

Index: src/gui-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/gui-msw.c,v
retrieving revision 1.10
diff -u -r1.10 gui-msw.c
--- src/gui-msw.c	2004/11/04 23:06:34	1.10
+++ src/gui-msw.c	2005/11/22 14:00:46
＠＠ -26,14 +26,15 ＠＠

 #include <config.h>
 #include "lisp.h"
+
+#include "casetab.h"
 #include "console-msw-impl.h"
-#include "redisplay.h"
-#include "gui.h"
-#include "glyphs.h"
-#include "frame-impl.h"
 #include "elhash.h"
 #include "events.h"
-#include "buffer.h"
+#include "frame-impl.h"
+#include "glyphs.h"
+#include "gui.h"
+#include "redisplay.h"

 /*
  * Return value is Qt if we have dispatched the command,
Index: src/gui.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/gui.c,v
retrieving revision 1.31
diff -u -r1.31 gui.c
--- src/gui.c	2005/10/24 10:07:37	1.31
+++ src/gui.c	2005/11/22 14:00:46
＠＠ -28,8 +28,8 ＠＠
 #include <config.h>
 #include "lisp.h"

-#include "buffer.h"
 #include "bytecode.h"
+#include "casetab.h"
 #include "elhash.h"
 #include "gui.h"
 #include "menubar.h"
＠＠ -694,7 +694,7 ＠＠
   Lisp_Gui_Item *g = XGUI_ITEM (obj);

   if (print_readably)
-    printing_unreadable_object ("#<gui-item 0x%x>", g->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string (printcharfun, "#<gui-item 0x%x>", g->header.uid);
 }
Index: src/gutter.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/gutter.c,v
retrieving revision 1.19
diff -u -r1.19 gutter.c
--- src/gutter.c	2005/10/25 11:16:25	1.19
+++ src/gutter.c	2005/11/22 14:00:47
＠＠ -1,5 +1,6 ＠＠
 /* Gutter implementation.
    Copyright (C) 1999, 2000 Andy Piper.
+   Copyright (C) 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -300,7 +301,7 ＠＠
       /* grab coordinates of last line  */
       if (Dynarr_length (ddla))
 	{
-	  dl = Dynarr_atp (ddla, Dynarr_length (ddla) - 1);
+	  dl = Dynarr_lastp (ddla);
 	  size = (dl->ypos + dl->descent - dl->clip) 
 	    - (Dynarr_atp (ddla, 0)->ypos - Dynarr_atp (ddla, 0)->ascent);
 	}
＠＠ -456,7 +457,7 ＠＠
       /* grab coordinates of last line and blank after it. */
       if (Dynarr_length (ddla) > 0)
 	{
-	  dl = Dynarr_atp (ddla, Dynarr_length (ddla) - 1);
+	  dl = Dynarr_lastp (ddla);
 	  ypos = dl->ypos + dl->descent - dl->clip;
 	}
       else
Index: src/indent.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/indent.c,v
retrieving revision 1.21
diff -u -r1.21 indent.c
--- src/indent.c	2005/10/25 11:16:25	1.21
+++ src/indent.c	2005/11/22 14:00:47
＠＠ -169,11 +169,7 ＠＠
 		  - (displayed_glyphs->begin_columns
 		     + displayed_glyphs->end_columns));
 #else /* XEmacs */
-#ifdef MULE
-	  col += XCHARSET_COLUMNS (ichar_charset (c));
-#else
-	  col ++;
-#endif /* MULE */
+	  col += ichar_columns (c);
 #endif /* XEmacs */
 	}
     }
＠＠ -226,11 +222,7 ＠＠
       else if (c == '\n')
 	break;
       else
-#ifdef MULE
-	  col += XCHARSET_COLUMNS (ichar_charset (c));
-#else
-	  col ++;
-#endif /* MULE */
+	col += ichar_columns (c);
     }

   if (tab_seen)
＠＠ -456,11 +448,7 ＠＠
 		  - (displayed_glyphs->begin_columns
 		     + displayed_glyphs->end_columns));
 #else /* XEmacs */
-#ifdef MULE
-	  col += XCHARSET_COLUMNS (ichar_charset (c));
-#else
-	  col ++;
-#endif /* MULE */
+	  col += ichar_columns (c);
 #endif /* XEmacs */
 	}

Index: src/intl-win32.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/intl-win32.c,v
retrieving revision 1.16
diff -u -r1.16 intl-win32.c
--- src/intl-win32.c	2005/09/16 08:51:26	1.16
+++ src/intl-win32.c	2005/11/22 14:00:47
＠＠ -35,6 +35,7 ＠＠
 #include <config.h>
 #include "lisp.h"

+#include "charset.h"
 #include "elhash.h"
 #include "faces.h"
 #include "file-coding.h"
Index: src/intl.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/intl.c,v
retrieving revision 1.10
diff -u -r1.10 intl.c
--- src/intl.c	2005/09/24 16:31:39	1.10
+++ src/intl.c	2005/11/22 14:00:47
＠＠ -207,4 +207,8 ＠＠
 #ifdef MULE
   Fprovide (intern ("mule"));
 #endif /* MULE */
+
+#ifdef UNICODE_INTERNAL
+  Fprovide (intern ("unicode-internal"));
+#endif /* UNICODE_INTERNAL */
 }
Index: src/keymap.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/keymap.c,v
retrieving revision 1.59
diff -u -r1.59 keymap.c
--- src/keymap.c	2005/10/25 11:16:25	1.59
+++ src/keymap.c	2005/11/22 14:00:49
＠＠ -2,7 +2,7 ＠＠
    Copyright (C) 1985, 1991-1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.
    Totally redesigned by jwz in 1991.

 This file is part of XEmacs.
＠＠ -260,7 +260,7 ＠＠
   /* This function can GC */
   Lisp_Keymap *keymap = XKEYMAP (obj);
   if (print_readably)
-    printing_unreadable_object ("#<keymap 0x%x>",
keymap->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_c_string (printcharfun, "#<keymap ");
   if (!NILP (keymap->name))
     {
＠＠ -3411,34 +3411,63 ＠＠
     {
       p += set_itext_ichar (p, c);
     }
-  else if (c < 040 && ctl_p)
+  else if (c < 32 && ctl_p)
     {
       *p++ = '^';
-      *p++ = c + 64;		/* 'A' - 1 */
+      *p++ = c + 'A' - 1; /* 1 -> 'A' */
     }
-  else if (c == 0177)
+  else if (c == 127)
     {
       *p++ = '^';
       *p++ = '?';
     }
-  else if (c >= 0200 || c < 040)
+  else if (c >= 128 || c < 32)
     {
       *p++ = '\\';
 #ifdef MULE
-      /* !!#### This syntax is not readable.  It will
-	 be interpreted as a 3-digit octal number rather
-	 than a 7-digit octal number. */
-      if (c >= 0400)
-	{
-	  *p++ = '0' + ((c & 07000000) >> 18);
-	  *p++ = '0' + ((c & 0700000) >> 15);
-	  *p++ = '0' + ((c & 070000) >> 12);
-	  *p++ = '0' + ((c & 07000) >> 9);
+#ifdef UNICODE_INTERNAL
+      /* Output Unicode codes directly */
+#define FROB(x) ((Ibyte) ((x) >= 10 ? (x) + 'A' - 10 : (x) + '0'))
+      if (c >= 65536)
+	{
+	  *p++ = 'U';
+	  *p++ = FROB (c >> 28);
+	  *p++ = FROB (c >> 24 & 0x0F);
+	  *p++ = FROB (c >> 20 & 0x0F);
+	  *p++ = FROB (c >> 16 & 0x0F);
+	  *p++ = FROB (c >> 12 & 0x0F);
+	  *p++ = FROB (c >>  8 & 0x0F);
+	  *p++ = FROB (c >>  4 & 0x0F);
+	  *p++ = FROB (c & 0x0F);
+	}
+      else if (c >= 256)
+	{
+	  *p++ = 'u';
+	  *p++ = FROB (c >> 12 & 0x0F);
+	  *p++ = FROB (c >>  8 & 0x0F);
+	  *p++ = FROB (c >>  4 & 0x0F);
+	  *p++ = FROB (c & 0x0F);
+	}
+#undef FROB
+#else /* not UNICODE_INTERNAL */
+      /* Don't output Unicode because it is lossy. */
+      if (c >= 256)
+	{
+	  Ibyte hexbuf[200]; /* Way more than enough */
+	  Ibyte *hp = hexbuf;
+	  qxesprintf (hexbuf, "%X", c);
+	  *p++ = 'x';
+	  while (*hp)
+	    *p++ = *hp++;
 	}
-#endif
-      *p++ = '0' + ((c & 0700) >> 6);
-      *p++ = '0' + ((c & 0070) >> 3);
-      *p++ = '0' + ((c & 0007));
+#endif /* not UNICODE_INTERNAL */
+      else
+#endif /* MULE */
+	{
+	  *p++ = '0' + ((c & 0700) >> 6);
+	  *p++ = '0' + ((c & 0070) >> 3);
+	  *p++ = '0' + ((c & 0007));
+	}
     }
   else
     {
＠＠ -4406,7 +4435,7 ＠＠
 This can be any form recognized as a single key specifier.
 To disable the meta-prefix-char, set it to a negative number.
 */ );
-  Vmeta_prefix_char = make_char (033);
+  Vmeta_prefix_char = make_char (0x1B);

   DEFVAR_LISP ("mouse-grabbed-buffer", &Vmouse_grabbed_buffer /*
 A buffer which should be consulted first for all mouse activity.
Index: src/lisp-disunion.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lisp-disunion.h,v
retrieving revision 1.18
diff -u -r1.18 lisp-disunion.h
--- src/lisp-disunion.h	2002/11/18 06:52:40	1.18
+++ src/lisp-disunion.h	2005/11/22 14:00:49
＠＠ -1,6 +1,6 ＠＠
 /* Fundamental definitions for XEmacs Lisp interpreter -- non-union objects.
    Copyright (C) 1985, 1986, 1987, 1992, 1993 Free Software Foundation, Inc.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -82,7 +82,9 ＠＠
 #define VALMASK (((1UL << VALBITS) - 1UL) << GCTYPEBITS)
 #define XTYPE(x) ((enum Lisp_Type) (((EMACS_UINT)(x)) & ~VALMASK))
 #define XPNTRVAL(x) (x) /* This depends on Lisp_Type_Record == 0 */
-#define XCHARVAL(x) ((x) >> GCBITS)
+/* A character is always >= 0, so get 30 bits out of it by treating it as
+   unsigned */
+#define XCHARVAL(x) ((EMACS_UINT)(x) >> GCBITS)
 #define XREALINT(x) ((x) >> INT_GCBITS)
 #define XUINT(x) ((EMACS_UINT)(x) >> INT_GCBITS)

Index: src/lisp.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lisp.h,v
retrieving revision 1.132
diff -u -r1.132 lisp.h
--- src/lisp.h	2005/10/24 10:07:38	1.132
+++ src/lisp.h	2005/11/22 14:00:50
＠＠ -114,58 +114,281 ＠＠
    control this using configure, but you can manually stick in a define as
    necessary. */

+/* How these work:
+
+   The most common classes will be `text' and `type', followed by
`structure'.
+   `text' is for problems related to bad textual format.  `type' is for
+   problems related to wrongly typed arguments, structure fields, etc.
+   `structure' is for bad data inside of a structure.  Sometimes these are
+   used "incorrectly", e.g. `type' is often used for structure-checking.
+   Consider `text':
+
+   `text_checking_assert() will assert() only when ERROR_CHECK_TEXT is defined;
+   otherwise it's a no-op.  text_checking_assert_at_line() is similar, but
+   allows you to override the file name and line number normally supplied in
+   the message.  This is especially useful in inline header functions, and
+   so there's a special inline_text_checking_assert() for this; this works
+   like text_checking_assert() but supplies the file and line of the calling
+   function.  In order for this to work, you need to declare your inline
+   function with INLINE_TEXT_CHECK_ARGS at the end of its argument list,
+   and give its function name a _1 extension or similar.  Then create a
+   macro that calls your inline function and includes INLINE_TEXT_CHECK_CALL
+   at the end of the parameter list.  This will arrange to pass in and receive
+   the file and line (__FILE__, __LINE__) at place where the call occurs in
+   the calling function; but nothing will get passed in when ERROR_CHECK_TEXT
+   is not defined.
+
+#ifdef ERROR_CHECK_TEXT
+#define text_checking_assert(assertion) assert (assertion)
+#define text_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_text_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_TEXT_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_TEXT_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define text_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else // not ERROR_CHECK_TEXT
+#define text_checking_assert(assertion) disabled_assert (assertion)
+#define text_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_text_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_TEXT_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_TEXT_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define text_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif // ERROR_CHECK_TEXT
+*/
+
+
 #ifdef ERROR_CHECK_STRUCTURES
 /* Check for problems with the catch list and specbind stack */
 #define ERROR_CHECK_CATCH
 /* Check for insufficient use of call_trapping_problems(), particularly
    due to glyph-related changes causing eval or QUIT within redisplay */
 #define ERROR_CHECK_TRAPPING_PROBLEMS
-#endif
+#endif /* ERROR_CHECK_STRUCTURES */
+
+#define INLINE_ERROR_CHECK_ARGS , const char *__file__, int __line__
+#define INLINE_ERROR_CHECK_CALL , __FILE__, __LINE__
+#define DISABLED_INLINE_ERROR_CHECK_ARGS
+#define DISABLED_INLINE_ERROR_CHECK_CALL
+
+/* For assertions in inline header functions which will report the file and
+   line of the calling function */
+#define inline_assert(assertion) assert_at_line (assertion, __file__, __line__)
+#define disabled_inline_assert(assertion) \
+  disabled_assert_at_line (assertion, __file__, __line__)

+#ifdef ERROR_CHECK_TEXT
+#define text_checking_assert(assertion) assert (assertion)
+#define text_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_text_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_TEXT_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_TEXT_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define text_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_TEXT */
+#define text_checking_assert(assertion) disabled_assert (assertion)
+#define text_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_text_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_TEXT_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_TEXT_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define text_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_TEXT */
+
 #ifdef ERROR_CHECK_TYPES
 #define type_checking_assert(assertion) assert (assertion)
 #define type_checking_assert_at_line(assertion, file, line) \
   assert_at_line (assertion, file, line)
+#define inline_type_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_TYPE_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_TYPE_CHECK_CALL INLINE_ERROR_CHECK_CALL
 #define type_checking_assert_with_message(assertion, msg) \
   assert_with_message (assertion, msg)
-#else
-#define type_checking_assert(assertion)
-#define type_checking_assert_at_line(assertion, file, line)
-#define type_checking_assert_with_message(assertion, msg)
-#endif
+#else /* not ERROR_CHECK_TYPES */
+#define type_checking_assert(assertion) disabled_assert (assertion)
+#define type_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_type_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_TYPE_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_TYPE_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define type_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_TYPES */
+
+#ifdef ERROR_CHECK_STRUCTURES
+#define structure_checking_assert(assertion) assert (assertion)
+#define structure_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_structure_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_STRUCTURE_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_STRUCTURE_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define structure_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_STRUCTURES */
+#define structure_checking_assert(assertion) disabled_assert (assertion)
+#define structure_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_structure_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_STRUCTURE_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_STRUCTURE_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define structure_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_STRUCTURES */
+
 #ifdef ERROR_CHECK_GC
 #define gc_checking_assert(assertion) assert (assertion)
 #define gc_checking_assert_at_line(assertion, file, line) \
   assert_at_line (assertion, file, line)
+#define inline_gc_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_GC_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_GC_CHECK_CALL INLINE_ERROR_CHECK_CALL
 #define gc_checking_assert_with_message(assertion, msg) \
   assert_with_message (assertion, msg)
-#else
-#define gc_checking_assert(assertion)
-#define gc_checking_assert_at_line(assertion, file, line)
-#define gc_checking_assert_with_message(assertion, msg)
-#endif
-#ifdef ERROR_CHECK_TEXT
-#define text_checking_assert(assertion) assert (assertion)
-#define text_checking_assert_at_line(assertion, file, line) \
+#else /* not ERROR_CHECK_GC */
+#define gc_checking_assert(assertion) disabled_assert (assertion)
+#define gc_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_gc_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_GC_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_GC_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define gc_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_GC */
+
+#ifdef ERROR_CHECK_DISPLAY
+#define display_checking_assert(assertion) assert (assertion)
+#define display_checking_assert_at_line(assertion, file, line) \
   assert_at_line (assertion, file, line)
-#define text_checking_assert_with_message(assertion, msg) \
+#define inline_display_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_DISPLAY_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_DISPLAY_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define display_checking_assert_with_message(assertion, msg) \
   assert_with_message (assertion, msg)
-#else
-#define text_checking_assert(assertion)
-#define text_checking_assert_at_line(assertion, file, line)
-#define text_checking_assert_with_message(assertion, msg)
-#endif
+#else /* not ERROR_CHECK_DISPLAY */
+#define display_checking_assert(assertion) disabled_assert (assertion)
+#define display_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_display_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_DISPLAY_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_DISPLAY_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define display_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_DISPLAY */
+
+#ifdef ERROR_CHECK_GLYPHS
+#define glyph_checking_assert(assertion) assert (assertion)
+#define glyph_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_glyph_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_GLYPH_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_GLYPH_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define glyph_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_GLYPHS */
+#define glyph_checking_assert(assertion) disabled_assert (assertion)
+#define glyph_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_glyph_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_GLYPH_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_GLYPH_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define glyph_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_GLYPHS */
+
+#ifdef ERROR_CHECK_EXTENTS
+#define extent_checking_assert(assertion) assert (assertion)
+#define extent_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_extent_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_EXTENT_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_EXTENT_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define extent_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_EXTENTS */
+#define extent_checking_assert(assertion) disabled_assert (assertion)
+#define extent_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_extent_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_EXTENT_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_EXTENT_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define extent_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_EXTENTS */
+
+#ifdef ERROR_CHECK_MALLOC
+#define malloc_checking_assert(assertion) assert (assertion)
+#define malloc_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_malloc_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_MALLOC_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_MALLOC_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define malloc_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_MALLOC */
+#define malloc_checking_assert(assertion) disabled_assert (assertion)
+#define malloc_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_malloc_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_MALLOC_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_MALLOC_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define malloc_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_MALLOC */
+
+#ifdef ERROR_CHECK_BYTE_CODE
+#define byte_code_checking_assert(assertion) assert (assertion)
+#define byte_code_checking_assert_at_line(assertion, file, line) \
+  assert_at_line (assertion, file, line)
+#define inline_byte_code_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_BYTE_CODE_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_BYTE_CODE_CHECK_CALL INLINE_ERROR_CHECK_CALL
+#define byte_code_checking_assert_with_message(assertion, msg) \
+  assert_with_message (assertion, msg)
+#else /* not ERROR_CHECK_BYTE_CODE */
+#define byte_code_checking_assert(assertion) disabled_assert (assertion)
+#define byte_code_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_byte_code_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_BYTE_CODE_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_BYTE_CODE_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define byte_code_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_BYTE_CODE */
+
 #ifdef ERROR_CHECK_TRAPPING_PROBLEMS
 #define trapping_problems_checking_assert(assertion) assert (assertion)
 #define trapping_problems_checking_assert_at_line(assertion, file, line) \
   assert_at_line (assertion, file, line)
+#define inline_trapping_problems_checking_assert(assertion) inline_assert (assertion)
+#define INLINE_TRAPPING_PROBLEMS_CHECK_ARGS INLINE_ERROR_CHECK_ARGS
+#define INLINE_TRAPPING_PROBLEMS_CHECK_CALL INLINE_ERROR_CHECK_CALL
 #define trapping_problems_checking_assert_with_message(assertion, msg) \
   assert_with_message (assertion, msg)
-#else
-#define trapping_problems_checking_assert(assertion)
-#define trapping_problems_checking_assert_at_line(assertion, file, line)
-#define trapping_problems_checking_assert_with_message(assertion, msg)
-#endif
+#else /* not ERROR_CHECK_TRAPPING_PROBLEMS */
+#define trapping_problems_checking_assert(assertion) disabled_assert (assertion)
+#define trapping_problems_checking_assert_at_line(assertion, file, line) \
+  disabled_assert_at_line (assertion, file, line)
+#define inline_trapping_problems_checking_assert(assertion) \
+  disabled_inline_assert (assertion)
+#define INLINE_TRAPPING_PROBLEMS_CHECK_ARGS DISABLED_INLINE_ERROR_CHECK_ARGS
+#define INLINE_TRAPPING_PROBLEMS_CHECK_CALL DISABLED_INLINE_ERROR_CHECK_CALL
+#define trapping_problems_checking_assert_with_message(assertion, msg) \
+  disabled_assert_with_message (assertion, msg)
+#endif /* ERROR_CHECK_TRAPPING_PROBLEMS */

 /************************************************************************/
 /**                     Definitions of basic types                     **/
＠＠ -231,6 +454,30 ＠＠
 #define EFFICIENT_UINT_128_BIT UINT_128_BIT
 #endif

+/* These are easily computable using `dc'.
+   (Just in case you cared, the maximum 256-bit unsigned int is
+   115792089237316195423570985008687907853269984665640564039457584007913 \
+   129639935.  You can get this with
+
+   echo '2 256^ 1-p' | dc
+   )
+*/
+
+#define INT_16_BIT_MAX 32767
+#define INT_32_BIT_MAX 2147483647
+#define INT_64_BIT_MAX 9223372036854775807
+#define INT_128_BIT_MAX 170141183460469231731687303715884105727
+
+#define UINT_16_BIT_MAX 65535
+#define UINT_32_BIT_MAX 4294967295
+#define UINT_64_BIT_MAX 18446744073709551615
+#define UINT_128_BIT_MAX 340282366920938463463374607431768211455
+
+#define INT_16_BIT_MIN -32768
+#define INT_32_BIT_MIN -2147483648
+#define INT_64_BIT_MIN -9223372036854775808
+#define INT_128_BIT_MIN -170141183460469231731687303715884105728
+
 #ifdef HAVE_INTTYPES_H
 #include <inttypes.h>
 #elif defined (HAVE_INTPTR_T_IN_SYS_TYPES_H)
＠＠ -1041,6 +1288,15 ＠＠
 MODULE_API void assert_failed (const Ascbyte *, int, const Ascbyte *);
 #define ABORT() (assert_failed (__FILE__, __LINE__, "ABORT()"))

+/* This used to be ((void) (0)) but that triggers lots of unused variable
+   warnings.  It's pointless to force all that code to be rewritten, with
+   added ifdefs.  Any reasonable compiler will eliminate an expression with
+   no effects.  We keep this abstracted out like this in case we want to
+   change it in the future. */
+#define disabled_assert(x) ((void) (x))
+#define disabled_assert_with_message(x, msg) disabled_assert (x)
+#define disabled_assert_at_line(x, file, line) disabled_assert (x)
+
 #ifdef USE_ASSERTIONS
 # define assert(x) ((x) ? (void) 0 : assert_failed (__FILE__, __LINE__, #x))
 # define assert_with_message(x, msg) \
＠＠ -1052,13 +1308,9 ＠＠
 # define assert_with_message(x, msg) assert (x)
 # define assert_at_line(x, file, line) assert (x)
 #else
-/* This used to be ((void) (0)) but that triggers lots of unused variable
-   warnings.  It's pointless to force all that code to be rewritten, with
-   added ifdefs.  Any reasonable compiler will eliminate an expression with
-   no effects. */
-# define assert(x) ((void) (x))
-# define assert_with_message(x, msg) assert (x)
-# define assert_at_line(x, file, line) assert (x)
+# define assert(x) disabled_assert (x)
+# define assert_with_message(x, msg) disabled_assert_with_message (x, msg)
+# define assert_at_line(x, file, line) disabled_assert_at_line (x, file, line)
 #endif

 /************************************************************************/
＠＠ -1206,6 +1458,7 ＠＠
 #define alloca_itexts(num) alloca_array (Itext, num)
 #define alloca_ibytes(num) alloca_array (Ibyte, num)
 #define alloca_extbytes(num) alloca_array (Extbyte, num)
+#define alloca_chbytes(num) alloca_array (Chbyte, num)
 #define alloca_rawbytes(num) alloca_array (Rawbyte, num)
 #define alloca_binbytes(num) alloca_array (Binbyte, num)
 #define alloca_ascbytes(num) alloca_array (Ascbyte, num)
＠＠ -1226,6 +1479,26 ＠＠
   memcpy (*_bsta_, _bsta_2, 1 + _bsta_3);	\
 } while (0)

+/* Make an alloca'd copy of a Extbyte * */
+#define EXTBYTE_STRING_TO_ALLOCA(p, lval)	\
+do {						\
+  Extbyte **_esta_ = (Extbyte **) &(lval);	\
+  const Extbyte *_esta_2 = (p);			\
+  Bytecount _esta_3 = strlen (_esta_2);		\
+  *_esta_ = alloca_ibytes (1 + _esta_3);	\
+  memcpy (*_esta_, _esta_2, 1 + _esta_3);	\
+} while (0)
+
+/* Make an alloca'd copy of a char * */
+#define C_STRING_TO_ALLOCA(p, lval)		\
+do {						\
+  Chbyte **_csta_ = (Chbyte **) &(lval);	\
+  const Chbyte *_csta_2 = (p);			\
+  Bytecount _csta_3 = strlen (_csta_2);		\
+  *_csta_ = alloca_ibytes (1 + _csta_3);	\
+  memcpy (*_csta_, _csta_2, 1 + _csta_3);	\
+} while (0)
+
 /* ----------------- convenience functions for reallocation --------------- */

 #define XREALLOC_ARRAY(ptr, type, len) \
＠＠ -1258,14 +1531,14 ＠＠
   type *base;			\
   int locked;			\
   int elsize;			\
-  int cur;			\
+  int len;			\
   int largest;			\
   int max
 #else
 #define Dynarr_declare(type)	\
   type *base;			\
   int elsize;			\
-  int cur;			\
+  int len;			\
   int largest;			\
   int max
 #endif /* ERROR_CHECK_STRUCTURES */
＠＠ -1277,19 +1550,102 ＠＠

 MODULE_API void *Dynarr_newf (int elsize);
 MODULE_API void Dynarr_resize (void *dy, Elemcount size);
-MODULE_API void Dynarr_insert_many (void *d, const void *el, int len, int start);
+MODULE_API void Dynarr_insert_many (void *d, const void *el, int len,
+				    int start);
 MODULE_API void Dynarr_delete_many (void *d, int start, int len);
 MODULE_API void Dynarr_free (void *d);

+#ifdef ERROR_CHECK_TYPES
+DECLARE_INLINE_HEADER (
+int
+Dynarr_verify_pos_at (void *d, int pos, const Ascbyte *file, int line)
+)
+{
+  Dynarr *dy = (Dynarr *) d;
+  /* We use `largest', not `len', because the redisplay code often
+     accesses stuff between len and largest. */
+  assert_at_line (pos >= 0 && pos < dy->largest, file, line);
+  return pos;
+}
+#else
+#define Dynarr_verify_pos(d, pos, file, line) (pos)
+#endif /* ERROR_CHECK_TYPES */
+
+#ifdef ERROR_CHECK_TYPES
+DECLARE_INLINE_HEADER (
+int
+Dynarr_verify_pos_atp (void *d, int pos, const Ascbyte *file, int line)
+)
+{
+  Dynarr *dy = (Dynarr *) d;
+  /* We use `largest', not `len', because the redisplay code often
+     accesses stuff between len and largest. */
+  /* Code will often do something like ...
+
+     val = make_bit_vector_from_byte_vector (Dynarr_atp (dyn, 0),
+	                                     Dynarr_length (dyn));
+
+     which works fine when the Dynarr_length is non-zero, but when zero,
+     the result of Dynarr_atp() not only points past the end of the
+     allocated array, but the array may not have ever been allocated and
+     hence the return value is NULL.  But the length of 0 causes the
+     pointer to never get checked.  These can occur throughout the code
+     so we put in a special check. */
+  if (pos == 0 && dy->len == 0)
+    return pos;
+  /* #### It's vaguely possible that some code could legitimately want to
+     retrieve a pointer to the position just past the end of dynarr memory.
+     This could happen with Dynarr_atp() but not Dynarr_at().  If so, it
+     will trigger this assert().  In such cases, it should be obvious that
+     the code wants to do this; rather than relaxing the assert, we should
+     probably create a new macro Dynarr_atp_allow_end() which is like
+     Dynarr_atp() but which allows for pointing at invalid addresses -- we
+     really want to check for cases of accessing just past the end of
+     memory, which is a likely off-by-one problem to occur and will usually
+     not trigger a protection fault (instead, you'll just get random
+     behavior, possibly overwriting other memory, which is bad). */
+  assert_at_line (pos >= 0 && pos < dy->largest, file, line);
+  return pos;
+}
+
+DECLARE_INLINE_HEADER (
+int
+Dynarr_verify_pos_atp_allow_end (void *d, int pos, const Ascbyte *file,
+				 int line)
+)
+{
+  Dynarr *dy = (Dynarr *) d;
+  /* We use `largest', not `len', because the redisplay code often
+     accesses stuff between len and largest.
+     We also allow referencing the very end, past the end of allocated
+     legitimately space.  See comments in Dynarr_verify_pos_atp.()*/
+  assert_at_line (pos >= 0 && pos <= dy->largest, file, line);
+  return pos;
+}
+
+#else
+#define Dynarr_verify_pos_at(d, pos, file, line) (pos)
+#define Dynarr_verify_pos_atp(d, pos, file, line) (pos)
+#define Dynarr_verify_pos_atp_allow_end(d, pos, file, line) (pos)
+#endif /* ERROR_CHECK_TYPES */
+
 #define Dynarr_new(type) ((type##_dynarr *) Dynarr_newf (sizeof (type)))
 #define Dynarr_new2(dynarr_type, type) \
   ((dynarr_type *) Dynarr_newf (sizeof (type)))
-#define Dynarr_at(d, pos) ((d)->base[pos])
-#define Dynarr_atp(d, pos) (&Dynarr_at (d, pos))
-#define Dynarr_begin(d) Dynarr_atp (d, 0)
-#define Dynarr_end(d) Dynarr_atp (d, Dynarr_length (d) - 1)
-#define Dynarr_sizeof(d) ((d)->cur * (d)->elsize)

+#define Dynarr_at(d, pos) \
+  ((d)->base[Dynarr_verify_pos_at (d, pos, __FILE__, __LINE__)])
+#define Dynarr_atp_allow_end(d, pos) \
+  (&((d)->base[Dynarr_verify_pos_atp_allow_end (d, pos, __FILE__, __LINE__)]))
+#define Dynarr_atp(d, pos) \
+  (&((d)->base[Dynarr_verify_pos_atp (d, pos, __FILE__, __LINE__)]))
+
+/* Old #define Dynarr_atp(d, pos) (&Dynarr_at (d, pos)) */
+#define Dynarr_firstp(d) Dynarr_atp (d, 0)
+#define Dynarr_lastp(d) Dynarr_atp (d, Dynarr_length (d) - 1)
+#define Dynarr_past_lastp(d) Dynarr_atp_allow_end (d, Dynarr_length (d))
+#define Dynarr_sizeof(d) ((d)->len * (d)->elsize)
+
 #ifdef ERROR_CHECK_STRUCTURES
 DECLARE_INLINE_HEADER (
 Dynarr *
＠＠ -1297,7 +1653,7 ＠＠
 )
 {
   Dynarr *dy = (Dynarr *) d;
-  assert_at_line (dy->cur >= 0 && dy->cur <= dy->largest
&&
+  assert_at_line (dy->len >= 0 && dy->len <= dy->largest
&&
 		  dy->largest <= dy->max, file, line);
   return dy;
 }
＠＠ -1309,7 +1665,7 ＠＠
 {
   Dynarr *dy = (Dynarr *) d;
   assert_at_line (!dy->locked, file, line);
-  assert_at_line (dy->cur >= 0 && dy->cur <= dy->largest
&&
+  assert_at_line (dy->len >= 0 && dy->len <= dy->largest
&&
 		  dy->largest <= dy->max, file, line);
   return dy;
 }
＠＠ -1325,10 +1681,9 ＠＠
 #define Dynarr_unlock(d)
 #endif /* ERROR_CHECK_STRUCTURES */

-#define Dynarr_length(d) (Dynarr_verify (d)->cur)
+#define Dynarr_length(d) (Dynarr_verify (d)->len)
 #define Dynarr_largest(d) (Dynarr_verify (d)->largest)
-#define Dynarr_reset(d) (Dynarr_verify_mod (d)->cur = 0)
-#define Dynarr_add_many(d, el, len) Dynarr_insert_many (d, el, len, (d)->cur)
+#define Dynarr_reset(d) (Dynarr_verify_mod (d)->len = 0)
 #define Dynarr_insert_many_at_start(d, el, len)	\
   Dynarr_insert_many (d, el, len, 0)
 #define Dynarr_add_literal_string(d, s) Dynarr_add_many (d, s, sizeof (s) - 1)
＠＠ -1345,19 +1700,48 ＠＠
 } while (0)

 #define Dynarr_add(d, el) (						     \
-  Dynarr_verify_mod (d)->cur >= (d)->max ? Dynarr_resize ((d), (d)->cur+1):
\
+  Dynarr_verify_mod (d)->len >= (d)->max ? Dynarr_resize ((d), (d)->len+1):
\
       (void) 0,								     \
-  ((d)->base)[(d)->cur++] = (el),					     \
-  (d)->cur > (d)->largest ? (d)->largest = (d)->cur : (int) 0)
+  ((d)->base)[(d)->len++] = (el),					     \
+  (d)->len > (d)->largest ? (d)->largest = (d)->len : (int) 0)
+
+/* Add LEN contiguous elements to a Dynarr */
+
+DECLARE_INLINE_HEADER (
+void
+Dynarr_add_many (void *d, const void *el, int len)
+)
+{
+  /* This duplicates Dynarr_insert_many to some extent; but since it is
+     called so often, it seemed useful to remove the unnecessary stuff
+     from that function and to make it inline */
+  Dynarr *dy = (Dynarr *) Dynarr_verify (d);
+
+  if (dy->len + len > dy->max)
+    Dynarr_resize (dy, dy->len + len);
+  /* Some functions call us with a value of 0 to mean "reserve space but
+     don't write into it" */
+  if (el)
+    memcpy ((char *) dy->base + dy->len*dy->elsize, el, len*dy->elsize);
+  dy->len += len;

+  if (dy->len > dy->largest)
+    dy->largest = dy->len;
+}
+
 /* The following defines will get you into real trouble if you aren't
    careful.  But they can save a lot of execution time when used wisely. */
-#define Dynarr_increment(d) (Dynarr_verify_mod (d)->cur++)
-#define Dynarr_set_size(d, n) (Dynarr_verify_mod (d)->cur = n)
+#define Dynarr_increment(d) (Dynarr_verify_mod (d)->len++)
+#define Dynarr_set_size(d, n)						\
+do {									\
+  Bytecount _dss_n = (n);						\
+  structure_checking_assert (_dss_n >= 0 && _dss_n <= (d)->largest);	\
+  Dynarr_verify_mod (d)->len = _dss_n;					\
+} while (0)

 #define Dynarr_pop(d)					\
-  (assert ((d)->cur > 0), Dynarr_verify_mod (d)->cur--,	\
-   Dynarr_at (d, (d)->cur))
+  (assert ((d)->len > 0), Dynarr_verify_mod (d)->len--,	\
+   Dynarr_at (d, (d)->len))
 #define Dynarr_delete(d, i) Dynarr_delete_many (d, i, 1)
 #define Dynarr_delete_by_pointer(d, p) \
   Dynarr_delete_many (d, (p) - ((d)->base), 1)
＠＠ -1378,6 +1762,102 ＠＠
 Bytecount Dynarr_memory_usage (void *d, struct overhead_stats *stats);
 #endif

+/* --------------------------- static dynarrs ------------------------- */
+
+/* A static Dynarr is, besides being an oxymoron, a combination of a
+   small static array with a Dynarr.  Typical size of the small array is
+   4 or 6.  This is used when you rarely expect your array to grow beyond
+   a certain size, but you would like to allow for this.  Stretchy arrays
+   are sometimes used for this, but they require that your whole data object
+   be resized, and handling them with pdump is difficult.  Typically a static
+   Dynarr is declared as one field of a struct, or it can be a local array.
+
+   #### It might be simpler to use *either* the static array *or* the
+   Dynarr, but not both at the same time, as we do currently.  We'd have
+   to either modify the pdump handling to involve a union, or zero out
+   the elements in the static array when we switch to the Dynarr. */
+
+/* Declare a static Dynarr variable declaration NAME, containing elements of
+   type TYPE, with NUM_STATIC static elements.  If you never use more than
+   these, no allocation will occur.  Before using, initialize with
+   Stynarr_init(d). */
+#define Stynarr_declare(name, type, num_static)	\
+struct						\
+{						\
+  type##_dynarr *els;				\
+  int nels;					\
+  type els_static[num_static];			\
+} name
+
+typedef struct
+{
+  void *els;
+  int nels;
+} Stynarr;
+
+#ifdef ERROR_CHECK_TYPES
+DECLARE_INLINE_HEADER (
+int
+Stynarr_verify_pos (void *st, int pos, const Ascbyte *file, int line)
+)
+{
+  Stynarr *sty = (Stynarr *) st;
+  /* #### See comment above in Dynarr_verify_pos() about accessing just
+     past end of the real used memory block using Stynarr_atp(). */
+  assert_at_line (pos >= 0 && pos < sty->nels, file, line);
+  return pos;
+}
+#else
+#define Stynarr_verify_pos(st, pos, file, line) (pos)
+#endif /* ERROR_CHECK_TYPES */
+
+#define Stynarr_init(d) (xzero (d))
+#define Stynarr_reset(d) ((d).nels = 0)
+#define Stynarr_free(d)				\
+do {						\
+  if ((d).els)					\
+    {						\
+      Dynarr_free ((d).els);			\
+      (d).els = 0;				\
+    }						\
+  (d).nels = 0;					\
+} while (0)
+#define Stynarr_num_static(d) countof ((d).els_static)
+#define Stynarr_elsize(d) sizeof ((d).els_static[0])
+/* WARNING! The following two macros evaluate POS multiply.
+   We write them this way so that Stynarr_at() is an lvalue. */
+#define Stynarr_atp(d, pos)						\
+  (Stynarr_verify_pos (&d, pos, __FILE__, __LINE__) < Stynarr_num_static (d) \
+   ? &((d).els_static[pos]) :						\
+   Dynarr_atp ((d).els, pos - Stynarr_num_static (d)))
+#define Stynarr_at(d, pos) (*(Stynarr_atp (d, pos)))
+#define Stynarr_add(d, el)						\
+do {									\
+  if ((d).nels < Stynarr_num_static (d))				\
+    (d).els_static[(d).nels++] = (el);					\
+  else									\
+    {									\
+      if (!(d).els)							\
+	VOIDP_CAST ((d).els) = Dynarr_newf (Stynarr_elsize (d));	\
+      Dynarr_add ((d).els, el);						\
+      (d).nels++;							\
+    }									\
+} while (0)
+
+#define Stynarr_length(d) ((d).nels)
+
+MODULE_API void Stynarr_insert_many_1 (void *d, const void *els, int len,
+				       int start, int num_static,
+				       int elsize, int staticoff);
+
+#define Stynarr_insert_many(d, els, len, start)		\
+  Stynarr_insert_many_1 (&d, els, len, start,		\
+			 Stynarr_num_static (d),	\
+			 Stynarr_elsize (d),		\
+			 offsetof (d, (d).els_static))
+
+/* ---------------------- stack-like malloc ----------------------- */
+
 void *stack_like_malloc (Bytecount size);
 void stack_like_free (void *val);

＠＠ -1678,6 +2158,16 ＠＠
   Dynarr_declare (Lisp_Object *);
 } Lisp_Object_ptr_dynarr;

+typedef struct
+{
+  Lisp_Object key, value;
+} Lisp_Object_pair;
+
+typedef struct
+{
+  Dynarr_declare (Lisp_Object_pair);
+} Lisp_Object_pair_dynarr;
+
 /* Close your eyes now lest you vomit or spontaneously combust ... */

 #define HACKEQ_UNSAFE(obj1, obj2)				\
＠＠ -3189,9 +3679,25 ＠＠

 #define LISP_HASH(obj) ((unsigned long) LISP_TO_VOID (obj))
 Hashcode memory_hash (const void *xv, Bytecount size);
-Hashcode internal_hash (Lisp_Object obj, int depth);
+Hashcode internal_hash_1 (Lisp_Object obj, int depth);
 Hashcode internal_array_hash (Lisp_Object *arr, int size, int depth);

+DECLARE_INLINE_HEADER (
+Hashcode
+internal_hash (Lisp_Object obj, int depth)
+)
+{
+  /* This catches all non-lrecords (integers, chars) and all lrecords
+     with no hash function.  The latter macro requires three memory reads,
+     but I assume that for extremely common objects such as Qunbound, Qnil
+     and Qt, the appropriate memory will be in the L1 cache and such access
+     will be quite fast.  Putting various conditional checks here would
+     probably slow it down due to the branch-prediction problems. */
+  if (!LRECORDP (obj) || !XRECORD_LHEADER_IMPLEMENTATION (obj)->hash)
+    return LISP_HASH (obj);
+  return internal_hash_1 (obj, depth);
+}
+

 /************************************************************************/
 /*			 String translation				*/
＠＠ -3885,6 +4391,17 ＠＠
      PRINTF_ARGS (2, 3);

+/* Defined in dynarr.c */
+extern const struct sized_memory_description int_description;
+extern const struct sized_memory_description int_dynarr_description;
+extern const struct sized_memory_description unsigned_char_description;
+extern const struct sized_memory_description unsigned_char_dynarr_description;
+extern const struct sized_memory_description Lisp_Object_description;
+extern const struct sized_memory_description Lisp_Object_dynarr_description;
+extern const struct sized_memory_description Lisp_Object_pair_description;
+extern const struct sized_memory_description Lisp_Object_pair_dynarr_description;
+void mark_Lisp_Object_dynarr (Lisp_Object_dynarr *dyn);
+
 /* Defined in editfns.c */
 EXFUN (Fbobp, 1);
 EXFUN (Fbolp, 1);
＠＠ -3947,7 +4464,7 ＠＠
 extern Fixnum emacs_priority;
 extern int suppress_early_error_handler_backtrace;
 void debug_break (void);
-int debug_can_access_memory (void *ptr, Bytecount len);
+int debug_can_access_memory (const void *ptr, Bytecount len);
 DECLARE_DOESNT_RETURN (really_abort (void));
 void zero_out_command_line_status_vars (void);

＠＠ -4086,9 +4603,6 ＠＠
 						 Lisp_Object frob));
 DECLARE_DOESNT_RETURN (stack_overflow (const CIbyte *reason,
 				       Lisp_Object frob));
-MODULE_API DECLARE_DOESNT_RETURN (printing_unreadable_object (const CIbyte *,
-							      ...))
-       PRINTF_ARGS (1, 2);

 Lisp_Object signal_void_function_error (Lisp_Object);
 Lisp_Object signal_invalid_function_error (Lisp_Object);
＠＠ -4265,6 +4779,7 ＠＠
 				...) PRINTF_ARGS (3, 4);
 extern int backtrace_with_internal_sections;

+extern Lisp_Object Vdebug_on_error;
 extern Lisp_Object Vstack_trace_on_error;

 /* Defined in event-stream.c */
＠＠ -4314,13 +4829,11 ＠＠
 EXFUN (Fcoding_system_property, 2);
 EXFUN (Fcoding_system_type, 1);
 EXFUN (Fcopy_coding_system, 2);
-EXFUN (Fdecode_big5_char, 1);
 EXFUN (Fdecode_coding_region, 4);
 EXFUN (Fdecode_shift_jis_char, 1);
 EXFUN (Fdefine_coding_system_alias, 2);
 EXFUN (Fdetect_coding_region, 3);
 EXFUN (Fdefault_encoding_detection_enabled_p, 0);
-EXFUN (Fencode_big5_char, 1);
 EXFUN (Fencode_coding_region, 4);
 EXFUN (Fencode_shift_jis_char, 1);
 EXFUN (Ffind_coding_system, 1);
＠＠ -4692,6 +5205,11 ＠＠
 						 Lisp_Object, Lisp_Object);
 void float_to_string (char *, double);
 void internal_object_printer (Lisp_Object, Lisp_Object, int);
+MODULE_API DECLARE_DOESNT_RETURN (printing_unreadable_object (const CIbyte *,
+							      ...))
+       PRINTF_ARGS (1, 2);
+DECLARE_DOESNT_RETURN (printing_unreadable_lcrecord (Lisp_Object obj,
+						     const Ibyte *name));

 /* Defined in rangetab.c */
 EXFUN (Fclear_range_table, 1);
＠＠ -4812,12 +5330,12 ＠＠
 void seed_random (long arg);

 /* Defined in text.c */
-void find_charsets_in_ibyte_string (unsigned char *charsets,
-				      const Ibyte *str,
-				      Bytecount len);
-void find_charsets_in_ichar_string (unsigned char *charsets,
-				     const Ichar *str,
-				     Charcount len);
+void find_charsets_in_ibyte_string (Lisp_Object_dynarr *charsets,
+				    const Ibyte *USED_IF_MULE (str),
+				    Bytecount USED_IF_MULE (len));
+void find_charsets_in_ichar_string (Lisp_Object_dynarr *charsets,
+				    const Ichar *USED_IF_MULE (str),
+				    Charcount USED_IF_MULE (len));
 int ibyte_string_displayed_columns (const Ibyte *str, Bytecount len);
 int ichar_string_displayed_columns (const Ichar *str, Charcount len);
 Charcount ibyte_string_nonascii_chars (const Ibyte *str, Bytecount len);
＠＠ -4911,9 +5429,11 ＠＠
 Bytexpos buffer_or_string_clip_to_absolute_byte (Lisp_Object object,
 						 Bytexpos pos);

+Lisp_Object get_charset_octets (Lisp_Object charset, Lisp_Object arg1,
+				Lisp_Object arg2, int *a1, int *a2);
+enum converr decode_handle_error (Lisp_Object err);

 #ifdef ENABLE_COMPOSITE_CHARS
-
 Ichar lookup_composite_char (Ibyte *str, int len);
 Lisp_Object composite_char_string (Ichar ch);
 #endif /* ENABLE_COMPOSITE_CHARS */
＠＠ -4922,6 +5442,7 ＠＠
 EXFUN (Fget_charset, 1);
 EXFUN (Fcharset_list, 0);

+#ifdef MULE
 extern Lisp_Object Vcharset_ascii;
 extern Lisp_Object Vcharset_control_1;
 extern Lisp_Object Vcharset_latin_iso8859_1;
＠＠ -4936,6 +5457,8 ＠＠
 extern Lisp_Object Vcharset_latin_jisx0201;
 extern Lisp_Object Vcharset_cyrillic_iso8859_5;
 extern Lisp_Object Vcharset_latin_iso8859_9;
+extern Lisp_Object Vcharset_latin_iso8859_15;
+extern Lisp_Object Vcharset_chinese_sisheng;
 extern Lisp_Object Vcharset_japanese_jisx0208_1978;
 extern Lisp_Object Vcharset_chinese_gb2312;
 extern Lisp_Object Vcharset_japanese_jisx0208;
＠＠ -4943,9 +5466,15 ＠＠
 extern Lisp_Object Vcharset_japanese_jisx0212;
 extern Lisp_Object Vcharset_chinese_cns11643_1;
 extern Lisp_Object Vcharset_chinese_cns11643_2;
+#ifdef UNICODE_INTERNAL
+extern Lisp_Object Vcharset_chinese_big5;
+extern Lisp_Object Vcharset_japanese_shift_jis;
+#else
 extern Lisp_Object Vcharset_chinese_big5_1;
 extern Lisp_Object Vcharset_chinese_big5_2;
+#endif /* UNICODE_INTERNAL */
 extern Lisp_Object Vcharset_composite;
+#endif /* MULE */

 Ichar Lstream_get_ichar_1 (Lstream *stream, int first_char);
 int Lstream_fput_ichar (Lstream *stream, Ichar ch);
＠＠ -5191,12 +5720,20 ＠＠
 MODULE_API int find_pos_of_existing_active_alloca_convert (const char *
 							   srctext);

+#ifdef UNICODE_INTERNAL
+extern int firstbyte_mask[];
+extern unsigned int utf8_offsets_by_rep_bytes[];
+#endif /* UNICODE_INTERNAL */
+
 /* Defined in unicode.c */
 extern const struct sized_memory_description to_unicode_description;
 extern const struct sized_memory_description from_unicode_description;
 void init_charset_unicode_tables (Lisp_Object charset);
 void free_charset_unicode_tables (Lisp_Object charset);
+Lisp_Object_dynarr *get_unicode_precedence (void);
 void recalculate_unicode_precedence (void);
+Lisp_Object_dynarr *
+convert_charset_list_to_precedence_dynarr (Lisp_Object charsets);
 extern Lisp_Object Qunicode;
 extern Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7;
 #ifdef MEMORY_USAGE_STATS
＠＠ -5205,6 +5742,9 ＠＠
 Bytecount compute_to_unicode_table_size (Lisp_Object charset,
 					    struct overhead_stats *stats);
 #endif /* MEMORY_USAGE_STATS */
+void initialize_ascii_control_1_latin_1_unicode_translation (void);
+int decode_unicode (Lisp_Object unicode);
+void free_precedence_dynarr (Lisp_Object_dynarr *dynarr);

 /* Defined in undo.c */
 EXFUN (Fundo_boundary, 0);
Index: src/lread.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lread.c,v
retrieving revision 1.76
diff -u -r1.76 lread.c
--- src/lread.c	2005/07/12 23:26:49	1.76
+++ src/lread.c	2005/11/22 14:00:51
＠＠ -1,7 +1,7 ＠＠
 /* Lisp parsing and input streams.
    Copyright (C) 1985-1989, 1992-1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Tinker Systems.
-   Copyright (C) 1996, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1996, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -1683,8 +1683,8 ＠＠
     {
     case 'a': return '\007';
     case 'b': return '\b';
-    case 'd': return 0177;
-    case 'e': return 033;
+    case 'd': return 0x7F;
+    case 'e': return 0x1B;
     case 'f': return '\f';
     case 'n': return '\n';
     case 'r': return '\r';
＠＠ -1703,7 +1703,7 ＠＠
 	signal_error (Qend_of_file, 0, READCHARFUN_MAYBE (readcharfun));
       if (c == '\\')
 	c = read_escape (readcharfun);
-      return c | 0200;
+      return c | 0x80;

       /* Originally, FSF_KEYS provided a degree of FSF Emacs
 	 compatibility by defining character "modifiers" alt, super,
＠＠ -1736,9 +1736,9 ＠＠
       /* FSFmacs junk for non-ASCII controls.
 	 Not used here. */
       if (c == '?')
-	return 0177;
+	return 0x7F;
       else
-        return c & (0200 | 037);
+        return c & (0x80 | 0x1F);

     case '0':
     case '1':
＠＠ -1762,21 +1762,22 ＠＠
 		break;
 	      }
 	  }
-	if (i >= 0400)
+	if (i >= 256)
 	  syntax_error ("Attempt to create non-ASCII/ISO-8859-1 character",
 			make_int (i));
 	return i;
       }

     case 'x':
-      /* A hex escape, as in ANSI C, except that we only allow latin-1
+      /* [[ A hex escape, as in ANSI C, except that we only allow latin-1
 	 characters to be read this way.  What is "\x4e03" supposed to
 	 mean, anyways, if the internal representation is hidden?
-         This is also consistent with the treatment of octal escapes. */
+         This is also consistent with the treatment of octal escapes.]]
+
+         If someone really wants to, let them. --ben */
       {
 	REGISTER Ichar i = 0;
-	REGISTER int count = 0;
-	while (++count <= 2)
+	while (1)
 	  {
 	    c = readchar (readcharfun);
 	    /* Remember, can't use isdigit(), isalpha() etc. on Ichars */
＠＠ -1789,12 +1790,51 ＠＠
 		break;
 	      }
 	  }
+	if (!valid_ichar_p (i))
+	  {
+	    syntax_error ("Attempt to read invalid hex character",
+			  emacs_sprintf_string ("#x%X", i));
+	  }
+
 	return i;
       }

 #ifdef MULE
-      /* #### need some way of reading an extended character with
-	 an escape sequence. */
+    case 'U':
+    case 'u':
+      /* A four-digit Unicode character. */
+      {
+	REGISTER EMACS_INT i = 0;
+	REGISTER int count;
+	int capu = c == 'U';
+	for (count = 0; count < (capu ? 8 : 4); count++)
+	  {
+	    c = readchar (readcharfun);
+	    /* Remember, can't use isdigit(), isalpha() etc. on Ichars */
+	    if      (c >= '0' && c <= '9')  i = (i << 4) +
(c - '0');
+	    else if (c >= 'a' && c <= 'f')  i = (i << 4) +
(c - 'a') + 10;
+            else if (c >= 'A' && c <= 'F')  i = (i <<
4) + (c - 'A') + 10;
+	    else
+	      {
+		syntax_error (capu ? "Invalid character in \\U... spec" :
+			      "Invalid character in \\u... spec",
+			      make_char (c));
+	      }
+	  }
+	{
+	  Ichar ch;
+
+	  if (!valid_unicode_codepoint_p (i))
+	    syntax_error ("Invalid Unicode codepoint",
+			  emacs_sprintf_string ("#x%lX", i));
+
+	  ch = unicode_to_ichar (i, get_unicode_precedence (), CONVERR_FAIL);
+	  if (ch < 0)
+	    syntax_error ("Unicode character can't be converted to a charset",
+			  emacs_sprintf_string ("#x%lX", i));
+	  return ch;
+	}
+      }
 #endif

     default:
＠＠ -1814,7 +1854,7 ＠＠

   *saw_a_backslash = 0;

-  while (c > 040	/* #### - comma should be here as should backquote */
+  while (c > ' '	/* #### - comma should be here as should backquote */
          && !(c == '\"' || c == '\'' || c ==
';'
               || c == '(' || c == ')'
               || c == '[' || c == ']' || c == '#'
＠＠ -2062,8 +2102,7 ＠＠
   st.instantiate = instantiate;
   Dynarr_add (the_structure_type_dynarr, st);

-  return Dynarr_atp (the_structure_type_dynarr,
-		     Dynarr_length (the_structure_type_dynarr) - 1);
+  return Dynarr_lastp (the_structure_type_dynarr);
 }

 void
＠＠ -2203,7 +2242,7 ＠＠
     default:
       {
 	/* Ignore whitespace and control characters */
-	if (c <= 040)
+	if (c <= ' ')
 	  goto retry;
 	return c;
       }
＠＠ -2656,7 +2695,7 ＠＠
     default:
       {
 	/* Ignore whitespace and control characters */
-	if (c <= 040)
+	if (c <= ' ')
 	  goto retry;
 	return read_atom (readcharfun, c, 0);
       }
Index: src/lrecord.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lrecord.h,v
retrieving revision 1.42
diff -u -r1.42 lrecord.h
--- src/lrecord.h	2005/10/26 13:01:42	1.42
+++ src/lrecord.h	2005/11/22 14:00:52
＠＠ -120,9 +120,9 ＠＠
      debugging. */
   unsigned int free :1;

-  /* The `uid' field is just for debugging/printing convenience.
-     Having this slot doesn't hurt us much spacewise, since the
-     bits are unused anyway. */
+  /* The `uid' field is just for debugging/printing convenience.  Having
+     this slot doesn't hurt us spacewise, since the bits are unused
+     anyway. */
   unsigned int uid :22;

 #else /* not MC_ALLOC */
＠＠ -220,9 +220,6 ＠＠
   lrecord_type_cons,                      /*  6 */
   lrecord_type_vector,                    /*  7 */
   lrecord_type_string,                    /*  8 */
-#ifndef MC_ALLOC
-  lrecord_type_lcrecord_list,
-#endif /* not MC_ALLOC */
   lrecord_type_compiled_function,         /*  9 */
   lrecord_type_weak_list,                 /* 10 */
   lrecord_type_bit_vector,                /* 11 */
＠＠ -233,16 +230,54 ＠＠
   lrecord_type_charset,                   /* 16 */
   lrecord_type_coding_system,             /* 17 */
   lrecord_type_char_table,                /* 18 */
-  lrecord_type_char_table_entry,          /* 19 */
-  lrecord_type_range_table,               /* 20 */
-  lrecord_type_opaque,                    /* 21 */
-  lrecord_type_opaque_ptr,                /* 22 */
-  lrecord_type_buffer,                    /* 23 */
-  lrecord_type_extent,                    /* 24 */
-  lrecord_type_extent_info,               /* 25 */
-  lrecord_type_extent_auxiliary,          /* 26 */
-  lrecord_type_marker,                    /* 27 */
-  lrecord_type_event,                     /* 28 */
+#ifndef USE_PLAIN_ARRAYS_FOR_SUB_TABLES
+  lrecord_type_char_subtable,
+#endif
+  lrecord_type_range_table,               /* 19 */
+  lrecord_type_opaque,                    /* 20 */
+  lrecord_type_opaque_ptr,                /* 21 */
+  lrecord_type_buffer,                    /* 22 */
+  lrecord_type_extent,                    /* 23 */
+  lrecord_type_extent_info,               /* 24 */
+  lrecord_type_extent_auxiliary,          /* 25 */
+  lrecord_type_marker,                    /* 26 */
+  lrecord_type_event,                     /* 27 */
+  lrecord_type_keymap,                    /* 28 */
+  lrecord_type_command_builder,           /* 29 */
+  lrecord_type_timeout,                   /* 30 */
+  lrecord_type_specifier,                 /* 31 */
+  lrecord_type_console,                   /* 32 */
+  lrecord_type_device,                    /* 33 */
+  lrecord_type_frame,                     /* 34 */
+  lrecord_type_window,                    /* 35 */
+  lrecord_type_window_mirror,             /* 36 */
+  lrecord_type_window_configuration,      /* 37 */
+  lrecord_type_gui_item,                  /* 38 */
+  lrecord_type_popup_data,                /* 39 */
+  lrecord_type_toolbar_button,            /* 40 */
+  lrecord_type_scrollbar_instance,        /* 41 */
+  lrecord_type_color_instance,            /* 42 */
+  lrecord_type_font_instance,             /* 43 */
+  lrecord_type_image_instance,            /* 44 */
+  lrecord_type_glyph,                     /* 45 */
+  lrecord_type_face,                      /* 46 */
+  lrecord_type_database,                  /* 47 */
+  lrecord_type_tooltalk_message,          /* 48 */
+  lrecord_type_tooltalk_pattern,          /* 49 */
+  lrecord_type_ldap,                      /* 50 */
+  lrecord_type_pgconn,                    /* 51 */
+  lrecord_type_pgresult,                  /* 52 */
+  lrecord_type_devmode,                   /* 53 */
+  lrecord_type_mswindows_dialog_id,       /* 54 */
+  lrecord_type_case_table,                /* 55 */
+  lrecord_type_emacs_ffi,                 /* 56 */
+  lrecord_type_emacs_gtk_object,          /* 57 */
+  lrecord_type_emacs_gtk_boxed,           /* 58 */
+  lrecord_type_weak_box,                  /* 59 */
+  lrecord_type_ephemeron,                 /* 60 */
+  lrecord_type_bignum,                    /* 61 */
+  lrecord_type_ratio,                     /* 62 */
+  lrecord_type_bigfloat,                  /* 63 */
 #ifdef EVENT_DATA_AS_OBJECTS /* not defined */
   lrecord_type_key_data,
   lrecord_type_button_data,
＠＠ -254,47 +289,12 ＠＠
   lrecord_type_magic_eval_data,
   lrecord_type_magic_data,
 #endif /* EVENT_DATA_AS_OBJECTS */
-  lrecord_type_keymap,                    /* 29 */
-  lrecord_type_command_builder,           /* 30 */
-  lrecord_type_timeout,                   /* 31 */
-  lrecord_type_specifier,                 /* 32 */
-  lrecord_type_console,                   /* 33 */
-  lrecord_type_device,                    /* 34 */
-  lrecord_type_frame,                     /* 35 */
-  lrecord_type_window,                    /* 36 */
-  lrecord_type_window_mirror,             /* 37 */
-  lrecord_type_window_configuration,      /* 38 */
-  lrecord_type_gui_item,                  /* 39 */
-  lrecord_type_popup_data,                /* 40 */
-  lrecord_type_toolbar_button,            /* 41 */
-  lrecord_type_scrollbar_instance,        /* 42 */
-  lrecord_type_color_instance,            /* 43 */
-  lrecord_type_font_instance,             /* 44 */
-  lrecord_type_image_instance,            /* 45 */
-  lrecord_type_glyph,                     /* 46 */
-  lrecord_type_face,                      /* 47 */
-  lrecord_type_database,                  /* 48 */
-  lrecord_type_tooltalk_message,          /* 49 */
-  lrecord_type_tooltalk_pattern,          /* 50 */
-  lrecord_type_ldap,                      /* 51 */
-  lrecord_type_pgconn,                    /* 52 */
-  lrecord_type_pgresult,                  /* 53 */
-  lrecord_type_devmode,                   /* 54 */
-  lrecord_type_mswindows_dialog_id,       /* 55 */
-  lrecord_type_case_table,                /* 56 */
-  lrecord_type_emacs_ffi,                 /* 57 */
-  lrecord_type_emacs_gtk_object,          /* 58 */
-  lrecord_type_emacs_gtk_boxed,           /* 59 */
-  lrecord_type_weak_box,                  /* 60 */
-  lrecord_type_ephemeron,                 /* 61 */
-  lrecord_type_bignum,                    /* 62 */
-  lrecord_type_ratio,                     /* 63 */
-  lrecord_type_bigfloat,                  /* 64 */
 #ifndef MC_ALLOC
+  lrecord_type_lcrecord_list,
   lrecord_type_free, /* only used for "free" lrecords */
   lrecord_type_undefined, /* only used for debugging */
 #endif /* not MC_ALLOC */
-  lrecord_type_last_built_in_type         /* 65 */ /* must be last */
+  lrecord_type_last_built_in_type         /* 64 */ /* must be last */
 };

 extern MODULE_API int lrecord_type_count;
＠＠ -655,20 +655,20 ＠＠
      ...
      { XD_INT,		offsetof (Lisp_Foo, count) },
      { XD_BLOCK_PTR,	offsetof (Lisp_Foo, objects),
-       XD_INDIRECT (0, 0), { &lisp_object_description } },
+       XD_INDIRECT (0, 0), { &Lisp_Object_description } },
      ...
    };

-   lisp_object_description is declared in alloc.c, like this:
+   Lisp_Object_description is declared in dynarr.c, like this:

-   static const struct memory_description lisp_object_description_1[] = {
+   static const struct memory_description Lisp_Object_description_1[] = {
      { XD_LISP_OBJECT, 0 },
      { XD_END }
    };

-   const struct sized_memory_description lisp_object_description = {
+   const struct sized_memory_description Lisp_Object_description = {
      sizeof (Lisp_Object),
-     lisp_object_description_1
+     Lisp_Object_description_1
    };

    Another example of XD_BLOCK_PTR:
＠＠ -1069,8 +1069,6 ＠＠

 };

-extern const struct sized_memory_description lisp_object_description;
-
 #define XD_INDIRECT(val, delta) (-1 - (Bytecount) ((val) | ((delta) << 8)))

 #define XD_IS_INDIRECT(code) ((code) < 0)
＠＠ -1079,7 +1077,7 ＠＠

 #define XD_DYNARR_DESC(base_type, sub_desc)				      \
   { XD_BLOCK_PTR, offsetof (base_type, base), XD_INDIRECT(1, 0), {sub_desc} },\
-  { XD_INT,        offsetof (base_type, cur) },				      \
+  { XD_INT,        offsetof (base_type, len) },				      \
   { XD_INT_RESET,  offsetof (base_type, max), XD_INDIRECT(1, 0) }	      \

 /* DEFINE_LRECORD_IMPLEMENTATION is for objects with constant size.
Index: src/md5.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/md5.c,v
retrieving revision 1.15
diff -u -r1.15 md5.c
--- src/md5.c	2002/06/05 09:56:26	1.15
+++ src/md5.c	2005/11/22 14:00:52
＠＠ -23,14 +23,12 ＠＠

 /* XEmacs frontend written by Ben Wing, Jareth Hein and Hrvoje Niksic.  */

-#ifdef HAVE_CONFIG_H
-# include <config.h>
-#endif
+#include <config.h>
+#include "lisp.h"

-#include <sys/types.h>
-#include <string.h>
-#include <stdio.h>
-#include <limits.h>
+#include "buffer.h"
+#include "lstream.h"
+#include "file-coding.h"

 /* The following contortions are an attempt to use the C preprocessor
    to determine an unsigned integral type that is 32 bits wide.  An
＠＠ -74,11 +72,6 ＠＠
 #  endif
 # endif
 #endif
-
-#include "lisp.h"
-#include "buffer.h"
-#include "lstream.h"
-# include "file-coding.h"

 /* Structure to save state of computation between the single steps.  */
 struct md5_ctx
Index: src/menubar-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/menubar-msw.c,v
retrieving revision 1.44
diff -u -r1.44 menubar-msw.c
--- src/menubar-msw.c	2005/01/24 23:34:03	1.44
+++ src/menubar-msw.c	2005/11/22 14:00:52
＠＠ -83,6 +83,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "commands.h"
 #include "console-msw-impl.h"
 #include "elhash.h"
Index: src/menubar.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/menubar.c,v
retrieving revision 1.30
diff -u -r1.30 menubar.c
--- src/menubar.c	2005/10/25 11:16:26	1.30
+++ src/menubar.c	2005/11/22 14:00:52
＠＠ -33,7 +33,7 ＠＠
 #include <config.h>
 #include "lisp.h"

-#include "buffer.h"
+#include "casetab.h"
 #include "device-impl.h"
 #include "frame-impl.h"
 #include "gui.h"
＠＠ -130,7 +130,7 ＠＠
 }

 Lisp_Object
-current_frame_menubar (const struct frame* f)
+current_frame_menubar (const struct frame *f)
 {
   struct window *w = XWINDOW (FRAME_LAST_NONMINIBUF_WINDOW (f));
   return symbol_value_in_buffer (Qcurrent_menubar, w->buffer);
Index: src/minibuf.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/minibuf.c,v
retrieving revision 1.21
diff -u -r1.21 minibuf.c
--- src/minibuf.c	2004/11/04 23:06:42	1.21
+++ src/minibuf.c	2005/11/22 14:00:53
＠＠ -30,6 +30,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "commands.h"
 #include "console-stream.h"
 #include "events.h"
Index: src/mule-ccl.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-ccl.c,v
retrieving revision 1.28
diff -u -r1.28 mule-ccl.c
--- src/mule-ccl.c	2005/06/26 19:05:07	1.28
+++ src/mule-ccl.c	2005/11/22 14:00:53
＠＠ -1,6 +1,6 ＠＠
 /* CCL (Code Conversion Language) interpreter.
    Copyright (C) 1995, 1997 Electrotechnical Laboratory, JAPAN.
-   Copyright (C) 2002 Ben Wing.
+   Copyright (C) 2002, 2005 Ben Wing.
    Licensed to the Free Software Foundation.

 This file is part of XEmacs.
＠＠ -30,6 +30,26 ＠＠
 #include "mule-ccl.h"
 #include "file-coding.h"

+#ifdef UNICODE_INTERNAL
+/* The problem is that CCL contains built into it the concept of "charset
+   ID for a charset" and of a character as a combination of charset ID
+   and one or two octets of a national character set .  CCL sticks the
+   charset ID and octets of a character into separate numeric registers.
+   Some CCL programs even have the particular internal charset ID codes
+   hard-coded into them, which is extremely bad, although others are at
+   least wise enough to call `charset-id' to get the charset ID.  Fixing
+   this would require changing the nature of CCL, and it turns out that
+   this just isn't worth it, because with the adoption of internal Unicode
+   support and the expansion of the concept of a charset to cover any
+   character set indexed by one or two bytes, all of the current major CCL
+   applications can be done in other fashions.  If people start complaining
+   that they have complicated conversions that can't be done efficiently
+   other than through CCL, we may rethink this, but no sense in creating
+   general mechanisms that have no use.
+*/
+#error This file not currently compilable with Unicode-internal
+#endif
+
 Lisp_Object Qccl_error;

 /* This contains all code conversion map available to CCL.  */
＠＠ -728,25 +748,15 ＠＠
 	  }							\
 	else							\
 	  {							\
-	    Ibyte work[MAX_ICHAR_LEN];				\
-	    int len;						\
-	    len = non_ascii_set_itext_ichar (work, ch);		\
-	    Dynarr_add_many (destination, work, len);		\
+	    /* !!#### This used to write the internal		\
+	       representation out, which is totally wrong */	\
+	    Dynarr_add (destination,				\
+			CANT_CONVERT_CHAR_WHEN_ENCODING);	\
 	  }							\
       }								\
     else							\
       {								\
-	if (!ichar_multibyte_p(ch))				\
-	  {							\
-	    Dynarr_add (destination, ch);			\
-	  }							\
-	else							\
-	  {							\
-	    Ibyte work[MAX_ICHAR_LEN];				\
-	    int len;						\
-	    len = non_ascii_set_itext_ichar (work, ch);		\
-	    Dynarr_add_many (destination, work, len);		\
-	  }							\
+	Dynarr_add_ichar (destination, ch);			\
       }								\
   } while (0)

＠＠ -755,7 +765,6 ＠＠
    cannot handle a multibyte string except for Control-1 characters. */
 #define CCL_WRITE_STRING(len)					\
   do {								\
-    Ibyte work[MAX_ICHAR_LEN];					\
     int ch;							\
     if (!destination)						\
       CCL_INVALID_CMD;						\
＠＠ -783,8 +792,10 ＠＠
 	      }							\
 	    else						\
 	      {							\
-		non_ascii_set_itext_ichar (work, ch);		\
-		Dynarr_add_many (destination, work, len);	\
+		/* !!#### This used to write the internal	\
+		   representation out, which is totally wrong */\
+		Dynarr_add (destination,			\
+			    CANT_CONVERT_CHAR_WHEN_ENCODING);	\
 	      }							\
 	  }							\
       }								\
＠＠ -794,15 +805,7 ＠＠
 	  {							\
 	    ch = ((XINT (ccl_prog[ic + (i / 3)]))		\
...
> ((2 - (i % 3)) * 8)) & 0xFF;		\ -	    if
(!ichar_multibyte_p(ch))				\
-	      {							\
-		Dynarr_add (destination, ch);			\
-	      }							\
-	    else						\
-	      {							\
-		non_ascii_set_itext_ichar (work, ch);		\
-		Dynarr_add_many (destination, work, len);	\
-	      }							\
+	    Dynarr_add_ichar (destination, ch);			\
 	  }							\
       }								\
   } while (0)
＠＠ -826,39 +829,41 ＠＠
       }							\
   } while (0)

-#define POSSIBLE_LEADING_BYTE_P(leading_byte) \
-  ((leading_byte > MIN_LEADING_BYTE) && \
-   (leading_byte - MIN_LEADING_BYTE) < NUM_LEADING_BYTES)
-
-/* Set C to the character code made from CHARSET and CODE.  This is
-   like make_ichar but check the validity of CHARSET and CODE.  If they
-   are not valid, set C to (CODE & 0xFF) because that is usually the
-   case that CCL_ReadMultibyteChar2 read an invalid code and it set
-   CODE to that invalid byte.  */
+#define ENCODABLE_ID_P(id)				\
+  ((id) >= MIN_ENCODABLE_CHARSET_ID && (id) <= MAX_ENCODABLE_CHARSET_ID)

-/* On XEmacs, TranslateCharacter is not supported.  Thus, this
-   macro is not used.  */
-#if 0
-#define CCL_MAKE_CHAR(charset, code, c)				\
-  do {								\
-    if ((charset) == CHARSET_ASCII)				\
-      (c) = (code) & 0xFF;						\
-    else if (CHARSET_DEFINED_P (charset)			\
-	     && ((code) & 0x7F) >= 32				\
-	     && ((code) < 256 || ((code >> 7) & 0x7F) >= 32))	\
-      {								\
-	int c1 = (code) & 0x7F, c2 = 0;				\
-								\
-	if ((code) >= 256)					\
-	  c2 = c1, c1 = ((code) >> 7) & 0x7F;			\
-	(c) = make_ichar (charset, c1, c2);			\
-      }								\
-    else							\
-      (c) = (code) & 0xFF;						\
-  } while (0)
-#endif
+/* Set C to the character code made from CHARSET and CODE.  This splits up
+   CODE into two position codes and checks the validity of CHARSET and
+   CODE.  If they are not valid, set C to (CODE & 0xFF) because that is
+   usually the case that CCL_ReadMultibyteChar2 read an invalid code and it
+   set CODE to that invalid byte.  */

+inline static Ichar
+ccl_make_char (int csid, int code)
+{
+  Lisp_Object charset;

+  if (ENCODABLE_ID_P (csid) &&
+      !NILP (charset = charset_by_encodable_id (csid)) &&
+      /* The following exclusion, and resulting (code & 0xFF) at the end,
+	 comes directly out of the previous code. */
+      !EQ (charset, Vcharset_ascii))
+    {
+      int c1, c2;
+      c1 = (code >> 7) & 0x7F, c2 = code & 0x7F;
+      if (XCHARSET_OFFSET (charset, 0) >= 128)
+	c1 += 128;
+      if (XCHARSET_OFFSET (charset, 1) >= 128)
+	c2 += 128;
+      if (EQ (charset, Vcharset_control_1))
+	c2 -= 0x20;
+      if (valid_charset_codepoint_p (charset, c1, c2))
+	return charset_codepoint_to_ichar (charset, c1, c2, CONVERR_SUCCEED);
+    }
+
+  return (Ichar) (code & 0xFF);
+}
+
 /* Execute CCL code on SRC_BYTES length text at SOURCE.  The resulting
    text goes to a place pointed by DESTINATION, the length of which
    should not exceed DST_BYTES.  The bytes actually processed is
＠＠ -1257,13 +1262,11 ＠＠
 	      /* DECODE_SHIFT_JIS set MSB for internal format
 		 as opposed to Emacs.  */
 	      DECODE_SHIFT_JIS (i, j, reg[rrr], reg[7]);
-	      reg[rrr] &= 0x7F;
-	      reg[7] &= 0x7F;
 	      break;
 	    case CCL_ENCODE_SJIS:
 	      /* ENCODE_SHIFT_JIS assumes MSB of SHIFT-JIS-char is set
 		 as opposed to Emacs.  */
-	      ENCODE_SHIFT_JIS (i | 0x80, j | 0x80, reg[rrr], reg[7]);
+	      ENCODE_SHIFT_JIS (i, j, reg[rrr], reg[7]);
 	      break;
 	    default: CCL_INVALID_CMD;
 	    }
＠＠ -1284,123 +1287,55 ＠＠
 	    case CCL_ReadMultibyteChar2:
 	      if (!src)
 		CCL_INVALID_CMD;
-
-		if (src >= src_end)
-		  {
-		    src++;
-		    goto ccl_read_multibyte_character_suspend;
-		  }
-
-		i = *src++;
-		if (i < 0x80)
-		  {
-		    /* ASCII */
-		    reg[rrr] = i;
-		    reg[RRR] = LEADING_BYTE_ASCII;
-		  }
-		/* Previously, these next two elses were reversed in order,
-		   which should have worked fine, but is more fragile than
-		   this order. */
-		else if (LEADING_BYTE_CONTROL_1 == i)
-		  {
-		    if (src >= src_end)
-		      goto ccl_read_multibyte_character_suspend;
-		    reg[RRR] = i;
-		    reg[rrr] = (*src++ - 0xA0);
-		  }
-		else if (i <= MAX_LEADING_BYTE_OFFICIAL_1)
-		  {
-		    if (src >= src_end)
-		      goto ccl_read_multibyte_character_suspend;
-		    reg[RRR] = i;
-		    reg[rrr] = (*src++ & 0x7F);
-		  }
-		else if (i <= MAX_LEADING_BYTE_OFFICIAL_2)
-		  {
-		    if ((src + 1) >= src_end)
-		      goto ccl_read_multibyte_character_suspend;
-		    reg[RRR] = i;
-		    i = (*src++ & 0x7F);
-		    reg[rrr] = ((i << 7) | (*src & 0x7F));
-		    src++;
-		  }
-		else if (i == PRE_LEADING_BYTE_PRIVATE_1)
-		  {
-		    if ((src + 1) >= src_end)
-		      goto ccl_read_multibyte_character_suspend;
-		    reg[RRR] = *src++;
-		    reg[rrr] = (*src++ & 0x7F);
-		  }
-		else if (i == PRE_LEADING_BYTE_PRIVATE_2)
+	      {
+		Bytecount len;
+		if (src >= src_end ||
+		    src + (len = itext_ichar_len (src)) > src_end)
 		  {
-		    if ((src + 2) >= src_end)
-		      goto ccl_read_multibyte_character_suspend;
-		    reg[RRR] = *src++;
-		    i = (*src++ & 0x7F);
-		    reg[rrr] = ((i << 7) | (*src & 0x7F));
-		    src++;
+		    if (ccl->last_block)
+		      {
+			ic = ccl->eof_ic;
+			goto ccl_repeat;
+		      }
+		    else
+		      CCL_SUSPEND (CCL_STAT_SUSPEND_BY_SRC);
 		  }
 		else
 		  {
-		    /* INVALID CODE.  Return a single byte character.  */
-		    reg[RRR] = LEADING_BYTE_ASCII;
-		    reg[rrr] = i;
-		  }
-	      break;
+		    int c1, c2;
+		    Lisp_Object charset;
+		    /* ＠＠#### Better error-handling? */
+		    itext_to_charset_codepoint (src, get_unicode_precedence(),
+						&charset, &c1, &c2,
+						CONVERR_SUCCEED);
+		    src += len;
+		    c1 &= 127;
+		    c2 &= 127;

-	    ccl_read_multibyte_character_suspend:
-	      src--;
-	      if (ccl->last_block)
-		{
-		  ic = ccl->eof_ic;
-		  goto ccl_repeat;
-		}
-	      else
-		CCL_SUSPEND (CCL_STAT_SUSPEND_BY_SRC);
-
+		    reg[RRR] = XCHARSET_ID (charset);
+		    reg[rrr] = (c1 << 7) + c2;
+		  }
+	      }
 	      break;

 	    case CCL_WriteMultibyteChar2:
-	      i = reg[RRR]; /* charset */
-	      if (i == LEADING_BYTE_ASCII) 
-		i = reg[rrr] & 0xFF;
-	      else if (LEADING_BYTE_CONTROL_1 == i)
-		i = ((reg[rrr] & 0xFF) - 0xA0);
-	      else if (POSSIBLE_LEADING_BYTE_P(i) &&
-		       !NILP(charset_by_leading_byte(i)))
-		{
-		  if (XCHARSET_DIMENSION (charset_by_leading_byte (i)) == 1)
-		    i = (((i - FIELD2_TO_OFFICIAL_LEADING_BYTE) << 7)
-			 | (reg[rrr] & 0x7F));
-		  else if (i < MAX_LEADING_BYTE_OFFICIAL_2)
-		    i = ((i - FIELD1_TO_OFFICIAL_LEADING_BYTE) << 14) 
-		      | reg[rrr];
-		  else
-		    i = ((i - FIELD1_TO_PRIVATE_LEADING_BYTE) << 14) | reg[rrr];
-		}
-	      else 
-		{
-		  /* No charset we know about; use U+3012 GETA MARK */
-		  i = make_ichar
-		    (charset_by_leading_byte(LEADING_BYTE_JAPANESE_JISX0208),
-		     34, 46);
-		}
-
-	      CCL_WRITE_CHAR (i);
-
+	      {
+		Ichar ich = ccl_make_char (reg[RRR], reg[rrr]);
+		CCL_WRITE_CHAR (ich);
+	      }
 	      break;

 	    case CCL_TranslateCharacter:
 #if 0
 	      /* XEmacs does not have translate_char, and its
 		 equivalent nor.  We do nothing on this operation. */
-	      CCL_MAKE_CHAR (reg[RRR], reg[rrr], i);
+	      i = ccl_make_char (reg[RRR], reg[rrr]);
 	      op = translate_char (GET_TRANSLATION_TABLE (reg[Rrr]),
 				   i, -1, 0, 0);
 	      SPLIT_CHAR (op, reg[RRR], i, j);
 	      if (j != -1)
 		i = (i << 7) | j;
-
+	      
 	      reg[rrr] = i;
 #endif
 	      break;
＠＠ -1411,7 +1346,7 ＠＠
                  do nothing on this operation. */
 	      op = XINT (ccl_prog[ic]); /* table */
 	      ic++;
-	      CCL_MAKE_CHAR (reg[RRR], reg[rrr], i);
+	      i = ccl_make_char (reg[RRR], reg[rrr]);
 	      op = translate_char (GET_TRANSLATION_TABLE (op), i, -1, 0, 0);
 	      SPLIT_CHAR (op, reg[RRR], i, j);
 	      if (j != -1)
＠＠ -2294,6 +2229,8 ＠＠
 If the font is single-byte font, the register R2 is not used.
 */ );
   Vfont_ccl_encoder_alist = Qnil;
+
+  Fprovide (intern ("ccl"));
 }

 #endif  /* emacs */
Index: src/mule-charset.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-charset.c,v
retrieving revision 1.46
diff -u -r1.46 mule-charset.c
--- src/mule-charset.c	2005/10/25 11:16:26	1.46
+++ src/mule-charset.c	2005/11/22 14:00:54
＠＠ -33,7 +33,9 ＠＠
 #include "device.h"
 #include "faces.h"
 #include "lstream.h"
+#ifdef HAVE_CCL
 #include "mule-ccl.h"
+#endif /* HAVE_CCL */
 #include "objects.h"

 /* The various pre-defined charsets. */
＠＠ -53,6 +55,7 ＠＠
 Lisp_Object Vcharset_cyrillic_iso8859_5;
 Lisp_Object Vcharset_latin_iso8859_9;
 Lisp_Object Vcharset_latin_iso8859_15;
+Lisp_Object Vcharset_chinese_sisheng;
 Lisp_Object Vcharset_japanese_jisx0208_1978;
 Lisp_Object Vcharset_chinese_gb2312;
 Lisp_Object Vcharset_japanese_jisx0208;
＠＠ -60,14 +63,116 ＠＠
 Lisp_Object Vcharset_japanese_jisx0212;
 Lisp_Object Vcharset_chinese_cns11643_1;
 Lisp_Object Vcharset_chinese_cns11643_2;
+#ifdef UNICODE_INTERNAL
+Lisp_Object Vcharset_chinese_big5;
+Lisp_Object Vcharset_japanese_shift_jis;
+#else
 Lisp_Object Vcharset_chinese_big5_1;
 Lisp_Object Vcharset_chinese_big5_2;
+#endif /* UNICODE_INTERNAL */
 Lisp_Object Vcharset_composite;

+#ifndef UNICODE_INTERNAL
+
+/* GNU Emacs creates charsets called `mule-unicode-0100-24ff',
+   `mule-unicode-2500-33ff', and `mule-unicode-e000-ffff'.
+   Although it's conceivable we could also start from 0x100 on the
+   assumption that ASCII and Latin-1 can always be represented otherwise,
+   it's not quite so clean; and furthermore, by starting at 0, we end
+   up with chunks that perfectly end right before the start of the
+   surrogate space.  We just pick up right afterwards and don't have to
+   worry about having a charset that crosses the surrogate space or
+   overlaps one side of it.
+
+   Since we're not being compatible with their charsets in our ranges,
+   there's no sense in being compatible in other ways. */
+
+#define FROB(low, high)				\
+Lisp_Object Vcharset_unicode_##low##_##high;	\
+Lisp_Object Qunicode_##low##_##high
+
+FROB  (   0,  23ff);
+FROB  (2400,  47ff);
+FROB  (4800,  6bff);
+FROB  (6c00,  8fff);
+FROB  (9000,  b3ff);
+FROB  (b400,  d7ff);
+FROB  (e000, 103ff);
+FROB (10400, 127ff);
+FROB (12800, 14bff);
+FROB (14c00, 16fff);
+FROB (17000, 193ff);
+FROB (19400, 1b7ff);
+FROB (1b800, 1dbff);
+FROB (1dc00, 1ffff);
+FROB (20000, 223ff);
+FROB (22400, 247ff);
+FROB (24800, 26bff);
+FROB (26c00, 28fff);
+FROB (29000, 2b3ff);
+FROB (2b400, 2d7ff);
+FROB (2d800, 2fbff);
+FROB (2fc00, 31fff);
+/* WARNING: Any changes to this list need to be propagated to at least two
+   other places in this file (in syms_of_mule_charset() and
+   complex_vars_of_mule_charset(); maybe also to the CHARSET_ID enum just
+   below. */
+
+#undef FROB
+
+/* Non-public definitions of charset IDs */
+enum CHARSET_ID_OFFICIAL
+{
+  /* WARNING!!!  If you change this, you *MUST* change
+     rep_bytes_by_first_byte[] in text.c correspondingly. */
+  CHARSET_ID_LATIN_ISO8859_2 =
+    CHARSET_ID_LATIN_ISO8859_1 + 1,  /* 0x82 Right half of ISO 8859-2 */
+  CHARSET_ID_LATIN_ISO8859_3,        /* 0x83 Right half of ISO 8859-3 */
+  CHARSET_ID_LATIN_ISO8859_4,        /* 0x84 Right half of ISO 8859-4 */
+  CHARSET_ID_CYRILLIC_ISO8859_5,     /* 0x85 Right half of ISO 8859-5 */
+  CHARSET_ID_ARABIC_ISO8859_6,       /* 0x86 Right half of ISO 8859-6 */
+  CHARSET_ID_GREEK_ISO8859_7,        /* 0x87 Right half of ISO 8859-7 */
+  CHARSET_ID_HEBREW_ISO8859_8,       /* 0x88 Right half of ISO 8859-8 */
+  CHARSET_ID_LATIN_ISO8859_9,        /* 0x89 Right half of ISO 8859-9 */
+  CHARSET_ID_LATIN_ISO8859_15,       /* 0x8A Right half of ISO 8859-15 */
+  CHARSET_ID_LATIN_JISX0201,         /* 0x8B Left  half of JIS X0201-1976 */
+  CHARSET_ID_KATAKANA_JISX0201,      /* 0x8C Right half of JIS X0201-1976 */
+  /* The following is dimension-2 with ENABLE_COMPOSITE_CHARS but
+     dimension-1 otherwise.  All above are dimension-1; all below are
+     dimension-2. */
+  CHARSET_ID_COMPOSITE,              /* 0x8D Composite characters or a
+					  fake set that replaces ESC 0 -
+					  ESC 4 in a buffer */
+  CHARSET_ID_JAPANESE_JISX0208_1978, /* 0x8E Japanese JIS X0208-1978 */
+  CHARSET_ID_CHINESE_GB2312,         /* 0x8F Chinese Hanzi GB2312-1980 */
+  CHARSET_ID_JAPANESE_JISX0208,      /* 0x90 Japanese JIS X0208-1983 */
+  CHARSET_ID_KOREAN_KSC5601,         /* 0x91 Hangul KS C5601-1987 */
+  CHARSET_ID_JAPANESE_JISX0212,      /* 0x92 Japanese JIS X0212-1990 */
+  CHARSET_ID_CHINESE_CNS11643_1,     /* 0x93 Chinese CNS11643 Set 1 */
+  CHARSET_ID_CHINESE_CNS11643_2,     /* 0x94 Chinese CNS11643 Set 2 */
+  CHARSET_ID_CHINESE_BIG5_1,         /* 0x95 Big5 Level 1 */
+  CHARSET_ID_CHINESE_BIG5_2,         /* 0x96 Big5 Level 2 */
+  CHARSET_ID_UNICODE_0_23FF,	     /* 0x97 Unicode 0-23FF */
+  CHARSET_ID_UNICODE_2400_47FF,      /* 0x98 Unicode 2400-47FF */
+  CHARSET_ID_UNICODE_4800_6BFF,      /* 0x99 Unicode 4800-6BFF */
+  CHARSET_ID_UNICODE_6C00_8FFF,      /* 0x9A Unicode 6C00-8FFF */
+  CHARSET_ID_UNICODE_9000_B3FF,      /* 0x9B Unicode 9000-B3FF */
+  CHARSET_ID_UNICODE_B400_D7FF,      /* 0x9C Unicode B400-D7FF */
+  CHARSET_ID_UNICODE_E000_103FF,     /* 0x9D Unicode E000-103FF */
+};
+
+#endif /* not UNICODE_INTERNAL */
+
 struct charset_lookup *chlook;

 static const struct memory_description charset_lookup_description_1[] = {
-  { XD_LISP_OBJECT_ARRAY, offsetof (struct charset_lookup, charset_by_leading_byte),
NUM_LEADING_BYTES+4*128*2 },
+  { XD_LISP_OBJECT_ARRAY,
+    offsetof (struct charset_lookup, charset_by_attributes), 4*128*2 },
+#ifndef UNICODE_INTERNAL
+  { XD_LISP_OBJECT_ARRAY,
+    offsetof (struct charset_lookup, charset_by_encodable_id),
+    NUM_ENCODABLE_CHARSET_IDS },
+#endif /* not UNICODE_INTERNAL */
   { XD_END }
 };

＠＠ -76,6 +181,8 ＠＠
   charset_lookup_description_1
 };

+static int next_charset_id;
+
 Lisp_Object Qcharsetp;

 /* Qdoc_string, Qdimension, Qchars defined in general.c */
＠＠ -107,13 +214,19 ＠＠
   Qjapanese_jisx0212,
   Qchinese_cns11643_1,
   Qchinese_cns11643_2,
+#ifdef UNICODE_INTERNAL
+  Qchinese_big5,
+  Qjapanese_shift_jis,
+#else /* not UNICODE_INTERNAL */
   Qchinese_big5_1,
   Qchinese_big5_2,
+#endif /* UNICODE_INTERNAL */
+  Qchinese_sisheng,
   Qcomposite;

 Lisp_Object Ql2r, Qr2l;

-Lisp_Object Vcharset_hash_table;
+Lisp_Object Vcharset_hash_table, Vcharset_id_table;

 /************************************************************************/
＠＠ -129,7 +242,9 ＠＠
   mark_object (cs->long_name);
   mark_object (cs->doc_string);
   mark_object (cs->registry);
+#ifdef HAVE_CCL
   mark_object (cs->ccl_program);
+#endif /* HAVE_CCL */
   return cs->name;
 }

＠＠ -137,29 +252,28 ＠＠
 print_charset (Lisp_Object obj, Lisp_Object printcharfun,
 	       int UNUSED (escapeflag))
 {
-  Lisp_Charset *cs = XCHARSET (obj);
-
   if (print_readably)
-    printing_unreadable_object ("#<charset %s 0x%x>",
-				XSTRING_DATA (XSYMBOL (CHARSET_NAME (cs))->
-					     name),
-				cs->header.uid);
+    printing_unreadable_lcrecord
+      (obj, XSTRING_DATA (XSYMBOL (XCHARSET_NAME (obj))->name));

   write_fmt_string_lisp (printcharfun, "#<charset %s %S %S %S", 4,
-			 CHARSET_NAME (cs), CHARSET_SHORT_NAME (cs),
-			 CHARSET_LONG_NAME (cs), CHARSET_DOC_STRING (cs));
-  write_fmt_string (printcharfun, " %s %s cols=%d g%d final='%c'
reg=",
-		    CHARSET_TYPE (cs) == CHARSET_TYPE_94    ? "94" :
-		    CHARSET_TYPE (cs) == CHARSET_TYPE_96    ? "96" :
-		    CHARSET_TYPE (cs) == CHARSET_TYPE_94X94 ? "94x94" :
-		    "96x96",
-		    CHARSET_DIRECTION (cs) == CHARSET_LEFT_TO_RIGHT ? "l2r" :
+			 XCHARSET_NAME (obj), XCHARSET_SHORT_NAME (obj),
+			 XCHARSET_LONG_NAME (obj), XCHARSET_DOC_STRING (obj));
+  if (XCHARSET_DIMENSION (obj) == 1)
+    write_fmt_string (printcharfun, " %d", XCHARSET_CHARS (obj, 1));
+  else
+    write_fmt_string (printcharfun, " %dx%d", XCHARSET_CHARS (obj, 0),
+		      XCHARSET_CHARS (obj, 1));
+  write_fmt_string (printcharfun, " %s cols=%d g%d ",
+		    XCHARSET_DIRECTION (obj) == CHARSET_LEFT_TO_RIGHT ? "l2r" :
 		    "r2l",
-		    CHARSET_COLUMNS (cs),
-		    CHARSET_GRAPHIC (cs),
-		    CHARSET_FINAL (cs));
-  print_internal (CHARSET_REGISTRY (cs), printcharfun, 0);
-  write_fmt_string (printcharfun, " 0x%x>", cs->header.uid);
+		    XCHARSET_COLUMNS (obj),
+		    XCHARSET_GRAPHIC (obj));
+  if (XCHARSET_FINAL (obj))
+    write_fmt_string (printcharfun, "final='%c' ", XCHARSET_FINAL
(obj));
+  write_fmt_string (printcharfun, "reg=");
+  print_internal (XCHARSET_REGISTRY (obj), printcharfun, 0);
+  write_fmt_string (printcharfun, " 0x%x>", XCHARSET (obj)->header.uid);
 }

 static const struct memory_description charset_description[] = {
＠＠ -171,9 +285,16 ＠＠
   { XD_LISP_OBJECT, offsetof (Lisp_Charset, short_name) },
   { XD_LISP_OBJECT, offsetof (Lisp_Charset, long_name) },
   { XD_LISP_OBJECT, offsetof (Lisp_Charset, reverse_direction_charset) },
+#ifdef HAVE_CCL
   { XD_LISP_OBJECT, offsetof (Lisp_Charset, ccl_program) },
+#endif /* HAVE_CCL */
   { XD_UNION, offsetof (Lisp_Charset, to_unicode_table),
-    XD_INDIRECT (0, 0), { &to_unicode_description }, XD_FLAG_NO_KKCC },
+#ifdef MAXIMIZE_UNICODE_TABLE_DEPTH
+    2,
+#else
+    XD_INDIRECT (0, 0),
+#endif /* MAXIMIZE_UNICODE_TABLE_DEPTH */
+    { &to_unicode_description }, XD_FLAG_NO_KKCC },
   { XD_UNION, offsetof (Lisp_Charset, from_unicode_table),
     XD_INDIRECT (1, 0), { &from_unicode_description }, XD_FLAG_NO_KKCC },
   { XD_END }
＠＠ -183,145 +304,154 ＠＠
 			       1, /* dumpable flag */
                                mark_charset, print_charset, 0,
 			       0, 0, charset_description, Lisp_Charset);
-/* Make a new charset. */
-/* #### SJT Should generic properties be allowed? */
-static Lisp_Object
-make_charset (int id, Lisp_Object name, int rep_bytes,
-	      int type, int columns, int graphic,
-	      Ibyte final, int direction,  Lisp_Object short_name,
-	      Lisp_Object long_name, Lisp_Object doc,
-	      Lisp_Object reg, int overwrite)
-{
-  Lisp_Object obj;
-  Lisp_Charset *cs;

-  if (!overwrite)
-    {
-      cs = ALLOC_LCRECORD_TYPE (Lisp_Charset, &lrecord_charset);
-      obj = wrap_charset (cs);
+#ifndef UNICODE_INTERNAL

-      if (final)
-	{
-	  /* some charsets do not have final characters.  This includes
-	     ASCII, Control-1, Composite, and the two faux private
-	     charsets. */
-	  assert (NILP (chlook->
-			charset_by_attributes[type][final][direction]));
-	  chlook->charset_by_attributes[type][final][direction] = obj;
-	}
-
-      assert (NILP (chlook->charset_by_leading_byte[id - MIN_LEADING_BYTE]));
-      chlook->charset_by_leading_byte[id - MIN_LEADING_BYTE] = obj;
+static int
+get_unallocated_charset_id (int dimension)
+{
+  if (dimension == 1)
+    {
+      if (chlook->next_allocated_private_dim1_id > MAX_PRIVATE_DIM1_CHARSET_ID)
+	return 0;
+      else
+	return chlook->next_allocated_private_dim1_id++;
     }
   else
     {
-      Lisp_Object ret;
-      /* Actually overwrite the properties of the existing charset.
-	 We do this because until now charsets could never be "deleted",
-	 so parts of the code don't bother to GC charsets. */
-      obj = chlook->charset_by_leading_byte[id - MIN_LEADING_BYTE];
-      cs = XCHARSET (obj);
-      assert (EQ (chlook->charset_by_attributes[type][final][direction],
-		  obj));
-      
-      ret = Fremhash (XCHARSET_NAME (obj), Vcharset_hash_table);
-      assert (!NILP (ret));
+      text_checking_assert (dimension == 2);
+      if (chlook->next_allocated_private_dim2_id > MAX_PRIVATE_DIM2_CHARSET_ID)
+	return 0;
+      else
+	return chlook->next_allocated_private_dim2_id++;
     }

-  CHARSET_ID		(cs) = id;
-  CHARSET_NAME		(cs) = name;
-  CHARSET_SHORT_NAME	(cs) = short_name;
-  CHARSET_LONG_NAME	(cs) = long_name;
-  CHARSET_REP_BYTES	(cs) = rep_bytes;
-  CHARSET_DIRECTION	(cs) = direction;
-  CHARSET_TYPE		(cs) = type;
-  CHARSET_COLUMNS	(cs) = columns;
-  CHARSET_GRAPHIC	(cs) = graphic;
-  CHARSET_FINAL		(cs) = final;
-  CHARSET_DOC_STRING	(cs) = doc;
-  CHARSET_REGISTRY	(cs) = reg;
-  CHARSET_CCL_PROGRAM	(cs) = Qnil;
-  CHARSET_REVERSE_DIRECTION_CHARSET (cs) = Qnil;
-
-  CHARSET_DIMENSION (cs) = (CHARSET_TYPE (cs) == CHARSET_TYPE_94 ||
-			    CHARSET_TYPE (cs) == CHARSET_TYPE_96) ? 1 : 2;
-  CHARSET_CHARS (cs) = (CHARSET_TYPE (cs) == CHARSET_TYPE_94 ||
-			CHARSET_TYPE (cs) == CHARSET_TYPE_94X94) ? 94 : 96;
+}

-  if (id == LEADING_BYTE_ASCII || id == LEADING_BYTE_CONTROL_1
-#ifdef ENABLE_COMPOSITE_CHARS
-      || id == LEADING_BYTE_COMPOSITE
-#endif
-      )
-    assert (!overwrite);
-  else
-    {
-      if (overwrite)
-	free_charset_unicode_tables (obj);
-      init_charset_unicode_tables (obj);
-    }
+#endif /* not UNICODE_INTERNAL */

-  /* Some charsets are "faux" and don't have names or really exist at
-     all except in the leading-byte table. */
-  if (!NILP (name))
+
+/************************************************************************/
+/*                      Basic charset Lisp functions                    */
+/************************************************************************/
+
+/* Return the inclusive limits of the indexing bounds on the charset. */
+
+void
+get_charset_limits (Lisp_Object charset, int *low0, int *high0,
+		    int *low1, int *high1)
+{
+  *low0 = XCHARSET_OFFSET (charset, 0);
+  *low1 = XCHARSET_OFFSET (charset, 1);
+  *high0 = *low0 + XCHARSET_CHARS (charset, 0) - 1;
+  *high1 = *low1 + XCHARSET_CHARS (charset, 1) - 1;
+#ifdef ERROR_CHECK_STRUCTURES
+  if (XCHARSET_DIMENSION (charset) == 1)
     {
-      assert (NILP (Fgethash (name, Vcharset_hash_table, Qnil)));
-      Fputhash (name, obj, Vcharset_hash_table);
+      assert (*low0 == 0);
+      assert (*high0 == 0);
     }
-
-  recalculate_unicode_precedence ();
-  return obj;
+#endif /* ERROR_CHECK_STRUCTURES */
 }

+/* Return the ISO-2022 type of the charset, one of CHARSET_TYPE_94,
+   CHARSET_TYPE_96, CHARSET_TYPE_94X94, CHARSET_TYPE_96X96, or -1.
+   Charset must be of the right size with the right offsets, and have
+   a final. */
+
 static int
-get_unallocated_leading_byte (int dimension)
+get_charset_iso2022_type_1 (int dimension, int size0, int size1, int offset0,
+			    int offset1, int final)
 {
-  int lb;
+  int type;

+  offset0 &= 127;
+  offset1 &= 127;
+
   if (dimension == 1)
     {
-      if (chlook->next_allocated_1_byte_leading_byte >
-	  MAX_LEADING_BYTE_PRIVATE_1)
-	lb = 0;
+      assert (size0 == 1);
+      assert (offset0 == 0);
+      if (size1 == 94 && offset1 == 33)
+	type = CHARSET_TYPE_94;
+      else if (size1 == 96 && offset1 == 32)
+	type = CHARSET_TYPE_96;
       else
-	lb = chlook->next_allocated_1_byte_leading_byte++;
+	return -1;
     }
   else
     {
-      /* awfully fragile, but correct */
-#if MAX_LEADING_BYTE_PRIVATE_2 == 255
-      if (chlook->next_allocated_2_byte_leading_byte == 0)
-#else
-      if (chlook->next_allocated_2_byte_leading_byte >
-	  MAX_LEADING_BYTE_PRIVATE_2)
-#endif
-	lb = 0;
+      text_checking_assert (dimension == 2);
+      if (size0 == 94 && offset0 == 33 &&
+	  size1 == 94 && offset1 == 33)
+	type = CHARSET_TYPE_94X94;
+      else if (size0 == 96 && offset0 == 32 &&
+	       size1 == 96 && offset1 == 32)
+	type = CHARSET_TYPE_96X96;
       else
-	lb = chlook->next_allocated_2_byte_leading_byte++;
+	return -1;
     }
-
-  if (!lb)
-    invalid_operation
-      ("No more character sets free for this dimension", make_int
(dimension));

-  return lb;
+  if (final)
+    return type;
+  else
+    return -1;
 }

-
-/************************************************************************/
-/*                      Basic charset Lisp functions                    */
-/************************************************************************/
+/* Return the ISO-2022 type (94, 96, 94x94 or 96x96) charset, or -1 if not
+   ISO-2022 compatible.  NOTE: This is *NOT* the same as whether an
+   existing charset can be encoded in old-Mule.  That can only be
+   determined by looking at the charset ID and seeing if it's in the
+   encodable range (i.e. <= MAX_ENCODABLE_CHARSET_ID).  Encodable
+   charsets may not be ISO-2022 compatible (e.g. no final) and ISO-2022
+   compatible charsets may not encodable (e.g. no more charset ID's
+   in the encodable range). */
+
+int
+get_charset_iso2022_type (Lisp_Object cs)
+{
+  if (EQ (cs, Vcharset_ascii))
+    return CHARSET_TYPE_94; /* Yes, even though it has 128 characters in it */
+  return get_charset_iso2022_type_1 (XCHARSET_DIMENSION (cs),
+				     XCHARSET_CHARS (cs, 0),
+				     XCHARSET_CHARS (cs, 1),
+				     XCHARSET_OFFSET (cs, 0),
+				     XCHARSET_OFFSET (cs, 1),
+				     XCHARSET_FINAL (cs));
+}
+
+#if defined (MULE) && !defined (UNICODE_INTERNAL)
+
+/* Return true if a charset can be encoded as a string or character in the
+   old-Mule encoding, given the dimension, size and offset of the charset.
+   Currently, this is similar to but not the same as whether it can be
+   encoded as ISO-2022, since the presence of a final byte is
+   unnecessary. ＠＠#### In reality, any charset no bigger than 96x96 can be
+   so encoded; we should relax this restriction. */

-void
-get_charset_limits (Lisp_Object charset, int *low, int *high)
+static int
+old_mule_charset_encodable (int dimension, int size0, int size1, int offset0,
+			    int offset1)
 {
-  Lisp_Charset *cs = XCHARSET (charset);
+  int size0_ok, size1_ok;
+
+  offset0 &= 127;
+  offset1 &= 127;

-  if      (EQ (charset, Vcharset_ascii))     *low =  0, *high = 127;
-  else if (EQ (charset, Vcharset_control_1)) *low =  0, *high =  31;
-  else if (CHARSET_CHARS (cs) == 94)         *low = 33, *high = 126;
-  else	/* CHARSET_CHARS (cs) == 96) */	     *low = 32, *high = 127;
+  size0_ok = (size0 == 94 && offset0 == 33) || (size0 == 96 && offset0 ==
32);
+  size1_ok = (size1 == 94 && offset1 == 33) || (size1 == 96 && offset1 ==
32);
+
+  
+  if (dimension == 1)
+    return size1_ok;
+  else
+    {
+      text_checking_assert (dimension == 2);
+      return size0_ok && size1_ok;
+    }
 }
+
+#endif /* defined (MULE) && !defined (UNICODE_INTERNAL) */

 DEFUN ("charsetp", Fcharsetp, 1, 1, 0, /*
 Return non-nil if OBJECT is a charset.
＠＠ -372,7 +502,6 ＠＠
 add_charset_to_list_mapper (Lisp_Object UNUSED (key), Lisp_Object value,
 			    void *charset_list_closure)
 {
-  /* This function can GC */
   struct charset_list_closure *chcl =
     (struct charset_list_closure*) charset_list_closure;
   Lisp_Object *charset_list = chcl->charset_list;
＠＠ -387,14 +516,11 ＠＠
        ())
 {
   Lisp_Object charset_list = Qnil;
-  struct gcpro gcpro1;
   struct charset_list_closure charset_list_closure;

-  GCPRO1 (charset_list);
   charset_list_closure.charset_list = &charset_list;
   elisp_maphash (add_charset_to_list_mapper, Vcharset_hash_table,
 		 &charset_list_closure);
-  UNGCPRO;

   return charset_list;
 }
＠＠ -407,10 +533,247 ＠＠
   return XCHARSET_NAME (Fget_charset (charset));
 }

+static void
+validate_charset_offset_or_size (Lisp_Object keyword, Lisp_Object value,

+				 int *dim0, int *dim1)
+{
+  int minval, maxval;
+
+  if (EQ (keyword, Qchars))
+    minval = 1, maxval = 256;
+  else
+    {
+      assert (EQ (keyword, Qoffset));
+      minval = 0, maxval = 255;
+    }
+
+  if (INTP (value))
+    {
+      *dim0 = *dim1 = XINT (value);
+      if (*dim0 < minval || *dim0 > maxval)
+	goto bzzzzt;
+    }
+  else
+    {
+      int len = 0;
+      Lisp_Object tem;
+      {
+	EXTERNAL_LIST_LOOP_1 (value)
+	  len++;
+      }
+      if (len < 1 || len > 2)
+	invalid_constant_2
+	  ("Invalid value for property (list of 1 or 2 integers)",
+	   keyword, value);
+      tem = X1ST (value);
+      CHECK_INT (tem);
+      *dim0 = *dim1 = XINT (tem);
+      if (*dim0 < minval || *dim0 > maxval)
+	{
+	  value = tem;
+	  goto bzzzzt;
+	}
+      if (len == 2)
+	{
+	  tem = X2ND (value);
+	  CHECK_INT (tem);
+	  *dim1 = XINT (tem);
+	  if (*dim1 < minval || *dim1 > maxval)
+	    {
+	      value = tem;
+	      goto bzzzzt;
+	    }
+	}
+    }
+
+  return;
+
+ bzzzzt:
+  if (maxval == 256)
+    invalid_constant_2 ("Invalid value for property (1-256)", keyword,
+			value);
+  else
+    invalid_constant_2 ("Invalid value for property (0-255)", keyword,
+			value);
+}
+
+Lisp_Object
+charset_by_id (int id)
+{
+  return Fgethash (make_int (id), Vcharset_id_table, Qnil);
+}
+		    
+/* Make a new charset.  ID is the charset ID, or -1 to find a new one. */
 /* #### SJT Should generic properties be allowed? */
+static Lisp_Object
+make_charset (int id, int no_init_unicode_tables,
+	      Lisp_Object name, int dimension,
+	      int size0, int size1, int offset0, int offset1, int columns,
+	      int graphic, Ibyte final, int direction,  Lisp_Object short_name,
+	      Lisp_Object long_name, Lisp_Object doc_string,
+	      Lisp_Object registry, int overwrite,
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+	      int algo_low
+#else
+	      int UNUSED (algo_low)
+#endif
+	      )
+{
+  Lisp_Object obj;
+  int type = get_charset_iso2022_type_1 (dimension, size0, size1,
+					 offset0, offset1, final);
+
+  /* Major hack; ISO-2022 pretends that ASCII is a 94-byte charset when it's
+     obviously not. */
+
+  if (EQ (name, Qascii))
+    type = CHARSET_TYPE_94;
+
+  if (id < 0)
+    {
+#ifdef UNICODE_INTERNAL
+      id = next_charset_id++;
+#else
+      if (!old_mule_charset_encodable (dimension, size0, size1,
+				       offset0, offset1))
+	/* Make sure the ID's are above all encodable ID's */
+	id = 1 + MAX_ENCODABLE_CHARSET_ID + next_charset_id++;
+      else
+	{
+	  id = get_unallocated_charset_id (dimension);
+	  if (id == 0)
+	    id = 1 + MAX_ENCODABLE_CHARSET_ID + next_charset_id++;
+	}
+#endif /* not UNICODE_INTERNAL */
+    }
+
+  /* If this is a temporary place-holder charset, assign a temporary name
+     based on the id. */
+  if (EQ (name, Qunbound))
+    {
+      Ibyte tempname[80];
+
+      qxesprintf (tempname, "___temporary___%d__", id);
+      name = intern_int (tempname);
+    }
+
+  /* Set certain other default values.  SHORT_NAME cannot be computed
+     until the actual name is generated (just above, for temporaries). */
+  if (NILP (doc_string))
+    doc_string = build_string ("");
+  if (NILP (registry))
+    registry = build_string ("");
+  if (NILP (short_name))
+    short_name = XSYMBOL (name)->name;
+  if (NILP (long_name))
+    long_name = doc_string;
+  if (columns == -1)
+    columns = dimension;
+
+  if (!overwrite)
+    {
+      obj = wrap_charset (ALLOC_LCRECORD_TYPE (Lisp_Charset, &lrecord_charset));
+
+      if (final)
+	{
+	  /* some charsets do not have final characters.  This includes
+	     ASCII, Control-1, Composite, the two faux private charsets,
+	     and various others, in the new, better world. */
+	  assert (NILP (charset_by_attributes (type, final, direction)));
+	  chlook->charset_by_attributes[type][final][direction] = obj;
+	}
+
+#ifndef UNICODE_INTERNAL
+      if (id <= MAX_ENCODABLE_CHARSET_ID)
+	{
+	  assert (NILP (chlook->charset_by_encodable_id
+			[id - MIN_ENCODABLE_CHARSET_ID]));
+	  chlook->charset_by_encodable_id[id - MIN_ENCODABLE_CHARSET_ID] = obj;
+	}
+#endif /* not UNICODE_INTERNAL */
+    }
+  else
+    {
+      Lisp_Object ret;
+      /* We should only be called this way from the ISO-2022 code */
+      text_checking_assert (final > 0);
+      /* Actually overwrite the properties of the existing charset.
+	 We do this because until now charsets could never be "deleted",
+	 so parts of the code don't bother to GC charsets. */
+      obj = charset_by_attributes (type, final, direction);
+#ifndef UNICODE_INTERNAL
+      if (id <= MAX_ENCODABLE_CHARSET_ID)
+	{
+	  assert (EQ (chlook->charset_by_encodable_id
+		      [id - MIN_ENCODABLE_CHARSET_ID],
+		      obj));
+	}
+#endif /* not UNICODE_INTERNAL */
+
+      ret = Fremhash (XCHARSET_NAME (obj), Vcharset_hash_table);
+      assert (!NILP (ret));
+      ret = Fremhash (make_int (id), Vcharset_id_table);
+      assert (!NILP (ret));
+    }
+
+  XCHARSET_ID		(obj) = id;
+  XCHARSET_NAME		(obj) = name;
+  XCHARSET_SHORT_NAME	(obj) = short_name;
+  XCHARSET_LONG_NAME	(obj) = long_name;
+  XCHARSET_DIRECTION	(obj) = direction;
+  XCHARSET_CHARS        (obj, 0) = size0;
+  XCHARSET_CHARS        (obj, 1) = size1;
+  XCHARSET_OFFSET       (obj, 0) = offset0;
+  XCHARSET_OFFSET       (obj, 1) = offset1;
+  XCHARSET_COLUMNS	(obj) = columns;
+  XCHARSET_GRAPHIC	(obj) = graphic;
+  XCHARSET_FINAL	(obj) = final;
+  XCHARSET_DOC_STRING	(obj) = doc_string;
+  XCHARSET_REGISTRY	(obj) = registry;
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+  XCHARSET_ALGO_LOW	(obj) = algo_low;
+#endif /* ALLOW_ALGORITHMIC_CONVERSION_TABLES */
+#ifdef HAVE_CCL
+  XCHARSET_CCL_PROGRAM	(obj) = Qnil;
+#endif /* HAVE_CCL */
+  XCHARSET_REVERSE_DIRECTION_CHARSET (obj) = Qnil;
+
+  XCHARSET_DIMENSION (obj) = dimension;
+
+  if (no_init_unicode_tables)
+    assert (!overwrite);
+  else
+    {
+      if (overwrite)
+	free_charset_unicode_tables (obj);
+      init_charset_unicode_tables (obj);
+    }
+
+  assert (NILP (Fgethash (make_int (id), Vcharset_id_table, Qnil)));
+  Fputhash (make_int (id), obj, Vcharset_id_table);
+
+  /* [[ Some charsets are "faux" and don't have names or really exist at
+     all except in the charset ID table. ]]
+     #### Explain this comment! --ben */
+  assert (!NILP (name)); /* ＠＠#### to figure out what's going on */
+  if (!NILP (name))
+    {
+      assert (NILP (Fgethash (name, Vcharset_hash_table, Qnil)));
+      Fputhash (name, obj, Vcharset_hash_table);
+    }
+
+  recalculate_unicode_precedence ();
+  return obj;
+}
+
+/* #### SJT Should generic properties be allowed? */
 DEFUN ("make-charset", Fmake_charset, 3, 3, 0, /*
-Define a new character set.
-This function is for use with Mule support.
+Define a new national character set.
+This function is for use with international support.
+With a Unicode-based engine, these charsets are used mostly in the codecs
+that read-in and write-out text formatting according to one or another
+national character sets.
+
 NAME is a symbol, the name by which the character set is normally referred.
 DOC-STRING is a string describing the character set.
 PROPS is a property list, describing the specific nature of the
＠＠ -419,33 +782,43 ＠＠
 `short-name'	Short version of the charset name (ex: Latin-1)
 `long-name'	Long version of the charset name (ex: ISO8859-1 (Latin-1))
 `registry'	A regular expression matching the font registry field for
-		this character set.
+		this character set, under X Windows.
 `dimension'	Number of octets used to index a character in this charset.
 		Either 1 or 2.  Defaults to 1.
+`chars'		Number of characters in each dimension.  This should be a
+                single integer or a list of two integers, when dimension == 2.
+                For a square character set, a single integer may be specified
+                even when dimension == 2.  The largest possible value for
+                each dimension is 256.  Defaults to 94.
+`offset'        Minimum index in each dimension.  This should be a single
+                integer or a list of integers, as for `chars'.  The possible
+                indices in the appropriate dimension, as used in `make-char'
+                and returned in `split-char', are between OFFSET and
+                OFFSET + CHARS - 1, inclusive.  The default is derived from
+		the size of this dimension and from the `graphic' property;
+		if the `graphic' is 0, the default is 33 if the size
+                of this dimension is <= 94, 32 if the size is 95 or 96, and
+                0 otherwise.  If the `graphic' is 1, the defaults are 161,
+		160 and 128, respectively.
 `columns'	Number of columns used to display a character in this charset.
 		Only used in TTY mode. (Under X, the actual width of a
 		character can be derived from the font used to display the
 		characters.) If unspecified, defaults to the dimension
 		(this is almost	always the correct value).
-`chars'		Number of characters in each dimension (94 or 96).
-		Defaults to 94.  Note that if the dimension is 2, the
-		character set thus described is 94x94 or 96x96.
-`final'		Final byte of ISO 2022 escape sequence.  Must be
-		supplied.  Each combination of (DIMENSION, CHARS) defines a
-		separate namespace for final bytes.  Note that ISO
-		2022 restricts the final byte to the range
-		0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if
-		dimension == 2.  Note also that final bytes in the range
-		0x30 - 0x3F are reserved for user-defined (not official)
-		character sets.
-`graphic'	0 (use left half of font on output) or 1 (use right half
-		of font on output).  Defaults to 0.  For example, for
-		a font whose registry is ISO8859-1, the left half
-		(octets 0x20 - 0x7F) is the `ascii' character set, while
-		the right half (octets 0xA0 - 0xFF) is the `latin-1'
-		character set.  With `graphic' set to 0, the octets
-		will have their high bit cleared; with it set to 1,
-		the octets will have their high bit set.
+`final'		Relevant only for ISO-2022-compatible charsets (those of size
+                94, 96, 94x94 or 96x96), and allowed only for these charsets.
+		Indicates the final byte of ISO 2022 escape sequence used to
+		select this charset.  Note that each of these four sizes
+		defines a separate namespace for final bytes.  Note also that
+		ISO-2022 restricts the final byte to the range 0x30 - 0x7E if
+                dimension == 1, and 0x30 - 0x5F if dimension == 2.  Also,
+		final bytes in the range 0x30 - 0x3F are reserved for user-
+		defined (not official) character sets.
+`graphic'	Relevant mostly to ISO-2022.  Value should be 0 (use bytes
+                with the high bit cleared) or 1 (use bytes with the high bit
+		set).  This controls which register the charset will be
+		selected into.  Generally, this should match up with the
+		`offset' property.
 `direction'	`l2r' (left-to-right) or `r2l' (right-to-left).
 		Defaults to `l2r'.
 `ccl-program'	A compiled CCL program used to convert a character in
＠＠ -453,19 +826,23 ＠＠
 		addition to the `graphic' property.  The CCL program
 		is passed the octets of the character, with the high
 		bit cleared and set depending upon whether the value
-		of the `graphic' property is 0 or 1.
+		of the `graphic' property is 0 or 1.  CCL is not available
+		when (featurep 'unicode-internal).
 */
        (name, doc_string, props))
 {
-  int id, dimension = 1, chars = 94, graphic = 0, columns = -1;
+  int id = -1, dimension = 1, size0 = 94, size1 = 94, graphic = 0,
+    columns = -1, offset0 = -1, offset1 = -1;
   Ibyte final = 0;
   int direction = CHARSET_LEFT_TO_RIGHT;
   int type;
   Lisp_Object registry = Qnil;
   Lisp_Object charset = Qnil;
+#ifdef HAVE_CCL
   Lisp_Object ccl_program = Qnil;
+#endif /* HAVE_CCL */
   Lisp_Object short_name = Qnil, long_name = Qnil;
-  Lisp_Object existing_charset;
+  Lisp_Object existing_charset = Qnil;
   int temporary = UNBOUNDP (name);

   /* NOTE: name == Qunbound is a directive from the iso2022 code to
＠＠ -505,15 +882,15 ＠＠
 	    if (dimension < 1 || dimension > 2)
 	      invalid_constant ("Invalid value for `dimension'", value);
 	  }
-
 	else if (EQ (keyword, Qchars))
 	  {
-	    CHECK_INT (value);
-	    chars = XINT (value);
-	    if (chars != 94 && chars != 96)
-	      invalid_constant ("Invalid value for `chars'", value);
+	    validate_charset_offset_or_size (Qchars, value, &size0, &size1);
 	  }
-
+	else if (EQ (keyword, Qoffset))
+	  {
+	    validate_charset_offset_or_size (Qoffset, value, &offset0,
+					     &offset1);
+	  }
 	else if (EQ (keyword, Qcolumns))
 	  {
 	    CHECK_INT (value);
＠＠ -554,6 +931,7 ＠＠
 	      invalid_constant ("Invalid value for `final'", value);
 	  }

+#ifdef HAVE_CCL
 	else if (EQ (keyword, Qccl_program))
 	  {
 	    struct ccl_program test_ccl;
＠＠ -562,73 +940,80 ＠＠
 	      invalid_argument ("Invalid value for `ccl-program'", value);
 	    ccl_program = value;
 	  }
+#endif /* HAVE_CCL */
 	else
 	  invalid_constant ("Unrecognized property", keyword);
       }
   }

-  if (!final)
-    invalid_argument ("`final' must be specified", Qunbound);
   if (dimension == 2 && final > 0x5F)
     invalid_constant
       ("Final must be in the range 0x30 - 0x5F for dimension == 2",
        make_char (final));

+  /* Dimension-1 charsets are of size [1xN]; doing this way, rather than
+     [NX1], greatly simplifies many things. */
   if (dimension == 1)
-    type = (chars == 94) ? CHARSET_TYPE_94    : CHARSET_TYPE_96;
-  else
-    type = (chars == 94) ? CHARSET_TYPE_94X94 : CHARSET_TYPE_96X96;
-
-  existing_charset = charset_by_attributes (type, final, direction);
+    {
+      size0 = 1;
+      offset0 = 0;
+    }

-  if (!NILP (existing_charset) && !XCHARSET (existing_charset)->temporary)
+  if (offset0 < 0)
+    offset0 = graphic == 0 ?
+      (size0 <= 94 ? 33 : size0 == 95 || size0 == 96 ? 32 : 0) :
+      (size0 <= 94 ? 161 : size0 == 95 || size0 == 96 ? 160 : 128);
+  if (offset1 < 0)
+    offset1 = graphic == 0 ?
+      (size1 <= 94 ? 33 : size1 == 95 || size1 == 96 ? 32 : 0) :
+      (size1 <= 94 ? 161 : size1 == 95 || size1 == 96 ? 160 : 128);
+  if (size0 + offset0 > 256 || size1 + offset1 > 256)
     invalid_argument
-      ("Character set already defined for this DIMENSION/CHARS/FINAL/DIRECTION
combo",
-       existing_charset);
-
-  if (!NILP (existing_charset))
-    /* Reuse same leading byte */
-    id = XCHARSET_ID (existing_charset);
-  else
-    id = get_unallocated_leading_byte (dimension);
+      ("Offset + size cannot exceed 256 in any dimension", name);

-  if (temporary)
+  type = get_charset_iso2022_type_1 (dimension, size0, size1,
+				     offset0, offset1, final);
+  if (final)
     {
-      Ibyte tempname[80];
-
-      qxesprintf (tempname, "___temporary___%d__", id);
-      name = intern_int (tempname);
+      if (type == -1)
+	invalid_argument ("Final can only be specified for ISO-2022 charsets",
+			  make_char (final));
+      existing_charset =
+	charset_by_attributes (type, final, direction);
+
+      if (!NILP (existing_charset) && !XCHARSET
(existing_charset)->temporary)
+	invalid_argument
+	  ("Charset already defined for this DIMENSION/CHARS/FINAL/DIRECTION combo",
+	   existing_charset);
+
+      if (!NILP (existing_charset))
+	/* Reuse same charset ID */
+	id = XCHARSET_ID (existing_charset);
     }
-  if (NILP (doc_string))
-    doc_string = build_string ("");
-  if (NILP (registry))
-    registry = build_string ("");
-  if (NILP (short_name))
-    short_name = XSYMBOL (name)->name;
-  if (NILP (long_name))
-    long_name = doc_string;
-  if (columns == -1)
-    columns = dimension;

-  charset = make_charset (id, name, dimension + 2, type, columns, graphic,
+  charset = make_charset (id, 0, name, dimension, size0, size1, offset0,
+			  offset1, columns, graphic,
 			  final, direction, short_name, long_name,
-			  doc_string, registry, !NILP (existing_charset));
+			  doc_string, registry, !NILP (existing_charset), -1);

   XCHARSET (charset)->temporary = temporary;
+#ifdef HAVE_CCL
   if (!NILP (ccl_program))
     XCHARSET_CCL_PROGRAM (charset) = ccl_program;
+#endif /* HAVE_CCL */

-  {
-    Lisp_Object revdircs =
-      charset_by_attributes (type, final,
-			     direction == CHARSET_LEFT_TO_RIGHT ?
-			     CHARSET_RIGHT_TO_LEFT : CHARSET_LEFT_TO_RIGHT);
-    if (!NILP (revdircs))
-      {
-	XCHARSET_REVERSE_DIRECTION_CHARSET (revdircs) = charset;
-	XCHARSET_REVERSE_DIRECTION_CHARSET (charset) = revdircs;
-      }
-  }
+  if (final)
+    {
+      Lisp_Object revdircs =
+	charset_by_attributes (type, final,
+			       direction == CHARSET_LEFT_TO_RIGHT ?
+			       CHARSET_RIGHT_TO_LEFT : CHARSET_LEFT_TO_RIGHT);
+      if (!NILP (revdircs))
+	{
+	  XCHARSET_REVERSE_DIRECTION_CHARSET (revdircs) = charset;
+	  XCHARSET_REVERSE_DIRECTION_CHARSET (charset) = revdircs;
+	}
+    }

   return charset;
 }
＠＠ -641,9 +1026,9 ＠＠
        (charset, new_name))
 {
   Lisp_Object new_charset = Qnil;
-  int id, dimension, columns, graphic;
+  int dimension, columns, graphic;
   Ibyte final;
-  int direction, type;
+  int direction;
   Lisp_Object registry, doc_string, short_name, long_name;
   Lisp_Charset *cs;

＠＠ -658,10 +1043,8 ＠＠

   cs = XCHARSET (charset);

-  type      = CHARSET_TYPE      (cs);
   columns   = CHARSET_COLUMNS   (cs);
   dimension = CHARSET_DIMENSION (cs);
-  id = get_unallocated_leading_byte (dimension);

   graphic = CHARSET_GRAPHIC (cs);
   final = CHARSET_FINAL (cs);
＠＠ -673,9 +1056,11 ＠＠
   long_name = CHARSET_LONG_NAME (cs);
   registry = CHARSET_REGISTRY (cs);

-  new_charset = make_charset (id, new_name, dimension + 2, type, columns,
-			      graphic, final, direction, short_name, long_name,
-			      doc_string, registry, 0);
+  new_charset = make_charset (-1, 0, new_name, dimension,
+			      CHARSET_CHARS (cs, 0), CHARSET_CHARS (cs, 1),
+			      CHARSET_OFFSET (cs, 0), CHARSET_OFFSET (cs, 1),
+			      columns, graphic, final, direction, short_name,
+			      long_name, doc_string, registry, 0, -1);

   CHARSET_REVERSE_DIRECTION_CHARSET (cs) = new_charset;
   XCHARSET_REVERSE_DIRECTION_CHARSET (new_charset) = charset;
＠＠ -698,8 +1083,10 ＠＠

 DEFUN ("charset-from-attributes", Fcharset_from_attributes, 3, 4, 0, /*
 Return a charset with the given DIMENSION, CHARS, FINAL, and DIRECTION.
-If DIRECTION is omitted, both directions will be checked (left-to-right
-will be returned if character sets exist for both directions).
+This applies to ISO-2022 conversion only.  DIMENSION should be 1 or 2.
+CHARS must be 94 or 96.  DIRECTION is `r2l' or `l2r'.  If DIRECTION is
+omitted, both directions will be checked (left-to-right will be returned if
+character sets exist for both directions).
 */
        (dimension, chars, final, direction))
 {
＠＠ -715,7 +1102,7 ＠＠
   CHECK_INT (chars);
   ch = XINT (chars);
   if (ch != 94 && ch != 96)
-    invalid_constant ("Invalid value for CHARS", chars);
+    invalid_constant ("Invalid value for CHARS, must be 94 or 96", chars);

   CHECK_CHAR_COERCE_INT (final);
   fi = XCHAR (final);
＠＠ -805,9 +1192,26 ＠＠
   if (EQ (prop, Qcolumns))     return make_int (CHARSET_COLUMNS (cs));
   if (EQ (prop, Qgraphic))     return make_int (CHARSET_GRAPHIC (cs));
   if (EQ (prop, Qfinal))       return make_char (CHARSET_FINAL (cs));
-  if (EQ (prop, Qchars))       return make_int (CHARSET_CHARS (cs));
+  if (EQ (prop, Qchars))
+    {
+      if (CHARSET_DIMENSION (cs) == 1)
+	return make_int (CHARSET_CHARS (cs, 1));
+      else
+	return list2 (make_int (CHARSET_CHARS (cs, 0)),
+		      make_int (CHARSET_CHARS (cs, 1)));
+    }
+  if (EQ (prop, Qoffset))
+    {
+      if (CHARSET_DIMENSION (cs) == 1)
+	return make_int (CHARSET_OFFSET (cs, 1));
+      else
+	return list2 (make_int (CHARSET_OFFSET (cs, 0)),
+		      make_int (CHARSET_OFFSET (cs, 1)));
+    }
   if (EQ (prop, Qregistry))    return CHARSET_REGISTRY (cs);
+#ifdef HAVE_CCL
   if (EQ (prop, Qccl_program)) return CHARSET_CCL_PROGRAM (cs);
+#endif /* HAVE_CCL */
   if (EQ (prop, Qdirection))
     return CHARSET_DIRECTION (cs) == CHARSET_LEFT_TO_RIGHT ? Ql2r : Qr2l;
   if (EQ (prop, Qreverse_direction_charset))
＠＠ -822,12 +1226,20 ＠＠

 DEFUN ("charset-id", Fcharset_id, 1, 1, 0, /*
 Return charset identification number of CHARSET.
+When configured with `--with-unicode-internal' (see `make-char'), this is
+simply an arbitrary value, retained for compatibility.  With old-Mule, this
+is the internal charset ID of the charset, which is significant in how the
+internal string and character encodings are constructed.  This function is
+normally used only by CCL (which isn't available under Unicode-internal,
+anyway).
 */
 	(charset))
 {
-  return make_int (XCHARSET_LEADING_BYTE (Fget_charset (charset)));
+  return make_int (XCHARSET_ID (Fget_charset (charset)));
 }

+#ifdef HAVE_CCL
+
 /* #### We need to figure out which properties we really want to
    allow to be set. */

＠＠ -846,6 +1258,8 ＠＠
   return Qnil;
 }

+#endif /* HAVE_CCL */
+
 /* Japanese folks may want to (set-charset-registry 'ascii "jisx0201") */
 DEFUN ("set-charset-registry", Fset_charset_registry, 2, 2, 0, /*
 Set the `registry' property of CHARSET to REGISTRY.
＠＠ -950,7 +1364,9 ＠＠
   DEFSUBR (Fcharset_dimension);
   DEFSUBR (Fcharset_property);
   DEFSUBR (Fcharset_id);
+#ifdef HAVE_CCL
   DEFSUBR (Fset_charset_ccl_program);
+#endif /* HAVE_CCL */
   DEFSUBR (Fset_charset_registry);

 #ifdef MEMORY_USAGE_STATS
＠＠ -987,6 +1403,7 ＠＠
   DEFSYMBOL (Qcyrillic_iso8859_5);
   DEFSYMBOL (Qlatin_iso8859_9);
   DEFSYMBOL (Qlatin_iso8859_15);
+  DEFSYMBOL (Qchinese_sisheng);
   DEFSYMBOL (Qjapanese_jisx0208_1978);
   DEFSYMBOL (Qchinese_gb2312);
   DEFSYMBOL (Qjapanese_jisx0208);
＠＠ -994,10 +1411,49 ＠＠
   DEFSYMBOL (Qjapanese_jisx0212);
   DEFSYMBOL (Qchinese_cns11643_1);
   DEFSYMBOL (Qchinese_cns11643_2);
+#ifdef UNICODE_INTERNAL
+  DEFSYMBOL (Qchinese_big5);
+  DEFSYMBOL (Qjapanese_shift_jis);
+#else /* not UNICODE_INTERNAL */
   DEFSYMBOL (Qchinese_big5_1);
   DEFSYMBOL (Qchinese_big5_2);
+#endif /* UNICODE_INTERNAL */

   DEFSYMBOL (Qcomposite);
+
+#ifndef UNICODE_INTERNAL
+#define FROB(low, high)				\
+  DEFSYMBOL (Qunicode_##low##_##high)
+
+FROB  (   0,  23ff);
+FROB  (2400,  47ff);
+FROB  (4800,  6bff);
+FROB  (6c00,  8fff);
+FROB  (9000,  b3ff);
+FROB  (b400,  d7ff);
+FROB  (e000, 103ff);
+FROB (10400, 127ff);
+FROB (12800, 14bff);
+FROB (14c00, 16fff);
+FROB (17000, 193ff);
+FROB (19400, 1b7ff);
+FROB (1b800, 1dbff);
+FROB (1dc00, 1ffff);
+FROB (20000, 223ff);
+FROB (22400, 247ff);
+FROB (24800, 26bff);
+FROB (26c00, 28fff);
+FROB (29000, 2b3ff);
+FROB (2b400, 2d7ff);
+FROB (2d800, 2fbff);
+FROB (2fc00, 31fff);
+/* WARNING: Any changes to this list need to be propagated to at least two
+   other places in this file (at the top and
+   complex_vars_of_mule_charset(); maybe also to the CHARSET_ID enum near
+   the top. */
+
+#undef FROB
+#endif
 }

 void
＠＠ -1008,22 +1464,28 ＠＠
   chlook = xnew_and_zero (struct charset_lookup); /* zero for Purify. */
   dump_add_root_block_ptr (&chlook, &charset_lookup_description);

-  /* Table of charsets indexed by leading byte. */
-  for (i = 0; i < countof (chlook->charset_by_leading_byte); i++)
-    chlook->charset_by_leading_byte[i] = Qnil;
-
   /* Table of charsets indexed by type/final-byte/direction. */
   for (i = 0; i < countof (chlook->charset_by_attributes); i++)
     for (j = 0; j < countof (chlook->charset_by_attributes[0]); j++)
       for (k = 0; k < countof (chlook->charset_by_attributes[0][0]); k++)
 	chlook->charset_by_attributes[i][j][k] = Qnil;

-  chlook->next_allocated_1_byte_leading_byte = MIN_LEADING_BYTE_PRIVATE_1;
-  chlook->next_allocated_2_byte_leading_byte = MIN_LEADING_BYTE_PRIVATE_2;
+#ifndef UNICODE_INTERNAL
+  /* Table of encodable charsets indexed by charset ID. */
+  for (i = 0; i < countof (chlook->charset_by_encodable_id); i++)
+    chlook->charset_by_encodable_id[i] = Qnil;
+
+  chlook->next_allocated_private_dim1_id = MIN_PRIVATE_DIM1_CHARSET_ID;
+  chlook->next_allocated_private_dim2_id = MIN_PRIVATE_DIM2_CHARSET_ID;
+#endif /* not UNICODE_INTERNAL */

   staticpro (&Vcharset_hash_table);
   Vcharset_hash_table =
     make_lisp_hash_table (50, HASH_TABLE_NON_WEAK, HASH_TABLE_EQ);
+
+  staticpro (&Vcharset_id_table);
+  Vcharset_id_table =
+    make_lisp_hash_table (50, HASH_TABLE_NON_WEAK, HASH_TABLE_EQ);
 }

 void
＠＠ -1032,255 +1494,377 ＠＠
   /* Predefined character sets.  We store them into variables for
      ease of access. */

+#ifdef UNICODE_INTERNAL
+#define MAKE_CSID(foo) -1
+#else
+#define MAKE_CSID(foo) CHARSET_ID_##foo
+#endif
+
   staticpro (&Vcharset_ascii);
   Vcharset_ascii =
-    make_charset (LEADING_BYTE_ASCII, Qascii, 1,
-		  CHARSET_TYPE_94, 1, 0, 'B',
+    make_charset (MAKE_CSID (ASCII), 0, Qascii, 1,
+		  1, 128, 0, 0, 1, 0, 'B',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("ASCII"),
 		  build_msg_string ("ASCII"),
 		  build_msg_string ("ASCII (ISO646 IRV)"),
-		  build_string ("\\(iso8859-[0-9]*\\|-ascii\\)"), 0);
+		  build_string ("\\(iso8859-[0-9]*\\|-ascii\\)"), 0, -1);
   staticpro (&Vcharset_control_1);
   Vcharset_control_1 =
-    make_charset (LEADING_BYTE_CONTROL_1, Qcontrol_1, 2,
-		  CHARSET_TYPE_94, 1, 1, 0,
+    make_charset (MAKE_CSID (CONTROL_1), 0, Qcontrol_1, 1,
+		  1, 32, 0, 128, 1, 1, 0,
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("C1"),
 		  build_msg_string ("Control characters"),
-		  build_msg_string ("Control characters 128-191"),
-		  build_string (""), 0);
+		  build_msg_string ("Control characters 128-159"),
+		  build_string (""), 0, -1);
   staticpro (&Vcharset_latin_iso8859_1);
   Vcharset_latin_iso8859_1 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_1, Qlatin_iso8859_1, 2,
-		  CHARSET_TYPE_96, 1, 1, 'A',
+    make_charset (MAKE_CSID (LATIN_ISO8859_1), 0, Qlatin_iso8859_1, 1,
+		  1, 96, 0, 160, 1, 1, 'A',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-1"),
 		  build_msg_string ("ISO8859-1 (Latin-1)"),
 		  build_msg_string ("ISO8859-1 (Latin-1)"),
-		  build_string ("iso8859-1"), 0);
+		  build_string ("iso8859-1"), 0, -1);
   staticpro (&Vcharset_latin_iso8859_2);
   Vcharset_latin_iso8859_2 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_2, Qlatin_iso8859_2, 2,
-		  CHARSET_TYPE_96, 1, 1, 'B',
+    make_charset (MAKE_CSID (LATIN_ISO8859_2), 0, Qlatin_iso8859_2, 1,
+		  1, 96, 0, 160, 1, 1, 'B',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-2"),
 		  build_msg_string ("ISO8859-2 (Latin-2)"),
 		  build_msg_string ("ISO8859-2 (Latin-2)"),
-		  build_string ("iso8859-2"), 0);
+		  build_string ("iso8859-2"), 0, -1);
   staticpro (&Vcharset_latin_iso8859_3);
   Vcharset_latin_iso8859_3 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_3, Qlatin_iso8859_3, 2,
-		  CHARSET_TYPE_96, 1, 1, 'C',
+    make_charset (MAKE_CSID (LATIN_ISO8859_3), 0, Qlatin_iso8859_3, 1,
+		  1, 96, 0, 160, 1, 1, 'C',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-3"),
 		  build_msg_string ("ISO8859-3 (Latin-3)"),
 		  build_msg_string ("ISO8859-3 (Latin-3)"),
-		  build_string ("iso8859-3"), 0);
+		  build_string ("iso8859-3"), 0, -1);
   staticpro (&Vcharset_latin_iso8859_4);
   Vcharset_latin_iso8859_4 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_4, Qlatin_iso8859_4, 2,
-		  CHARSET_TYPE_96, 1, 1, 'D',
+    make_charset (MAKE_CSID (LATIN_ISO8859_4), 0, Qlatin_iso8859_4, 1,
+		  1, 96, 0, 160, 1, 1, 'D',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-4"),
 		  build_msg_string ("ISO8859-4 (Latin-4)"),
 		  build_msg_string ("ISO8859-4 (Latin-4)"),
-		  build_string ("iso8859-4"), 0);
-  staticpro (&Vcharset_thai_tis620);
-  Vcharset_thai_tis620 =
-    make_charset (LEADING_BYTE_THAI_TIS620, Qthai_tis620, 2,
-		  CHARSET_TYPE_96, 1, 1, 'T',
-		  CHARSET_LEFT_TO_RIGHT,
-		  build_string ("TIS620"),
-		  build_msg_string ("TIS620 (Thai)"),
-		  build_msg_string ("TIS620.2529 (Thai)"),
-		  build_string ("tis620"),0);
+		  build_string ("iso8859-4"), 0, -1);
   staticpro (&Vcharset_greek_iso8859_7);
   Vcharset_greek_iso8859_7 =
-    make_charset (LEADING_BYTE_GREEK_ISO8859_7, Qgreek_iso8859_7, 2,
-		  CHARSET_TYPE_96, 1, 1, 'F',
+    make_charset (MAKE_CSID (GREEK_ISO8859_7), 0, Qgreek_iso8859_7, 1,
+		  1, 96, 0, 160, 1, 1, 'F',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("ISO8859-7"),
 		  build_msg_string ("ISO8859-7 (Greek)"),
 		  build_msg_string ("ISO8859-7 (Greek)"),
-		  build_string ("iso8859-7"), 0);
+		  build_string ("iso8859-7"), 0, -1);
   staticpro (&Vcharset_arabic_iso8859_6);
   Vcharset_arabic_iso8859_6 =
-    make_charset (LEADING_BYTE_ARABIC_ISO8859_6, Qarabic_iso8859_6, 2,
-		  CHARSET_TYPE_96, 1, 1, 'G',
+    make_charset (MAKE_CSID (ARABIC_ISO8859_6), 0, Qarabic_iso8859_6, 1,
+		  1, 96, 0, 160, 1, 1, 'G',
 		  CHARSET_RIGHT_TO_LEFT,
 		  build_string ("ISO8859-6"),
 		  build_msg_string ("ISO8859-6 (Arabic)"),
 		  build_msg_string ("ISO8859-6 (Arabic)"),
-		  build_string ("iso8859-6"), 0);
+		  build_string ("iso8859-6"), 0, -1);
   staticpro (&Vcharset_hebrew_iso8859_8);
   Vcharset_hebrew_iso8859_8 =
-    make_charset (LEADING_BYTE_HEBREW_ISO8859_8, Qhebrew_iso8859_8, 2,
-		  CHARSET_TYPE_96, 1, 1, 'H',
+    make_charset (MAKE_CSID (HEBREW_ISO8859_8), 0, Qhebrew_iso8859_8, 1,
+		  1, 96, 0, 160, 1, 1, 'H',
 		  CHARSET_RIGHT_TO_LEFT,
 		  build_string ("ISO8859-8"),
 		  build_msg_string ("ISO8859-8 (Hebrew)"),
 		  build_msg_string ("ISO8859-8 (Hebrew)"),
-		  build_string ("iso8859-8"), 0);
+		  build_string ("iso8859-8"), 0, -1);
   staticpro (&Vcharset_katakana_jisx0201);
   Vcharset_katakana_jisx0201 =
-    make_charset (LEADING_BYTE_KATAKANA_JISX0201, Qkatakana_jisx0201, 2,
-		  CHARSET_TYPE_94, 1, 1, 'I',
+    make_charset (MAKE_CSID (KATAKANA_JISX0201), 0, Qkatakana_jisx0201, 1,
+		  1, 94, 0, 161, 1, 1, 'I',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("JISX0201 Kana"),
 		  build_msg_string ("JISX0201.1976 (Japanese Kana)"),
 		  build_msg_string ("JISX0201.1976 Japanese Kana"),
-		  build_string ("jisx0201.1976"), 0);
+		  build_string ("jisx0201.1976"), 0, -1);
   staticpro (&Vcharset_latin_jisx0201);
   Vcharset_latin_jisx0201 =
-    make_charset (LEADING_BYTE_LATIN_JISX0201, Qlatin_jisx0201, 2,
-		  CHARSET_TYPE_94, 1, 0, 'J',
+    make_charset (MAKE_CSID (LATIN_JISX0201), 0, Qlatin_jisx0201, 1,
+		  1, 94, 0, 33, 1, 0, 'J',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("JISX0201 Roman"),
 		  build_msg_string ("JISX0201.1976 (Japanese Roman)"),
 		  build_msg_string ("JISX0201.1976 Japanese Roman"),
-		  build_string ("jisx0201.1976"), 0);
+		  build_string ("jisx0201.1976"), 0, -1);
   staticpro (&Vcharset_cyrillic_iso8859_5);
   Vcharset_cyrillic_iso8859_5 =
-    make_charset (LEADING_BYTE_CYRILLIC_ISO8859_5, Qcyrillic_iso8859_5, 2,
-		  CHARSET_TYPE_96, 1, 1, 'L',
+    make_charset (MAKE_CSID (CYRILLIC_ISO8859_5), 0, Qcyrillic_iso8859_5, 1,
+		  1, 96, 0, 160, 1, 1, 'L',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("ISO8859-5"),
 		  build_msg_string ("ISO8859-5 (Cyrillic)"),
 		  build_msg_string ("ISO8859-5 (Cyrillic)"),
-		  build_string ("iso8859-5"), 0);
+		  build_string ("iso8859-5"), 0, -1);
   staticpro (&Vcharset_latin_iso8859_9);
   Vcharset_latin_iso8859_9 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_9, Qlatin_iso8859_9, 2,
-		  CHARSET_TYPE_96, 1, 1, 'M',
+    make_charset (MAKE_CSID (LATIN_ISO8859_9), 0, Qlatin_iso8859_9, 1,
+		  1, 96, 0, 160, 1, 1, 'M',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-5"),
 		  build_msg_string ("ISO8859-9 (Latin-5)"),
 		  build_msg_string ("ISO8859-9 (Latin-5)"),
-		  build_string ("iso8859-9"), 0);
+		  build_string ("iso8859-9"), 0, -1);
   staticpro (&Vcharset_latin_iso8859_15);
   Vcharset_latin_iso8859_15 =
-    make_charset (LEADING_BYTE_LATIN_ISO8859_15, Qlatin_iso8859_15, 2,
-		  CHARSET_TYPE_96, 1, 1, 'b',
+    make_charset (MAKE_CSID (LATIN_ISO8859_15), 0, Qlatin_iso8859_15, 1,
+		  1, 96, 0, 160, 1, 1, 'b',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Latin-9"),
 		  build_msg_string ("ISO8859-15 (Latin-9)"),
 		  build_msg_string ("ISO8859-15 (Latin-9)"),
-		  build_string ("iso8859-15"), 0);
+		  build_string ("iso8859-15"), 0, -1);
   staticpro (&Vcharset_japanese_jisx0208_1978);
   Vcharset_japanese_jisx0208_1978 =
-    make_charset (LEADING_BYTE_JAPANESE_JISX0208_1978, Qjapanese_jisx0208_1978, 3,
-		  CHARSET_TYPE_94X94, 2, 0, '＠',
+    make_charset (MAKE_CSID (JAPANESE_JISX0208_1978), 0,
+		  Qjapanese_jisx0208_1978, 2,
+		  94, 94, 33, 33, 2, 0, '＠',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("JISX0208.1978"),
 		  build_msg_string ("JISX0208.1978 (Japanese)"),
 		  build_msg_string
 		  ("JISX0208.1978 Japanese Kanji (so called \"old JIS\")"),
-		  build_string ("\\(jisx0208\\|jisc6226\\)\\.1978"), 0);
+		  build_string ("\\(jisx0208\\|jisc6226\\)\\.1978"), 0, -1);
   staticpro (&Vcharset_chinese_gb2312);
   Vcharset_chinese_gb2312 =
-    make_charset (LEADING_BYTE_CHINESE_GB2312, Qchinese_gb2312, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'A',
+    make_charset (MAKE_CSID (CHINESE_GB2312), 0, Qchinese_gb2312, 2,
+		  94, 94, 33, 33, 2, 0, 'A',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("GB2312"),
 		  build_msg_string ("GB2312)"),
 		  build_msg_string ("GB2312 Chinese simplified"),
-		  build_string ("gb2312"), 0);
+		  build_string ("gb2312"), 0, -1);
   staticpro (&Vcharset_japanese_jisx0208);
   Vcharset_japanese_jisx0208 =
-    make_charset (LEADING_BYTE_JAPANESE_JISX0208, Qjapanese_jisx0208, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'B',
+    make_charset (MAKE_CSID (JAPANESE_JISX0208), 0, Qjapanese_jisx0208, 2,
+		  94, 94, 33, 33, 2, 0, 'B',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("JISX0208"),
 		  build_msg_string ("JISX0208.1983/1990 (Japanese)"),
 		  build_msg_string ("JISX0208.1983/1990 Japanese Kanji"),
-		  build_string ("jisx0208.19\\(83\\|90\\)"), 0);
+		  build_string ("jisx0208.19\\(83\\|90\\)"), 0, -1);
   staticpro (&Vcharset_korean_ksc5601);
   Vcharset_korean_ksc5601 =
-    make_charset (LEADING_BYTE_KOREAN_KSC5601, Qkorean_ksc5601, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'C',
+    make_charset (MAKE_CSID (KOREAN_KSC5601), 0, Qkorean_ksc5601, 2,
+		  94, 94, 33, 33, 2, 0, 'C',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("KSC5601"),
 		  build_msg_string ("KSC5601 (Korean"),
 		  build_msg_string ("KSC5601 Korean Hangul and Hanja"),
-		  build_string ("ksc5601"), 0);
+		  build_string ("ksc5601"), 0, -1);
   staticpro (&Vcharset_japanese_jisx0212);
   Vcharset_japanese_jisx0212 =
-    make_charset (LEADING_BYTE_JAPANESE_JISX0212, Qjapanese_jisx0212, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'D',
+    make_charset (MAKE_CSID (JAPANESE_JISX0212), 0, Qjapanese_jisx0212, 2,
+		  94, 94, 33, 33, 2, 0, 'D',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("JISX0212"),
 		  build_msg_string ("JISX0212 (Japanese)"),
 		  build_msg_string ("JISX0212 Japanese Supplement"),
-		  build_string ("jisx0212"), 0);
+		  build_string ("jisx0212"), 0, -1);

 #define CHINESE_CNS_PLANE_RE(n) "cns11643[.-]\\(.*[.-]\\)?" n "$"
   staticpro (&Vcharset_chinese_cns11643_1);
   Vcharset_chinese_cns11643_1 =
-    make_charset (LEADING_BYTE_CHINESE_CNS11643_1, Qchinese_cns11643_1, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'G',
+    make_charset (MAKE_CSID (CHINESE_CNS11643_1), 0, Qchinese_cns11643_1, 2,
+		  94, 94, 33, 33, 2, 0, 'G',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("CNS11643-1"),
 		  build_msg_string ("CNS11643-1 (Chinese traditional)"),
 		  build_msg_string
 		  ("CNS 11643 Plane 1 Chinese traditional"),
-		  build_string (CHINESE_CNS_PLANE_RE("1")), 0);
+		  build_string (CHINESE_CNS_PLANE_RE("1")), 0, -1);
   staticpro (&Vcharset_chinese_cns11643_2);
   Vcharset_chinese_cns11643_2 =
-    make_charset (LEADING_BYTE_CHINESE_CNS11643_2, Qchinese_cns11643_2, 3,
-		  CHARSET_TYPE_94X94, 2, 0, 'H',
+    make_charset (MAKE_CSID (CHINESE_CNS11643_2), 0, Qchinese_cns11643_2, 2,
+		  94, 94, 33, 33, 2, 0, 'H',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("CNS11643-2"),
 		  build_msg_string ("CNS11643-2 (Chinese traditional)"),
 		  build_msg_string
 		  ("CNS 11643 Plane 2 Chinese traditional"),
-		  build_string (CHINESE_CNS_PLANE_RE("2")), 0);
+		  build_string (CHINESE_CNS_PLANE_RE("2")), 0, -1);
+#ifdef UNICODE_INTERNAL
+  /* We can support Big5 directly. */
+  staticpro (&Vcharset_chinese_big5);
+  Vcharset_chinese_big5 =
+    /* Big5 claims to be a 94x157 charset, but with gaps in the middle.
+       In particular, the rows are (theoretically) indexed from A1 - FE
+       and the columns from 40 - 7E and A1 - FE.  In fact, there are gaps
+       in the rows as well (rows C7 and C8 are missing, as well as rows
+       FA - FE), but that appears to be due to accident -- i.e. they just
+       ran out of chars and/or wanted to make room for expansion.  Note
+       also that the gap at C7 and C8 is due to the Level-1/Level-2
+       division of Big5 (see below).  The 94 rows are those between
+       A1 and FE, inclusive.  The 157 columns count the sum of the columns
+       in each disjoint set.  For us, we need to use the size of the range
+       [40, FE], which is 191. */
+    make_charset (-1, 0, Qchinese_big5, 2,
+		  94, 191, 161, 64, 2, 0, 0,
+		  CHARSET_LEFT_TO_RIGHT,
+		  build_string ("Big5"),
+		  build_msg_string ("Big5"),
+		  build_msg_string
+		  ("Big5 Chinese traditional"),
+		  build_string ("big5"), 0, -1);
+#else /* not UNICODE_INTERNAL */
+  /* Old Mule situation; we can only handle up to 96x96 charsets.
+     So we split it into two charsets.  According to Ken Lunde's CJKV
+     book, Big5 itself is split into "Big Five Level 1" (rows A1-C6)
+     and "Big Five Level 2" (rows C9-F9), with the latter containing
+     less used characters.  We split the same way then coerce the
+     result into a 94x94 block. */
   staticpro (&Vcharset_chinese_big5_1);
   Vcharset_chinese_big5_1 =
-    make_charset (LEADING_BYTE_CHINESE_BIG5_1, Qchinese_big5_1, 3,
-		  CHARSET_TYPE_94X94, 2, 0, '0',
+    make_charset (MAKE_CSID (CHINESE_BIG5_1), 0, Qchinese_big5_1, 2,
+		  94, 94, 33, 33, 2, 0, '0',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Big5"),
 		  build_msg_string ("Big5 (Level-1)"),
 		  build_msg_string
 		  ("Big5 Level-1 Chinese traditional"),
-		  build_string ("big5"), 0);
+		  build_string ("big5"), 0, -1);
   staticpro (&Vcharset_chinese_big5_2);
   Vcharset_chinese_big5_2 =
-    make_charset (LEADING_BYTE_CHINESE_BIG5_2, Qchinese_big5_2, 3,
-		  CHARSET_TYPE_94X94, 2, 0, '1',
+    make_charset (MAKE_CSID (CHINESE_BIG5_2), 0, Qchinese_big5_2, 2,
+		  94, 94, 33, 33, 2, 0, '1',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Big5"),
 		  build_msg_string ("Big5 (Level-2)"),
 		  build_msg_string
 		  ("Big5 Level-2 Chinese traditional"),
-		  build_string ("big5"), 0);
-
+		  build_string ("big5"), 0, -1);
+#endif /* UNICODE_INTERNAL */
+#ifdef UNICODE_INTERNAL
+  /* We can support Shift-JIS directly.*/
+  staticpro (&Vcharset_japanese_shift_jis);
+  Vcharset_japanese_shift_jis =
+    /* See comments in mule-coding.c.
+       First byte is in the range [80-9F], [E0-EF]; second byte is in the
+       range [40-7E], [80-FC] */
+    make_charset (MAKE_CSID (SHIFT_JIS), 0, Qjapanese_shift_jis, 2,
+		  112, 189, 128, 64, 2, 0, 0,
+		  CHARSET_LEFT_TO_RIGHT,
+		  build_string ("Shift-JIS"),
+		  build_msg_string ("Shift-JIS"),
+		  build_msg_string
+		  ("Shift-JIS Japanese encoding of JIS X 0208:1997"),
+		  /* #### This is the X registry; is it right? */
+		  build_string ("sjis"), 0, -1);
+#endif /* UNICODE_INTERNAL */

 #ifdef ENABLE_COMPOSITE_CHARS
   /* #### For simplicity, we put composite chars into a 96x96 charset.
      This is going to lead to problems because you can run out of
      room, esp. as we don't yet recycle numbers. */
+  /* #### Figure out what should go into the Unicode translation table,
+     if we even support this crap at all */
   staticpro (&Vcharset_composite);
   Vcharset_composite =
-    make_charset (LEADING_BYTE_COMPOSITE, Qcomposite, 3,
-		  CHARSET_TYPE_96X96, 2, 0, 0,
+    make_charset (MAKE_CSID (COMPOSITE), 0, Qcomposite, 2,
+		  96, 96, 32, 32, 2, 0, 0,
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Composite"),
 		  build_msg_string ("Composite characters"),
 		  build_msg_string ("Composite characters"),
-		  build_string (""), 0);
+		  build_string (""), 0, -1);
 #else
   /* We create a hack so that we have a way of storing ESC 0 and ESC 1
      sequences as "characters", so that they will be output correctly. */
   staticpro (&Vcharset_composite);
   Vcharset_composite =
-    make_charset (LEADING_BYTE_COMPOSITE_REPLACEMENT, Qcomposite, 2,
-		  CHARSET_TYPE_96, 1, 1, '|',
+    make_charset (MAKE_CSID (COMPOSITE), 0, Qcomposite, 1,
+		  1, 96, 0, 32, 1, 0, '|',
 		  CHARSET_LEFT_TO_RIGHT,
 		  build_string ("Composite hack"),
 		  build_msg_string ("Composite characters hack"),
 		  build_msg_string ("Composite characters hack"),
-		  build_string (""), 0);
+		  build_string (""), 0, -1);
 #endif /* ENABLE_COMPOSITE_CHARS */
+
+  /* These two are in the private charset ID space. */
+  staticpro (&Vcharset_thai_tis620);
+  Vcharset_thai_tis620 =
+    make_charset (-1, 0, Qthai_tis620, 1,
+		  1, 96, 0, 160, 1, 1, 'T',
+		  CHARSET_LEFT_TO_RIGHT,
+		  build_string ("TIS620"),
+		  build_msg_string ("TIS620 (Thai)"),
+		  build_msg_string ("TIS620.2529 (Thai)"),
+		  build_string ("tis620"), 0, -1);
+  staticpro (&Vcharset_chinese_sisheng);
+  Vcharset_chinese_sisheng =
+    make_charset (-1, 0, Qchinese_sisheng, 1,
+		  1, 94, 0, 33, 1, 0, '0',
+		  CHARSET_LEFT_TO_RIGHT,
+		  build_string ("SiSheng"),
+		  build_msg_string ("SiSheng (PinYin/ZhuYin)"),
+		  build_msg_string ("SiSheng characters for PinYin/ZhuYin"),
+		  build_string ("sisheng_cwnn\\|OMRON_UDC_ZH"), 0, -1);
+
+#ifndef UNICODE_INTERNAL
+
+#define FROB_1(id, lclow, lchigh, uclow, uchigh)			\
+  staticpro (&Vcharset_unicode_##lclow##_##lchigh);			\
+  Vcharset_unicode_##lclow##_##lchigh =					\
+    make_charset (id, 1,						\
+		  Qunicode_##lclow##_##lchigh, 2,			\
+		  96, 96, 32, 32, 2, 0, 0,				\
+		  CHARSET_LEFT_TO_RIGHT,				\
+		  build_string ("Unicode " #uclow "-" #uchigh),		\
+		  build_msg_string ("Unicode subset (U+" #uclow "..U+" #uchigh), \
+		  build_msg_string ("Unicode subset (U+" #uclow "..U+" #uchigh
"used for maintaining round-trip\n" \
+"compatibility for Unicode characters that have no representation in any\n" \
+"other charset."),							\
+		  build_string ("ISO10646-1"), 0, 0x##lclow)
+
+#define FROB_OFFICIAL(lclow, lchigh, uclow, uchigh) \
+  FROB_1 (MAKE_CSID (UNICODE_##uclow##_##uchigh), lclow, lchigh, uclow, uchigh)
+
+#define FROB_PRIVATE(lclow, lchigh, uclow, uchigh) \
+  FROB_1 (-1, lclow, lchigh, uclow, uchigh)
+
+  FROB_OFFICIAL (0, 23ff, 0, 23FF);
+  FROB_OFFICIAL (2400, 47ff, 2400, 47FF);
+  FROB_OFFICIAL (4800, 6bff, 4800, 6BFF);
+  FROB_OFFICIAL (6c00, 8fff, 6C00, 8FFF);
+  FROB_OFFICIAL (9000, b3ff, 9000, B3FF);
+  FROB_OFFICIAL (b400, d7ff, B400, D7FF);
+  FROB_OFFICIAL (e000, 103ff, E000, 103FF);
+  FROB_PRIVATE (10400, 127ff, 10400, 127FF);
+  FROB_PRIVATE (12800, 14bff, 12800, 14BFF);
+  FROB_PRIVATE (14c00, 16fff, 14C00, 16FFF);
+  FROB_PRIVATE (17000, 193ff, 17000, 193FF);
+  FROB_PRIVATE (19400, 1b7ff, 19400, 1B7FF);
+  FROB_PRIVATE (1b800, 1dbff, 1B800, 1DBFF);
+  FROB_PRIVATE (1dc00, 1ffff, 1DC00, 1FFFF);
+  FROB_PRIVATE (20000, 223ff, 20000, 223FF);
+  FROB_PRIVATE (22400, 247ff, 22400, 247FF);
+  FROB_PRIVATE (24800, 26bff, 24800, 26BFF);
+  FROB_PRIVATE (26c00, 28fff, 26C00, 28FFF);
+  FROB_PRIVATE (29000, 2b3ff, 29000, 2B3FF);
+  FROB_PRIVATE (2b400, 2d7ff, 2B400, 2D7FF);
+  FROB_PRIVATE (2d800, 2fbff, 2D800, 2FBFF);
+  FROB_PRIVATE (2fc00, 31fff, 2FC00, 31FFF);
+/* WARNING: Any changes to this list need to be propagated to at least two
+   other places in this file (at the top and syms_of_mule_charset(); maybe
+   also to the CHARSET_ID enum near the top. */
+
+#undef FROB_OFFICIAL
+#undef FROB_PRIVATE
+#undef FROB_1
+#endif /* not UNICODE_INTERNAL */
+
+  initialize_ascii_control_1_latin_1_unicode_translation ();
 }
Index: src/mule-coding.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-coding.c,v
retrieving revision 1.35
diff -u -r1.35 mule-coding.c
--- src/mule-coding.c	2005/06/19 21:08:31	1.35
+++ src/mule-coding.c	2005/11/22 14:00:55
＠＠ -1,7 +1,7 ＠＠
 /* Conversion functions for I18N encodings, but not Unicode (in separate file).
    Copyright (C) 1991, 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2000, 2001, 2002 Ben Wing.
+   Copyright (C) 2000, 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -34,20 +34,265 ＠＠
 #include "lisp.h"

 #include "charset.h"
-#include "mule-ccl.h"
+#include "elhash.h"
 #include "file-coding.h"
+
+#ifdef HAVE_CCL
+#include "mule-ccl.h"
+#endif /* HAVE_CCL */

-Lisp_Object Qshift_jis, Qiso2022, Qbig5, Qccl;
+#if defined (ENABLE_COMPOSITE_CHARS) && defined (UNICODE_INTERNAL)
+#error "No prayer of getting these two working in its current shape"
+#endif

+Lisp_Object Qshift_jis, Qiso2022, Qbig5, Qccl, Qmbcs;
+
 Lisp_Object Qcharset_g0, Qcharset_g1, Qcharset_g2, Qcharset_g3;
 Lisp_Object Qforce_g0_on_output, Qforce_g1_on_output;
 Lisp_Object Qforce_g2_on_output, Qforce_g3_on_output;
-Lisp_Object Qno_iso6429;
+Lisp_Object Qno_iso6429, Qiso2022_preserve;
 Lisp_Object Qinput_charset_conversion, Qoutput_charset_conversion;
 Lisp_Object Qshort, Qno_ascii_eol, Qno_ascii_cntl, Qseven, Qlock_shift;

 Lisp_Object Qiso_7, Qiso_8_designate, Qiso_8_1, Qiso_8_2, Qiso_lock_shift;

+Lisp_Object Qcharsets;
+
+static Lisp_Object_dynarr *shift_jis_precedence, *big5_precedence;
+
+
+/************************************************************************/
+/*                           MBCS coding system                         */
+/************************************************************************/
+
+struct mbcs_coding_system
+{
+  Lisp_Object_dynarr *charsets;
+};
+
+#define CODING_SYSTEM_MBCS_CHARSETS(codesys) \
+  (CODING_SYSTEM_TYPE_DATA (codesys, mbcs)->charsets)
+#define XCODING_SYSTEM_MBCS_CHARSETS(codesys) \
+  CODING_SYSTEM_MBCS_CHARSETS (XCODING_SYSTEM (codesys))
+
+/* ~~#### Must be converted to Lisp object.  struct mbcs_coding_system
+   does not need to be because it actually forms the latter part of a
+   coding system object.  The struct mbcs_coding_streams are currently
+   separate objects with a pointer to them in the struct coding_stream,
+   which itself is the latter part of an Lstream object.  It cannot be
+   converted into an "extended lump" on the end of the struct coding_stream
+   (then it'd be a lump on a lump, so to speak) because the size and type
+   of these data changes during the life of the stream, esp. when the
+   autodetection detects a particular coding system and switches to the
+   appropriate foo_coding_stream structure. */
+
+struct mbcs_coding_stream
+{
+  int foo; /* unused */
+};
+
+static const struct memory_description mbcs_coding_system_description[] = {
+  { XD_BLOCK_PTR, offsetof (struct mbcs_coding_system, charsets),
+    1, { &Lisp_Object_dynarr_description} },
+  { XD_END }
+};
+
+DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (mbcs);
+
+/* See if we can derive a character out of the specified charsets
+   that is of the right dimension, is valid according to the bounds, and
+   can be made into an Ichar. */
+
+static Ichar
+try_to_derive_character (int c1, int c2, int dimension,
+			 Lisp_Object_dynarr *charsets,
+			 enum converr handling)
+{
+  int i;
+  Ichar ich = -1;
+
+  for (i = 0; i < Dynarr_length (charsets); i++)
+    {
+      Lisp_Object charset = Dynarr_at (charsets, i);
+      if (XCHARSET_DIMENSION (charset) == dimension &&
+	  valid_charset_codepoint_p (charset, c1, c2) &&
+	  (ich = charset_codepoint_to_ichar (charset, c1, c2, handling)) >= 0)
+	break;
+    }
+
+  return ich;
+}
+
+
+static Bytecount
+mbcs_convert (struct coding_stream *str, const UExtbyte *src,
+	      unsigned_char_dynarr *dst, Bytecount n)
+{
+  Lisp_Object_dynarr *charsets = XCODING_SYSTEM_MBCS_CHARSETS (str->codesys);
+  Bytecount orign = n;
+
+  if (str->direction == CODING_DECODE)
+    {
+      while (n--)
+	{
+	  UExtbyte c = *src++;
+	  Ichar ich = -1;
+
+	  if (str->ch >= 0)
+	    {
+	      /* See if we can derive a legimitate character out of
+		 the specified charsets */
+	      ich = try_to_derive_character (str->ch, c, 2, charsets,
+					     CONVERR_FAIL);
+	      if (ich < 0)
+		/* See if we can derive an "illegitimate" two-byte
+		   character out of the charsets -- one that's within
+		   range but happens not to have a Unicode
+		   translation. */
+		ich = try_to_derive_character (str->ch, c, 2, charsets,
+					       CONVERR_SUCCEED);
+	      if (ich < 0)
+		/* See if we can derive an "illegitimate" one-byte
+		   character from the first byte. */
+		ich = try_to_derive_character (0, str->ch, 1, charsets,
+					       CONVERR_SUCCEED);
+
+	      str->ch = -1;
+
+	      if (ich >= 0)
+		{
+		  Dynarr_add_ichar (dst, ich);
+		  goto retry_one_byte;
+		}
+	      else
+		ich = CANT_CONVERT_CHAR_WHEN_DECODING;
+	    }
+	  else
+	    {
+	      retry_one_byte:
+	      /* See if we can derive a legimitate character out of
+		 the specified charsets */
+	      ich = try_to_derive_character (0, c, 1, charsets,
+					     CONVERR_FAIL);
+	      /* Don't try and derive an illegitimate character yet;
+		 that happens after we try to find legitimate two-byte
+		 characters. */
+	      if (ich < 0)
+		{
+		  str->ch = c;
+		  continue;
+		}
+	    }
+
+	  Dynarr_add_ichar (dst, ich);
+	}
+
+      if (str->eof)
+	{
+	  /* See if we can derive an "illegitimate" one-byte
+	     character from the last, straggling byte. */
+	  Ichar ich = try_to_derive_character (0, str->ch, 1, charsets,
+					       CONVERR_SUCCEED);
+	  if (ich >= 0)
+	    Dynarr_add_ichar (dst, ich);
+	  else
+	    DECODE_OUTPUT_PARTIAL_CHAR (str, dst);
+	}
+    }
+  else
+    {
+      while (n--)
+	{
+	  Ibyte c = *src++;
+	  COPY_PARTIAL_CHAR_BYTE (c, str);
+	  if (!str->pind_remaining)
+	    {
+	      Lisp_Object charset;
+	      int c1, c2;
+	      itext_to_charset_codepoint (str->partial, charsets, &charset,
+					  &c1, &c2, CONVERR_SUCCEED);
+	      if (XCHARSET_DIMENSION (charset) == 2)
+		Dynarr_add (dst, c1);
+	      Dynarr_add (dst, c2);
+	    }
+	}
+    }
+  
+  return orign;
+}
+
+static void
+mbcs_init (Lisp_Object codesys)
+{
+  XCODING_SYSTEM_MBCS_CHARSETS (codesys) = Dynarr_new (Lisp_Object);
+}
+
+static void
+mbcs_mark (Lisp_Object codesys)
+{
+  mark_Lisp_Object_dynarr (XCODING_SYSTEM_MBCS_CHARSETS (codesys));
+}
+
+static void
+mbcs_finalize (Lisp_Object cs)
+{
+  if (XCODING_SYSTEM_MBCS_CHARSETS (cs))
+    {
+      Dynarr_free (XCODING_SYSTEM_MBCS_CHARSETS (cs));
+      XCODING_SYSTEM_MBCS_CHARSETS (cs) = 0;
+    }
+}
+
+static int
+mbcs_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
+{
+  if (EQ (key, Qcharsets))
+    {
+      Lisp_Object_dynarr *charsets = XCODING_SYSTEM_MBCS_CHARSETS (codesys);
+      Dynarr_reset (charsets);
+      EXTERNAL_LIST_LOOP_2 (elt, value)
+	{
+	  Lisp_Object charset = Fget_charset (elt);
+	  Dynarr_add (charsets, charset);
+	}
+    }
+  else
+    return 0;
+  return 1;
+}
+
+static Lisp_Object
+mbcs_getprop (Lisp_Object codesys, Lisp_Object prop)
+{
+  if (EQ (prop, Qcharsets))
+    {
+      Lisp_Object_dynarr *charsets = XCODING_SYSTEM_MBCS_CHARSETS (codesys);
+      Lisp_Object list = Qnil;
+      int i;
+
+      for (i = 0; i < Dynarr_length (charsets); i++)
+	list = Fcons (Dynarr_at (charsets, i), list);
+      return Fnreverse (list);
+    }
+  return Qunbound;
+}
+
+static void
+mbcs_print (Lisp_Object codesys, Lisp_Object printcharfun,
+	    int UNUSED (escapeflag))
+{
+  Lisp_Object_dynarr *charsets = XCODING_SYSTEM_MBCS_CHARSETS (codesys);
+  int i;
+  
+  for (i = 0; i < Dynarr_length (charsets); i++)
+    write_fmt_string_lisp (printcharfun, i == 0 ? "(%s" : " %s", 1,
+			   XCHARSET_NAME (Dynarr_at (charsets, i)));
+  write_c_string (printcharfun, ")");
+}
+
+/* ＠＠#### Need MBCS detector; but probably need to redo the whole detection
+   system to accommodate this properly */
+

 /************************************************************************/
 /*                          Shift-JIS methods                           */
＠＠ -61,7 +306,7 ＠＠
    as is.  A character of JISX0201-Kana (DIMENSION1_CHARS94 character set) is
    encoded by "position-code + 0x80".  A character of JISX0208
    (DIMENSION2_CHARS94 character set) is encoded in 2-byte but two
-   position-codes are divided and shifted so that it fit in the range
+   position-codes are divided and shifted so that it fits in the range
    below.

    --- CODE RANGE of Shift-JIS ---
＠＠ -102,7 +347,6 ＠＠
 shift_jis_convert (struct coding_stream *str, const UExtbyte *src,
 		   unsigned_char_dynarr *dst, Bytecount n)
 {
-  unsigned int ch     = str->ch;
   Bytecount orign = n;

   if (str->direction == CODING_DECODE)
＠＠ -111,41 +355,37 ＠＠
 	{
 	  UExtbyte c = *src++;

-	  if (ch)
+	  if (str->ch >= 0)
 	    {
 	      /* Previous character was first byte of Shift-JIS Kanji char. */
 	      if (byte_shift_jis_two_byte_2_p (c))
 		{
-		  Ibyte e1, e2;
+		  int e1, e2;

-		  Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
-		  DECODE_SHIFT_JIS (ch, c, e1, e2);
-		  Dynarr_add (dst, e1);
-		  Dynarr_add (dst, e2);
+		  DECODE_SHIFT_JIS (str->ch, c, e1, e2);
+		  non_ascii_charset_codepoint_to_dynarr
+		    (Vcharset_japanese_jisx0208, e1, e2, dst, CONVERR_SUCCEED);
 		}
 	      else
 		{
-		  DECODE_ADD_BINARY_CHAR (ch, dst);
+		  DECODE_ADD_BINARY_CHAR (str->ch, dst);
 		  DECODE_ADD_BINARY_CHAR (c, dst);
 		}
-	      ch = 0;
+	      str->ch = -1;
 	    }
 	  else
 	    {
 	      if (byte_shift_jis_two_byte_1_p (c))
-		ch = c;
+		str->ch = c;
 	      else if (byte_shift_jis_katakana_p (c))
-		{
-		  Dynarr_add (dst, LEADING_BYTE_KATAKANA_JISX0201);
-		  Dynarr_add (dst, c);
-		}
+		  non_ascii_charset_codepoint_to_dynarr
+		    (Vcharset_katakana_jisx0201, 0, c, dst, CONVERR_SUCCEED);
 	      else
 		DECODE_ADD_BINARY_CHAR (c, dst);
 	    }
 	}

-      if (str->eof)
-	DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+      DECODE_OUTPUT_PARTIAL_CHAR (str, dst);
     }
   else
     {
＠＠ -153,38 +393,33 ＠＠
 	{
 	  Ibyte c = *src++;
 	  if (byte_ascii_p (c))
-	    {
-	      Dynarr_add (dst, c);
-	      ch = 0;
-	    }
-	  else if (ibyte_leading_byte_p (c))
-	    ch = (c == LEADING_BYTE_KATAKANA_JISX0201 ||
-		  c == LEADING_BYTE_JAPANESE_JISX0208_1978 ||
-		  c == LEADING_BYTE_JAPANESE_JISX0208) ? c : 0;
-	  else if (ch)
+	    Dynarr_add (dst, c);
+	  else
 	    {
-	      if (ch == LEADING_BYTE_KATAKANA_JISX0201)
-		{
-		  Dynarr_add (dst, c);
-		  ch = 0;
-		}
-	      else if (ch == LEADING_BYTE_JAPANESE_JISX0208_1978 ||
-		       ch == LEADING_BYTE_JAPANESE_JISX0208)
-		ch = c;
-	      else
+	      COPY_PARTIAL_CHAR_BYTE (c, str);
+	      if (!str->pind_remaining)
 		{
-		  UExtbyte j1, j2;
-		  ENCODE_SHIFT_JIS (ch, c, j1, j2);
-		  Dynarr_add (dst, j1);
-		  Dynarr_add (dst, j2);
-		  ch = 0;
+		  Lisp_Object charset;
+		  int c1, c2;
+		  non_ascii_itext_to_charset_codepoint_raw
+		    (str->partial, shift_jis_precedence, &charset, &c1, &c2);
+		  if (EQ (charset, Vcharset_katakana_jisx0201))
+		    Dynarr_add (dst, c2);
+		  else if (EQ (charset, Vcharset_japanese_jisx0208) ||
+			   EQ (charset, Vcharset_japanese_jisx0208_1978))
+		    {
+		      UExtbyte j1, j2;
+		      ENCODE_SHIFT_JIS (c1, c2, j1, j2);
+		      Dynarr_add (dst, j1);
+		      Dynarr_add (dst, j2);
+		    }
+		  else
+		    Dynarr_add (dst, CANT_CONVERT_CHAR_WHEN_ENCODING);
 		}
 	    }
 	}
     }

-  str->ch    = ch;
-
   return orign;
 }

＠＠ -206,8 +441,8 ＠＠
       byte_shift_jis_two_byte_2_p (s2))
     {
       DECODE_SHIFT_JIS (s1, s2, c1, c2);
-      return make_char (make_ichar (Vcharset_japanese_jisx0208,
-				     c1 & 0x7F, c2 & 0x7F));
+      return make_char (charset_codepoint_to_ichar
+			(Vcharset_japanese_jisx0208, c1, c2, CONVERR_SUCCEED));
     }
   else
     return Qnil;
＠＠ -223,10 +458,12 ＠＠
   int c1, c2, s1, s2;

   CHECK_CHAR_COERCE_INT (character);
-  BREAKUP_ICHAR (XCHAR (character), charset, c1, c2);
-  if (EQ (charset, Vcharset_japanese_jisx0208))
+  ichar_to_charset_codepoint (XCHAR (character), shift_jis_precedence,
+			      &charset, &c1, &c2);
+  if (EQ (charset, Vcharset_japanese_jisx0208) ||
+      EQ (charset, Vcharset_japanese_jisx0208_1978))
     {
-      ENCODE_SHIFT_JIS (c1 | 0x80, c2 | 0x80, s1, s2);
+      ENCODE_SHIFT_JIS (c1, c2, s1, s2);
       return Fcons (make_int (s1), make_int (s2));
     }
   else
＠＠ -327,6 +564,8 ＠＠
 /* BIG5 (used for Mandarin in Taiwan). */
 DEFINE_CODING_SYSTEM_TYPE (big5);

+#ifndef UNICODE_INTERNAL
+
 /* BIG5 is a coding system encoding two character sets: ASCII and
    Big5.  An ASCII character is encoded as is.  Big5 is a two-byte
    character set and is encoded in two-byte.
＠＠ -345,20 +584,6 ＠＠
    contains frequently used characters and the latter contains less
    frequently used characters.  */

-inline static int
-byte_big5_two_byte_1_p (int c)
-{
-  return c >= 0xA1 && c <= 0xFE;
-}
-
-/* Is this the second byte of a Shift-JIS two-byte char? */
-
-inline static int
-byte_big5_two_byte_2_p (int c)
-{
-  return (c >= 0x40 && c <= 0x7E) || (c >= 0xA1 && c <=
0xFE);
-}
-
 /* Number of Big5 characters which have the same code in 1st byte.  */

 #define BIG5_SAME_ROW (0xFF - 0xA1 + 0x7F - 0x40)
＠＠ -372,8 +597,8 ＠＠
    capital letters in variables except in a very formalized way
    (e.g. Qstring). */

-/* Convert Big5 code (b1, b2) into its internal string representation
-   (lb, c1, c2). */
+/* Convert Big5 code (b1, b2) into a charset codepoint (c1, c2) in the
+   pseudo-charsets `chinese-big5-1' or `chinese-big5-2'. */

 /* There is a much simpler way to split the Big5 charset into two.
    For the moment I'm going to leave the algorithm as-is because it
＠＠ -397,7 +622,7 ＠＠
    and coerce the result into a 94x94 space.
    */

-#define DECODE_BIG5(b1, b2, lb, c1, c2) do				\
+#define DECODE_BIG5(b1, b2, charset, c1, c2) do				\
 {									\
   int B1 = b1, B2 = b2;							\
   int I									\
＠＠ -405,25 +630,25 ＠＠
 									\
   if (B1 < 0xC9)							\
     {									\
-      lb = LEADING_BYTE_CHINESE_BIG5_1;					\
+      charset = Vcharset_chinese_big5_1;				\
     }									\
   else									\
     {									\
-      lb = LEADING_BYTE_CHINESE_BIG5_2;					\
+      charset = Vcharset_chinese_big5_2;				\
       I -= (BIG5_SAME_ROW) * (0xC9 - 0xA1);				\
     }									\
-  c1 = I / (0xFF - 0xA1) + 0xA1;					\
-  c2 = I % (0xFF - 0xA1) + 0xA1;					\
+  c1 = I / (0xFF - 0xA1) + 0x21;					\
+  c2 = I % (0xFF - 0xA1) + 0x21;					\
 } while (0)

-/* Convert the internal string representation of a Big5 character
-   (lb, c1, c2) into Big5 code (b1, b2). */
+/* Convert charset codepoint in the pseudo-charsets `chinese-big5-1' and
+   `chinese-big5-2' (c1, c2) into Big5 code (b1, b2). */

-#define ENCODE_BIG5(lb, c1, c2, b1, b2) do				\
+#define ENCODE_BIG5(charset, c1, c2, b1, b2) do				\
 {									\
-  int I = ((c1) - 0xA1) * (0xFF - 0xA1) + ((c2) - 0xA1);		\
+  int I = ((c1) - 0x21) * (0xFF - 0xA1) + ((c2) - 0x21);		\
 									\
-  if (lb == LEADING_BYTE_CHINESE_BIG5_2)				\
+  if (EQ (charset, Vcharset_chinese_big5_2))				\
     {									\
       I += BIG5_SAME_ROW * (0xC9 - 0xA1);				\
     }									\
＠＠ -432,13 +657,28 ＠＠
   b2 += b2 < 0x3F ? 0x40 : 0x62;					\
 } while (0)

+#endif /* not UNICODE_INTERNAL */
+
+inline static int
+byte_big5_two_byte_1_p (int c)
+{
+  return c >= 0xA1 && c <= 0xFE;
+}
+
+/* Is this the second byte of a Shift-JIS two-byte char? */
+
+inline static int
+byte_big5_two_byte_2_p (int c)
+{
+  return (c >= 0x40 && c <= 0x7E) || (c >= 0xA1 && c <=
0xFE);
+}
+
 /* Convert Big5 data to internal format. */

 static Bytecount
 big5_convert (struct coding_stream *str, const UExtbyte *src,
 	      unsigned_char_dynarr *dst, Bytecount n)
 {
-  unsigned int ch     = str->ch;
   Bytecount orign = n;

   if (str->direction == CODING_DECODE)
＠＠ -446,35 +686,40 ＠＠
       while (n--)
 	{
 	  UExtbyte c = *src++;
-	  if (ch)
+	  if (str->ch >= 0)
 	    {
 	      /* Previous character was first byte of Big5 char. */
 	      if (byte_big5_two_byte_2_p (c))
 		{
-		  Ibyte b1, b2, b3;
-		  DECODE_BIG5 (ch, c, b1, b2, b3);
-		  Dynarr_add (dst, b1);
-		  Dynarr_add (dst, b2);
-		  Dynarr_add (dst, b3);
+#ifdef UNICODE_INTERNAL
+		  non_ascii_charset_codepoint_to_dynarr
+		    (Vcharset_chinese_big5, str->ch, c, dst,
+		     CONVERR_SUCCEED);
+#else /* not UNICODE_INTERNAL */
+		  Lisp_Object charset;
+		  int b1, b2;
+		  DECODE_BIG5 (str->ch, c, charset, b1, b2);
+		  non_ascii_charset_codepoint_to_dynarr
+		    (charset, b1, b2, dst, CONVERR_SUCCEED);
+#endif /* UNICODE_INTERNAL */
 		}
 	      else
 		{
-		  DECODE_ADD_BINARY_CHAR (ch, dst);
+		  DECODE_ADD_BINARY_CHAR (str->ch, dst);
 		  DECODE_ADD_BINARY_CHAR (c, dst);
 		}
-	      ch = 0;
+	      str->ch = -1;
 	    }
 	  else
 	    {
 	      if (byte_big5_two_byte_1_p (c))
-		ch = c;
+		str->ch = c;
 	      else
 		DECODE_ADD_BINARY_CHAR (c, dst);
 	    }
 	}

-      if (str->eof)
-	DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+      DECODE_OUTPUT_PARTIAL_CHAR (str, dst);
     }
   else
     {
＠＠ -482,65 +727,85 ＠＠
 	{
 	  Ibyte c = *src++;
 	  if (byte_ascii_p (c))
-	    {
-	      /* ASCII. */
 	      Dynarr_add (dst, c);
-	    }
-	  else if (ibyte_leading_byte_p (c))
+	  else
 	    {
-	      if (c == LEADING_BYTE_CHINESE_BIG5_1 ||
-		  c == LEADING_BYTE_CHINESE_BIG5_2)
+	      COPY_PARTIAL_CHAR_BYTE (c, str);
+	      if (!str->pind_remaining)
 		{
-		  /* A recognized leading byte. */
-		  ch = c;
-		  continue;	/* not done with this character. */
+		  Lisp_Object charset;
+		  int c1, c2;
+		  non_ascii_itext_to_charset_codepoint_raw 
+		    (str->partial, big5_precedence, &charset, &c1, &c2);
+#ifdef UNICODE_INTERNAL
+		  if (EQ (charset, Vcharset_chinese_big5))
+		    {
+		      Dynarr_add (dst, c1);
+		      Dynarr_add (dst, c2);
+		    }
+#else /* not UNICODE_INTERNAL */
+		  if (EQ (charset, Vcharset_chinese_big5_1) ||
+		      EQ (charset, Vcharset_chinese_big5_2))
+		    {
+		      UExtbyte b1, b2;
+		      ENCODE_BIG5 (charset, c1, c2, b1, b2);
+		      Dynarr_add (dst, b1);
+		      Dynarr_add (dst, b2);
+		    }
+#endif /* UNICODE_INTERNAL */
+		  else
+		    Dynarr_add (dst, CANT_CONVERT_CHAR_WHEN_ENCODING);
 		}
-	      /* otherwise just ignore this character. */
-	    }
-	  else if (ch == LEADING_BYTE_CHINESE_BIG5_1 ||
-		   ch == LEADING_BYTE_CHINESE_BIG5_2)
-	    {
-	      /* Previous char was a recognized leading byte. */
-	      ch = (ch << 8) | c;
-	      continue;		/* not done with this character. */
-	    }
-	  else if (ch)
-	    {
-	      /* Encountering second byte of a Big5 character. */
-	      UExtbyte b1, b2;
-
-	      ENCODE_BIG5 (ch >> 8, ch & 0xFF, c, b1, b2);
-	      Dynarr_add (dst, b1);
-	      Dynarr_add (dst, b2);
 	    }
-
-	  ch = 0;
 	}
     }

-  str->ch    = ch;
   return orign;
 }

-Ichar
-decode_big5_char (int b1, int b2)
+static Ichar
+decode_big5_char (int b1, int b2, enum converr handling)
 {
+#ifdef UNICODE_INTERNAL
+  return charset_codepoint_to_ichar (Vcharset_chinese_big5, b1, b2,
+				     handling);
+#else /* not UNICODE_INTERNAL */
   if (byte_big5_two_byte_1_p (b1) &&
       byte_big5_two_byte_2_p (b2))
     {
-      int leading_byte;
       Lisp_Object charset;
       int c1, c2;

-      DECODE_BIG5 (b1, b2, leading_byte, c1, c2);
-      charset = charset_by_leading_byte (leading_byte);
-      return make_ichar (charset, c1 & 0x7F, c2 & 0x7F);
+      DECODE_BIG5 (b1, b2, charset, c1, c2);
+      return charset_codepoint_to_ichar (charset, c1, c2, handling);
     }
   else
-    return -1;
+    return old_mule_handle_bad_ichar (handling);
+#endif /* UNICODE_INTERNAL */
 }
+
+#ifndef UNICODE_INTERNAL

-DEFUN ("decode-big5-char", Fdecode_big5_char, 1, 1, 0, /*
+void
+big5_char_to_fake_codepoint (int b1, int b2, Lisp_Object *charset, int *c1,
+			     int *c2)
+{
+  if (byte_big5_two_byte_1_p (b1) &&
+      byte_big5_two_byte_2_p (b2))
+    {
+      DECODE_BIG5 (b1, b2, *charset, *c1, *c2);
+    }
+  else
+    {
+      *charset = Qnil;
+      *c1 = 0;
+      *c2 = 0;
+    }
+}
+
+#endif /* not UNICODE_INTERNAL */
+
+DEFUN ("decode-big5-char", Fdecode_big5_char, 1, 2, 0, /*
 Convert Big Five character codes in CODE into a character.
 CODE is a cons of two integers specifying the codepoints in Big Five.
 Return the corresponding character, or nil if the codepoints are out of range.
＠＠ -549,15 +814,18 ＠＠
 representation of the character in the external Big Five encoding, and thus
 converting them to a character is analogous to any other operation that
 decodes an external representation.
+
+HANDLE-ERROR specifies error handling for illegal characters.
 */
-       (code))
+       (code, handle_error))
 {
   Ichar ch;
+  enum converr handl = decode_handle_error (handle_error);

   CHECK_CONS (code);
   CHECK_INT (XCAR (code));
   CHECK_INT (XCDR (code));
-  ch = decode_big5_char (XINT (XCAR (code)), XINT (XCDR (code)));
+  ch = decode_big5_char (XINT (XCAR (code)), XINT (XCDR (code)), handl);
   if (ch == -1)
     return Qnil;
   else
＠＠ -573,17 +841,25 ＠＠
        (character))
 {
   Lisp_Object charset;
-  int c1, c2, b1, b2;
+  int c1, c2;

   CHECK_CHAR_COERCE_INT (character);
-  BREAKUP_ICHAR (XCHAR (character), charset, c1, c2);
+  ichar_to_charset_codepoint (XCHAR (character), big5_precedence,
+			      &charset, &c1, &c2);
+#ifdef UNICODE_INTERNAL
+  if (EQ (charset, Vcharset_chinese_big5))
+    {
+      return Fcons (make_int (c1), make_int (c2));
+    }
+#else /* not UNICODE_INTERNAL */
   if (EQ (charset, Vcharset_chinese_big5_1) ||
       EQ (charset, Vcharset_chinese_big5_2))
     {
-      ENCODE_BIG5 (XCHARSET_LEADING_BYTE (charset), c1 | 0x80, c2 | 0x80,
-		   b1, b2);
+      int b1, b2;
+      ENCODE_BIG5 (charset, c1, c2, b1, b2);
       return Fcons (make_int (b1), make_int (b2));
     }
+#endif /* UNICODE_INTERNAL */
   else
     return Qnil;
 }
＠＠ -671,6 +947,12 ＠＠
   ISO_ESC_2_4,		/* We've seen ESC $.  This indicates
 			   that we're designating a multi-byte, rather
 			   than a single-byte, character set. */
+  ISO_ESC_2_5,		/* We've seen ESC %.  This indicates the beginning
+			   of an extended segment. */
+  ISO_ESC_2_5_2F,	/* We've seen ESC % 0x2F. */
+  ISO_ESC_2_5_2F_30,	/* We've seen ESC % 0x2F 0x3[01234]. */
+  ISO_ESC_2_5_2F_30_M,	/* We've seen ESC % 0x2F 0x3[01234] size1. */
+  ISO_ESC_2_5_2F_30_M_L,/* We've seen ESC % 0x2F 0x3[01234] size1 size2. */
   ISO_ESC_2_8,		/* We've seen ESC 0x28, i.e. ESC (.
 			   This means designate a 94-character
 			   character set into G0. */
＠＠ -751,12 +1033,17 ＠＠
 /* If set, we're currently processing a composite character (i.e. a
    character constructed by overstriking two or more characters). */
 #define ISO_STATE_COMPOSITE	(1 << 5)
+/* If set, we're processing XFree86-style UTF-8. */
+#define ISO_STATE_XFREE86_UTF8	(1 << 6)
+/* If set, we're processing an X extended segment. */
+#define ISO_STATE_X_EXTENDED	(1 << 7)

 /* ISO_STATE_LOCK is the mask of flags that remain on until explicitly
    turned off when in the ISO2022 encoder/decoder.  Other flags are turned
    off at the end of processing each character or escape sequence. */
 # define ISO_STATE_LOCK \
-  (ISO_STATE_COMPOSITE | ISO_STATE_R2L)
+  (ISO_STATE_COMPOSITE | ISO_STATE_R2L | ISO_STATE_XFREE86_UTF8 | \
+   ISO_STATE_X_EXTENDED)

 typedef struct charset_conversion_spec
 {
＠＠ -791,6 +1078,7 ＠＠
   unsigned int lock_shift	:1;
   unsigned int no_iso6429	:1;
   unsigned int escape_quoted	:1;
+  unsigned int iso2022_preserve :1;
 };

 #define CODING_SYSTEM_ISO2022_INITIAL_CHARSET(codesys, g) \
＠＠ -811,6 +1099,8 ＠＠
   (CODING_SYSTEM_TYPE_DATA (codesys, iso2022)->no_iso6429)
 #define CODING_SYSTEM_ISO2022_ESCAPE_QUOTED(codesys) \
   (CODING_SYSTEM_TYPE_DATA (codesys, iso2022)->escape_quoted)
+#define CODING_SYSTEM_ISO2022_ISO2022_PRESERVE(codesys) \
+  (CODING_SYSTEM_TYPE_DATA (codesys, iso2022)->iso2022_preserve)
 #define CODING_SYSTEM_ISO2022_INPUT_CONV(codesys) \
   (CODING_SYSTEM_TYPE_DATA (codesys, iso2022)->input_conv)
 #define CODING_SYSTEM_ISO2022_OUTPUT_CONV(codesys) \
＠＠ -834,11 +1124,14 ＠＠
   CODING_SYSTEM_ISO2022_NO_ISO6429 (XCODING_SYSTEM (codesys))
 #define XCODING_SYSTEM_ISO2022_ESCAPE_QUOTED(codesys) \
   CODING_SYSTEM_ISO2022_ESCAPE_QUOTED (XCODING_SYSTEM (codesys))
+#define XCODING_SYSTEM_ISO2022_ISO2022_PRESERVE(codesys) \
+  CODING_SYSTEM_ISO2022_ISO2022_PRESERVE (XCODING_SYSTEM (codesys))
 #define XCODING_SYSTEM_ISO2022_INPUT_CONV(codesys) \
   CODING_SYSTEM_ISO2022_INPUT_CONV (XCODING_SYSTEM (codesys))
 #define XCODING_SYSTEM_ISO2022_OUTPUT_CONV(codesys) \
   CODING_SYSTEM_ISO2022_OUTPUT_CONV (XCODING_SYSTEM (codesys))

+/* ~~#### Must be converted to Lisp object */
 /* Additional information used by the ISO2022 decoder and detector. */
 struct iso2022_coding_stream
 {
＠＠ -874,54 +1167,21 ＠＠
   unsigned_char_dynarr *composite_chars;
 #endif

-  /* If we saw an invalid designation sequence for a particular
-     register, we flag it here and switch to ASCII.  The next time we
-     see a valid designation for this register, we turn off the flag
-     and do the designation normally, but pretend the sequence was
-     invalid.  The effect of all this is that (most of the time) the
-     escape sequences for both the switch to the unknown charset, and
-     the switch back to the known charset, get inserted literally into
-     the buffer and saved out as such.  The hope is that we can
-     preserve the escape sequences so that the resulting written out
-     file makes sense.  If we don't do any of this, the designation
-     to the invalid charset will be preserved but that switch back
-     to the known charset will probably get eaten because it was
-     the same charset that was already present in the register. */
-  unsigned char invalid_designated[4];
-
-  /* We try to do similar things as above for direction-switching
-     sequences.  If we encountered a direction switch while an
-     invalid designation was present, or an invalid designation
-     just after a direction switch (i.e. no valid designation
-     encountered yet), we insert the direction-switch escape
-     sequence literally into the output stream, and later on
-     insert the corresponding direction-restoring escape sequence
-     literally also. */
-  unsigned int switched_dir_and_no_valid_charset_yet :1;
-  unsigned int invalid_switch_dir :1;
-
-  /* Tells the decoder to output the escape sequence literally
-     even though it was valid.  Used in the games we play to
-     avoid lossage when we encounter invalid designations. */
-  unsigned int output_literally :1;
-  /* We encountered a direction switch followed by an invalid
-     designation.  We didn't output the direction switch
-     literally because we didn't know about the invalid designation;
-     but we have to do so now. */
-  unsigned int output_direction_sequence :1;
-
   /**************** for encoding ****************/

   /* Whether we need to explicitly designate the charset in the
      G? register before using it.  It is initialized from the
      array FORCE_CHARSET_ON_OUTPUT in CODESYS. */
   unsigned char force_charset_on_output[4];
+
+  /* List of invalid chars that we attempted to output */
+  Lisp_Object warned_chars;

-  /* Other state variables that need to be preserved across
-     invocations. */
-  Lisp_Object current_charset;
-  int current_half;
-  int current_char_boundary;
+  /* Unicode precedence used for this conversion.  We put in this list only
+     charsets that can be encoded using ISO2022 and preferring currently
+     designated charsets.  ＠＠#### We should be still smarter, making use of
+     the language of the buffer we're coming from. */
+  Lisp_Object_dynarr *unicode_precedence;
 };

 static const struct memory_description ccs_description_1[] =
＠＠ -961,6 +1221,230 ＠＠

 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (iso2022);

+/* NOTE NOTE NOTE:
+
+   A full description of ISO-2022 is available through ECMA, where it is
+   known as ECMA-35.  See
+
+   http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-035.pdf
+
+   In general, see
+
+   http://www.ecma-international.org/publications/standards/Standard.htm
+
+   ISO makes you pay for their bloody standards, but ECMA provides the same
+   ones for free.
+
+   See also
+
+   http://www.iana.org/assignments/character-sets
+
+   for a listing of lots of character sets and alternative names.  But this
+   doesn't seem to list everything; the following does a better job in some
+   cases:
+
+   http://www.dataparksearch.org/dpsearch-international.en.html
+
+   The registry of character sets and their associated final characters is
+   in
+
+   http://www.itscj.ipsj.or.jp/ISO-IR/
+
+   Although this is extremely confusing and difficult to figure out.
+
+   Here's some more info I dug up somewhere; this is partial:
+
+   Values of ID for 94-character sets include: 
+   
+   B US-ASCII (= left half of ISO 8859 sets) 
+   I right half of JIS X0201-1976 (katakana) 
+   J left half of JIS X0201-1976 (JIS-Roman) 
+   
+   Values of ID for multi-byte 94-character sets are: 
+   
+   ＠ JIS C 6226-1978 
+   A GB 2312-1980 
+   B JIS X0208-1990 
+   C KSC 5601-1987 
+   D JIS X0212-1990 
+   E GB 2312-1980 plus GB 8565-1989 
+   G CNS 11643-1986 level 1 
+   H CNS 11643-1986 level 2 
+   I CNS 11643-1992 plane 3 
+   J CNS 11643-1992 plane 4 
+   K CNS 11643-1992 plane 5 
+   L CNS 11643-1992 plane 6 
+   M CNS 11643-1992 plane 7 
+   
+   Values of ID for 96-character sets include: 
+   
+   A right half of ISO 8859-1:1987 (ISO Latin-1) 
+   B right half of ISO 8859-2:1987 (ISO Latin-2) 
+   C right half of ISO 8859-3:1988 (ISO Latin-3) 
+   D right half of ISO 8859-4:1988 (ISO Latin-4) 
+   F right half of ISO 8859-7:1987 (ISO Latin-Greek) 
+   G right half of ISO 8859-6:1988 (ISO Latin-Arabic) 
+   H right half of ISO 8859-8:1988 (ISO Latin-Hebrew) 
+   L right half of ISO 8859-5:1989 (ISO Latin-Cyrillic) 
+   M right half of ISO 8859-9:1989 (ISO Latin-5) 
+   T Thai character set TIS 620-2533:1990 
+   V right half of ISO 8859-10:1992 (ISO Latin-6) 
+
+
+   Here is something from an old 1993 email, which tries to give a complete
+   listing of character sets and final bytes, but it's old:
+
+ ------------------------------------------------------------------
+ 
+ Character set types
+   ( ): 94-char set w/ 3-char sequence
+   (!): 94-char set w/ 4-char sequence
+   (-): 96-char set
+   ($): multiple-byte set
+   (%): non-2022 system with standard return
+   (/): non-2022 system without standard return
+ 
+ ISO#      Sponsor Source(s)                     GL  Esc Description
+       Size                                          HEX
+                                       type --------*
+    2    94 ISO   646:1983                           40  646-old
+    4    94 GB    BS 4730                            41  646-GB
+    6    94 US    ANSI X3.4:1968                     42  646-IRV (ASCII)
+    8-1  94 SE    NATS                               43  news SE-FI
+    8-2  14 SE    NATS                               44  news SE-FI extra
+    9-1  94 SE    NATS                               45  news DK-NO
+    9-2  14 SE    NATS                               46  news DK-NO extra
+   10    94 SE    SEN 850200 B                       47  646-SE
+   11    94 SE    SEN 850200 C                       48  646-SE names
+   13    63 JP    JIS C 6220:1969, JIS X 0201    14  49  Japanese Katakana
+   14    94 JP    JIS C 6220:1969, JIS X 0201        4A  646-JP
+   15    94 ECMA  Olivetti                           59  646-IT
+   16    94 ECMA  Olivetti                           4C  646-PT
+   17    94 ECMA  Olivetti                           5A  646-ES
+   18    94 ECMA  Olivetti                           5B  Greek
+   19    94 ECMA  Olivetti                           5C  Latin+Greek
+   21    94 DE    DIN 66003                          4B  646-DE
+   27    94 ECMA  Honeywell-Bull                     55  Latin+Greek
+   31    82 ISO   5428                               58  Greek biblio.
+   37    94 ISO   5427                               4E  Cyrillic
+   38    77 DE    DIN 31624                          4F  biblio.
+   39    59 DE    DIN 31625, ISO 6438                4D  African
+   42  6802 JP    JIS C 6226:1978                   $40  Japanese Kanji
+   47    94 GB    BBC                                56  viewdata
+   49    81 IAEA  INIS                               57  646 subset
+   50    38 IAEA  INIS                           49  5D  symbols
+   51    82 IAEA  INIS                           49  5E  Cyrillic
+   53    76 ISO   5426:1980                       2  50  biblio.
+   54    42 ISO   5427:1981                      37  51  Cyrillic
+   55    73 ISO   5428:1980                      31  53  Greek biblio.
+   57    94 CN    GB 1988-80                         54  646-CN
+   58  7445 CN    GB 2312-80                        $41  Chinese Hanzi
+   59    94 MA    CODAR-U                            5F  Arabic (Morocco)
+   60    94 NO    NS 4551-1                          60  646-NO
+   68    94 CA    APL WG                             65  APL Canadian
+   69    94 FR    NF Z 62-010:1982                   66  646-FR
+   70    85 CCITT                                    62  videotex suppl.
+   71    94 CCITT                                    63  videotex mosaic G1
+   84    94 ECMA  IBM                                67  646-PT
+   85    94 ECMA  IBM                                68  646-ES
+   86    94 HU    MSZ 7795/3                         69  646-HU
+   87  6877 JP    JIS C 6226:1983 = JIS X 0208      $42  Japanese Kanji
+   89    86 ASMO  ASMO 449, ISO 9036                 6B  Arabic-7
+   90    83 ISO   6937/2                          2  6C  extra
+   91    59 JP    JIS C 6229:1984                    6D  OCR a
+   92    92 JP    JIS C 6229:1984                    6E  OCR b
+   93     4 JP    JIS C 6229:1984                92  6F  OCR b extra
+   94    64 JP    JIS C 6229:1984                    70  OCR hand
+   95     1 JP    JIS C 6229:1984                94  71  OCR hand extra
+   96    51 JP    JIS C 6229:1984                    72  OCR Katakana
+   98    14 JP    ISO 2033:1983                      73  E13B magnetic
+  100    96 ISO   8859/1:1987, ECMA-94            6 -41  Latin-1
+  101    96 ISO   8859/2:1987, ECMA-94            6 -42  Latin-2
+  102    88 CCITT T.61                               75  teletex
+  103    67 CCITT T.61                          102  76  teletex extra
+  108   605 CA    CSA T 500:1983                    %41  NAPLPS
+  109    96 ISO   8859/3:1988, ECMA-94            6 -43  Latin-3
+  110    96 ISO   8859/4:1988, ECMA-94            6 -44  Latin-4
+  111    96 ECMA  ECMA-113:1986                   6 -40  Cyrillic
+  121    94 CA    CSA Z243.4:1985                    77  Canadian 1
+  122    94 CA    CSA Z243.4:1985                    78  Canadian 2
+  123    96 CA    CSA Z243.4:1985                 ? -45  extra
+  125   256 ISO   9040 + 9041                       /42  virt. term.
+  126    90 ISO   8859/7:1987, ECMA-118           6 -46  Latin/Greek
+  127    51 ISO   8859/6:1987, ASMO 708           6 -47  Latin/Arabic
+  128    94 CCITT T.101                              7C  DS 3 - G2
+  129    65 CCITT T.101                           ? -7D  DS 3 - G3
+  131       CCITT T.101                             %43  Data syntax I
+  137    59 CCITT T.101                              79  DS 1 - mosaic
+  138    58 ISO   8859/8:1988, ECMA-121           6 -48  Latin/Hebrew
+  139    96 CS    CSN 369103                      2 -49  Czech, Slovak
+  141    94 YU    JUS I.B1.002                       7A  Croatian, Slovenian
+  142    87 GB    ISO 6937/2 + addendum           2 -4A  extra
+  143    96 NL    IEC P27-1                       6 -4B  symbols
+  144    96 ISO   8859/5:1988, ECMA-113           6 -4C  Latin/Cyrillic
+  145    96 CCITT T.101                             %44  Data syntax II
+  146    94 YU    JUS I.B1.003                       7B  Serbian
+  147    94 YU    JUS I.B1.004                       7D  Macedonian
+  148    96 ISO   8859/9:1989, ECMA-128           6 -4D  Latin-5
+  149  8224 KR    KSC 5601:1987                     $43  Korean Hanja
+  150    94 CCITT                                   !40  Greek
+  151    94 CS    NC NC99-10:81                     !41  646-CU
+  152    25 CEN   ISO 6937/2:1983                 6 -4E  extra
+  153    68 SU    GOST 19768:1987                 ? -4F  Cyrillic
+  154    90 ECMA                                  6 -50  Latin 1-5 G3
+  155    29 ISO   10367 (not final)               6 -51  box drawing
+  156    87 ISO   6937:1992 (not final)             -52  suppl 6937
+  157    96 SE    ISO 4873, ISO 8859/10           6 -56  Latin-6
+  158    42 SE    ISO 4873                        6 -58  Latin/Lapp supp
+  159  6067 JP    JIS X 0212:1990                   $44  supp. to 0208
+  160       CCITT T.101                             %45  Videotex syntax
+  161       CCITT T.101                             %46  Audio syntax
+  162       ISO   10646 (not finished)              /40  10646 2-oct lev 1
+  163       ISO   10646 (not finished)              /41  10646 4-oct lev 1
+  164    27 CCITT                                 ? -53  Hebrew
+  165  8443 CCITT                                   $45  Chinese comb.
+  166    88 TH    TIS 620-2533:1990               6 -54  Thai
+  167    82 ECMA                                  6 -55  Arab, Fr, Ger
+  168  6879 JP    JIS X 0208:1990 (repl #87)        $42  new Kanji set
+  169  2304 CA    Blissymbolics Comm Int'l          $46  Blissymbol
+  170    82 DK    ISO 646:1992                      !42  invariant 646
+  171  6085 ECMA  CNS 11643:1986                    $47  Chinese set 1
+  172  7650 ECMA  CNS 11643:1986                    $48  Chinese set 2
+  173    92 CCITT (repl #72)                         64  videotex mosaic G3
+  174       ISO   10646 (not finished)              /43  10646 2-oct lev 2
+  175       ISO   10646 (not finished)              /44  10646 4-oct lev 2
+  176       ISO   10646 (not finished)              /45  10646 2-oct lev 3
+  177       ISO   10646 (not finished)              /46  10646 4-oct lev 3
+  178       ISO   10646 (not finished)              %42 10646 UTF-1
+  179 --------- still pending ------------------------------------------
+
+
+*/
+
+/* ＠＠#### NOTE: We should implement extended segments in compound text.
+
+   XFree86 uses ESC 0x25 0x47 to switch into UTF8 mode, and ESC 0x25 0x40 to
+   switch out.  X also defines a more general mechanism for switching to
+   extended segments:
+
+   ESC 0x25 0x2F 0x30 M L name-of-encoding 0x02 ... [variable-length]
+   ESC 0x25 0x2F 0x31 M L name-of-encoding 0x02 ... [1 byte per char]
+   ESC 0x25 0x2F 0x32 M L name-of-encoding 0x02 ... [2 bytes per char]
+   ESC 0x25 0x2F 0x33 M L name-of-encoding 0x02 ... [3 bytes per char]
+   ESC 0x25 0x2F 0x34 M L name-of-encoding 0x02 ... [4 bytes per char]
+
+   where the name is encoded in ISO 8859-1 and M and L indicate the length
+   in bytes of the extended segment, including the name and terminating
+   0x02, and the length is computed as ((M - 128) * 128) + (L - 128).
+
+   According to emacs-unicode, (at least) the following names exist:
+
+  '(("big5-0" big5 2 (chinese-big5-1 chinese-big5-2))
+    ("ISO8859-14" iso-8859-14 1 latin-iso8859-14)
+    ("ISO8859-15" iso-8859-15 1 latin-iso8859-15))
+
+*/
+
 /* The following note taken directly from FSF 21.0.103. */

 /* The following note describes the coding system ISO2022 briefly.
＠＠ -1136,7 +1620,9 ＠＠
   COMPOSITION_WITH_ALTCHARS:
 	ESC 3 ALTCHAR [ ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
   COMPOSITION_WITH_RULE_ALTCHARS:
-	ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1 */
+	ESC 4 ALTCHAR [ RULE ALTCHAR ] ESC 0 CHAR [ CHAR ] ESC 1
+
+ */

 static void
 reset_iso2022_decode (Lisp_Object coding_system,
＠＠ -1166,6 +1652,60 ＠＠
       Dynarr_reset (data->composite_chars);
     }
 #endif
+  data->warned_chars = Qnil;
+}
+
+/* ＠＠#### This is inefficient as it is O(N^2).  Perhaps doesn't matter
+   as number of charsets will not be that big. */
+
+static void
+add_to_dynarr_if_necessary (Lisp_Object_dynarr *dyn, Lisp_Object charset)
+{
+  int i;
+  for (i = 0; i < Dynarr_length (dyn); i++)
+    if (EQ (charset, Dynarr_at (dyn, i)))
+      return;
+  Dynarr_add (dyn, charset);
+}
+
+/* Called for each pair of (symbol, charset) in the hash table tracking
+   charsets.  Handle ISO2022-compatible charsets.  We don't add
+   non-ISO2022-compatible charsets because there's no point. */
+static int
+riup_mapper (Lisp_Object UNUSED (key), Lisp_Object value,
+	     void *closure)
+{
+  if (get_charset_iso2022_type (value) != -1)
+    add_to_dynarr_if_necessary ((Lisp_Object_dynarr *) closure, value);
+  return 0;
+}
+
+/* Recreate the Unicode precedence array.  We want the following:
+
+   (1) Charsets currently designated should be at the top of the list.
+   (2) Then ASCII and Control-1, if not already there. (Hack)
+   (3) Then any remaining ISO2022-compatible charsets */
+
+static void
+reset_iso2022_unicode_precedence (struct iso2022_coding_stream *data)
+{
+  int i;
+
+  if (data->unicode_precedence)
+    Dynarr_reset (data->unicode_precedence);
+  else
+    data->unicode_precedence = Dynarr_new (Lisp_Object);
+  for (i = 0; i < 4; i++)
+    {
+      if (CHARSETP (data->charset[i]))
+	add_to_dynarr_if_necessary (data->unicode_precedence,
+				    data->charset[i]);
+    }
+  /* Also ASCII and Control-1 */
+  add_to_dynarr_if_necessary (data->unicode_precedence, Vcharset_ascii);
+  add_to_dynarr_if_necessary (data->unicode_precedence, Vcharset_control_1);
+  /* Add ISO2022-compatible charsets to unicode_precedence */
+  elisp_maphash (riup_mapper, Vcharset_hash_table, data->unicode_precedence);
 }

 static void
＠＠ -1184,8 +1724,9 ＠＠
 	XCODING_SYSTEM_ISO2022_FORCE_CHARSET_ON_OUTPUT (coding_system, i);
     }
   data->register_right = 1;
-  data->current_charset = Qnil;
-  data->current_char_boundary = 1;
+  data->warned_chars = Qnil;
+
+  reset_iso2022_unicode_precedence (data);
 }

 static void
＠＠ -1200,13 +1741,25 ＠＠
 }

 static void
+iso2022_mark_coding_stream (struct coding_stream *str)
+{
+  int i;
+  struct iso2022_coding_stream *data = CODING_STREAM_TYPE_DATA (str, iso2022);
+  mark_object (data->warned_chars);
+  for (i = 0; i < 4; i++)
+    mark_object (data->charset[i]);
+  if (data->unicode_precedence)
+    mark_Lisp_Object_dynarr (data->unicode_precedence);
+}
+
+static void
 iso2022_rewind_coding_stream (struct coding_stream *str)
 {
   iso2022_init_coding_stream (str);
 }

 static int
-fit_to_be_escape_quoted (unsigned char c)
+fit_to_be_escape_quoted (int c)
 {
   switch (c)
     {
＠＠ -1304,9 +1857,6 ＠＠
        escape-state variables. */
     iso->esc = ISO_ESC_NOTHING;

-  iso->output_literally = 0;
-  iso->output_direction_sequence = 0;
-
   switch (iso->esc)
     {
     case ISO_ESC_NOTHING:
＠＠ -1405,6 +1955,11 ＠＠
 	  iso->esc = ISO_ESC_5_11;
 	  goto not_done;

+	  /**** extended segments ****/
+	case '%':
+	  iso->esc = ISO_ESC_2_5;
+	  goto not_done;
+
 	  /**** designation ****/

 	case '$':	/* multibyte charset prefix */
＠＠ -1475,40 +2030,16 ＠＠

     directionality:
       iso->esc = ISO_ESC_DIRECTIONALITY;
-      /* Various junk here to attempt to preserve the direction sequences
-	 literally in the text if they would otherwise be swallowed due
-	 to invalid designations that don't show up as actual charset
-	 changes in the text. */
-      if (iso->invalid_switch_dir)
-	{
-	  /* We already inserted a direction switch literally into the
-	     text.  We assume (#### this may not be right) that the
-	     next direction switch is the one going the other way,
-	     and we need to output that literally as well. */
-	  iso->output_literally = 1;
-	  iso->invalid_switch_dir = 0;
-	}
-      else
-	{
-	  int jj;
+      return 1;

-	  /* If we are in the thrall of an invalid designation,
-	   then stick the directionality sequence literally into the
-	   output stream so it ends up in the original text again. */
-	  for (jj = 0; jj < 4; jj++)
-	    if (iso->invalid_designated[jj])
-	      break;
-	  if (jj < 4)
-	    {
-	      iso->output_literally = 1;
-	      iso->invalid_switch_dir = 1;
-	    }
-	  else
-	    /* Indicate that we haven't yet seen a valid designation,
-	       so that if a switch-dir is directly followed by an
-	       invalid designation, both get inserted literally. */
-	    iso->switched_dir_and_no_valid_charset_yet = 1;
-	}
+
+      /**** extended segments ****/
+
+    case ISO_ESC_2_5:
+      /* if (c == 0x47) */
+      /* ＠＠#### implement me */
+      iso->esc = ISO_ESC_NOTHING;
+      *flags &= ISO_STATE_LOCK;
       return 1;

＠＠ -1601,28 +2132,9 ＠＠
  designated:
   if (NILP (cs) && check_invalid_charsets)
     {
+      /* This should never happen, since we automatically create temporary
+	 charsets as necessary. --ben */
       ABORT ();
-      /* #### This should never happen now that we automatically create
-	 temporary charsets as necessary.  We should probably remove
-	 this code. --ben */
-      iso->invalid_designated[reg] = 1;
-      iso->charset[reg] = Vcharset_ascii;
-      iso->esc = ISO_ESC_DESIGNATE;
-      *flags &= ISO_STATE_LOCK;
-      iso->output_literally = 1;
-      if (iso->switched_dir_and_no_valid_charset_yet)
-	{
-	  /* We encountered a switch-direction followed by an
-	     invalid designation.  Ensure that the switch-direction
-	     gets outputted; otherwise it will probably get eaten
-	     when the text is written out again. */
-	  iso->switched_dir_and_no_valid_charset_yet = 0;
-	  iso->output_direction_sequence = 1;
-	  /* And make sure that the switch-dir going the other
-	     way gets outputted, as well. */
-	  iso->invalid_switch_dir = 1;
-	}
-      return 1;
     }
   /* This function is called with CODESYS equal to nil when
      doing coding-system detection. */
＠＠ -1647,13 +2159,6 ＠＠
   iso->charset[reg] = cs;
   iso->esc = ISO_ESC_DESIGNATE;
   *flags &= ISO_STATE_LOCK;
-  if (iso->invalid_designated[reg])
-    {
-      iso->invalid_designated[reg] = 0;
-      iso->output_literally = 1;
-    }
-  if (iso->switched_dir_and_no_valid_charset_yet)
-    iso->switched_dir_and_no_valid_charset_yet = 0;
   return 1;
 }

＠＠ -1728,7 +2233,7 ＠＠
 iso2022_decode (struct coding_stream *str, const UExtbyte *src,
 		unsigned_char_dynarr *dst, Bytecount n)
 {
-  unsigned int ch     = str->ch;
+  int ch     = str->ch;
 #ifdef ENABLE_COMPOSITE_CHARS
   unsigned_char_dynarr *real_dst = dst;
 #endif
＠＠ -1764,24 +2269,19 ＠＠
 		  break;
 		case ISO_ESC_END_COMPOSITE:
 		  {
-		    Ibyte comstr[MAX_ICHAR_LEN];
-		    Bytecount len;
 		    Ichar emch = lookup_composite_char (Dynarr_atp (dst, 0),
 							 Dynarr_length (dst));
 		    dst = real_dst;
-		    len = set_itext_ichar (comstr, emch);
-		    Dynarr_add_many (dst, comstr, len);
+		    Dynarr_add_ichar (dst, emch);
 		    break;
 		  }
 #else
 		case ISO_ESC_START_COMPOSITE:
 		  {
-		    Ibyte comstr[MAX_ICHAR_LEN];
-		    Bytecount len;
-		    Ichar emch = make_ichar (Vcharset_composite, c - '0' + ' ',
-					     0);
-		    len = set_itext_ichar (comstr, emch);
-		    Dynarr_add_many (dst, comstr, len);
+		    /* !!#### Handle error? */
+		    charset_codepoint_to_dynarr
+		      (Vcharset_composite, 0, c - '0' + ' ',
+		       dst, CONVERR_SUCCEED);
 		    break;
 		  }
 #endif /* ENABLE_COMPOSITE_CHARS */
＠＠ -1795,32 +2295,17 ＠＠
 		  break;
 		}
 	    }
-
-	  /* Attempted error recovery. */
-	  if (data->output_direction_sequence)
-	    ensure_correct_direction (flags & ISO_STATE_R2L ?
-				      CHARSET_RIGHT_TO_LEFT :
-				      CHARSET_LEFT_TO_RIGHT,
-				      str->codesys, dst, 0, 1);
-	  /* More error recovery. */
-	  if (!retval || data->output_literally)
+	  else
 	    {
+	      /* Error recovery. */
 	      /* Output the (possibly invalid) sequence */
 	      int i;
 	      for (i = 0; i < data->esc_bytes_index; i++)
 		DECODE_ADD_BINARY_CHAR (data->esc_bytes[i], dst);
 	      flags &= ISO_STATE_LOCK;
-	      if (!retval)
-		n++, src--;/* Repeat the loop with the same character. */
-	      else
-		{
-		  /* No sense in reprocessing the final byte of the
-		     escape sequence; it could mess things up anyway.
-		     Just add it now. */
-		  DECODE_ADD_BINARY_CHAR (c, dst);
-		}
+	      n++, src--;/* Repeat the loop with the same character. */
 	    }
-	  ch = 0;
+	  ch = -1;
 	}
       else if (byte_c0_p (c) || byte_c1_p (c))
 	{ /* Control characters */
＠＠ -1829,7 +2314,9 ＠＠

 	  /* If we were in the middle of a character, dump out the
 	     partial character. */
-	  DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+	  if (ch >= 0)
+	    DECODE_ADD_BINARY_CHAR (ch, dst);
+	  ch = -1;

 	  /* If we just saw a single-shift character, dump it out.
 	     This may dump out the wrong sort of single-shift character,
＠＠ -1856,7 +2343,6 ＠＠
       else
 	{			/* Graphic characters */
 	  Lisp_Object charset;
-	  int lb;
 	  int reg;

 	  /* Now determine the charset. */
＠＠ -1868,18 +2354,19 ＠＠

 	  /* Error checking: */
 	  if (! CHARSETP (charset)
-	      || data->invalid_designated[reg]
 	      || (((c & 0x7F) == ' ' || (c & 0x7F) == ISO_CODE_DEL)
-		  && XCHARSET_CHARS (charset) == 94))
+		  && (XCHARSET_CHARS (charset, 0) == 94 ||
+                      XCHARSET_CHARS (charset, 1) == 94)))
 	    /* Mrmph.  We are trying to invoke a register that has no
 	       or an invalid charset in it, or trying to add a character
 	       outside the range of the charset.  Insert that char literally
 	       to preserve it for the output. */
 	    {
-	      DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+	      if (ch >= 0)
+		DECODE_ADD_BINARY_CHAR (ch, dst);
+	      ch = -1;
 	      DECODE_ADD_BINARY_CHAR (c, dst);
 	    }
-
 	  else
 	    {
 	      /* Things are probably hunky-dorey. */
＠＠ -1897,67 +2384,56 ＠＠
 		    charset = new_charset;
 		}

-	      lb = XCHARSET_LEADING_BYTE (charset);
-	      switch (XCHARSET_REP_BYTES (charset))
+	      if (XCHARSET_DIMENSION (charset) == 2 && ch < 0)
+		ch = c;
+	      else
 		{
-		case 1:	/* ASCII */
-		  DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
-		  Dynarr_add (dst, c & 0x7F);
-		  break;
+		  int c1, c2;
+		  c1 = XCHARSET_DIMENSION (charset) == 2 ? ch & 0x7F : 0;
+		  c2 = c & 0x7F;
+
+		  if (XCHARSET_OFFSET (charset, 0) >= 128)
+		    c1 += 128;
+		  if (XCHARSET_OFFSET (charset, 1) >= 128)
+		    c2 += 128;

-		case 2:	/* one-byte official */
-		  DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
-		  Dynarr_add (dst, lb);
-		  Dynarr_add (dst, c | 0x80);
-		  break;
-
-		case 3:	/* one-byte private or two-byte official */
-		  if (XCHARSET_PRIVATE_P (charset))
-		    {
-		      DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
-		      Dynarr_add (dst, PRE_LEADING_BYTE_PRIVATE_1);
-		      Dynarr_add (dst, lb);
-		      Dynarr_add (dst, c | 0x80);
-		    }
-		  else
+#ifdef UNICODE_INTERNAL
+                  if (XCODING_SYSTEM_ISO2022_ISO2022_PRESERVE (str->codesys))
 		    {
-		      if (ch)
+		      if (EQ (charset, Vcharset_ascii))
+			Dynarr_add (dst, (unsigned char) c2);
+		      else
 			{
-			  Dynarr_add (dst, lb);
-			  Dynarr_add (dst, ch | 0x80);
-			  Dynarr_add (dst, c | 0x80);
-			  ch = 0;
+			  int priv =
+			    charset_codepoint_to_private_unicode (charset, c1,
+								  c2);
+			  Dynarr_add_ichar (dst, (Ichar) priv);
 			}
-		      else
-			ch = c;
 		    }
-		  break;
-
-		default:	/* two-byte private */
-		  if (ch)
+		  else
+#endif /* UNICODE_INTERNAL */
 		    {
-		      Dynarr_add (dst, PRE_LEADING_BYTE_PRIVATE_2);
-		      Dynarr_add (dst, lb);
-		      Dynarr_add (dst, ch | 0x80);
-		      Dynarr_add (dst, c | 0x80);
-		      ch = 0;
+		      /* !!#### Handle error differenly? Especially here! This
+			 is the main place where we convert an ISO-2022-encoded
+			 char in a national charset to Unicode. */
+		      charset_codepoint_to_dynarr
+			(charset, c1, c2, dst, CONVERR_SUCCEED);
 		    }
-		  else
-		    ch = c;
+		  ch = -1;
 		}
 	    }

-	  if (!ch)
+	  if (ch < 0)
 	    flags &= ISO_STATE_LOCK;
 	}

     }

-  if (str->eof)
-    DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
-
   data->flags = flags;
   str->ch    = ch;
+
+  DECODE_OUTPUT_PARTIAL_CHAR (str, dst);
+
   return orign;
 }

＠＠ -1972,21 +2448,26 ＠＠
 {
   static const char inter94[] = "()*+";
   static const char inter96[] = ",-./";
-  int type;
   unsigned char final;
   struct iso2022_coding_stream *data =
     CODING_STREAM_TYPE_DATA (str, iso2022);
   Lisp_Object old_charset = data->charset[reg];
+  int type;

   data->charset[reg] = charset;
+  if (!EQ (old_charset, charset))
+    /* If we are changing the set of designated charsets, recalculate
+       the Unicode precedence used to convert characters to ISO2022. */
+    reset_iso2022_unicode_precedence (data);
   if (!CHARSETP (charset))
     /* charset might be an initial nil or t. */
     return;
-  type = XCHARSET_TYPE (charset);
+  type = get_charset_iso2022_type (charset);
+  text_checking_assert (type >= 0);
   final = XCHARSET_FINAL (charset);
   if (!data->force_charset_on_output[reg] &&
       CHARSETP (old_charset) &&
-      XCHARSET_TYPE (old_charset) == type &&
+      get_charset_iso2022_type (old_charset) == type &&
       XCHARSET_FINAL (old_charset) == final)
     return;

＠＠ -2066,17 +2547,10 ＠＠
 iso2022_encode (struct coding_stream *str, const Ibyte *src,
 		unsigned_char_dynarr *dst, Bytecount n)
 {
-  unsigned char charmask;
-  Ibyte c;
-  unsigned char char_boundary;
-  unsigned int ch             = str->ch;
-  Lisp_Object codesys         = str->codesys;
+  Lisp_Object codesys = str->codesys;
   int i;
-  Lisp_Object charset;
-  int half;
-  struct iso2022_coding_stream *data =
-    CODING_STREAM_TYPE_DATA (str, iso2022);
-  unsigned int flags          = data->flags;
+  struct iso2022_coding_stream *data = CODING_STREAM_TYPE_DATA (str, iso2022);
+  unsigned int flags = data->flags;
   Bytecount orign = n;

 #ifdef ENABLE_COMPOSITE_CHARS
＠＠ -2087,21 +2561,15 ＠＠
   int in_composite = 0;
 #endif /* ENABLE_COMPOSITE_CHARS */

-  char_boundary = data->current_char_boundary;
-  charset = data->current_charset;
-  half = data->current_half;
-
 #ifdef ENABLE_COMPOSITE_CHARS
  back_to_square_n:
 #endif
   while (n--)
     {
-      c = *src++;
+      Ibyte c = *src++;

       if (byte_ascii_p (c))
 	{		/* Processing ASCII character */
-	  ch = 0;
-
 	  restore_left_to_right_direction (codesys, dst, &flags, 0);

 	  /* Make sure G0 contains ASCII */
＠＠ -2143,199 +2611,196 ＠＠
 	      && fit_to_be_escape_quoted (c))
 	    Dynarr_add (dst, ISO_CODE_ESC);
 	  Dynarr_add (dst, c);
-	  char_boundary = 1;
 	}
-
-      else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
-	{ /* Processing Leading Byte */
-	  ch = 0;
-	  charset = charset_by_leading_byte (c);
-	  if (leading_byte_prefix_p (c))
-	    ch = c;
-	  else if (!EQ (charset, Vcharset_control_1)
-		   && !EQ (charset, Vcharset_composite))
-	    {
-	      int reg;
-
-	      ensure_correct_direction (XCHARSET_DIRECTION (charset),
-					codesys, dst, &flags, 0);
-
-	      /* Now determine which register to use. */
-	      reg = -1;
-	      for (i = 0; i < 4; i++)
+      else
+	{
+	  COPY_PARTIAL_CHAR_BYTE (c, str);
+	  if (!str->pind_remaining)
+	    {
+	      /* We have a full char. */
+	      Lisp_Object charset;
+	      int c1, c2;
+	      int half = 0;
+
+	      /* Convert character to a charset codepoint. */
+	      non_ascii_itext_to_charset_codepoint_raw
+		(str->partial, data->unicode_precedence, &charset, &c1, &c2);
+	      /* Now, find the register containing this charset.
+		 If none, put this charset in an appropriate register and
+		 output an appropriate escape sequence to designate that
+		 the charset is in the register. */
+	      if (!EQ (charset, Vcharset_control_1)
+		  && !EQ (charset, Vcharset_composite))
 		{
-		  if (EQ (charset, data->charset[i]) ||
-		      EQ (charset,
-			  XCODING_SYSTEM_ISO2022_INITIAL_CHARSET (codesys, i)))
-		    {
-		      reg = i;
-		      break;
-		    }
-		}
+		  int reg;

-	      if (reg == -1)
-		{
-		  if (XCHARSET_GRAPHIC (charset) != 0)
+		  /* We should have only ISO2022-compatible charsets in
+		     str->unicode_precedence, except control-1. */
+		  text_checking_assert
+		    (NILP (charset) ||
+		     get_charset_iso2022_type (charset) != -1);
+		  if (NILP (charset))
 		    {
-		      if (!NILP (data->charset[1]) &&
-			  (!XCODING_SYSTEM_ISO2022_SEVEN (codesys) ||
-			   XCODING_SYSTEM_ISO2022_LOCK_SHIFT (codesys)))
-			reg = 1;
-		      else if (!NILP (data->charset[2]))
-			reg = 2;
-		      else if (!NILP (data->charset[3]))
-			reg = 3;
-		      else
-			reg = 0;
+		      Lisp_Object chr = make_char (itext_ichar (str->partial));
+		      if (NILP (memq_no_quit (chr, data->warned_chars)))
+			{
+			  warn_when_safe_lispobj
+			    (intern ("encoding"),
+			     Qwarning,
+			     emacs_sprintf_string_lisp
+			     ("Unable to encode character #x`%x'",
+			      Qnil, 1, chr));
+			  data->warned_chars = Fcons (chr, data->warned_chars);
+			}
+		      charset = Vcharset_ascii;
+		      c2 = CANT_CONVERT_CHAR_WHEN_ENCODING;
 		    }
-		  else
-		    reg = 0;
-		}

-	      iso2022_designate (charset, reg, str, dst);
+		  ensure_correct_direction (XCHARSET_DIRECTION (charset),
+					    codesys, dst, &flags, 0);

-	      /* Now invoke that register. */
-	      switch (reg)
-		{
-		case 0:
-		  ensure_normal_shift (str, dst);
-		  half = 0;
-		  break;
-
-		case 1:
-		  if (XCODING_SYSTEM_ISO2022_SEVEN (codesys))
+		  /* Now determine which register to use. */
+		  reg = -1;
+		  for (i = 0; i < 4; i++)
 		    {
-		      ensure_shift_out (str, dst);
-		      half = 0;
+		      if (EQ (charset, data->charset[i]) ||
+			  EQ (charset,
+			      XCODING_SYSTEM_ISO2022_INITIAL_CHARSET
+			      (codesys, i)))
+			{
+			  reg = i;
+			  break;
+			}
 		    }
-		  else
-		    half = 1;
-		  break;

-		case 2:
-		  if (XCODING_SYSTEM_ISO2022_SEVEN (str->codesys))
-		    {
-		      Dynarr_add (dst, ISO_CODE_ESC);
-		      Dynarr_add (dst, 'N');
-		      half = 0;
-		    }
-		  else
+		  if (reg == -1)
 		    {
-		      Dynarr_add (dst, ISO_CODE_SS2);
-		      half = 1;
+		      if (XCHARSET_GRAPHIC (charset) != 0)
+			{
+			  if (!NILP (data->charset[1]) &&
+			      (!XCODING_SYSTEM_ISO2022_SEVEN (codesys) ||
+			       XCODING_SYSTEM_ISO2022_LOCK_SHIFT (codesys)))
+			    reg = 1;
+			  else if (!NILP (data->charset[2]))
+			    reg = 2;
+			  else if (!NILP (data->charset[3]))
+			    reg = 3;
+			  else
+			    reg = 0;
+			}
+		      else
+			reg = 0;
 		    }
-		  break;
+
+		  iso2022_designate (charset, reg, str, dst);

-		case 3:
-		  if (XCODING_SYSTEM_ISO2022_SEVEN (str->codesys))
+		  /* Now invoke that register. */
+		  switch (reg)
 		    {
-		      Dynarr_add (dst, ISO_CODE_ESC);
-		      Dynarr_add (dst, 'O');
+		    case 0:
+		      ensure_normal_shift (str, dst);
 		      half = 0;
-		    }
-		  else
-		    {
-		      Dynarr_add (dst, ISO_CODE_SS3);
-		      half = 1;
+		      break;
+
+		    case 1:
+		      if (XCODING_SYSTEM_ISO2022_SEVEN (codesys))
+			{
+			  ensure_shift_out (str, dst);
+			  half = 0;
+			}
+		      else
+			half = 1;
+		      break;
+
+		    case 2:
+		      if (XCODING_SYSTEM_ISO2022_SEVEN (str->codesys))
+			{
+			  Dynarr_add (dst, ISO_CODE_ESC);
+			  Dynarr_add (dst, 'N');
+			  half = 0;
+			}
+		      else
+			{
+			  Dynarr_add (dst, ISO_CODE_SS2);
+			  half = 1;
+			}
+		      break;
+
+		    case 3:
+		      if (XCODING_SYSTEM_ISO2022_SEVEN (str->codesys))
+			{
+			  Dynarr_add (dst, ISO_CODE_ESC);
+			  Dynarr_add (dst, 'O');
+			  half = 0;
+			}
+		      else
+			{
+			  Dynarr_add (dst, ISO_CODE_SS3);
+			  half = 1;
+			}
+		      break;
+		  
+		    default:
+		      ABORT ();
 		    }
-		  break;
+		}

-		default:
-		  ABORT ();
+	      if (EQ (charset, Vcharset_control_1))
+		{
+		  if (XCODING_SYSTEM_ISO2022_ESCAPE_QUOTED (codesys)
+		      && fit_to_be_escape_quoted (c2))
+		    Dynarr_add (dst, ISO_CODE_ESC);
+		  /* you asked for it ... */
+		  Dynarr_add (dst, c2);
 		}
-	    }
-	  char_boundary = 0;
-	}
-      else
-	{			/* Processing Non-ASCII character */
-	  charmask = (half == 0 ? 0x7F : 0xFF);
-	  char_boundary = 1;
-	  if (EQ (charset, Vcharset_control_1))
-	    {
-	      if (XCODING_SYSTEM_ISO2022_ESCAPE_QUOTED (codesys)
-		  && fit_to_be_escape_quoted (c))
-		Dynarr_add (dst, ISO_CODE_ESC);
-	      /* you asked for it ... */
-	      Dynarr_add (dst, c - 0x20);
-	    }
 #ifndef ENABLE_COMPOSITE_CHARS
-	  else if (EQ (charset, Vcharset_composite))
-	    {
-	      if (c >= 160 || c <= 164) /* Someone might have stuck in
-					   something else */
+	      else if (EQ (charset, Vcharset_composite))
 		{
-		  Dynarr_add (dst, ISO_CODE_ESC);
-		  Dynarr_add (dst, c - 160 + '0');
+		  c2 &= 127;
+		  if (c2 >= 32 || c2 <= 36) /* Someone might have stuck in
+					       something else */
+		    {
+		      Dynarr_add (dst, ISO_CODE_ESC);
+		      Dynarr_add (dst, c2 - 32 + '0');
+		    }
 		}
-	    }
 #endif
-	  else
-	    {
-	      switch (XCHARSET_REP_BYTES (charset))
+	      else
 		{
-		case 2:
-		  Dynarr_add (dst, c & charmask);
-		  break;
-		case 3:
-		  if (XCHARSET_PRIVATE_P (charset))
-		    {
-		      Dynarr_add (dst, c & charmask);
-		      ch = 0;
-		    }
-		  else if (ch)
-		    {
+		  /* Processing Non-ASCII character */
 #ifdef ENABLE_COMPOSITE_CHARS
-		      if (EQ (charset, Vcharset_composite))
+		  if (EQ (charset, Vcharset_composite))
+		    {
+		      if (in_composite)
 			{
-			  if (in_composite)
-			    {
-			      /* #### Bother! We don't know how to
-				 handle this yet. */
-			      Dynarr_add (dst, '~');
-			    }
-			  else
-			    {
-			      Ichar emch = make_ichar (Vcharset_composite,
-						       ch & 0x7F, c & 0x7F);
-			      Lisp_Object lstr = composite_char_string (emch);
-			      saved_n = n;
-			      saved_src = src;
-			      in_composite = 1;
-			      src = XSTRING_DATA   (lstr);
-			      n   = XSTRING_LENGTH (lstr);
-			      Dynarr_add (dst, ISO_CODE_ESC);
-			      Dynarr_add (dst, '0'); /* start composing */
-			    }
+			  /* #### Bother! We don't know how to
+			     handle this yet. */
+			  Dynarr_add (dst, CANT_CONVERT_CHAR_WHEN_ENCODING);
 			}
 		      else
-#endif /* ENABLE_COMPOSITE_CHARS */
 			{
-			  Dynarr_add (dst, ch & charmask);
-			  Dynarr_add (dst, c & charmask);
+			  Ichar emch =
+			    charset_codepoint_to_ichar
+			    (Vcharset_composite, c1, c2, CONVERR_SUCCEED);
+			  Lisp_Object lstr =
+			    composite_char_string (emch);
+			  saved_n = n;
+			  saved_src = src;
+			  in_composite = 1;
+			  src = XSTRING_DATA   (lstr);
+			  n   = XSTRING_LENGTH (lstr);
+			  Dynarr_add (dst, ISO_CODE_ESC);
+			  Dynarr_add (dst, '0'); /* start composing */
 			}
-		      ch = 0;
-		    }
-		  else
-		    {
-		      ch = c;
-		      char_boundary = 0;
 		    }
-		  break;
-		case 4:
-		  if (ch)
-		    {
-		      Dynarr_add (dst, ch & charmask);
-		      Dynarr_add (dst, c & charmask);
-		      ch = 0;
-		    }
 		  else
+#endif /* ENABLE_COMPOSITE_CHARS */
 		    {
-		      ch = c;
-		      char_boundary = 0;
+		      int offset = (half == 0 ? 0 : 0x80);
+		      if (XCHARSET_DIMENSION (charset) == 2)
+			Dynarr_add (dst, (c1 & 127) + offset);
+		      Dynarr_add (dst, (c2 & 127) + offset);
 		    }
-		  break;
-		default:
-		  ABORT ();
 		}
 	    }
 	}
＠＠ -2353,7 +2818,7 ＠＠
     }
 #endif /* ENABLE_COMPOSITE_CHARS */

-  if (char_boundary && str->eof)
+  if (!str->pind_remaining && str->eof)
     {
       restore_left_to_right_direction (codesys, dst, &flags, 0);
       ensure_normal_shift (str, dst);
＠＠ -2366,10 +2831,6 ＠＠
     }

   data->flags = flags;
-  str->ch    = ch;
-  data->current_char_boundary = char_boundary;
-  data->current_charset = charset;
-  data->current_half = half;

   /* Verbum caro factum est! */
   return orign;
＠＠ -2434,6 +2895,15 ＠＠
     }
 }

+static Lisp_Object
+get_valid_iso2022_charset (Lisp_Object value)
+{
+  Lisp_Object charset = Fget_charset (value);
+  if (get_charset_iso2022_type (charset) < 0)
+    invalid_argument ("Charset cannot be used with ISO-2022", value);
+  return charset;
+}
+  
 /* Given a list of charset conversion specs as specified in a Lisp
    program, parse it into STORE_HERE. */

＠＠ -2448,9 +2918,9 ＠＠

       if (!CONSP (car) || !CONSP (XCDR (car)) || !NILP (XCDR (XCDR (car))))
 	invalid_argument ("Invalid charset conversion spec", car);
-      from = Fget_charset (XCAR (car));
-      to = Fget_charset (XCAR (XCDR (car)));
-      if (XCHARSET_TYPE (from) != XCHARSET_TYPE (to))
+      from = get_valid_iso2022_charset (XCAR (car));
+      to = get_valid_iso2022_charset (XCAR (XCDR (car)));
+      if (get_charset_iso2022_type (from) != get_charset_iso2022_type (to))
 	invalid_operation_2
 	  ("Attempted conversion between different charset types",
 	   from, to);
＠＠ -2493,9 +2963,10 ＠＠
 		 Lisp_Object key,
 		 Lisp_Object value)
 {
-#define FROB_INITIAL_CHARSET(charset_num) \
-  XCODING_SYSTEM_ISO2022_INITIAL_CHARSET (codesys, charset_num) = \
-    ((EQ (value, Qt) || EQ (value, Qnil)) ? value : Fget_charset (value))
+ #define FROB_INITIAL_CHARSET(charset_num)				\
+  XCODING_SYSTEM_ISO2022_INITIAL_CHARSET (codesys, charset_num) =	\
+    ((EQ (value, Qt) || EQ (value, Qnil)) ? value :			\
+     get_valid_iso2022_charset (value))

   if      (EQ (key, Qcharset_g0)) FROB_INITIAL_CHARSET (0);
   else if (EQ (key, Qcharset_g1)) FROB_INITIAL_CHARSET (1);
＠＠ -2514,13 +2985,14 ＠＠
 #define FROB_BOOLEAN_PROPERTY(prop) \
   XCODING_SYSTEM_ISO2022_##prop (codesys) = !NILP (value)

-  else if (EQ (key, Qshort))         FROB_BOOLEAN_PROPERTY (SHORT);
-  else if (EQ (key, Qno_ascii_eol))  FROB_BOOLEAN_PROPERTY (NO_ASCII_EOL);
-  else if (EQ (key, Qno_ascii_cntl)) FROB_BOOLEAN_PROPERTY (NO_ASCII_CNTL);
-  else if (EQ (key, Qseven))         FROB_BOOLEAN_PROPERTY (SEVEN);
-  else if (EQ (key, Qlock_shift))    FROB_BOOLEAN_PROPERTY (LOCK_SHIFT);
-  else if (EQ (key, Qno_iso6429))    FROB_BOOLEAN_PROPERTY (NO_ISO6429);
-  else if (EQ (key, Qescape_quoted)) FROB_BOOLEAN_PROPERTY (ESCAPE_QUOTED);
+  else if (EQ (key, Qshort))            FROB_BOOLEAN_PROPERTY (SHORT);
+  else if (EQ (key, Qno_ascii_eol))     FROB_BOOLEAN_PROPERTY (NO_ASCII_EOL);
+  else if (EQ (key, Qno_ascii_cntl))    FROB_BOOLEAN_PROPERTY (NO_ASCII_CNTL);
+  else if (EQ (key, Qseven))            FROB_BOOLEAN_PROPERTY (SEVEN);
+  else if (EQ (key, Qlock_shift))       FROB_BOOLEAN_PROPERTY (LOCK_SHIFT);
+  else if (EQ (key, Qno_iso6429))       FROB_BOOLEAN_PROPERTY (NO_ISO6429);
+  else if (EQ (key, Qescape_quoted))    FROB_BOOLEAN_PROPERTY (ESCAPE_QUOTED);
+  else if (EQ (key, Qiso2022_preserve)) FROB_BOOLEAN_PROPERTY (ISO2022_PRESERVE);

   else if (EQ (key, Qinput_charset_conversion))
     {
＠＠ -2605,13 +3077,14 ＠＠
 #define LISP_BOOLEAN(prop) \
   (XCODING_SYSTEM_ISO2022_##prop (coding_system) ? Qt : Qnil)

-  else if (EQ (prop, Qshort))         return LISP_BOOLEAN (SHORT);
-  else if (EQ (prop, Qno_ascii_eol))  return LISP_BOOLEAN (NO_ASCII_EOL);
-  else if (EQ (prop, Qno_ascii_cntl)) return LISP_BOOLEAN (NO_ASCII_CNTL);
-  else if (EQ (prop, Qseven))         return LISP_BOOLEAN (SEVEN);
-  else if (EQ (prop, Qlock_shift))    return LISP_BOOLEAN (LOCK_SHIFT);
-  else if (EQ (prop, Qno_iso6429))    return LISP_BOOLEAN (NO_ISO6429);
-  else if (EQ (prop, Qescape_quoted)) return LISP_BOOLEAN (ESCAPE_QUOTED);
+  else if (EQ (prop, Qshort))            return LISP_BOOLEAN (SHORT);
+  else if (EQ (prop, Qno_ascii_eol))     return LISP_BOOLEAN (NO_ASCII_EOL);
+  else if (EQ (prop, Qno_ascii_cntl))    return LISP_BOOLEAN (NO_ASCII_CNTL);
+  else if (EQ (prop, Qseven))            return LISP_BOOLEAN (SEVEN);
+  else if (EQ (prop, Qlock_shift))       return LISP_BOOLEAN (LOCK_SHIFT);
+  else if (EQ (prop, Qno_iso6429))       return LISP_BOOLEAN (NO_ISO6429);
+  else if (EQ (prop, Qescape_quoted))    return LISP_BOOLEAN (ESCAPE_QUOTED);
+  else if (EQ (prop, Qiso2022_preserve)) return LISP_BOOLEAN (ISO2022_PRESERVE);

   else if (EQ (prop, Qinput_charset_conversion))
     return
＠＠ -2656,6 +3129,9 ＠＠
   FROB (Qlock_shift);
   FROB (Qno_iso6429);
   FROB (Qescape_quoted);
+  FROB (Qiso2022_preserve);
+
+#undef FROB

   {
     Lisp_Object val =
＠＠ -2663,14 +3139,16 ＠＠
       (XCODING_SYSTEM_ISO2022_INPUT_CONV (cs), 1);
     if (!NILP (val))
       {
-	write_fmt_string_lisp (printcharfun, ", input-charset-conversion=%s", 1,
val);
+	write_fmt_string_lisp (printcharfun, ", input-charset-conversion=%s",
+			       1, val);
       }
     val =
       unparse_charset_conversion_specs
       (XCODING_SYSTEM_ISO2022_OUTPUT_CONV (cs), 1);
     if (!NILP (val))
       {
-	write_fmt_string_lisp (printcharfun, ", output-charset-conversion=%s", 1,
val);
+	write_fmt_string_lisp (printcharfun, ", output-charset-conversion=%s",
+			       1, val);
       }
     write_c_string (printcharfun, ")");
   }
＠＠ -2696,6 +3174,9 ＠＠
 /* ISO2022 system using locking shift */
 DEFINE_DETECTOR_CATEGORY (iso2022, iso_lock_shift);

+/* ~~#### Must properly mark the coding stream in here; these form parts of
+   the struct detection_state, so it in turn must be a Lisp object and
+   have mark methods for the appropriate parts. */
 struct iso2022_detector
 {
   int initted;
＠＠ -2940,6 +3421,8 ＠＠
     xfree (data->iso, struct iso2022_coding_stream *);
 }

+#ifdef HAVE_CCL
+

 /************************************************************************/
 /*                               CCL methods                            */
＠＠ -3086,6 +3569,8 ＠＠
     return Qunbound;
 }

+#endif /* HAVE_CCL */
+

 /************************************************************************/
 /*                             Initialization                           */
＠＠ -3103,6 +3588,7 ＠＠
   DEFSYMBOL (Qshift_jis);
   DEFSYMBOL (Qccl);
   DEFSYMBOL (Qiso2022);
+  DEFSYMBOL (Qmbcs);

   DEFSYMBOL (Qcharset_g0);
   DEFSYMBOL (Qcharset_g1);
＠＠ -3112,7 +3598,6 ＠＠
   DEFSYMBOL (Qforce_g1_on_output);
   DEFSYMBOL (Qforce_g2_on_output);
   DEFSYMBOL (Qforce_g3_on_output);
-  DEFSYMBOL (Qno_iso6429);
   DEFSYMBOL (Qinput_charset_conversion);
   DEFSYMBOL (Qoutput_charset_conversion);

＠＠ -3121,22 +3606,36 ＠＠
   DEFSYMBOL (Qno_ascii_cntl);
   DEFSYMBOL (Qseven);
   DEFSYMBOL (Qlock_shift);
+  DEFSYMBOL (Qno_iso6429);
+  DEFSYMBOL (Qiso2022_preserve);

   DEFSYMBOL (Qiso_7);
   DEFSYMBOL (Qiso_8_designate);
   DEFSYMBOL (Qiso_8_1);
   DEFSYMBOL (Qiso_8_2);
   DEFSYMBOL (Qiso_lock_shift);
+
+  DEFSYMBOL (Qcharsets);
 }

 void
 coding_system_type_create_mule_coding (void)
 {
+  INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (mbcs, "mbcs-coding-system-p");
+  CODING_SYSTEM_HAS_METHOD (mbcs, convert);
+  CODING_SYSTEM_HAS_METHOD (mbcs, init);
+  CODING_SYSTEM_HAS_METHOD (mbcs, mark);
+  CODING_SYSTEM_HAS_METHOD (mbcs, finalize);
+  CODING_SYSTEM_HAS_METHOD (mbcs, putprop);
+  CODING_SYSTEM_HAS_METHOD (mbcs, getprop);
+  CODING_SYSTEM_HAS_METHOD (mbcs, print);
+
   INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (iso2022,
"iso2022-coding-system-p");
   CODING_SYSTEM_HAS_METHOD (iso2022, mark);
   CODING_SYSTEM_HAS_METHOD (iso2022, convert);
   CODING_SYSTEM_HAS_METHOD (iso2022, finalize_coding_stream);
   CODING_SYSTEM_HAS_METHOD (iso2022, init_coding_stream);
+  CODING_SYSTEM_HAS_METHOD (iso2022, mark_coding_stream);
   CODING_SYSTEM_HAS_METHOD (iso2022, rewind_coding_stream);
   CODING_SYSTEM_HAS_METHOD (iso2022, init);
   CODING_SYSTEM_HAS_METHOD (iso2022, print);
＠＠ -3153,6 +3652,7 ＠＠
   INITIALIZE_DETECTOR_CATEGORY (iso2022, iso_8_2);
   INITIALIZE_DETECTOR_CATEGORY (iso2022, iso_lock_shift);

+#ifdef HAVE_CCL
   INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (ccl, "ccl-coding-system-p");
   CODING_SYSTEM_HAS_METHOD (ccl, mark);
   CODING_SYSTEM_HAS_METHOD (ccl, convert);
＠＠ -3161,6 +3661,7 ＠＠
   CODING_SYSTEM_HAS_METHOD (ccl, rewind_coding_stream);
   CODING_SYSTEM_HAS_METHOD (ccl, putprop);
   CODING_SYSTEM_HAS_METHOD (ccl, getprop);
+#endif /* HAVE_CCL */

   INITIALIZE_CODING_SYSTEM_TYPE (shift_jis, "shift-jis-coding-system-p");
   CODING_SYSTEM_HAS_METHOD (shift_jis, convert);
＠＠ -3181,7 +3682,9 ＠＠
 reinit_coding_system_type_create_mule_coding (void)
 {
   REINITIALIZE_CODING_SYSTEM_TYPE (iso2022);
+#ifdef HAVE_CCL
   REINITIALIZE_CODING_SYSTEM_TYPE (ccl);
+#endif /* HAVE_CCL */
   REINITIALIZE_CODING_SYSTEM_TYPE (shift_jis);
   REINITIALIZE_CODING_SYSTEM_TYPE (big5);
 }
＠＠ -3194,4 +3697,30 ＠＠
 void
 vars_of_mule_coding (void)
 {
+}
+
+void
+complex_vars_of_mule_coding (void)
+{
+  /* #### Hack!  These should use actual shift-jis and big5 charsets.
+     There should be a general multibyte codec to handle both of these,
+     and similar variants. */
+  shift_jis_precedence =
+    convert_charset_list_to_precedence_dynarr
+    (list3 (Vcharset_japanese_jisx0208, Vcharset_japanese_jisx0208_1978,
+	    Vcharset_katakana_jisx0201));
+  dump_add_root_block_ptr (&shift_jis_precedence,
+			   &Lisp_Object_dynarr_description);
+
+#ifdef UNICODE_INTERNAL
+  big5_precedence =
+    convert_charset_list_to_precedence_dynarr
+    (list1 (Vcharset_chinese_big5));
+#else /* not UNICODE_INTERNAL */
+  big5_precedence =
+    convert_charset_list_to_precedence_dynarr
+    (list2 (Vcharset_chinese_big5_1, Vcharset_chinese_big5_2));
+#endif /* UNICODE_INTERNAL */
+  dump_add_root_block_ptr (&big5_precedence,
+			   &Lisp_Object_dynarr_description);
 }
Index: src/mule-wnnfns.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-wnnfns.c,v
retrieving revision 1.26
diff -u -r1.26 mule-wnnfns.c
--- src/mule-wnnfns.c	2004/11/04 23:06:43	1.26
+++ src/mule-wnnfns.c	2005/11/22 14:00:55
＠＠ -1,6 +1,7 ＠＠
 /* -*- coding: iso-2022-jp -*-
    Copyright (C) 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
+   Copyright (C) 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -286,13 +287,13 ＠＠
 #define WNNSERVER_K 3

 int check_wnn_server_type (void);
-void w2m (w_char *wp, unsigned char *mp, unsigned char lb);
-void m2w (unsigned char *mp, w_char *wp);
+void w2m (w_char *wp, Ibyte *mp, Lisp_Object charset);
+void m2w (Ibyte *mp, w_char *wp);
 void w2y (w_char *w);
-void c2m (unsigned char *cp, unsigned char *mp, unsigned char lb);
+void c2m (UExtbyte *cp, Ibyte *mp, Lisp_Object charset);
 static void puts2 (char *s);
 static int dai_end (int no, int server);
-static int yes_or_no (unsigned char *s);
+static int yes_or_no (UExtbyte *s);

  /* Why doesn't wnn have a prototype for these? */
 typedef unsigned int letter;
＠＠ -303,8 +304,7 ＠＠
 static struct wnn_env *wnnfns_env_norm[NSERVER];
 static struct wnn_env *wnnfns_env_rev[NSERVER];
 static int wnnfns_norm;
-static unsigned char lb_wnn_server_type[NSERVER] =
-{LEADING_BYTE_JAPANESE_JISX0208, LEADING_BYTE_CHINESE_GB2312, LEADING_BYTE_THAI_TIS620,
LEADING_BYTE_KOREAN_KSC5601};
+static Lisp_Object charset_wnn_server_type[NSERVER];

 /* Lisp Variables and Constants Definition */
 Lisp_Object	Qjserver;
＠＠ -321,7 +321,6 ＠＠
 Lisp_Object	Vcwnn_zhuyin;
 Lisp_Object	Vwnnenv_sticky;
 Lisp_Object	Vwnn_uniq_level;
-Fixnum		lb_sisheng;

 /* Lisp functions definition */

＠＠ -332,12 +331,11 ＠＠
 */
      (hname, lname))
 {
-  char *envname;
-  char *langname;
-  char *hostname;
+  Extbyte *envname;
+  Ascbyte *langname;
+  Extbyte *hostname;
   int	snum;
   int size;
-  CHECK_STRING (lname);

   snum = check_wnn_server_type ();
   switch (snum)
＠＠ -360,35 +358,31 ＠＠
     default:
       return Qnil;
     }
-  size = XSTRING_LENGTH (lname) > 1024 ? 1026 : XSTRING_LENGTH (lname) + 2;
-  /* !!#### */
-  envname = (char *) ALLOCA (size);
-  strncpy (envname, (char *) XSTRING_DATA (lname), size-2);
-  envname[size-2] = '\0';
+  /* #### This is extremely stupid.  I'm sure these alloca() copies are
+     unnecessary, but the old code went out of its way to do this. --ben */
+  CHECK_STRING (lname);
+  EXTBYTE_STRING_TO_ALLOCA (NEW_LISP_STRING_TO_EXTERNAL (lname, Qnative),
+			    envname);
   if (NILP (hname)) hostname = "";
   else
     {
       CHECK_STRING (hname);
-      size = XSTRING_LENGTH(hname) > 1024 ? 1025 : XSTRING_LENGTH(hname) + 1;
-
-      hostname = (char *) ALLOCA (size);
-      strncpy (hostname, (char *) XSTRING_DATA (hname), size-1);
-      hostname[size-1] = '\0';
+      EXTBYTE_STRING_TO_ALLOCA (NEW_LISP_STRING_TO_EXTERNAL (hname, Qnative),
+				hostname);
     }
-  CHECK_STRING (lname);
   /* 97/4/16 jhod(a)po.iijnet.or.jp
    * libwnn uses SIGALRM, so we need to stop and start interrupts.
    */
-  stop_interrupts();
+  stop_interrupts ();
   if (!(wnnfns_buf[snum] = jl_open_lang (envname, hostname, langname,
 					 0, 0, 0, EGG_TIMEOUT)))
     {
-      start_interrupts();
+      start_interrupts ();
       return Qnil;
     }
   if (!jl_isconnect (wnnfns_buf[snum]))
     {
-      start_interrupts();
+      start_interrupts ();
       return Qnil;
     }
   wnnfns_env_norm[snum] = jl_env_get (wnnfns_buf[snum]);
＠＠ -398,12 +392,12 ＠＠
   if (!(wnnfns_env_rev[snum] = jl_connect_lang (envname, hostname, langname,
 						0, 0, 0, EGG_TIMEOUT)))
     {
-      start_interrupts();
+      start_interrupts ();
       return Qnil;
     }
 /*  if (Vwnnenv_sticky == Qt) jl_env_sticky_e (wnnfns_env_rev[snum]);
     else jl_env_un_sticky_e (wnnfns_env_rev[snum]);*/
-  start_interrupts();
+  start_interrupts ();
   return Qt;
 }

＠＠ -455,14 +449,16 ＠＠
   GCPRO1 (*args);
   gcpro1.nvars = nargs;
   if (jl_dic_add (wnnfns_buf[snum],
-		  XSTRING_DATA (args[0]),
-		  XSTRING_DATA (args[1]),
+		  NEW_LISP_STRING_TO_EXTERNAL (args[0], Qfile_name),
+		  NEW_LISP_STRING_TO_EXTERNAL (args[1], Qfile_name),
 		  wnnfns_norm ? WNN_DIC_ADD_NOR : WNN_DIC_ADD_REV,
 		  XINT (args[2]),
 		  NILP (args[3]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
 		  NILP (args[4]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
-		  NILP (args[5]) ? 0 : XSTRING_DATA (args[5]),
-		  NILP (args[6]) ? 0 : XSTRING_DATA (args[6]),
+		  NILP (args[5]) ? 0 :
+		  NEW_LISP_STRING_TO_EXTERNAL (args[5], Qfile_name),
+		  NILP (args[6]) ? 0 :
+		  NEW_LISP_STRING_TO_EXTERNAL (args[6], Qfile_name),
 		  yes_or_no,
 		  puts2 ) < 0)
     {
＠＠ -495,13 +491,13 ＠＠
 {
   WNN_DIC_INFO	*dicinfo;
   int		cnt, i;
-  unsigned char	comment[1024];
+  Ibyte		comment[1024];
   Lisp_Object	val;
   int	snum;
-  unsigned char lb;
+  Lisp_Object charset;

   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
 #ifdef	WNN6
   if((cnt = jl_fi_dic_list (wnnfns_buf[snum], 0x3f, &dicinfo)) < 0)
＠＠ -513,14 +509,11 ＠＠
   for (i = 0, dicinfo += cnt; i < cnt; i++)
     {
       dicinfo--;
-      w2m (dicinfo->comment, comment, lb);
-      /* #### The following has not been Mule-ized!!
-         fname and comment must be ASCII strings! */
+      w2m (dicinfo->comment, comment, charset);
       val =
 	Fcons (Fcons (make_int (dicinfo->dic_no),
-		      list4 (make_string ((Ibyte *) (dicinfo->fname),
-					  strlen (dicinfo->fname)),
-			     make_string (comment, strlen ((char *) comment)),
+		      list4 (build_ext_string (dicinfo->fname, Qfile_name),
+			     build_intstring (comment),
 			     make_int (dicinfo->gosuu),
 			     make_int (dicinfo->nice))), val);
     }
＠＠ -630,17 +623,17 ＠＠
 */
      (kouhoNo))
 {
-  unsigned char	kanji_buf[256];
+  Ibyte		kanji_buf[256];
   w_char	wbuf[256];
   int	snum;
-  unsigned char lb;
+  Lisp_Object charsetl
   CHECK_INT (kouhoNo);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   jl_get_zenkouho_kanji (wnnfns_buf[snum], XINT (kouhoNo), wbuf);
-  w2m (wbuf, kanji_buf, lb);
-  return make_string (kanji_buf, strlen ((char *) kanji_buf));
+  w2m (wbuf, kanji_buf, charset);
+  return build_intstring (kanji_buf);
 }

 DEFUN ("wnn-server-zenkouho-bun", Fwnn_zenkouho_bun, 0, 0, 0, /*
＠＠ -739,14 +732,14 ＠＠
      (bunNo))
 {
   Lisp_Object		val;
-  unsigned char		cbuf[512];
+  Ibyte			cbuf[512];
   w_char		wbuf[256];
   int			bun_no, yomilen, jirilen, i;
   int	snum;
-  unsigned char		lb;
+  Lisp_Object		charset;
   CHECK_INT (bunNo);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   bun_no = XINT (bunNo);
   val = Qnil;
＠＠ -762,11 +755,11 ＠＠
   jirilen = wnnfns_buf[snum]->bun[bun_no]->jirilen;
   for (i = yomilen; i >= jirilen; i--) wbuf[i+1] = wbuf[i];
   wbuf[jirilen] = '+';
-  w2m (wbuf, cbuf, lb);
-  val = Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
+  w2m (wbuf, cbuf, charset);
+  val = Fcons (build_intstring (cbuf), val);
   jl_get_kanji (wnnfns_buf[snum], bun_no, bun_no + 1, wbuf);
-  w2m (wbuf, cbuf, lb);
-  return Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
+  w2m (wbuf, cbuf, charset);
+  return Fcons (build_intstring (cbuf), val);
 }

＠＠ -787,20 +780,19 ＠＠
      (bunNo))
 {
   int		no;
-  unsigned char		kanji_buf[256];
+  Ibyte			kanji_buf[256];
   w_char		wbuf[256];
   int			kanji_len;
   int			snum;
-  unsigned char		lb;
+  Lisp_Object		charset;
   CHECK_INT (bunNo);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   no = XINT (bunNo);
   kanji_len = jl_get_kanji (wnnfns_buf[snum], no, no + 1, wbuf);
-  w2m (wbuf, kanji_buf, lb);
-  return Fcons (make_string (kanji_buf, strlen ((char *) kanji_buf)),
-		make_int (kanji_len));
+  w2m (wbuf, kanji_buf, charset);
+  return Fcons (build_intstring (kanji_buf), make_int (kanji_len));
 }

 DEFUN ("wnn-server-bunsetu-yomi", Fwnn_bunsetu_yomi, 1, 1, 0, /*
＠＠ -809,20 +801,19 ＠＠
      (bunNo))
 {
   int		no;
-  unsigned char		yomi_buf[256];
+  Ibyte			yomi_buf[256];
   w_char		wbuf[256];
   int			yomi_len;
   int			snum;
-  unsigned char		lb;
+  Lisp_Object		charset;
   CHECK_INT (bunNo);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   no = XINT (bunNo);
   yomi_len = jl_get_yomi (wnnfns_buf[snum], no, no + 1, wbuf);
-  w2m (wbuf, yomi_buf, lb);
-  return Fcons (make_string (yomi_buf, strlen ((char *) yomi_buf)),
-		make_int (yomi_len));
+  w2m (wbuf, yomi_buf, charset);
+  return Fcons (build_intstring (yomi_buf), make_int (yomi_len));
 }

 DEFUN ("wnn-server-bunsetu-suu", Fwnn_bunsetu_suu, 0, 0, 0, /*
＠＠ -925,13 +916,13 ＠＠
 {
   Lisp_Object		val;
   struct wnn_jdata	*info_buf;
-  unsigned char		cbuf[512];
+  Ibyte			cbuf[512];
   int			snum;
-  unsigned char		lb;
+  Lisp_Object		charset;
   CHECK_INT (no);
   CHECK_INT (serial);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   if ((info_buf =  jl_word_info (wnnfns_buf[snum],
 				 XINT (no), XINT (serial))) != NULL)
＠＠ -943,12 +934,12 ＠＠
       val = Qnil;
       val = Fcons (make_int (info_buf->hinshi), val);
       val = Fcons (make_int (info_buf->hindo), val);
-      w2m (info_buf->com, cbuf, lb);
-      val = Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
-      w2m (info_buf->kanji, cbuf, lb);
-      val = Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
-      w2m (info_buf->yomi, cbuf, lb);
-      val = Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
+      w2m (info_buf->com, cbuf, charset);
+      val = Fcons (build_intstring (cbuf), val);
+      w2m (info_buf->kanji, cbuf, charset);
+      val = Fcons (build_intstring (cbuf), val);
+      w2m (info_buf->yomi, cbuf, charset);
+      val = Fcons (build_intstring (cbuf), val);
       return val;
     }
 }
＠＠ -984,13 +975,13 ＠＠
   Lisp_Object		val;
   struct wnn_jdata	*wordinfo;
   int			i, count;
-  w_char			wbuf[256];
-  unsigned char		kanji_buf[256];
+  w_char		wbuf[256];
+  Ibyte			kanji_buf[256];
   int			snum;
-  unsigned char		lb;
+  Lisp_Object		charset;
   CHECK_STRING (yomi);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   m2w (XSTRING_DATA (yomi), wbuf);
   if (snum == WNNSERVER_C)
＠＠ -1002,8 +993,8 ＠＠
   for (i = 0, wordinfo += count; i < count; i++)
     {
       wordinfo--;
-      w2m (wordinfo->kanji, kanji_buf, lb);
-      val = Fcons (Fcons (make_string (kanji_buf, strlen ((char *) kanji_buf)),
+      w2m (wordinfo->kanji, kanji_buf, charset);
+      val = Fcons (Fcons (build_intstring (kanji_buf),
 			  list4 (make_int (wordinfo->hinshi),
 				 make_int (wordinfo->hindo),
 				 make_int (wordinfo->dic_no),
＠＠ -1134,14 +1125,14 ＠＠
 */
      ())
 {
-  unsigned char mbuf[256];
-  char 			*msgp;
-  int			snum;
-  unsigned char		lb;
-  char  langname[32];
+  Ibyte		mbuf[256];
+  char 		*msgp;
+  int		snum;
+  Lisp_Object	charset;
+  char		langname[32];
 /*  CHECK_INT (errno);*/
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   switch (snum)
     {
     case WNNSERVER_J:
＠＠ -1162,8 +1153,8 ＠＠
   if (!wnnfns_buf[snum]) return Qnil;
 /*  msgp = msg_get (wnn_msg_cat, XINT (errno), 0, 0);*/
   msgp = wnn_perror_lang (langname);
-  c2m ((unsigned char *) msgp, mbuf, lb);
-  return make_string (mbuf, strlen ((char *) mbuf));
+  c2m ((UExtbyte *) msgp, mbuf, charset);
+  return build_intstring (mbuf);
 }

＠＠ -1176,7 +1167,8 ＠＠
   CHECK_STRING (file);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
   if (!wnnfns_buf[snum]) return Qnil;
-  if (jl_fuzokugo_set (wnnfns_buf[snum], XSTRING_DATA (file)) < 0)
+  if (jl_fuzokugo_set (wnnfns_buf[snum],
+		       NEW_LISP_STRING_TO_EXTERNAL (file, Qfile_name)) < 0)
     return Qnil;
   return Qt;
 }
＠＠ -1191,7 +1183,7 ＠＠
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
   if (!wnnfns_buf[snum]) return Qnil;
   if (jl_fuzokugo_get (wnnfns_buf[snum], fname) < 0) return Qnil;
-  return make_string ((Ibyte *) fname, strlen (fname));
+  return build_ext_string (fname, Qfile_name);
 }

＠＠ -1237,15 +1229,15 ＠＠
 {
   int		cnt;
   Lisp_Object	val;
-  w_char		wbuf[256];
-  w_char		**area;
-  unsigned char	cbuf[512];
+  w_char	wbuf[256];
+  w_char	**area;
+  Ibyte		cbuf[512];
   int		snum;
-  unsigned char lb;
+  Lisp_Object	charset;
   CHECK_INT (dicno);
   CHECK_STRING (name);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   m2w (XSTRING_DATA (name), wbuf);
   if ((cnt = jl_hinsi_list (wnnfns_buf[snum], XINT (dicno), wbuf, &area)) < 0)
＠＠ -1255,8 +1247,8 ＠＠
   for (area += cnt; cnt > 0; cnt--)
     {
       area--;
-      w2m (*area, cbuf, lb);
-      val = Fcons (make_string (cbuf, strlen ((char *) cbuf)), val);
+      w2m (*area, cbuf, charset);
+      val = Fcons (build_intstring (cbuf), val);
     }
   return val;
 }
＠＠ -1266,17 +1258,17 ＠＠
 */
      (no))
 {
-  unsigned char	name[256];
-  w_char		*wname;
-  int			snum;
-  unsigned char		lb;
+  Ibyte		name[256];
+  w_char	*wname;
+  int		snum;
+  Lisp_Object	charset;
   CHECK_INT (no);
   if ((snum = check_wnn_server_type ()) == -1) return Qnil;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   if (!wnnfns_buf[snum]) return Qnil;
   if ((wname = jl_hinsi_name (wnnfns_buf[snum], XINT (no))) == 0) return Qnil;
-  w2m (wname, name, lb);
-  return make_string (name, strlen ((char *) name));
+  w2m (wname, name, charset);
+  return build_intstring (name);
 }
 #ifdef	WNN6
 DEFUN ("wnn-server-fisys-dict-add", Fwnn_fisys_dict_add, 3, MANY, 0, /*
＠＠ -1290,24 +1282,26 ＠＠
   int   snum;
   CHECK_STRING (args[0]);
   CHECK_STRING (args[1]);
-  if (! NILP (args[3])) CHECK_STRING (args[3]);
+  if (!NILP (args[3])) CHECK_STRING (args[3]);
   if ((snum = check_wnn_server_type()) == -1) return Qnil;
-  if(!wnnfns_buf[snum]) return Qnil;
+  if (!wnnfns_buf[snum]) return Qnil;
   GCPRO1 (*args);
   gcpro1.nvars = nargs;
-  if(jl_fi_dic_add(wnnfns_buf[snum],
-		   XSTRING_DATA (args[0]),
-		   XSTRING_DATA (args[1]),
-		   WNN_FI_SYSTEM_DICT,
-		   WNN_DIC_RDONLY,
-		   NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
-		   0,
-		   NILP (args[3]) ? 0 : XSTRING_DATA (args[3]),
-		   yes_or_no,
-		   puts2 ) < 0) {
-    UNGCPRO;
-    return Qnil;
-  }
+  if (jl_fi_dic_add (wnnfns_buf[snum],
+		     NEW_LISP_STRING_TO_EXTERNAL (args[0], Qfile_name),
+		     NEW_LISP_STRING_TO_EXTERNAL (args[1], Qfile_name),
+		     WNN_FI_SYSTEM_DICT,
+		     WNN_DIC_RDONLY,
+		     NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
+		     0,
+		     NILP (args[3]) ? 0 :
+		     NEW_LISP_STRING_TO_EXTERNAL (args[3], Qfile_name),
+		     yes_or_no,
+		     puts2) < 0)
+    {
+      UNGCPRO;
+      return Qnil;
+    }
   UNGCPRO;
   return Qt;
 }
＠＠ -1323,25 +1317,28 ＠＠
   int   snum;
   CHECK_STRING (args[0]);
   CHECK_STRING (args[1]);
-  if (! NILP (args[4])) CHECK_STRING (args[4]);
-  if (! NILP (args[5])) CHECK_STRING (args[5]);
+  if (!NILP (args[4])) CHECK_STRING (args[4]);
+  if (!NILP (args[5])) CHECK_STRING (args[5]);
   if ((snum = check_wnn_server_type()) == -1) return Qnil;
-  if(!wnnfns_buf[snum]) return Qnil;
+  if (!wnnfns_buf[snum]) return Qnil;
   GCPRO1 (*args);
   gcpro1.nvars = nargs;
-  if(jl_fi_dic_add(wnnfns_buf[snum],
-		   XSTRING_DATA (args[0]),
-		   XSTRING_DATA (args[1]),
-		   WNN_FI_USER_DICT,
-		   NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
-		   NILP (args[3]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
-		   NILP (args[4]) ? 0 : XSTRING_DATA (args[4]),
-		   NILP (args[5]) ? 0 : XSTRING_DATA (args[5]),
-		   yes_or_no,
-		   puts2 ) < 0) {
-    UNGCPRO;
-    return Qnil;
-  }
+  if (jl_fi_dic_add (wnnfns_buf[snum],
+		     NEW_LISP_STRING_TO_EXTERNAL (args[0], Qfile_name),
+		     NEW_LISP_STRING_TO_EXTERNAL (args[1], Qfile_name),
+		     WNN_FI_USER_DICT,
+		     NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
+		     NILP (args[3]) ? WNN_DIC_RDONLY : WNN_DIC_RW,
+		     NILP (args[4]) ? 0 :
+		     NEW_LISP_STRING_TO_EXTERNAL (args[4], Qfile_name),
+		     NILP (args[5]) ? 0 :
+		     NEW_LISP_STRING_TO_EXTERNAL (args[5], Qfile_name),
+		     yes_or_no,
+		     puts2) < 0)
+    {
+      UNGCPRO;
+      return Qnil;
+    }
   UNGCPRO;
   return Qt;
 }
＠＠ -1370,37 +1367,47 ＠＠
   else
       cur_env = wnnfns_env_rev[snum];
   dic_no = js_get_autolearning_dic(cur_env, WNN_MUHENKAN_LEARNING);
-  if (dic_no == WNN_NO_LEARNING) {
-      if((dic_no = jl_dic_add(wnnfns_buf[snum],
-			      XSTRING_DATA (args[0]),
-			      0,
-			      wnnfns_norm ? WNN_DIC_ADD_NOR : WNN_DIC_ADD_REV,
-			      XINT(args[1]),
-			      WNN_DIC_RW, WNN_DIC_RW,
-			      NILP (args[3]) ? 0 : XSTRING_DATA (args[3]),
-			      0,
-			      yes_or_no,
-			      puts2)) < 0) {
+  if (dic_no == WNN_NO_LEARNING)
+    {
+      if ((dic_no = jl_dic_add (wnnfns_buf[snum],
+				NEW_LISP_STRING_TO_EXTERNAL (args[0],
+							     Qfile_name),
+				0,
+				wnnfns_norm ? WNN_DIC_ADD_NOR :
+				WNN_DIC_ADD_REV,
+				XINT (args[1]),
+				WNN_DIC_RW, WNN_DIC_RW,
+				NILP (args[3]) ? 0 :
+				NEW_LISP_STRING_TO_EXTERNAL (args[3],
+							     Qfile_name),
+				0,
+				yes_or_no,
+				puts2)) < 0)
+	{
 	  UNGCPRO;
 	  return Qnil;
-      }
-      js_set_autolearning_dic(cur_env, WNN_MUHENKAN_LEARNING, dic_no);
-  }
-  if(!js_is_loaded_temporary_dic(cur_env)) {
-      if(js_temporary_dic_add(cur_env,
-			      wnnfns_norm ? WNN_DIC_ADD_NOR : WNN_DIC_ADD_REV) < 0) {
+	}
+      js_set_autolearning_dic (cur_env, WNN_MUHENKAN_LEARNING, dic_no);
+    }
+  if (!js_is_loaded_temporary_dic (cur_env))
+    {
+      if (js_temporary_dic_add (cur_env,
+				wnnfns_norm ? WNN_DIC_ADD_NOR :
+				WNN_DIC_ADD_REV) < 0)
+	{
 	  UNGCPRO;
           return Qnil;
-      }
-  }
+	}
+    }
   vmask |= WNN_ENV_MUHENKAN_LEARN_MASK;
   henv.muhenkan_flag = NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW;
-  if(jl_set_henkan_env(wnnfns_buf[snum],
-                       vmask,
-                       &henv) < 0) {
+  if (jl_set_henkan_env (wnnfns_buf[snum],
+			 vmask,
+			 &henv) < 0)
+    {
       UNGCPRO;
       return Qnil;
-  }
+    }
   UNGCPRO;
   return Qt;
 }
＠＠ -1429,37 +1436,47 ＠＠
   else
       cur_env = wnnfns_env_rev[snum];
   dic_no = js_get_autolearning_dic(cur_env, WNN_BUNSETSUGIRI_LEARNING);
-  if (dic_no == WNN_NO_LEARNING) {
-      if((dic_no = jl_dic_add(wnnfns_buf[snum],
-                              XSTRING_DATA (args[0]),
-                              0,
-                              wnnfns_norm ? WNN_DIC_ADD_NOR : WNN_DIC_ADD_REV,
-                              XINT(args[1]),
-                              WNN_DIC_RW, WNN_DIC_RW,
-                              NILP (args[3]) ? 0 : XSTRING_DATA (args[3]),
-                              0,
-                              yes_or_no,
-                              puts2)) < 0) {
+  if (dic_no == WNN_NO_LEARNING)
+    {
+      if ((dic_no = jl_dic_add (wnnfns_buf[snum],
+				NEW_LISP_STRING_TO_EXTERNAL (args[0],
+							     Qfile_name),
+				0,
+				wnnfns_norm ? WNN_DIC_ADD_NOR :
+				WNN_DIC_ADD_REV,
+				XINT(args[1]),
+				WNN_DIC_RW, WNN_DIC_RW,
+				NILP (args[3]) ? 0 :
+				NEW_LISP_STRING_TO_EXTERNAL (args[3],
+							     Qfile_name),
+				0,
+				yes_or_no,
+				puts2)) < 0)
+	{
           UNGCPRO;
           return Qnil;
-      }
-      js_set_autolearning_dic(cur_env, WNN_BUNSETSUGIRI_LEARNING, dic_no);
-  }
-  if(!js_is_loaded_temporary_dic(cur_env)) {
-      if(js_temporary_dic_add(cur_env,
-                              wnnfns_norm ? WNN_DIC_ADD_NOR : WNN_DIC_ADD_REV) < 0) {
+	}
+      js_set_autolearning_dic (cur_env, WNN_BUNSETSUGIRI_LEARNING, dic_no);
+    }
+  if (!js_is_loaded_temporary_dic (cur_env))
+    {
+      if (js_temporary_dic_add (cur_env,
+				wnnfns_norm ? WNN_DIC_ADD_NOR :
+				WNN_DIC_ADD_REV) < 0)
+	{
           UNGCPRO;
           return Qnil;
-      }
-  }
+	}
+    }
   vmask |= WNN_ENV_BUNSETSUGIRI_LEARN_MASK;
   henv.bunsetsugiri_flag = NILP (args[2]) ? WNN_DIC_RDONLY : WNN_DIC_RW;
-  if(jl_set_henkan_env(wnnfns_buf[snum],
-                       vmask,
-                       &henv) < 0) {
+  if (jl_set_henkan_env (wnnfns_buf[snum],
+			 vmask,
+			 &henv) < 0)
+    {
       UNGCPRO;
       return Qnil;
-  }
+    }
   UNGCPRO;
   return Qt;
 }
＠＠ -1886,14 +1903,16 ＠＠
       wnnfns_env_norm[i] = (struct wnn_env *) 0;
       wnnfns_env_rev[i] = (struct wnn_env *) 0;
     }
+
+  charset_wnn_server_type[0] = Vcharset_japanese_jisx0208;
+  charset_wnn_server_type[1] = Vcharset_chinese_gb2312;
+  charset_wnn_server_type[2] = Vcharset_thai_tis620;
+  charset_wnn_server_type[3] = Vcharset_korean_ksc5601;
 }

 void
 vars_of_mule_wnn (void)
 {
-  DEFVAR_INT ("lb-sisheng", &lb_sisheng /*
-Leading character for Sisheng.
-*/ );
   DEFVAR_LISP ("wnn-server-type", &Vwnn_server_type /*
 *jserver, cserver ..
 */ );
＠＠ -1916,8 +1935,11 ＠＠
   Fprovide (intern ("wnn"));
 }

+/* Convert from the wide-char format expected for wnn to the XEmacs string
+   format. */
+
 void
-w2m (w_char *wp, unsigned char *mp, unsigned char lb)
+w2m (w_char *wp, Ibyte *mp, Lisp_Object charset)
 {
   w_char	wc;
   w_char	pzy[10];
＠＠ -1937,84 +1959,80 ＠＠
 	      for (i = 0; i < len; i++)
 		{
 		  if (pzy[i] & 0x80)
-		    {
-		      *mp++ = PRE_LEADING_BYTE_PRIVATE_1; /* #### Not sure about this one... */
-		      *mp++ = lb_sisheng;
-		    }
-		  *mp++ = pzy[i];
+		    mp += charset_codepoint_to_itext
+		      (Vcharset_chinese_sisheng, 0, pzy[i] & 0x7f, mp,
+		       CONVERR_SUCCEED);
+		  else
+		    /* !!#### Correct? */
+		    mp += charset_codepoint_to_itext
+		      (Vcharset_ascii, 0, pzy[i] & 0x7f, mp,
+		       CONVERR_SUCCEED);
 		}
 	    }
 	  else
-	    {
-	      *mp++ = LEADING_BYTE_KATAKANA_JISX0201;
-	      *mp++ = (wc & 0xff);
-	    }
+	    mp += charset_codepoint_to_itext (Vcharset_katakana_jisx0201,
+					      0, wc & 0x7f, mp,
+					      CONVERR_SUCCEED);
 	  break;
 	case 0x8080:
-	  *mp++ = lb;
-	  *mp++ = (wc & 0xff00) >> 8;
-	  *mp++ = wc & 0x00ff;
+	  mp += charset_codepoint_to_itext (charset, (wc & 0x7f00) >> 8,
+					    wc & 0x007f, mp,
+					    CONVERR_SUCCEED);
 	  break;
 	case 0x8000:
-	  if (lb == LEADING_BYTE_JAPANESE_JISX0208)
-	    *mp++ = LEADING_BYTE_JAPANESE_JISX0212;
-	  else if (lb == LEADING_BYTE_CHINESE_BIG5_1)
-	    *mp++ = LEADING_BYTE_CHINESE_BIG5_2;
-	  else
-	    *mp++ = lb;
-	  *mp++ = (wc & 0xff00) >> 8;
-	  *mp++ = (wc & 0x00ff) | 0x80;
-	  break;
+	  {
+	    Lisp_Object newchar = charset;
+	    if (EQ (charset, Vcharset_japanese_jisx0208))
+	      newchar = Vcharset_japanese_jisx0212;
+#ifndef UNICODE_INTERNAL
+	    /* ＠＠#### Something very strange about this */
+	    else if (EQ (charset, Vcharset_chinese_big5_1))
+	      newchar = Vcharset_chinese_big5_2;
+#endif /* not UNICODE_INTERNAL */
+	    mp += charset_codepoint_to_itext (newchar, (wc & 0x7f00) >> 8,
+					      wc & 0x007f, mp,
+					      CONVERR_SUCCEED);
+	    break;
+	  }
 	default:
-	  *mp++ = wc & 0x00ff;
+	  mp += set_itext_ichar (mp, wc & 0x00ff);
 	  break;
 	}
     }
   *mp = 0;
 }

+/* Convert XEmacs string format to the wide-char format expected for wnn. */
 void
-m2w (unsigned char *mp, w_char *wp)
+m2w (Ibyte *mp, w_char *wp)
 {
-  int ch;
-
-  while ((ch = *mp++) != 0)
+  while (*mp)
     {
-      if (ibyte_leading_byte_p (ch))
-	{
-	  switch (ch)
-	    {
-	    case LEADING_BYTE_KATAKANA_JISX0201:
-	      *wp++ = *mp++; break;
-	    case LEADING_BYTE_LATIN_JISX0201:
-	      *wp++ = *mp++ & 0x7F; break;
-	    case LEADING_BYTE_JAPANESE_JISX0208_1978:
-	    case LEADING_BYTE_CHINESE_GB2312:
-	    case LEADING_BYTE_JAPANESE_JISX0208:
-	    case LEADING_BYTE_KOREAN_KSC5601:
-	      /* case LEADING_BYTE_TW: */
-	      ch = *mp++;
-	      *wp++ = (ch << 8) | *mp++;
-	      break;
-	    case LEADING_BYTE_JAPANESE_JISX0212:
-	      ch = *mp++;
-	      *wp++ = (ch << 8) | (*mp++ & 0x7f);
-	      break;
-	    case PRE_LEADING_BYTE_PRIVATE_1: /* #### Not sure about this one... */
-	      ch = *mp++;
-	      if (ch == lb_sisheng)
-		*wp++ = 0x8e80 | *mp++;
-	      else
-		mp++;
-	      break;
-	    default:			/* ignore this character */
-	      mp += rep_bytes_by_first_byte(ch) - 1;
-	    }
-	}
-      else
-        {
-	  *wp++ = ch;
-	}
+      Lisp_Object charset;
+      int c1, c2;
+      int ch;
+
+      itext_to_charset_codepoint (mp, get_unicode_precedence (),
+				  &charset, &c1, &c2, CONVERR_FAIL);
+      INC_IBYTEPTR (mp);
+      if (EQ (charset, Vcharset_ascii) ||
+	  EQ (charset, Vcharset_latin_jisx0201) ||
+	  EQ (charset, Vcharset_katakana_jisx0201))
+	ch = c2;
+      else if (EQ (charset, Vcharset_japanese_jisx0208) ||
+	       EQ (charset, Vcharset_japanese_jisx0208_1978) ||
+	       EQ (charset, Vcharset_chinese_gb2312) ||
+	       EQ (charset, Vcharset_korean_ksc5601)
+	       /* || other 2-byte charsets??? */
+	       )
+	ch = ((c1 | 0x80) << 8) + (c2 | 0x80);
+      else if (EQ (charset, Vcharset_japanese_jisx0212))
+	ch = ((c1 | 0x80) << 8) + c2;
+      else if (EQ (charset, Vcharset_chinese_sisheng))
+	ch = 0x8e80 | c2;
+      else /* Ignore character */
+	continue;
+      *wp++ = (w_char) ch;
     }
   *wp = 0;
 }
＠＠ -2051,18 +2069,26 ＠＠
     }
 }

+/* Converts text in the multi-byte locale-specific format returned by some
+   WNN functions into XEmacs-internal.  This format appears to be a simple
+   MBCS encoding with a single locale, and we could use probably existing
+   coding systems to handle it. */
+
 void
-c2m (unsigned char *cp, unsigned char *mp, unsigned char lb)
+c2m (UExtbyte *cp, Ibyte *mp, Lisp_Object charset)
 {
-  unsigned char	ch;
+  UExtbyte	ch;
   while ((ch = *cp) != 0)
     {
       if (ch & 0x80)
 	{
-	  *mp++ = lb;
-	  *mp++ = *cp++;
+	  mp += charset_codepoint_to_itext (charset, cp[0] & 0x7f,
+					    cp[1] & 0x7f, mp,
+					    CONVERR_SUCCEED);
+	  cp += 2;
 	}
-      *mp++ = *cp++;
+      else
+	*mp++ = *cp++; /* Guaranteed ASCII */
     }
   *mp = 0;
 }
＠＠ -2076,18 +2102,18 ＠＠
 }

 static int
-yes_or_no (unsigned char *s)
+yes_or_no (UExtbyte *s)
 {
-  unsigned char		mbuf[512];
-  unsigned char		lb;
+  Ibyte			mbuf[512];
+  Lisp_Object		charset;
   int			len;
   int			snum;
   if ((snum  = check_wnn_server_type ()) == -1) return 0;
-  lb = lb_wnn_server_type[snum];
+  charset = charset_wnn_server_type[snum];
   /* if no message found, create file without query */
   /* if (wnn_msg_cat->msg_bd == 0) return 1;*/
   if (*s == 0) return 1;
-  c2m (s, mbuf, lb);
+  c2m (s, mbuf, charset);
   /* truncate "(Y/N)" */
   for (len = 0; (mbuf[len]) && (len < 512); len++);
   for (; (mbuf[len] != '(') && (len > 0); len--);
＠＠ -2097,7 +2123,7 ＠＠

      str = make_string (mbuf, len);
      GCPRO1 (str);
-     yes = call1(Qyes_or_no_p, str);
+     yes = call1 (Qyes_or_no_p, str);
      UNGCPRO;
      if (NILP (yes)) return 0;
      else return (1);
＠＠ -2108,20 +2134,13 ＠＠
 puts2 (char *s)
 {
 #if 0 /* jhod: We don't really need this echoed... */
-#if 0
-  Lisp_Object		args[1];
-  char			mbuf[512];
-  unsigned char		lb;
-  extern Lisp_Object 	Fmessage ();
+  Ibyte			mbuf[512];
+  Lisp_Object		charset;
   int			snum;
   if ((snum = check_wnn_server_type ()) == -1) return;
-  lb = lb_wnn_server_type[snum];
-  c2m (s, mbuf, lb);
-  args[0] = make_string (mbuf, strlen (mbuf));
-  Fmessage (1, args);
-#else
-  message("%s",s);
-#endif
+  charset = charset_wnn_server_type[snum];
+  c2m (s, mbuf, charset);
+  message ("%s", mbuf);
 #endif
 }

Index: src/objects-impl.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/objects-impl.h,v
retrieving revision 1.5
diff -u -r1.5 objects-impl.h
--- src/objects-impl.h	2005/10/24 10:07:39	1.5
+++ src/objects-impl.h	2005/11/22 14:00:55
＠＠ -106,6 +106,8 ＠＠
   /* See comment in struct console about console variants. */
   enum console_variant color_instance_type;

+  /* ~~#### Instead of doing this, attach this data to the end of the same
+     structure; avoids the need to create new TTY, X, etc. Lisp objects */
   /* console-type-specific data */
   void *data;
 };
＠＠ -137,6 +139,8 ＠＠
   unsigned short height;
   int proportional_p;

+  /* ~~#### Instead of doing this, attach this data to the end of the same
+     structure; avoids the need to create new TTY, X, etc. Lisp objects */
   /* console-type-specific data */
   void *data;
 };
Index: src/objects-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/objects-msw.c,v
retrieving revision 1.47
diff -u -r1.47 objects-msw.c
--- src/objects-msw.c	2005/01/28 02:58:51	1.47
+++ src/objects-msw.c	2005/11/22 14:00:56
＠＠ -2,7 +2,7 ＠＠
    Copyright (C) 1993, 1994 Free Software Foundation, Inc.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
    Copyright (C) 1995 Tinker Systems.
-   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2004 Ben Wing.
+   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2004, 2005 Ben Wing.
    Copyright (C) 1995 Sun Microsystems, Inc.
    Copyright (C) 1997 Jonathan Harris.

＠＠ -27,8 +27,9 ＠＠

 /* Authorship:

-   Jamie Zawinski, Chuck Thompson, Ben Wing
-   Rewritten for mswindows by Jonathan Harris, November 1997 for 21.0.
+   This file created by Jonathan Harris, November 1997 for 21.0; based
+   heavily on objects-x.c (see authorship there).  Much further work
+   by Ben Wing.
  */

 /* This function Mule-ized by Ben Wing, 3-24-02. */
＠＠ -2014,6 +2015,8 ＠＠

 /*

+#### The following comment is old and probably not applicable any longer.
+
 1. handle standard mapping and inheritance vectors properly in Face-frob-property.
 2. finish impl of mswindows-charset-registry.
 3. see if everything works under fixup, now that i copied the stuff over.
＠＠ -2091,31 +2094,25 ＠＠
     }

   {
-    int lowlim, highlim;
-    int dim, j, cp = -1;
+    int l1, h1, l2, h2;
+    int j, cp = -1;

     /* Try to find a Unicode char in the charset.  #### This is somewhat
        bogus.  See below.

        #### Cache me baby!!!!!!!!!!!!!
     */
-    get_charset_limits (charset, &lowlim, &highlim);
-    dim = XCHARSET_DIMENSION (charset);
+    get_charset_limits (charset, &l1, &h1, &l2, &h2);

-    if (dim == 1)
-      {
-	for (i = lowlim; i <= highlim; i++)
-	  if ((cp = ichar_to_unicode (make_ichar (charset, i, 0))) >= 0)
-	    break;
-      }
-    else
-      {
-	for (i = lowlim; i <= highlim; i++)
-	  for (j = lowlim; j <= highlim; j++)
-	    if ((cp = ichar_to_unicode (make_ichar (charset, i, j))) >= 0)
-	      break;
-      }
-      
+    /* ＠＠#### This needs major fixing.  We need to be passed the character,
+       not the charset. */
+    for (i = l1; i <= h1; i++)
+      for (j = l2; j <= h2; j++)
+	if ((cp = charset_codepoint_to_unicode (charset, i, j, CONVERR_FAIL))
+	    >= 0)
+	  goto multi_break;
+
+  multi_break:
     if (cp < 0)
       return 0;

Index: src/objects-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/objects-x.c,v
retrieving revision 1.28
diff -u -r1.28 objects-x.c
--- src/objects-x.c	2005/01/28 02:58:52	1.28
+++ src/objects-x.c	2005/11/22 14:00:56
＠＠ -24,7 +24,9 ＠＠

 /* Synched up with: Not in FSF. */

-/* Authors: Jamie Zawinski, Chuck Thompson, Ben Wing */
+/* Authors: Ben Wing, in its current form; based on earlier
+   font/color-handling code by Jamie Zawinski.  Some code by Chuck
+   Thompson. */

 /* This file Mule-ized by Ben Wing, 7-10-00. */

Index: src/objects.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/objects.c,v
retrieving revision 1.29
diff -u -r1.29 objects.c
--- src/objects.c	2005/10/24 10:07:39	1.29
+++ src/objects.c	2005/11/22 14:00:56
＠＠ -1,7 +1,7 ＠＠
 /* Generic Objects and Functions.
    Copyright (C) 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 1995, 1996, 2002, 2004 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2004, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -43,7 +43,8 ＠＠
    If we leave in the Qunbound value, we will probably get crashes. */
 Lisp_Object Vthe_null_color_instance, Vthe_null_font_instance;

-/* Authors: Ben Wing, Chuck Thompson */
+/* Author: Ben Wing; some earlier code from Chuck Thompson, Jamie
+   Zawinski. */

 DOESNT_RETURN
 finalose (void *ptr)
＠＠ -98,8 +99,7 ＠＠
 {
   Lisp_Color_Instance *c = XCOLOR_INSTANCE (obj);
   if (print_readably)
-    printing_unreadable_object ("#<color-instance 0x%x>",
-           c->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_fmt_string_lisp (printcharfun, "#<color-instance %s", 1,
c->name);
   write_fmt_string_lisp (printcharfun, " on %s", 1, c->device);
   if (!NILP (c->device)) /* Vthe_null_color_instance */
＠＠ -310,7 +310,7 ＠＠
 {
   Lisp_Font_Instance *f = XFONT_INSTANCE (obj);
   if (print_readably)
-    printing_unreadable_object ("#<font-instance 0x%x>",
f->header.uid);
+    printing_unreadable_lcrecord (obj, 0);
   write_fmt_string_lisp (printcharfun, "#<font-instance %S", 1,
f->name);
   write_fmt_string_lisp (printcharfun, " on %s", 1, f->device);
   if (!NILP (f->device))
＠＠ -799,6 +799,32 ＠＠

 #endif /* MULE */

+/* It's a little non-obvious what's going on here.  Specifically:
+
+   MATCHSPEC is a somewhat bogus way in the specifier mechanism of passing
+   in additional information needed to instantiate some object.  For fonts,
+   it's a cons of (CHARSET . SECOND-STAGE-P).  SECOND-STAGE-P, if set,
+   means "try harder to find an appropriate font" and is a very bogus way
+   of dealing with the fact that it may not be possible to may a charset
+   directly onto a font; it's used esp. under Windows.  ＠＠#### We need to
+   change this so that MATCHSPEC is just a character.
+
+   When redisplay is building up its structure, and needs font info, it
+   calls functions in faces.c such as ensure_face_cachel_complete() (map
+   fonts needed for a string of text) or
+   ensure_face_cachel_contains_charset() (map fonts needed for a charset
+   derived from a single character).  The former function calls the latter;
+   the latter calls face_property_matching_instance(); this constructs the
+   MATCHSPEC and calls specifier_instance_no_quit() twice (first stage and
+   second stage, updating MATCHSPEC appropriately).  That function, in
+   turn, looks up the appropriate specifier method to do the instantiation,
+   which, lo and behold, is this function here (because we set it in
+   initialization using `SPECIFIER_HAS_METHOD (font, instantiate);').  We
+   in turn call the device method `find_charset_font', which maps to
+   mswindows_find_charset_font(), x_find_charset_font(), or similar, in
+   objects-msw.c or the like.
+
+   --ben */

 static Lisp_Object
 font_instantiate (Lisp_Object UNUSED (specifier),
Index: src/print.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/print.c,v
retrieving revision 1.56
diff -u -r1.56 print.c
--- src/print.c	2005/10/25 08:32:48	1.56
+++ src/print.c	2005/11/22 14:00:57
＠＠ -1,6 +1,6 ＠＠
 /* Lisp object printing and output streams.
    Copyright (C) 1985, 1986, 1988, 1992-1995 Free Software Foundation, Inc.
-   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -1449,13 +1449,41 ＠＠
   UNGCPRO;
 }

-static void
-default_object_printer (Lisp_Object obj, Lisp_Object printcharfun,
-			int UNUSED (escapeflag))
+DOESNT_RETURN
+printing_unreadable_object (const CIbyte *fmt, ...)
 {
+  Lisp_Object obj;
+  va_list args;
+
+  va_start (args, fmt);
+  obj = emacs_vsprintf_string (CGETTEXT (fmt), args);
+  va_end (args);
+
+  /* Fsignal GC-protects its args */
+  signal_error (Qprinting_unreadable_object, 0, obj);
+}
+
+DOESNT_RETURN
+printing_unreadable_lcrecord (Lisp_Object obj, const Ibyte *name)
+{
   struct LCRECORD_HEADER *header = (struct LCRECORD_HEADER *) XPNTR (obj);

-  if (print_readably)
+#ifndef MC_ALLOC
+  /* This must be a real lcrecord */
+  assert (!LHEADER_IMPLEMENTATION (&header->lheader)->basic_p);
+#endif
+
+  if (name)
+    printing_unreadable_object
+      ("#<%s %s 0x%x>",
+#ifdef MC_ALLOC
+       LHEADER_IMPLEMENTATION (header)->name,
+#else /* not MC_ALLOC */
+       LHEADER_IMPLEMENTATION (&header->lheader)->name,
+#endif /* not MC_ALLOC */
+       name,
+       header->uid);
+  else
     printing_unreadable_object
       ("#<%s 0x%x>",
 #ifdef MC_ALLOC
＠＠ -1464,7 +1492,22 ＠＠
        LHEADER_IMPLEMENTATION (&header->lheader)->name,
 #endif /* not MC_ALLOC */
        header->uid);
+}
+
+static void
+default_object_printer (Lisp_Object obj, Lisp_Object printcharfun,
+			int UNUSED (escapeflag))
+{
+  struct LCRECORD_HEADER *header = (struct LCRECORD_HEADER *) XPNTR (obj);
+
+#ifndef MC_ALLOC
+  /* This must be a real lcrecord */
+  assert (!LHEADER_IMPLEMENTATION (&header->lheader)->basic_p);
+#endif

+  if (print_readably)
+    printing_unreadable_lcrecord (obj, 0);
+
   write_fmt_string (printcharfun, "#<%s 0x%x>",
 #ifdef MC_ALLOC
 		    LHEADER_IMPLEMENTATION (header)->name,
＠＠ -1478,6 +1521,9 ＠＠
 internal_object_printer (Lisp_Object obj, Lisp_Object printcharfun,
 			 int UNUSED (escapeflag))
 {
+  /* Internal objects shouldn't normally escape to the Lisp level;
+     that's why we say "XEmacs bug?".  This can happen, however, when
+     printing backtraces. */
   write_fmt_string (printcharfun,
 		    "#<INTERNAL OBJECT (XEmacs bug?) (%s) 0x%lx>",
 		    XRECORD_LHEADER_IMPLEMENTATION (obj)->name,
＠＠ -1488,25 +1534,31 ＠＠
 {
   BADNESS_INTEGER_OBJECT,
   BADNESS_POINTER_OBJECT,
+  BADNESS_POINTER_OBJECT_WITH_DATA,
   BADNESS_NO_TYPE
 };

 static void
 printing_major_badness (Lisp_Object printcharfun,
 			Ascbyte *badness_string, int type, void *val,
-			enum printing_badness badness)
+			void *val2, enum printing_badness badness)
 {
   Ibyte buf[666];

   switch (badness)
     {
     case BADNESS_INTEGER_OBJECT:
-      qxesprintf (buf, "%s %d object %ld", badness_string, type,
+      qxesprintf (buf, "%s type %d object %ld", badness_string, type,
 		  (EMACS_INT) val);
       break;

     case BADNESS_POINTER_OBJECT:
-      qxesprintf (buf, "%s %d object %p", badness_string, type, val);
+      qxesprintf (buf, "%s type %d object %p", badness_string, type, val);
+      break;
+
+    case BADNESS_POINTER_OBJECT_WITH_DATA:
+      qxesprintf (buf, "%s type %d object %p data %p", badness_string, type,
+		  val, val2);
       break;

     case BADNESS_NO_TYPE:
＠＠ -1522,12 +1574,14 ＠＠
       ABORT ();
 #else  /* not ERROR_CHECK_TYPES */
       if (print_readably)
-	signal_ferror (Qinternal_error, "printing %s", buf);
+	signal_ferror (Qinternal_error, "SERIOUS XEMACS BUG: printing %s; "
+		       "save your buffers immediately and please report "
+		       "this bug", buf);
 #endif /* not ERROR_CHECK_TYPES */
     }
   write_fmt_string (printcharfun,
-		    "#<EMACS BUG: %s Save your buffers immediately and "
-		    "please report this bug>", buf);
+		    "#<SERIOUS XEMACS BUG: %s Save your buffers immediately "
+		    "and please report this bug>", buf);
 }

 void
＠＠ -1547,6 +1601,13 ＠＠
   /* Just to be safe ... */
   GCPRO2 (obj, printcharfun);

+  /* WARNING WARNING WARNING!!!  Don't put anything here that might
+     dereference memory.  Instead, put it down inside of
+     the case Lisp_Type_Record, after the appropriate checks to make sure
+     we're not dereferencing bad memory.  The idea is that, ideally,
+     calling debug_print() should *NEVER* make the program crash, even when
+     something very bad has happened. --ben */
+
 #ifdef I18N3
   /* #### Both input and output streams should have a flag associated
      with them indicating whether output to that stream, or strings
＠＠ -1559,23 +1620,6 ＠＠
      output. */
 #endif

-  /* Detect circularities and truncate them.
-     No need to offer any alternative--this is better than an error.  */
-  if (CONSP (obj) || VECTORP (obj) || COMPILED_FUNCTIONP (obj))
-    {
-      int i;
-      for (i = 0; i < print_depth; i++)
-	if (EQ (obj, being_printed[i]))
-	  {
-	    char buf[DECIMAL_PRINT_SIZE (long) + 1];
-	    *buf = '#';
-	    long_to_string (buf + 1, i);
-	    write_c_string (printcharfun, buf);
-	    UNGCPRO;
-	    return;
-	  }
-    }
-
   being_printed[print_depth] = obj;

   /* Avoid calling internal_bind_int, which conses, when called from
＠＠ -1585,7 +1629,8 ＠＠
       specdepth = internal_bind_int (&print_depth, print_depth + 1);

       if (print_depth > PRINT_CIRCLE)
-	signal_error (Qstack_overflow, "Apparently circular structure being printed",
Qunbound);
+	signal_error (Qstack_overflow,
+		      "Apparently circular structure being printed", Qunbound);
     }

   switch (XTYPE (obj))
＠＠ -1668,83 +1713,143 ＠＠
       {
 	struct lrecord_header *lheader = XRECORD_LHEADER (obj);

-	/* Try to check for various sorts of bogus pointers if we're in a
-	   situation where it may be likely -- i.e. called from
-	   debug_print() or we're already crashing.  In such cases,
-	   (further) crashing is counterproductive. */
+	/* Try to check for various sorts of bogus pointers or bad memory
+	   if we're in a situation where it may be likely -- i.e. called
+	   from debug_print() or we're already crashing.  In such cases,
+	   (further) crashing is counterproductive.
+
+	   We don't normally do these because they may be expensive or
+	   weird (e.g. under Unix we typically have to set a SIGSEGV
+	   handler and try to trigger a seg fault). */

+	if (!lheader)
+	  {
+	    /* i.e. EQ Qnull_pointer */
+	    printing_major_badness (printcharfun, "NULL POINTER LRECORD", 0,
+				    0, 0, BADNESS_NO_TYPE);
+	    break;
+	  }
+
+	/* First check to see if the lrecord header itself is garbage. */
 	if (inhibit_non_essential_conversion_operations &&
 	    !debug_can_access_memory (lheader, sizeof (*lheader)))
-	    {
-	      write_fmt_string (printcharfun, "#<EMACS BUG: BAD MEMORY %p>",
-				lheader);
-	      break;
-	    }
-
-	if (CONSP (obj) || VECTORP (obj))
 	  {
-	    /* If deeper than spec'd depth, print placeholder.  */
-	    if (INTP (Vprint_level)
-		&& print_depth > XINT (Vprint_level))
-	      {
-		write_c_string (printcharfun, "...");
-		break;
-	      }
+	    printing_major_badness (printcharfun,
+				    "BAD MEMORY in LRECORD HEADER", 0,
+				    lheader, 0, BADNESS_NO_TYPE);
+	      break;
 	  }

+	/* Check to see if the lrecord type is garbage. */
 #ifndef MC_ALLOC
 	if (lheader->type == lrecord_type_free)
 	  {
-	    printing_major_badness (printcharfun, "freed lrecord", 0,
-				    lheader, BADNESS_NO_TYPE);
+	    printing_major_badness (printcharfun, "FREED LRECORD", 0,
+				    lheader, 0, BADNESS_NO_TYPE);
 	    break;
 	  }
-	else if (lheader->type == lrecord_type_undefined)
+	if (lheader->type == lrecord_type_undefined)
 	  {
-	    printing_major_badness (printcharfun, "lrecord_type_undefined", 0,
-				    lheader, BADNESS_NO_TYPE);
+	    printing_major_badness (printcharfun, "LRECORD_TYPE_UNDEFINED", 0,
+				    lheader, 0, BADNESS_NO_TYPE);
 	    break;
 	  }
 #endif /* not MC_ALLOC */
-	else if ((int) (lheader->type) >= lrecord_type_count)
+	if ((int) (lheader->type) >= lrecord_type_count)
 	  {
-	    printing_major_badness (printcharfun, "illegal lrecord type",
+	    printing_major_badness (printcharfun, "ILLEGAL LRECORD TYPE",
 				    (int) (lheader->type),
-				    lheader, BADNESS_POINTER_OBJECT);
+				    lheader, 0, BADNESS_POINTER_OBJECT);
 	    break;
 	  }
+
+	/* Check to see if the lrecord implementation is missing or garbage. */
+	{
+	  const struct lrecord_implementation *imp =
+	    LHEADER_IMPLEMENTATION (lheader);
+
+	  if (!imp)
+	    {
+	      printing_major_badness
+		(printcharfun, "NO IMPLEMENTATION FOR LRECORD TYPE",
+		 (int) (lheader->type),
+		 lheader, 0, BADNESS_POINTER_OBJECT);
+	      break;
+	    }

-	/* Further checks for bad memory in critical situations.  We don't
-	   normally do these because they may be expensive or weird
-	   (e.g. under Unix we typically have to set a SIGSEGV handler and
-	   try to trigger a seg fault). */
+	  if (inhibit_non_essential_conversion_operations)
+	    {
+	      if (!debug_can_access_memory (imp, sizeof (*imp)))
+		{
+		  printing_major_badness
+		    (printcharfun, "BAD MEMORY IN LRECORD IMPLEMENTATION",
+		     (int) (lheader->type),
+		     lheader, 0, BADNESS_POINTER_OBJECT);
+		}
+	    }
+	}
+
+	/* Check to see if any of the memory of the lrecord is inaccessible.
+	   Note that we already checked above to see if the first part of
+	   the lrecord (the header) is inaccessible, which will catch most
+	   cases of a totally bad pointer.  */

 	if (inhibit_non_essential_conversion_operations)
 	  {
 	    if (!debug_can_access_memory
 		(lheader, detagged_lisp_object_size (lheader)))
 	      {
-		write_fmt_string (printcharfun,
-				  "#<EMACS BUG: type %s BAD MEMORY %p>",
-				  LHEADER_IMPLEMENTATION (lheader)->name,
-				  lheader);
+		printing_major_badness (printcharfun,
+					"BAD MEMORY IN LRECORD",
+					(int) (lheader->type),
+					lheader, 0, BADNESS_POINTER_OBJECT);
 		break;
 	      }

+	    /* For strings, also check the data of the string itself. */
 	    if (STRINGP (obj))
 	      {
 		Lisp_String *l = (Lisp_String *) lheader;
 		if (!debug_can_access_memory (l->data_, l->size_))
 		  {
-		    write_fmt_string
-		      (printcharfun,
-		       "#<EMACS BUG: %p (BAD STRING DATA %p)>",
-		       lheader, l->data_);
+		    printing_major_badness (printcharfun,
+		       "BAD STRING DATA", (int) (lheader->type),
+					    lheader, l->data_,
+					    BADNESS_POINTER_OBJECT_WITH_DATA);
 		    break;
 		  }
 	      }
 	  }

+	/* Detect circularities and truncate them.
+	   No need to offer any alternative--this is better than an error.  */
+	if (CONSP (obj) || VECTORP (obj) || COMPILED_FUNCTIONP (obj))
+	  {
+	    int i;
+	    for (i = 0; i < print_depth - 1; i++)
+	      if (EQ (obj, being_printed[i]))
+		{
+		  Ascbyte buf[DECIMAL_PRINT_SIZE (long) + 1];
+		  *buf = '#';
+		  long_to_string (buf + 1, i);
+		  write_c_string (printcharfun, buf);
+		  break;
+		}
+	    if (i < print_depth - 1) /* Did we print something? */
+	      break;
+	  }
+
+	if (CONSP (obj) || VECTORP (obj))
+	  {
+	    /* If deeper than spec'd depth, print placeholder.  */
+	    if (INTP (Vprint_level)
+		&& print_depth > XINT (Vprint_level))
+	      {
+		write_c_string (printcharfun, "...");
+		break;
+	      }
+	  }
+
 	if (LHEADER_IMPLEMENTATION (lheader)->printer)
 	  ((LHEADER_IMPLEMENTATION (lheader)->printer)
 	   (obj, printcharfun, escapeflag));
＠＠ -1756,8 +1861,9 ＠＠
     default:
       {
 	/* We're in trouble if this happens! */
-	printing_major_badness (printcharfun, "illegal data type", XTYPE (obj),
-				LISP_TO_VOID (obj), BADNESS_INTEGER_OBJECT);
+	printing_major_badness (printcharfun, "ILLEGAL LISP OBJECT TAG TYPE",
+				XTYPE (obj), LISP_TO_VOID (obj), 0,
+				BADNESS_INTEGER_OBJECT);
 	break;
       }
     }
Index: src/process.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/process.c,v
retrieving revision 1.69
diff -u -r1.69 process.c
--- src/process.c	2005/10/25 11:16:27	1.69
+++ src/process.c	2005/11/22 14:00:58
＠＠ -42,6 +42,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "commands.h"
 #include "device.h"
 #include "events.h"
＠＠ -158,12 +159,12 ＠＠
 }

 static void
-print_process (Lisp_Object object, Lisp_Object printcharfun, int escapeflag)
+print_process (Lisp_Object obj, Lisp_Object printcharfun, int escapeflag)
 {
-  Lisp_Process *process = XPROCESS (object);
+  Lisp_Process *process = XPROCESS (obj);

   if (print_readably)
-    printing_unreadable_object ("#<process %s>", XSTRING_DATA
(process->name));
+    printing_unreadable_lcrecord (obj, XSTRING_DATA (process->name));

   if (!escapeflag)
     {
＠＠ -171,7 +172,7 ＠＠
     }
   else
     {
-      int netp = network_connection_p (object);
+      int netp = network_connection_p (obj);
       write_c_string (printcharfun,
 		      netp ? GETTEXT ("#<network connection ") :
 		      GETTEXT ("#<process "));
Index: src/redisplay-gtk.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/redisplay-gtk.c,v
retrieving revision 1.17
diff -u -r1.17 redisplay-gtk.c
--- src/redisplay-gtk.c	2005/01/24 23:34:06	1.17
+++ src/redisplay-gtk.c	2005/11/22 14:00:58
＠＠ -2,7 +2,7 ＠＠
    Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
    Copyright (C) 1994 Lucid, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2002, 2003 Ben Wing.
+   Copyright (C) 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -32,6 +32,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "charset.h"
 #include "debug.h"
 #include "device-impl.h"
 #include "faces.h"
＠＠ -49,7 +50,7 ＠＠

 #include "sysproc.h" /* for select() */

-#ifdef MULE
+#ifdef HAVE_CCL
 #include "mule-ccl.h"
 #endif

＠＠ -95,32 +96,48 ＠＠
   int dimension;
 };

-/* Separate out the text in DYN into a series of textual runs of a
-   particular charset.  Also convert the characters as necessary into
-   the format needed by XDrawImageString(), XDrawImageString16(), et
-   al.  (This means converting to one or two byte format, possibly
-   tweaking the high bits, and possibly running a CCL program.) You
-   must pre-allocate the space used and pass it in. (This is done so
-   you can ALLOCA () the space.)  You need to allocate (2 * len) bytes
-   of TEXT_STORAGE and (len * sizeof (struct textual_run)) bytes of
-   RUN_STORAGE, where LEN is the length of the dynarr.
+/* Separate out the text in STR (an array of Ichars, not a string
+   representation) of length LEN into a series of runs, stored in RUNS.
+   RUNS is guaranteed to hold enough space for all runs that could be
+   generated from this text.  Each run points to the a stretch of text
+   given simply by the position codes TEXT_STORAGE into a series of textual
+   runs of a particular charset.  Also convert the characters as necessary
+   into the format needed by XDrawImageString(), XDrawImageString16(), et
+   al.  (This means converting to one or two byte format, possibly tweaking
+   the high bits, and possibly running a CCL program.) You must
+   pre-allocate the space used and pass it in. (This is done so you can
+   ALLOCA () the space.)  You need to allocate (2 * len) bytes of
+   TEXT_STORAGE and (len * sizeof (struct textual_run)) bytes of
+   RUNS, where LEN is the length of the dynarr.

    Returns the number of runs actually used. */

+/* #### Completely copied from redisplay-x.c. */
+
 static int
 separate_textual_runs (unsigned char *text_storage,
-		       struct textual_run *run_storage,
-		       CONST Ichar *str, Charcount len)
+		       struct textual_run *runs,
+		       const Ichar *str, Charcount len)
 {
+#ifndef MULE
+  int i;
+  for (i = 0; i < len; i++)
+    text_storage[i++] = (unsigned char) (*str);
+  runs[0].ptr       = text_storage;
+  runs[0].charset   = Vcharset_ascii;
+  runs[0].dimension = 1;
+  runs[0].len       = len;
+  return 1;
+#else /* MULE */
   Lisp_Object prev_charset = Qunbound; /* not Qnil because that is a
 					  possible valid charset when
 					  MULE is not defined */
   int runs_so_far = 0;
   int i;
-#ifdef MULE
+#ifdef HAVE_CCL
   struct ccl_program char_converter;
   int need_ccl_conversion = 0;
-#endif
+#endif /* HAVE_CCL */

   for (i = 0; i < len; i++)
     {
＠＠ -128,48 +145,41 ＠＠
       Lisp_Object charset;
       int byte1, byte2;
       int dimension;
-      int graphic;

-      BREAKUP_ICHAR (ch, charset, byte1, byte2);
+      ichar_to_charset_codepoint (ch, get_unicode_precedence(), &charset,
+				  &byte1, &byte2);
       dimension = XCHARSET_DIMENSION (charset);
-      graphic   = XCHARSET_GRAPHIC   (charset);
+      /* We swap here rather than handling below because we also take CCL
+	 input, whigh does it the other way */
+      if (dimension == 1)
+	byte1 = byte2;

       if (!EQ (charset, prev_charset))
 	{
-	  run_storage[runs_so_far].ptr       = text_storage;
-	  run_storage[runs_so_far].charset   = charset;
-	  run_storage[runs_so_far].dimension = dimension;
+	  runs[runs_so_far].ptr       = text_storage;
+	  runs[runs_so_far].charset   = charset;
+	  runs[runs_so_far].dimension = dimension;

 	  if (runs_so_far)
 	    {
-	      run_storage[runs_so_far - 1].len =
-		text_storage - run_storage[runs_so_far - 1].ptr;
-	      if (run_storage[runs_so_far - 1].dimension == 2)
-		run_storage[runs_so_far - 1].len >>= 1;
+	      runs[runs_so_far - 1].len =
+		text_storage - runs[runs_so_far - 1].ptr;
+	      if (runs[runs_so_far - 1].dimension == 2)
+		runs[runs_so_far - 1].len >>= 1;
 	    }
 	  runs_so_far++;
 	  prev_charset = charset;
-#ifdef MULE
+#ifdef HAVE_CCL
 	  {
 	    Lisp_Object ccl_prog = XCHARSET_CCL_PROGRAM (charset);
-	    need_ccl_conversion = !NILP (ccl_prog);
-	    if (need_ccl_conversion)
-	      setup_ccl_program (&char_converter, ccl_prog);
+	    if ((!NILP (ccl_prog))
+		  && (setup_ccl_program (&char_converter, ccl_prog) >= 0))
+	      need_ccl_conversion = 1;
 	  }
 #endif
 	}

-      if (graphic == 0)
-	{
-	  byte1 &= 0x7F;
-	  byte2 &= 0x7F;
-	}
-      else if (graphic == 1)
-	{
-	  byte1 |= 0x80;
-	  byte2 |= 0x80;
-	}
-#ifdef MULE
+#ifdef HAVE_CCL
       if (need_ccl_conversion)
 	{
 	  char_converter.reg[0] = XCHARSET_ID (charset);
＠＠ -187,13 +197,14 ＠＠

   if (runs_so_far)
     {
-      run_storage[runs_so_far - 1].len =
-	text_storage - run_storage[runs_so_far - 1].ptr;
-      if (run_storage[runs_so_far - 1].dimension == 2)
-	run_storage[runs_so_far - 1].len >>= 1;
+      runs[runs_so_far - 1].len =
+	text_storage - runs[runs_so_far - 1].ptr;
+      if (runs[runs_so_far - 1].dimension == 2)
+	runs[runs_so_far - 1].len >>= 1;
     }

   return runs_so_far;
+#endif /* not MULE */
 }

 /****************************************************************************/
＠＠ -320,7 +331,8 ＠＠
       xpos = rb->xpos;
       width = 0;
       if (rb->type == RUNE_CHAR)
-	charset = ichar_charset (rb->object.chr.ch);
+	/* ＠＠#### fix me */
+	charset = ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);
     }

   if (end < 0)
＠＠ -333,7 +345,9 ＠＠

       if (rb->findex == findex && rb->type == RUNE_CHAR
 	  && rb->object.chr.ch != '\n' && rb->cursor_type !=
CURSOR_ON
-	  && EQ (charset, ichar_charset (rb->object.chr.ch)))
+	  /* ＠＠#### fix me */
+	  && EQ (charset,
+		 ichar_charset_obsolete_me_baby_please (rb->object.chr.ch)))
 	{
 	  Dynarr_add (buf, rb->object.chr.ch);
 	  width += rb->width;
＠＠ -356,7 +370,9 ＠＠
 	    {
 	      findex = rb->findex;
 	      xpos = rb->xpos;
-	      charset = ichar_charset (rb->object.chr.ch);
+	      /* ＠＠#### fix me */
+	      charset =
+		ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);

 	      if (rb->cursor_type == CURSOR_ON)
 		{
Index: src/redisplay-msw.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/redisplay-msw.c,v
retrieving revision 1.42
diff -u -r1.42 redisplay-msw.c
--- src/redisplay-msw.c	2005/01/24 23:34:07	1.42
+++ src/redisplay-msw.c	2005/11/22 14:00:58
＠＠ -97,11 +97,13 ＠＠
   if (len == 0)
     return 0;

-  prev_charset = ichar_charset (str[0]);
+  /* ＠＠#### fix me */
+  prev_charset = ichar_charset_obsolete_me_baby_please (str[0]);

   for (i = 1; i <= len; i++)
     {
-      if (i == len || !EQ (ichar_charset (str[i]), prev_charset))
+      if (i == len || !EQ (ichar_charset_obsolete_me_baby_please (str[i]),
+			   prev_charset))
 	{
 	  int j;
 	  Ibyte *int_storage =
＠＠ -132,7 +134,7 ＠＠
 	  runs_so_far++;
 	  runbegin = i;
 	  if (i < len)
-	    prev_charset = ichar_charset (str[i]);
+	    prev_charset = ichar_charset_obsolete_me_baby_please (str[i]);
 	}
     }

＠＠ -930,7 +932,8 ＠＠
   xpos = rb->xpos;
   width = 0;
   if (rb->type == RUNE_CHAR)
-    charset = ichar_charset (rb->object.chr.ch);
+    /* ＠＠#### fix me */
+    charset = ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);

   if (end < 0)
     end = Dynarr_length (rba);
＠＠ -942,7 +945,9 ＠＠

       if (rb->findex == findex && rb->type == RUNE_CHAR
 	  && rb->object.chr.ch != '\n' && rb->cursor_type !=
CURSOR_ON
-	  && EQ (charset, ichar_charset (rb->object.chr.ch)))
+	  /* ＠＠#### fix me */
+	  && EQ (charset,
+		 ichar_charset_obsolete_me_baby_please (rb->object.chr.ch)))
 	{
 	  Dynarr_add (buf, rb->object.chr.ch);
 	  width += rb->width;
＠＠ -964,7 +969,9 ＠＠
 	    {
 	      findex = rb->findex;
 	      xpos = rb->xpos;
-	      charset = ichar_charset (rb->object.chr.ch);
+	      /* ＠＠#### fix me */
+	      charset =
+		ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);

 	      if (rb->cursor_type == CURSOR_ON)
 		{
Index: src/redisplay-output.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/redisplay-output.c,v
retrieving revision 1.26
diff -u -r1.26 redisplay-output.c
--- src/redisplay-output.c	2005/01/24 23:34:07	1.26
+++ src/redisplay-output.c	2005/11/22 14:00:59
＠＠ -1,6 +1,6 ＠＠
 /* Synchronize redisplay structures and output changes.
    Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 1995, 1996, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2003, 2005 Ben Wing.
    Copyright (C) 1996 Chuck Thompson.
    Copyright (C) 1999, 2002 Andy Piper.

＠＠ -510,8 +510,8 ＠＠
   block_end =
     (!Dynarr_length (ddb->runes)
      ? 0
-: (Dynarr_atp (ddb->runes, Dynarr_length (ddb->runes) - 1)->xpos +
-	Dynarr_atp (ddb->runes, Dynarr_length (ddb->runes) - 1)->width));
+: (Dynarr_lastp (ddb->runes)->xpos +
+	Dynarr_lastp (ddb->runes)->width));
 #endif

   /* If the new block type is not text and the cursor status is
＠＠ -687,7 +687,8 ＠＠
       cdba = NULL;
     }

-  ddl = Dynarr_atp (ddla, line);      /* assert line < Dynarr_length (ddla) */
+  /* The following will assert line < Dynarr_length (ddla) */
+  ddl = Dynarr_atp (ddla, line);
   ddba = ddl->display_blocks;

   if (force_start >= 0 && force_start >= ddl->bounds.left_out)
＠＠ -1547,16 +1548,15 ＠＠
 			struct display_line dl;	/* this is fake */
 			Lisp_Object string =
 			  IMAGE_INSTANCE_TEXT_STRING (childii);
-			unsigned char charsets[NUM_LEADING_BYTES];
-			struct face_cachel *cachel = WINDOW_FACE_CACHEL (w, findex);
+			struct face_cachel *cachel =
+			  WINDOW_FACE_CACHEL (w, findex);

-			find_charsets_in_ibyte_string (charsets,
-							 XSTRING_DATA (string),
-							 XSTRING_LENGTH (string));
-			ensure_face_cachel_complete (cachel, window, charsets);
-
 			convert_ibyte_string_into_ichar_dynarr
-			  (XSTRING_DATA (string), XSTRING_LENGTH (string), buf);
+			  (XSTRING_DATA (string), XSTRING_LENGTH (string),
+			   buf);
+			ensure_face_cachel_complete (cachel, window,
+						     Dynarr_atp (buf, 0),
+						     Dynarr_length (buf));

 			redisplay_normalize_display_box (&cdb, &cdga);
 			/* Offsets are now +ve again so be careful
＠＠ -1568,15 +1568,17 ＠＠
 			dl.ascent = glyph_ascent (child, image_instance);
 			dl.descent = glyph_descent (child, image_instance);
 			dl.top_clip = cdga.yoffset;
-			dl.clip = (dl.ypos + dl.descent) - (cdb.ypos + cdb.height);
+			dl.clip = (dl.ypos + dl.descent) -
+			  (cdb.ypos + cdb.height);
 			/* output_string doesn't understand offsets in
 			   the same way as other routines - we have to
 			   add the offset to the width so that we
 			   output the full string. */
-			MAYBE_DEVMETH (d, output_string, (w, &dl, buf, cdb.xpos,
-							  cdga.xoffset, cdb.xpos,
-							  cdga.width + cdga.xoffset,
-							  findex, 0, 0, 0, 0));
+			MAYBE_DEVMETH (d, output_string,
+				       (w, &dl, buf, cdb.xpos,
+					cdga.xoffset, cdb.xpos,
+					cdga.width + cdga.xoffset,
+					findex, 0, 0, 0, 0));
 			Dynarr_reset (buf);
 		      }
 		  }
Index: src/redisplay-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/redisplay-x.c,v
retrieving revision 1.40
diff -u -r1.40 redisplay-x.c
--- src/redisplay-x.c	2005/01/24 23:34:07	1.40
+++ src/redisplay-x.c	2005/11/22 14:00:59
＠＠ -2,7 +2,7 ＠＠
    Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
    Copyright (C) 1994 Lucid, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2002, 2003 Ben Wing.
+   Copyright (C) 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -31,6 +31,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "charset.h"
 #include "debug.h"
 #include "device-impl.h"
 #include "faces.h"
＠＠ -41,7 +42,7 ＠＠
 #include "sysdep.h"
 #include "window.h"

-#ifdef MULE
+#ifdef HAVE_CCL
 #include "mule-ccl.h"
 #endif

＠＠ -88,7 +89,9 ＠＠
 	    Solaris only support the Japanese locale if you get the
 	    special Asian-language version of the OS.  Yuck yuck
 	    yuck.  Linux doesn't support the Japanese locale at
-	    all.
+	    all (or didn't, at the time this was written, sometime
+	    in the 90's.  As of 2005, there is much better CJK
+	    support in Linux, as well as Unicode support).
 	 3) The locale support in X only exists in R5, not in R4.
 	    (Not sure how big of a problem this is: how many
 	    people are using R4?)
＠＠ -97,7 +100,7 ＠＠
 	    different OS's?  It's not even documented anywhere that
 	    I can find what the multi-byte text format for the
 	    Japanese locale under SunOS and Solaris is, but I assume
-	    it's EUC.
+	    it's EUC-JP.
       */

 struct textual_run
＠＠ -108,32 +111,46 ＠＠
   int dimension;
 };

-/* Separate out the text in DYN into a series of textual runs of a
-   particular charset.  Also convert the characters as necessary into
-   the format needed by XDrawImageString(), XDrawImageString16(), et
-   al.  (This means converting to one or two byte format, possibly
-   tweaking the high bits, and possibly running a CCL program.) You
-   must pre-allocate the space used and pass it in. (This is done so
-   you can ALLOCA () the space.)  You need to allocate (2 * len) bytes
-   of TEXT_STORAGE and (len * sizeof (struct textual_run)) bytes of
-   RUN_STORAGE, where LEN is the length of the dynarr.
+/* Separate out the text in STR (an array of Ichars, not a string
+   representation) of length LEN into a series of runs, stored in RUNS.
+   RUNS is guaranteed to hold enough space for all runs that could be
+   generated from this text.  Each run points to the a stretch of text
+   given simply by the position codes TEXT_STORAGE into a series of textual
+   runs of a particular charset.  Also convert the characters as necessary
+   into the format needed by XDrawImageString(), XDrawImageString16(), et
+   al.  (This means converting to one or two byte format, possibly tweaking
+   the high bits, and possibly running a CCL program.) You must
+   pre-allocate the space used and pass it in. (This is done so you can
+   ALLOCA () the space.)  You need to allocate (2 * len) bytes of
+   TEXT_STORAGE and (len * sizeof (struct textual_run)) bytes of
+   RUNS, where LEN is the length of the dynarr.

    Returns the number of runs actually used. */

 static int
 separate_textual_runs (unsigned char *text_storage,
-		       struct textual_run *run_storage,
+		       struct textual_run *runs,
 		       const Ichar *str, Charcount len)
 {
+#ifndef MULE
+  int i;
+  for (i = 0; i < len; i++)
+    text_storage[i++] = (unsigned char) (*str);
+  runs[0].ptr       = text_storage;
+  runs[0].charset   = Vcharset_ascii;
+  runs[0].dimension = 1;
+  runs[0].len       = len;
+  return 1;
+#else /* MULE */
   Lisp_Object prev_charset = Qunbound; /* not Qnil because that is a
 					  possible valid charset when
 					  MULE is not defined */
   int runs_so_far = 0;
   int i;
-#ifdef MULE
+#ifdef HAVE_CCL
   struct ccl_program char_converter;
   int need_ccl_conversion = 0;
-#endif
+#endif /* HAVE_CCL */

   for (i = 0; i < len; i++)
     {
＠＠ -141,28 +158,31 ＠＠
       Lisp_Object charset;
       int byte1, byte2;
       int dimension;
-      int graphic;

-      BREAKUP_ICHAR (ch, charset, byte1, byte2);
+      ichar_to_charset_codepoint (ch, get_unicode_precedence(), &charset,
+				  &byte1, &byte2);
       dimension = XCHARSET_DIMENSION (charset);
-      graphic   = XCHARSET_GRAPHIC   (charset);
+      /* We swap here rather than handling below because we also take CCL
+	 input, whigh does it the other way */
+      if (dimension == 1)
+	byte1 = byte2;

       if (!EQ (charset, prev_charset))
 	{
-	  run_storage[runs_so_far].ptr       = text_storage;
-	  run_storage[runs_so_far].charset   = charset;
-	  run_storage[runs_so_far].dimension = dimension;
+	  runs[runs_so_far].ptr       = text_storage;
+	  runs[runs_so_far].charset   = charset;
+	  runs[runs_so_far].dimension = dimension;

 	  if (runs_so_far)
 	    {
-	      run_storage[runs_so_far - 1].len =
-		text_storage - run_storage[runs_so_far - 1].ptr;
-	      if (run_storage[runs_so_far - 1].dimension == 2)
-		run_storage[runs_so_far - 1].len >>= 1;
+	      runs[runs_so_far - 1].len =
+		text_storage - runs[runs_so_far - 1].ptr;
+	      if (runs[runs_so_far - 1].dimension == 2)
+		runs[runs_so_far - 1].len >>= 1;
 	    }
 	  runs_so_far++;
 	  prev_charset = charset;
-#ifdef MULE
+#ifdef HAVE_CCL
 	  {
 	    Lisp_Object ccl_prog = XCHARSET_CCL_PROGRAM (charset);
 	    if ((!NILP (ccl_prog))
＠＠ -172,17 +192,7 ＠＠
 #endif
 	}

-      if (graphic == 0)
-	{
-	  byte1 &= 0x7F;
-	  byte2 &= 0x7F;
-	}
-      else if (graphic == 1)
-	{
-	  byte1 |= 0x80;
-	  byte2 |= 0x80;
-	}
-#ifdef MULE
+#ifdef HAVE_CCL
       if (need_ccl_conversion)
 	{
 	  char_converter.reg[0] = XCHARSET_ID (charset);
＠＠ -200,13 +210,14 ＠＠

   if (runs_so_far)
     {
-      run_storage[runs_so_far - 1].len =
-	text_storage - run_storage[runs_so_far - 1].ptr;
-      if (run_storage[runs_so_far - 1].dimension == 2)
-	run_storage[runs_so_far - 1].len >>= 1;
+      runs[runs_so_far - 1].len =
+	text_storage - runs[runs_so_far - 1].ptr;
+      if (runs[runs_so_far - 1].dimension == 2)
+	runs[runs_so_far - 1].len >>= 1;
     }

   return runs_so_far;
+#endif /* not MULE */
 }

 /****************************************************************************/
＠＠ -343,7 +354,8 ＠＠
   findex = rb->findex;
   xpos = rb->xpos;
   if (rb->type == RUNE_CHAR)
-    charset = ichar_charset (rb->object.chr.ch);
+    /* ＠＠#### fix me */
+    charset = ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);

   if (end < 0)
     end = Dynarr_length (rba);
＠＠ -355,7 +367,9 ＠＠

       if (rb->findex == findex && rb->type == RUNE_CHAR
 	  && rb->object.chr.ch != '\n' && rb->cursor_type !=
CURSOR_ON
-	  && EQ (charset, ichar_charset (rb->object.chr.ch)))
+	  /* ＠＠#### fix me */
+	  && EQ (charset,
+		 ichar_charset_obsolete_me_baby_please (rb->object.chr.ch)))
 	{
 	  Dynarr_add (buf, rb->object.chr.ch);
 	  width += rb->width;
＠＠ -378,7 +392,9 ＠＠
 	    {
 	      findex = rb->findex;
 	      xpos = rb->xpos;
-	      charset = ichar_charset (rb->object.chr.ch);
+	      /* ＠＠#### fix me */
+	      charset =
+		ichar_charset_obsolete_me_baby_please (rb->object.chr.ch);

 	      if (rb->cursor_type == CURSOR_ON)
 		{
Index: src/redisplay.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/redisplay.c,v
retrieving revision 1.97
diff -u -r1.97 redisplay.c
--- src/redisplay.c	2005/10/25 11:16:27	1.97
+++ src/redisplay.c	2005/11/22 14:01:02
＠＠ -49,6 +49,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "charset.h"
 #include "commands.h"
 #include "debug.h"
 #include "device-impl.h"
＠＠ -630,13 +631,8 ＠＠
 redisplay_text_width_ichar_string (struct window *w, int findex,
 				    Ichar *str, Charcount len)
 {
-  unsigned char charsets[NUM_LEADING_BYTES];
-  Lisp_Object window;
-
-  find_charsets_in_ichar_string (charsets, str, len);
-  window = wrap_window (w);
-  ensure_face_cachel_complete (WINDOW_FACE_CACHEL (w, findex), window,
-			       charsets);
+  ensure_face_cachel_complete (WINDOW_FACE_CACHEL (w, findex), wrap_window (w),
+			       str, len);
   return DEVMETH (XDEVICE (FRAME_DEVICE (XFRAME (WINDOW_FRAME (w)))),
 		  text_width, (XFRAME (WINDOW_FRAME (w)),
 			       WINDOW_FACE_CACHEL (w, findex), str, len));
＠＠ -667,7 +663,6 ＠＠
 				   Ibyte *nonreloc, Lisp_Object reloc,
 				   Bytecount offset, Bytecount len)
 {
-  unsigned char charsets[NUM_LEADING_BYTES];
   Lisp_Object frame;
   struct face_cachel cachel;

＠＠ -679,11 +674,12 ＠＠
   if (STRINGP (reloc))
     nonreloc = XSTRING_DATA (reloc);
   convert_ibyte_string_into_ichar_dynarr (nonreloc, len, rtw_ichar_dynarr);
-  find_charsets_in_ibyte_string (charsets, nonreloc, len);
   reset_face_cachel (&cachel);
   cachel.face = face;
   frame = wrap_frame (f);
-  ensure_face_cachel_complete (&cachel, frame, charsets);
+  ensure_face_cachel_complete (&cachel, frame,
+			       Dynarr_atp (rtw_ichar_dynarr, 0),
+			       Dynarr_length (rtw_ichar_dynarr));
   return DEVMETH (XDEVICE (FRAME_DEVICE (f)),
 		  text_width, (f, &cachel, Dynarr_atp (rtw_ichar_dynarr, 0),
 			       Dynarr_length (rtw_ichar_dynarr)));
＠＠ -1098,7 +1094,8 ＠＠
     }
   else
     {
-      Lisp_Object charset = ichar_charset (data->ch);
+      /* ＠＠#### fix me */
+      Lisp_Object charset = ichar_charset_obsolete_me_baby_please (data->ch);
       if (!EQ (charset, data->last_charset) ||
 	  data->findex != data->last_findex)
 	{
＠＠ -1121,7 +1118,7 ＠＠
 	  fi = XFONT_INSTANCE (font_instance);
 	  if (!fi->proportional_p || data->font_is_bogus)
 	    {
-	      Ichar ch = data->font_is_bogus ? '~' : data->ch;
+	      Ichar ch = data->font_is_bogus ? CANT_DISPLAY_CHAR : data->ch;

 	      data->last_char_width =
 		redisplay_text_width_ichar_string (XWINDOW (data->window), 
＠＠ -1154,7 +1151,7 ＠＠

   if (Dynarr_length (data->db->runes) < Dynarr_largest (data->db->runes))
     {
-      crb = Dynarr_atp (data->db->runes, Dynarr_length (data->db->runes));
+      crb = Dynarr_past_lastp (data->db->runes);
       local = 0;
     }
   else
＠＠ -1183,7 +1180,7 ＠＠
     /* Text but not in buffer */
     crb->charpos = 0;
   crb->type = RUNE_CHAR;
-  crb->object.chr.ch = data->font_is_bogus ? '~' : data->ch;
+  crb->object.chr.ch = data->font_is_bogus ? CANT_DISPLAY_CHAR : data->ch;
   crb->endpos = 0;

   if (data->cursor_type == CURSOR_ON)
＠＠ -2431,7 +2428,7 ＠＠
 		 appropriate processing to not get a continuation
 		 glyph. */
 	      if (*prop != ADD_FAILED 
-		  && Dynarr_atp (*prop, 0)->type == PROP_GLYPH
+		  && Dynarr_firstp (*prop)->type == PROP_GLYPH
 		  && data.ch == '\n')
 		{ 
 		  /* If there are no more glyphs then do the normal
＠＠ -2442,8 +2439,8 ＠＠
 		     this we would have to carry the index around
 		     which might be problematic since the fragment is
 		     recalculated for each line. */
-		  if (EQ (Dynarr_end (tmpglyphs)->glyph,
-			  Dynarr_atp (*prop, 0)->data.p_glyph.glyph))
+		  if (EQ (Dynarr_lastp (tmpglyphs)->glyph,
+			  Dynarr_firstp (*prop)->data.p_glyph.glyph))
 		    {
 		      Dynarr_free (*prop);
 		      *prop = 0;
＠＠ -2908,7 +2905,7 ＠＠
   db->start_pos = dl->bounds.left_in;
   if (Dynarr_length (db->runes))
     {
-      struct rune *rb = Dynarr_atp (db->runes, Dynarr_length (db->runes) - 1);
+      struct rune *rb = Dynarr_lastp (db->runes);

       db->end_pos = rb->xpos + rb->width;
     }
＠＠ -3823,8 +3820,7 ＠＠

   if (Dynarr_length (db->runes))
     {
-      struct rune *rb =
-        Dynarr_atp (db->runes, Dynarr_length (db->runes) - 1);
+      struct rune *rb = Dynarr_lastp (db->runes);
       c_pixpos = rb->xpos + rb->width;
     }
   else
＠＠ -3865,8 +3861,8 ＠＠
           if (Dynarr_atp (db->runes, elt)->type == RUNE_CHAR)
             {
               len += (set_itext_ichar
-                      (strdata + len, Dynarr_atp (db->runes,
-                                                  elt)->object.chr.ch));
+                      (strdata + len,
+		       Dynarr_atp (db->runes, elt)->object.chr.ch));
             }
         }

＠＠ -5152,7 +5148,7 ＠＠
   db->start_pos = dl->bounds.left_in;
   if (Dynarr_length (db->runes))
     {
-      struct rune *rb = Dynarr_atp (db->runes, Dynarr_length (db->runes) - 1);
+      struct rune *rb = Dynarr_lastp (db->runes);

       db->end_pos = rb->xpos + rb->width;
     }
＠＠ -6099,7 +6095,7 ＠＠
 	{
 	  if (!MINI_WINDOW_P (w) && scroll_on_clipped_lines)
 	    {
-	      dl = Dynarr_atp (dla, Dynarr_length (dla) - 1);
+	      dl = Dynarr_lastp (dla);

 	      if (point >= (dl->charpos + dl->offset)
 		  && point <= (dl->end_charpos + dl->offset))
＠＠ -7578,7 +7574,7 ＠＠
   if (gba)
     {
       glyph_block *gb = Dynarr_atp (gba, 0);
-      glyph_block *gb_last = Dynarr_atp (gba, Dynarr_length (gba));
+      glyph_block *gb_last = Dynarr_past_lastp (gba);

       for (; gb < gb_last; gb++)
 	{
＠＠ -7595,20 +7591,20 ＠＠
 void
 mark_redisplay_structs (display_line_dynarr *dla)
 {
-  display_line *dl = Dynarr_atp (dla, 0);
-  display_line *dl_last = Dynarr_atp (dla, Dynarr_length (dla));
+  display_line *dl = Dynarr_firstp (dla);
+  display_line *dl_last = Dynarr_past_lastp (dla);

   for (; dl < dl_last; dl++)
     {
       display_block_dynarr *dba = dl->display_blocks;
-      display_block *db = Dynarr_atp (dba, 0);
-      display_block *db_last = Dynarr_atp (dba, Dynarr_length (dba));
+      display_block *db = Dynarr_firstp (dba);
+      display_block *db_last = Dynarr_past_lastp (dba);

       for (; db < db_last; db++)
 	{
 	  rune_dynarr *ra = db->runes;
-	  rune *r = Dynarr_atp (ra, 0);
-	  rune *r_last = Dynarr_atp (ra, Dynarr_length (ra));
+	  rune *r = Dynarr_firstp (ra);
+	  rune *r_last = Dynarr_past_lastp (ra);

 	  for (; r < r_last; r++)
 	    {
＠＠ -7727,7 +7723,7 ＠＠
   if (!Dynarr_length (cache))
     return -1;
   else
-    return Dynarr_atp (cache, Dynarr_length (cache) - 1)->end;
+    return Dynarr_lastp (cache)->end;
 }

 /* Return the index of the line POINT is contained within in window
＠＠ -8364,9 +8360,8 ＠＠
 	  return;
 	}

-      start = Dynarr_atp (internal_cache, 0)->start;
-      end =
-	Dynarr_atp (internal_cache, Dynarr_length (internal_cache) - 1)->end;
+      start = Dynarr_firstp (internal_cache)->start;
+      end = Dynarr_lastp (internal_cache)->end;

       /* We aren't allowed to generate additional information to fill in
          gaps, so if the DESIRED structs don't overlap the cache, reset the
＠＠ -8528,11 +8523,11 ＠＠

   /* Readjust the high_bound to account for any changes made while
      correcting the low_bound. */
-  high_bound = Dynarr_atp (cache, Dynarr_length (cache) - 1)->end;
+  high_bound = Dynarr_lastp (cache)->end;

   if (to > high_bound)
     {
-      Charbpos startp = Dynarr_atp (cache, Dynarr_length (cache) - 1)->end + 1;
+      Charbpos startp = Dynarr_lastp (cache)->end + 1;

       do
 	{
＠＠ -8544,7 +8539,7 ＠＠

 	  Dynarr_add_many (cache, Dynarr_atp (internal_cache, 0),
 			   Dynarr_length (internal_cache));
-	  high_bound = Dynarr_atp (cache, Dynarr_length (cache) - 1)->end;
+	  high_bound = Dynarr_lastp (cache)->end;
 	  startp = high_bound + 1;
 	}
       while (to > high_bound);
＠＠ -8634,7 +8629,7 ＠＠

       if (Dynarr_length (dla))
 	{
-	  struct display_line *dl = Dynarr_atp (dla, Dynarr_length (dla) - 1);
+	  struct display_line *dl = Dynarr_lastp (dla);
 	  *pix_y = dl->ypos + dl->descent - dl->clip;
 	}
       else
＠＠ -9051,13 +9046,9 ＠＠
 		  else
 		    {
 		      if (dl->modeline)
-			*modeline_closest =
-			  Dynarr_atp (db->runes,
-				      Dynarr_length (db->runes) - 1)->charpos;
+			*modeline_closest = Dynarr_lastp (db->runes)->charpos;
 		      else
-			*closest =
-			  Dynarr_atp (db->runes,
-				      Dynarr_length (db->runes) - 1)->charpos;
+			*closest = Dynarr_lastp (db->runes)->charpos;
 		    }

 		  if (dl->modeline)
Index: src/regex.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/regex.c,v
retrieving revision 1.56
diff -u -r1.56 regex.c
--- src/regex.c	2005/03/09 04:59:31	1.56
+++ src/regex.c	2005/11/22 14:01:04
＠＠ -5,7 +5,7 ＠＠

    Copyright (C) 1993, 1994, 1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1995, 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 1995, 2001, 2002, 2003, 2005 Ben Wing.

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
＠＠ -60,6 +60,7 ＠＠

 #include "lisp.h"
 #include "buffer.h"
+#include "casetab.h"
 #include "syntax.h"

 #if (defined (DEBUG_XEMACS) && !defined (DEBUG))
＠＠ -640,7 +641,9 ＠＠
   ,charset_mule, /* Matches any character belonging to specified set.
 		    The set is stored in "unified range-table
 		    format"; see rangetab.c.  Unlike the `charset'
-		    opcode, this can handle arbitrary characters. */
+		    opcode, this can handle arbitrary characters.
+		    NOTE: This has nothing to do with the `charset' object,
+		    despite its name. */

   charset_mule_not   /* Same parameters as charset_mule, but match any
 			character that is not one of those specified.  */
＠＠ -3647,31 +3650,49 ＠＠
   if (range_start > range_end)
     return syntax & RE_NO_EMPTY_RANGES ? REG_ERANGE : REG_NOERROR;

+#ifndef UNICODE_INTERNAL
   /* Can't have ranges spanning different charsets, except maybe for
-     ranges entirely within the first 256 chars. */
+     ranges entirely within the first 256 chars. (The intent of this is that
+     the effect of such a range would be unpredictable, since there is no
+     well-defined ordering over charsets and the particular assignment of
+     charset ID's is arbitrary.) This does not apply to Unicode, with
+     well-defined character values. */

   if ((range_start >= 0x100 || range_end >= 0x100)
-      && ichar_leading_byte (range_start) !=
-      ichar_leading_byte (range_end))
+      && !EQ (old_mule_ichar_charset (range_start),
+	      old_mule_ichar_charset (range_end)))
     return REG_ERANGESPAN;
+#endif /* not UNICODE_INTERNAL */

-  /* #### This might be way inefficient if the range encompasses 10,000
-     chars or something.  To be efficient, you'd have to do something like
-     this:
-
-     range_table a;
-     range_table b;
-     map over translation table in [range_start, range_end] of
-       (put the mapped range in a;
-        put the translation in b)
-     invert the range in a and truncate to [range_start, range_end]
-     compute the union of a, b
-     union the result into rtab
-   */
-  for (this_char = range_start; this_char <= range_end; this_char++)
+  if (TRANSLATE_P (translate))
     {
-      SET_RANGETAB_BIT (RE_TRANSLATE (this_char));
+      /* #### This might be way inefficient if the range encompasses 10,000
+	 chars or something.  To be efficient, you'd have to do something like
+	 this:
+
+	 range_table a
+	 range_table b;
+	 map_char_table (translation table, [range_start, range_end]) of
+           lambda (ch, translation):
+             put (ch, Qt) in a
+             put (translation, Qt) in b
+	 invert the range in a and truncate to [range_start, range_end]
+	 put the union of a, b in rtab
+
+	 This is to say, we want to map every character that has a translation
+	 to its translation, and other characters to themselves.
+
+	 This assumes, as is reasonable in practice, that a translation
+	 table maps individual characters to their translation, and does
+	 not generally map multiple characters to the same translation.
+   */
+      for (this_char = range_start; this_char <= range_end; this_char++)
+	{
+	  SET_RANGETAB_BIT (RE_TRANSLATE (this_char));
+	}
     }
+  else
+    put_range_table (rtab, range_start, range_end, Qt);

   if (this_char <= range_end)
     put_range_table (rtab, this_char, range_end, Qt);
＠＠ -3802,8 +3823,9 ＠＠
 	  for (j = *p * BYTEWIDTH; j < 0x80; j++)
             fastmap[j] = 1;
 	  /* And all extended characters must be allowed, too. */
-	  for (j = 0x80; j < 0xA0; j++)
-	    fastmap[j] = 1;
+	  for (j = 0x80; j < 0x100; j++)
+	    if (ibyte_first_byte_p (j))
+	      fastmap[j] = 1;
 #else /* not MULE */
 	  for (j = *p * BYTEWIDTH; j < (1 << BYTEWIDTH); j++)
             fastmap[j] = 1;
＠＠ -3818,18 +3840,18 ＠＠
 	case charset_mule:
 	  {
 	    int nentries;
-	    int i;

 	    nentries = unified_range_table_nentries (p);
-	    for (i = 0; i < nentries; i++)
+	    for (j = 0; j < nentries; j++)
 	      {
 		EMACS_INT first, last;
 		Lisp_Object dummy_val;
 		int jj;
 		Ibyte strr[MAX_ICHAR_LEN];

-		unified_range_table_get_range (p, i, &first, &last,
+		unified_range_table_get_range (p, j, &first, &last,
 					       &dummy_val);
+#ifndef UNICODE_INTERNAL
 		for (jj = first; jj <= last && jj < 0x80; jj++)
 		  fastmap[jj] = 1;
 		/* Ranges below 0x100 can span charsets, but there
＠＠ -3842,24 +3864,47 ＠＠
 		    set_itext_ichar (strr, last);
 		    fastmap[*strr] = 1;
 		  }
+#else
+		/* Ranges can span charsets.  We depend on the fact that
+		   lead bytes are monotonically non-decreasing as
+		   character values increase.  ＠＠#### This is a fairly
+		   reasonable assumption in general (but DOES NOT WORK in
+		   old Mule due to the ordering of private dimension-1
+		   chars before official dimension-2 chars), and introduces
+		   a dependency on the particular representation. */
+		{
+		  Ibyte strrlast[MAX_ICHAR_LEN];
+		  set_itext_ichar (strr, first);
+		  set_itext_ichar (strrlast, last);
+		  for (jj = *strr; jj <= *strrlast; jj++)
+		    fastmap[*strr] = 1;
+		}
+#endif /* not UNICODE_INTERNAL */
 	      }
+	    /* If it's not a possible first byte, it can't be in the fastmap.
+	       In UTF-8, lead bytes are not contiguous with ASCII, so a
+	       range spanning the ASCII/non-ASCII boundary will put
+	       extraneous bytes in the range [0x80 - 0xBF] in the fastmap. */
+	    for (j = 0x80; j < 0x100; j++)
+	      if (!ibyte_first_byte_p (j))
+		fastmap[j] = 0;
 	  }
 	  break;

 	case charset_mule_not:
 	  {
 	    int nentries;
-	    int i;
+	    int smallest_prev = 0;

 	    nentries = unified_range_table_nentries (p);
-	    for (i = 0; i < nentries; i++)
+#ifdef UNICODE_INTERNAL
+	    for (j = 0; j < nentries; j++)
 	      {
 		EMACS_INT first, last;
 		Lisp_Object dummy_val;
 		int jj;
-		int smallest_prev = 0;

-		unified_range_table_get_range (p, i, &first, &last,
+		unified_range_table_get_range (p, j, &first, &last,
 					       &dummy_val);
 		for (jj = smallest_prev; jj < first && jj < 0x80; jj++)
 		  fastmap[jj] = 1;
＠＠ -3867,11 +3912,62 ＠＠
 		if (smallest_prev >= 0x80)
 		  break;
 	      }
-	    /* Calculating which leading bytes are actually allowed
+	    /* Calculating which lead bytes are actually allowed
 	       here is rather difficult, so we just punt and allow
-	       all of them. */
-	    for (i = 0x80; i < 0xA0; i++)
-	      fastmap[i] = 1;
+	       all of them.
+	    */
+	    for (j = 0x80; j < 0x100; j++)
+	      if (ibyte_first_byte_p (j))
+		fastmap[j] = 1;
+#else
+	    for (j = 0; j < nentries; j++)
+	      {
+		EMACS_INT first, last;
+		/* This denotes a range of lead bytes that are not
+		   in the fastmap. */
+		int firstlead, lastlead;
+		Lisp_Object dummy_val;
+		int jj;
+
+		unified_range_table_get_range (p, j, &first, &last,
+					       &dummy_val);
+		/* With Unicode-internal, lead bytes that are entirely
+		   within the range and not including the beginning or end
+		   are definitely not in the fastmap.  Leading bytes that
+		   include the beginning or ending characters will be in
+		   the fastmap unless the beginning or ending characters
+		   are the first or last character, respectively, that uses
+		   this lead byte. ＠＠#### We should try to determine
+		   whether this is the case.  Currently we just assume it's
+		   not. */
+		if (first < 0x80)
+		  firstlead = first;
+		else
+		  {
+		    Ibyte strr[MAX_ICHAR_LEN];
+		    set_itext_ichar (strr, first);
+		    firstlead = *strr + 1;
+		  }
+		if (last < 0x80)
+		  lastlead = last;
+		else
+		  {
+		    Ibyte strr[MAX_ICHAR_LEN];
+		    set_itext_ichar (strr, last);
+		    lastlead = *strr - 1;
+		  }
+		for (jj = smallest_prev; jj < firstlead; jj++)
+		  fastmap[jj] = 1;
+		smallest_prev = last + 1;
+	      }
+	    /* If it's not a possible first byte, it can't be in the fastmap.
+	       In UTF-8, lead bytes are not contiguous with ASCII, so a
+	       range spanning the ASCII/non-ASCII boundary will put
+	       extraneous bytes in the range [0x80 - 0xBF] in the fastmap. */
+	    for (j = 0x80; j < 0x100; j++)
+	      if (!ibyte_first_byte_p (j))
+		fastmap[j] = 0;
+#endif /* UNICODE_INTERNAL */
 	  }
 	  break;
 #endif /* MULE */
＠＠ -3885,8 +3981,9 ＠＠
 #ifdef MULE
 	    /* "anything" only includes bytes that can be the
 	       first byte of a character. */
-	    for (j = 0; j < 0xA0; j++)
-	      fastmap[j] = 1;
+	    for (j = 0; j < 0x100; j++)
+	      if (ibyte_first_byte_p (j))
+		fastmap[j] = 1;
 #else
 	    for (j = 0; j < (1 << BYTEWIDTH); j++)
 	      fastmap[j] = 1;
＠＠ -3951,25 +4048,12 ＠＠
 		(XCHAR_TABLE (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) ==
 		(enum syntaxcode) k)
 	      fastmap[j] = 1;
-	  for (j = 0x80; j < 0xA0; j++)
-	    {
-	      if (leading_byte_prefix_p ((unsigned char) j))
-		/* too complicated to calculate this right */
-		fastmap[j] = 1;
-	      else
-		{
-		  int multi_p;
-		  Lisp_Object cset;
-
-		  cset = charset_by_leading_byte (j);
-		  if (CHARSETP (cset))
-		    {
-		      if (charset_syntax (lispbuf, cset, &multi_p)
-			  == Sword || multi_p)
-			fastmap[j] = 1;
-		    }
-		}
-	    }
+	  /* ＠＠#### To be correct, we need to set the fastmap for any
+	     lead byte any of whose characters can have this syntax code.
+	     This is hard to calculate so we just punt for now. */
+	  for (j = 0x80; j < 0x100; j++)
+	    if (ibyte_first_byte_p (j))
+	      fastmap[j] = 1;
 #else /* not MULE */
 	  for (j = 0; j < (1 << BYTEWIDTH); j++)
 	    if (SYNTAX
＠＠ -3990,25 +4074,12 ＠＠
 		 (BUFFER_MIRROR_SYNTAX_TABLE (lispbuf)), j) !=
 		(enum syntaxcode) k)
 	      fastmap[j] = 1;
-	  for (j = 0x80; j < 0xA0; j++)
-	    {
-	      if (leading_byte_prefix_p ((unsigned char) j))
-		/* too complicated to calculate this right */
-		fastmap[j] = 1;
-	      else
-		{
-		  int multi_p;
-		  Lisp_Object cset;
-
-		  cset = charset_by_leading_byte (j);
-		  if (CHARSETP (cset))
-		    {
-		      if (charset_syntax (lispbuf, cset, &multi_p)
-			  != Sword || multi_p)
-			fastmap[j] = 1;
-		    }
-		}
-	    }
+	  /* ＠＠#### To be correct, we need to set the fastmap for any
+	     lead byte all of whose characters do not have this syntax code.
+	     This is hard to calculate so we just punt for now. */
+	  for (j = 0x80; j < 0x100; j++)
+	    if (ibyte_first_byte_p (j))
+	      fastmap[j] = 1;
 #else /* not MULE */
 	  for (j = 0; j < (1 << BYTEWIDTH); j++)
 	    if (SYNTAX
Index: src/search.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/search.c,v
retrieving revision 1.46
diff -u -r1.46 search.c
--- src/search.c	2005/01/24 23:34:09	1.46
+++ src/search.c	2005/11/22 14:01:05
＠＠ -1,7 +1,7 ＠＠
 /* String search routines for XEmacs.
    Copyright (C) 1985, 1986, 1987, 1992-1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -30,18 +30,15 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "insdel.h"
 #include "opaque.h"
+#include "regex.h"
 #ifdef REGION_CACHE_NEEDS_WORK
 #include "region-cache.h"
 #endif
 #include "syntax.h"

-#include <sys/types.h>
-#include "regex.h"
-#include "casetab.h"
-#include "chartab.h"
-
 #define TRANSLATE(table, pos)	\
  (!NILP (table) ? TRT_TABLE_OF (table, (Ichar) pos) : pos)

＠＠ -54,7 +51,7 ＠＠
   struct regexp_cache *next;
   Lisp_Object regexp;
   struct re_pattern_buffer buf;
-  char fastmap[0400];
+  char fastmap[256];
   /* Nonzero means regexp was compiled to do full POSIX backtracking.  */
   char posix;
 };
＠＠ -131,7 +128,8 ＠＠
 static Charbpos boyer_moore (struct buffer *buf, Ibyte *base_pat,
 			     Bytecount len, Bytebpos pos, Bytebpos lim,
 			     EMACS_INT n, Lisp_Object trt,
-			     Lisp_Object inverse_trt, int charset_base);
+			     Lisp_Object inverse_trt, Ibyte *char_base,
+                             int char_base_len);
 static Charbpos search_buffer (struct buffer *buf, Lisp_Object str,
 			       Charbpos charbpos, Charbpos buflim, EMACS_INT n,
 			       int RE, Lisp_Object trt,
＠＠ -818,7 +816,7 ＠＠
      directly.  For other characters, we do it the "hard" way.
      Note that this way works for all characters but the other
      way is faster. */
-  if (target >= 0200)
+  if (target >= 128)
     {
       while (st < lim && count > 0)
 	{
＠＠ -871,7 +869,7 ＠＠
   REGISTER Ichar c;
   /* We store the first 256 chars in an array here and the rest in
      a range table. */
-  unsigned char fastmap[0400];
+  unsigned char fastmap[256];
   int negate = 0;
   REGISTER int i;
   Charbpos limit;
＠＠ -912,7 +910,7 ＠＠
       INC_IBYTEPTR (p);
       if (syntaxp)
 	{
-	  if (c < 0400 && syntax_spec_code[c] < (unsigned char) Smax)
+	  if (c < 256 && syntax_spec_code[c] < (unsigned char) Smax)
 	    fastmap[c] = 1;
 	  else
 	    invalid_argument ("Invalid syntax designator", make_char (c));
＠＠ -933,7 +931,7 ＠＠
 	      p++;
 	      if (p == pend) break;
 	      cend = itext_ichar (p);
-	      while (c <= cend && c < 0400)
+	      while (c <= cend && c < 256)
 		{
 		  fastmap[c] = 1;
 		  c++;
＠＠ -945,7 +943,7 ＠＠
 	    }
 	  else
 	    {
-	      if (c < 0400)
+	      if (c < 256)
 		fastmap[c] = 1;
 	      else
 		Fput_range_table (make_int (c), make_int (c), Qt,
＠＠ -1018,7 +1016,7 ＠＠
 	    while (pos < limit)
 	      {
 		Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
+		if ((ch < 256) ? fastmap[ch] :
 		    (NILP (Fget_range_table (make_int (ch),
 					     Vskip_chars_range_table,
 					     Qnil))
＠＠ -1040,7 +1038,7 ＠＠

 		DEC_BYTEBPOS (buf, prev_pos_byte);
 		ch = BYTE_BUF_FETCH_CHAR (buf, prev_pos_byte);
-		if ((ch < 0400) ? fastmap[ch] :
+		if ((ch < 256) ? fastmap[ch] :
 		    (NILP (Fget_range_table (make_int (ch),
 					     Vskip_chars_range_table,
 					     Qnil))
＠＠ -1338,7 +1336,8 ＠＠
     }
   else				/* non-RE case */
     {
-      int charset_base = -1;
+      int char_base_len = -1;
+      Ibyte char_base[MAX_ICHAR_LEN];
       int boyer_moore_ok = 1;
       Ibyte *pat = 0;
       Ibyte *patbuf = alloca_ibytes (len * MAX_ICHAR_LEN);
＠＠ -1364,21 +1363,35 ＠＠
 	  inverse = TRANSLATE (inverse_trt, c);

 	  orig_bytelen = itext_ichar_len (base_pat);
-	  inv_bytelen = set_itext_ichar (tmp_str, inverse);
+	  inv_bytelen = ichar_len (inverse);
 	  new_bytelen = set_itext_ichar (tmp_str, translated);

 	  if (new_bytelen != orig_bytelen || inv_bytelen != orig_bytelen)
 	    boyer_moore_ok = 0;
 	  if (translated != c || inverse != c)
 	    {
-	      /* Keep track of which character set row
-		 contains the characters that need translation.  */
-	      int charset_base_code = c & ~ICHAR_FIELD3_MASK;
-	      if (charset_base == -1)
-		charset_base = charset_base_code;
-	      else if (charset_base != charset_base_code)
-		/* If two different rows appear, needing translation,
-		   then we cannot use boyer_moore search.  */
+	      /* Track the original character in string char representation
+		 (minus final byte); we will compare it against each other
+		 character (again minus final byte), to see if they're the
+		 same. */
+	      if (char_base_len == -1)
+		{
+		  char_base_len = orig_bytelen;
+		  if (char_base_len > 1)
+		    memcpy (char_base, base_pat, char_base_len - 1);
+		}
+	      else if (char_base_len != orig_bytelen ||
+		       /* Are two strings different? When we have only a
+			  single byte to compare, don't try calling memcmp()
+			  with zero size, to just the zero-size strings are
+			  the same */
+		       (char_base_len > 1 ?
+			memcmp (char_base, base_pat, char_base_len - 1) : 0))
+		/* If two different characters appear, needing translation
+		   but differing in one of the non-final bytes, then we
+		   cannot use boyer_moore search. #### Either explain why
+		   it's not possible or not worth it to extend Boyer-moore to
+		   eliminate this restriction, or go ahead and eliminate it. */
 		boyer_moore_ok = 0;
 	    }
 	  memcpy (pat, tmp_str, new_bytelen);
＠＠ -1404,7 +1417,7 ＠＠
       pat = base_pat = patbuf;
       if (boyer_moore_ok)
 	return boyer_moore (buf, base_pat, len, pos, lim, n,
-			    trt, inverse_trt, charset_base);
+			    trt, inverse_trt, char_base, char_base_len);
       else
 	return simple_search (buf, base_pat, len, pos, lim, n, trt);
     }
＠＠ -1545,7 +1558,8 ＠＠
 static Charbpos
 boyer_moore (struct buffer *buf, Ibyte *base_pat, Bytecount len,
 	     Bytebpos pos, Bytebpos lim, EMACS_INT n, Lisp_Object trt,
-	     Lisp_Object inverse_trt, int USED_IF_MULE (charset_base))
+	     Lisp_Object inverse_trt, Ibyte *USED_IF_MULE (char_base),
+	     int USED_IF_MULE (char_base_len))
 {
   /* &&#### needs some 8-bit work here */
   /* #### Someone really really really needs to comment the workings
＠＠ -1586,14 +1600,14 ＠＠
   REGISTER EMACS_INT i, j;
   Ibyte *pat, *pat_end;
   REGISTER Ibyte *cursor, *p_limit, *ptr2;
-  Ibyte simple_translate[0400];
+  Ibyte simple_translate[256];
   REGISTER int direction = ((n > 0) ? 1 : -1);
 #ifdef MULE
-  Ibyte translate_prev_byte = 0;
-  Ibyte translate_anteprev_byte = 0;
+  Ibyte translate_prev[MAX_ICHAR_LEN];
+  Ibyte translate_prev_num = 0;
 #endif
 #ifdef C_ALLOCA
-  EMACS_INT BM_tab_space[0400];
+  EMACS_INT BM_tab_space[256];
   BM_tab = &BM_tab_space[0];
 #else
   BM_tab = alloca_array (EMACS_INT, 256);
＠＠ -1635,7 +1649,7 ＠＠
   if (direction < 0)
     base_pat = pat_end - 1;
   BM_tab_base = BM_tab;
-  BM_tab += 0400;
+  BM_tab += 256;
   j = dirlen;		/* to get it in a register */
   /* A character that does not appear in the pattern induces a
      stride equal to the pattern length. */
＠＠ -1647,10 +1661,10 ＠＠
       *--BM_tab = j;
     }
   /* We use this for translation, instead of TRT itself.  We
-     fill this in to handle the characters that actually occur
+     fill this in to handle the bytes that actually occur
      in the pattern.  Others don't matter anyway!  */
   xzero (simple_translate);
-  for (i = 0; i < 0400; i++)
+  for (i = 0; i < 256; i++)
     simple_translate[i] = (Ibyte) i;
   i = 0;

＠＠ -1663,7 +1677,7 ＠＠
       if (!NILP (trt))
 	{
 #ifdef MULE
-	  Ichar ch, untranslated;
+	  Ichar ch;
 	  int this_translated = 1;

 	  /* Is *PTR the last byte of a character?  */
＠＠ -1672,16 +1686,24 ＠＠
 	      Ibyte *charstart = ptr;
 	      while (!ibyte_first_byte_p (*charstart))
 		charstart--;
-	      untranslated = itext_ichar (charstart);
-	      if (charset_base == (untranslated & ~ICHAR_FIELD3_MASK))
+
+	      if (char_base_len != itext_ichar_len (charstart) ||
+		  /* Are two strings different? When we have only a
+		     single byte to compare, don't try calling memcmp()
+		     with zero size, to just the zero-size strings are
+		     the same */
+		  (char_base_len > 1 ?
+		   memcmp (char_base, charstart, char_base_len - 1) : 0))
 		{
+		  Ibyte *ptr2 = ptr;
+		  Ichar untranslated = itext_ichar (charstart);
 		  ch = TRANSLATE (trt, untranslated);
-		  if (!ibyte_first_byte_p (*ptr))
-		    {
-		      translate_prev_byte = ptr[-1];
-		      if (!ibyte_first_byte_p (translate_prev_byte))
-			translate_anteprev_byte = ptr[-2];
-		    }
+		  /* We set everything to zero.  Since we use translate_prev
+		     only for storing parts of multi-byte characters, there
+		     won't be any zero's in them. */
+		  xzero (translate_prev);
+		  while (!ibyte_first_byte_p (*ptr2))
+		    translate_prev[translate_prev_num++] = *--ptr2;
 		}
 	      else
 		{
＠＠ -1694,8 +1716,8 ＠＠
 	      ch = *ptr;
 	      this_translated = 0;
 	    }
-	  if (ch > 0400)
-	    j = ((unsigned char) ch | 0200);
+	  if (ch > 256)
+	    j = ((unsigned char) ch | 128);
 	  else
 	    j = (unsigned char) ch;

＠＠ -1711,8 +1733,8 ＠＠
 	      while (1)
 		{
 		  ch = TRANSLATE (inverse_trt, ch);
-		  if (ch > 0400)
-		    j = ((unsigned char) ch | 0200);
+		  if (ch > 256)
+		    j = ((unsigned char) ch | 128);
 		  else
 		    j = (unsigned char) ch;

＠＠ -1845,15 +1867,29 ＠＠
 		  while ((i -= direction) + direction != 0)
 		    {
 #ifdef MULE
+		      int dotrans;
 		      Ichar ch;
 		      cursor -= direction;
 		      /* Translate only the last byte of a character.  */
-		      if ((cursor == tail_end_ptr
-			   || ibyte_first_byte_p (cursor[1]))
-			  && (ibyte_first_byte_p (cursor[0])
-			      || (translate_prev_byte == cursor[-1]
-				  && (ibyte_first_byte_p (translate_prev_byte)
-				      || translate_anteprev_byte == cursor[-2]))))
+		      dotrans = (cursor == tail_end_ptr
+				 || ibyte_first_byte_p (cursor[1]));
+		      if (dotrans)
+			{
+			  int i = 0;
+			  Ibyte *curs2 = cursor;
+			  while (1)
+			    {
+			      if (ibyte_first_byte_p (*curs2))
+				break;
+			      if (translate_prev[i++] != *--curs2)
+				{
+				  dotrans = 0;
+				  break;
+				}
+			    }
+			}
+
+		      if (dotrans)
 			ch = simple_translate[*cursor];
 		      else
 			ch = *cursor;
＠＠ -1934,16 +1970,28 ＠＠
 #ifdef MULE
 		  Ichar ch;
 		  Ibyte *ptr;
-#endif
+		  int dotrans;
+
 		  pos -= direction;
-#ifdef MULE
 		  ptr = BYTE_BUF_BYTE_ADDRESS_NO_VERIFY (buf, pos);
-		  if ((ptr == tail_end_ptr
-		       || ibyte_first_byte_p (ptr[1]))
-		      && (ibyte_first_byte_p (ptr[0])
-			  || (translate_prev_byte == ptr[-1]
-			      && (ibyte_first_byte_p (translate_prev_byte)
-				  || translate_anteprev_byte == ptr[-2]))))
+		  dotrans = (ptr == tail_end_ptr
+			     || ibyte_first_byte_p (ptr[1]));
+		  if (dotrans)
+		    {
+		      int i = 0;
+		      Ibyte *ptr2 = ptr;
+		      while (1)
+			{
+			  if (ibyte_first_byte_p (*ptr2))
+			    break;
+			  if (translate_prev[i++] != *--ptr2)
+			    {
+			      dotrans = 0;
+			      break;
+			    }
+			}
+		    }
+		  if (dotrans)
 		    ch = simple_translate[*ptr];
 		  else
 		    ch = *ptr;
＠＠ -1951,6 +1999,7 ＠＠
 		    break;

 #else
+		  pos -= direction;
 		  if (pat[i] !=
 		      TRANSLATE (trt,
 				 *BYTE_BUF_BYTE_ADDRESS_NO_VERIFY (buf, pos)))
＠＠ -2037,7 +2086,7 ＠＠
   Charcount i, len;
   EMACS_INT punct_count = 0, word_count = 0;
   struct buffer *buf = decode_buffer (buffer, 0);
-  Lisp_Object syntax_table = buf->mirror_syntax_table;
+  Lisp_Object syntax_table = BUFFER_MIRROR_SYNTAX_TABLE (buf);

   CHECK_STRING (string);
   len = string_char_length (string);
＠＠ -2459,7 +2508,7 ＠＠
       buf = XBUFFER (buffer);
     }

-  syntax_table = buf->mirror_syntax_table;
+  syntax_table = BUFFER_MIRROR_SYNTAX_TABLE (buf);

   case_action = nochange;	/* We tried an initialization */
 				/* but some C compilers blew it */
Index: src/select-x.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/select-x.c,v
retrieving revision 1.23
diff -u -r1.23 select-x.c
--- src/select-x.c	2005/03/02 18:31:57	1.23
+++ src/select-x.c	2005/11/22 14:01:05
＠＠ -1,6 +1,6 ＠＠
 /* X Selection processing for XEmacs
    Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -326,11 +326,10 ＠＠
 		continue;
 	      }

-	    if ((*ptr) == LEADING_BYTE_LATIN_ISO8859_1 ||
-		(*ptr) == LEADING_BYTE_CONTROL_1)
+	    if (itext_ichar (ptr) < 256)
 	      {
 		chartypes = LATIN_1;
-		ptr += 2;
+		INC_IBYTEPTR (ptr);
 		continue;
 	      }

＠＠ -1389,11 +1388,10 ＠＠
 	  continue;
 	}

-      if ((*ptr) == LEADING_BYTE_LATIN_ISO8859_1 ||
-	  (*ptr) == LEADING_BYTE_CONTROL_1)
+      if (itext_ichar (ptr) < 256)
 	{
 	  chartypes = LATIN_1;
-	  ptr += 2;
+	  INC_IBYTEPTR (ptr);
 	  continue;
 	}

Index: src/specifier.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/specifier.h,v
retrieving revision 1.18
diff -u -r1.18 specifier.h
--- src/specifier.h	2005/10/24 10:07:41	1.18
+++ src/specifier.h	2005/11/22 14:01:05
＠＠ -131,8 +131,9 ＠＠
      If this function is not present, then Fcopy_tree is used. */
   Lisp_Object (*copy_instantiator_method) (Lisp_Object instantiator);

-  /* Validate-matchspec method: Given a matchspec, verify that it's
-     valid for this specifier type.  If not, signal an error.
+  /* Validate-matchspec method: Given a matchspec (see
+     Fspecifier_matching_instance), verify that it's valid for this
+     specifier type.  If not, signal an error.

      If this function is not present, *no* matchspecs are considered
      valid.  Note that this differs from validate_method(). */
Index: src/symsinit.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/symsinit.h,v
retrieving revision 1.53
diff -u -r1.53 symsinit.h
--- src/symsinit.h	2005/10/04 17:51:24	1.53
+++ src/symsinit.h	2005/11/22 14:01:05
＠＠ -1,6 +1,6 ＠＠
 /* Various initialization function prototypes.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 2001, 2002 Ben Wing.
+   Copyright (C) 2001, 2002, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -140,9 +140,7 ＠＠
 void syms_of_lread (void);
 void syms_of_macros (void);
 void syms_of_marker (void);
-#ifdef MC_ALLOC
 void syms_of_mc_alloc (void);
-#endif /* MC_ALLOC */
 void syms_of_md5 (void);
 void syms_of_menubar (void);
 void syms_of_menubar_mswindows (void);
＠＠ -462,6 +460,7 ＠＠
 void complex_vars_of_faces (void);
 void complex_vars_of_mule_charset (void);
 void complex_vars_of_file_coding (void);
+void complex_vars_of_mule_coding (void);
 void complex_vars_of_intl_win32 (void);
 void complex_vars_of_glyphs (void);
 void complex_vars_of_glyphs_x (void);
Index: src/syntax.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/syntax.c,v
retrieving revision 1.24
diff -u -r1.24 syntax.c
--- src/syntax.c	2005/10/25 11:16:28	1.24
+++ src/syntax.c	2005/11/22 14:01:06
＠＠ -1,7 +1,7 ＠＠
 /* XEmacs routines to deal with syntax tables; also word and list parsing.
    Copyright (C) 1985-1994 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2001, 2002, 2003 Ben Wing.
+   Copyright (C) 2001, 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -58,11 +58,6 ＠＠

 Lisp_Object Vsyntax_designator_chars_string;

-Lisp_Object Vtemp_table_for_use_updating_syntax_tables;
-
-/* A value that is guaranteed not be in a syntax table. */
-Lisp_Object Vbogus_syntax_table_value;
-
 static void syntax_cache_table_was_changed (struct buffer *buf);

 /* This is the internal form of the parse state used in parse-partial-sexp.  */
＠＠ -177,6 +172,8 ＠＠

 #ifdef DEBUG_XEMACS

+#ifdef MIRROR_TABLE
+
 DEFUN ("mirror-syntax-table", Fmirror_syntax_table, 0, 1, 0, /*
 Return the current mirror syntax table, for debugging purposes.
 This is the one specified by the current buffer, or by BUFFER if it
＠＠ -187,6 +184,8 ＠＠
   return decode_buffer (buffer, 0)->mirror_syntax_table;
 }

+#endif /* MIRROR_TABLE */
+
 DEFUN ("syntax-cache-info", Fsyntax_cache_info, 0, 1, 0, /*
 Return info about the syntax cache in BUFFER.
 BUFFER defaults to the current buffer if nil.
＠＠ -232,7 +231,9 ＠＠
   struct buffer *buf = decode_buffer (buffer, 0);
   syntax_table = check_syntax_table (syntax_table, Qnil);
   buf->syntax_table = syntax_table;
+#ifdef MIRROR_TABLE
   buf->mirror_syntax_table = XCHAR_TABLE (syntax_table)->mirror_table;
+#endif /* MIRROR_TABLE */
   syntax_cache_table_was_changed (buf);
   /* Indicate that this buffer now has a specified syntax table.  */
   buf->local_var_flags |= XINT (buffer_local_flags.syntax_table);
＠＠ -250,8 +251,10 ＠＠
   cache->no_syntax_table_prop = 1;
   cache->syntax_table =
     BUFFER_SYNTAX_TABLE (cache->buffer);
+#ifdef MIRROR_TABLE
   cache->mirror_table =
     BUFFER_MIRROR_SYNTAX_TABLE (cache->buffer);
+#endif /* MIRROR_TABLE */
   cache->start = Qnil;
   cache->end = Qnil;
   if (infinite)
＠＠ -303,7 +306,9 ＠＠
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, object) },
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, buffer) },
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, syntax_table) },
+#ifdef MIRROR_TABLE
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, mirror_table) },
+#endif /* MIRROR_TABLE */
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, start) },
   { XD_LISP_OBJECT, offsetof (struct syntax_cache, end) },
   { XD_END }
＠＠ -324,7 +329,9 ＠＠
   if (cache->buffer)
     mark_object (wrap_buffer (cache->buffer));
   mark_object (cache->syntax_table);
+#ifdef MIRROR_TABLE
   mark_object (cache->mirror_table);
+#endif /* MIRROR_TABLE */
   mark_object (cache->start);
   mark_object (cache->end);
 }
＠＠ -350,7 +357,9 ＠＠
   cache->buffer = buf;
   cache->no_syntax_table_prop = 1;
   cache->syntax_table = BUFFER_SYNTAX_TABLE (cache->buffer);
+#ifdef MIRROR_TABLE
   cache->mirror_table = BUFFER_MIRROR_SYNTAX_TABLE (cache->buffer);
+#endif /* MIRROR_TABLE */
   cache->start = Fmake_marker ();
   cache->end = Fmake_marker ();
   reset_buffer_cache_range (cache, cache->object);
＠＠ -372,8 +381,10 ＠＠
     {
       cache->syntax_table =
 	BUFFER_SYNTAX_TABLE (buf);
+#ifdef MIRROR_TABLE
       cache->mirror_table =
 	BUFFER_MIRROR_SYNTAX_TABLE (buf);
+#endif /* MIRROR_TABLE */
     }
 }

＠＠ -500,7 +511,9 ＠＠
     {
       cache->use_code = 0;
       cache->syntax_table = tmp_table;
+#ifdef MIRROR_TABLE
       cache->mirror_table = XCHAR_TABLE (tmp_table)->mirror_table;
+#endif /* MIRROR_TABLE */
       cache->no_syntax_table_prop = 0;
 #ifdef NOT_WORTH_THE_EFFORT
       update_mirror_syntax_if_dirty (cache->mirror_table);
＠＠ -517,7 +530,9 ＠＠
       cache->use_code = 0;
       cache->no_syntax_table_prop = 1;
       cache->syntax_table = BUFFER_SYNTAX_TABLE (cache->buffer);
+#ifdef MIRROR_TABLE
       cache->mirror_table = BUFFER_MIRROR_SYNTAX_TABLE (cache->buffer);
+#endif /* MIRROR_TABLE */
 #ifdef NOT_WORTH_THE_EFFORT
       update_mirror_syntax_if_dirty (cache->mirror_table);
 #endif /* NOT_WORTH_THE_EFFORT */
＠＠ -573,8 +588,6 ＠＠
 */
        (character, syntax_table))
 {
-  Lisp_Object mirrortab;
-
   if (NILP (character))
     {
       character = make_char ('\000');
＠＠ -582,24 +595,16 ＠＠
   CHECK_CHAR_COERCE_INT (character);
   syntax_table = check_syntax_table (syntax_table,
 				     current_buffer->syntax_table);
-  mirrortab = XCHAR_TABLE (syntax_table)->mirror_table;
-  return make_char (syntax_code_spec[(int) SYNTAX (mirrortab,
+#ifdef MIRROR_TABLE
+  return make_char
+    (syntax_code_spec[(int) SYNTAX (XCHAR_TABLE (syntax_table)->mirror_table,
+				    XCHAR (character))]);
+#else
+  return make_char (syntax_code_spec[(int) SYNTAX (syntax_table,
 						   XCHAR (character))]);
+#endif /* MIRROR_TABLE */
 }

-#ifdef MULE
-
-enum syntaxcode
-charset_syntax (struct buffer *UNUSED (buf), Lisp_Object UNUSED (charset),
-		int *multi_p_out)
-{
-  *multi_p_out = 1;
-  /* !!#### get this right */
-  return Spunct;
-}
-
-#endif
-
 Lisp_Object
 syntax_match (Lisp_Object syntax_table, Ichar ch)
 {
＠＠ -621,14 +626,16 ＠＠
 */
        (character, syntax_table))
 {
-  Lisp_Object mirrortab;
   enum syntaxcode code;

   CHECK_CHAR_COERCE_INT (character);
   syntax_table = check_syntax_table (syntax_table,
 				     current_buffer->syntax_table);
-  mirrortab = XCHAR_TABLE (syntax_table)->mirror_table;
-  code = SYNTAX (mirrortab, XCHAR (character));
+#ifdef MIRROR_TABLE
+  code = SYNTAX (XCHAR_TABLE (syntax_table)->mirror_table, XCHAR (character));
+#else
+  code = SYNTAX (syntax_table, XCHAR (character));
+#endif /* MIRROR_TABLE */
   if (code == Sopen || code == Sclose || code == Sstring)
     return syntax_match (syntax_table, XCHAR (character));
   return Qnil;
＠＠ -644,6 +651,8 ＠＠
 #define WORD_BOUNDARY_P(c1, c2)			\
   (!(ichar_ascii_p (c1) && ichar_ascii_p (c2))	\
    && word_boundary_p (c1, c2))
+#else
+#define WORD_BOUNDARY_P(c1, c2) 0
 #endif

 /* Return the position across COUNT words from FROM.
＠＠ -689,15 +698,9 ＠＠
 	  code = SYNTAX_FROM_CACHE (scache, ch1);
 	  if (!(words_include_escapes
 		&& (code == Sescape || code == Scharquote)))
-	    if (code != Sword
-#ifdef MULE
-		|| WORD_BOUNDARY_P (ch0, ch1)
-#endif
-		)
+	    if (code != Sword || WORD_BOUNDARY_P (ch0, ch1))
 	      break;
-#ifdef MULE
 	  ch0 = ch1;
-#endif
 	  from++;
 	}
       count--;
＠＠ -734,15 +737,9 ＠＠

 	  if (!(words_include_escapes
 		&& (code == Sescape || code == Scharquote)))
-	    if (code != Sword
-#ifdef MULE
-		|| WORD_BOUNDARY_P (ch0, ch1)
-#endif
-		)
+	    if (code != Sword || WORD_BOUNDARY_P (ch0, ch1))
 	      break;
-#ifdef MULE
 	  ch1 = ch0;
-#endif
 	  from--;
 	}
       count++;
＠＠ -2191,6 +2188,7 ＠＠
   return val;
 }

+#ifdef MIRROR_TABLE

 /* Updating of the mirror syntax table.

＠＠ -2210,7 +2208,7 ＠＠
    */

 static int
-copy_to_mirrortab (struct chartab_range *range, Lisp_Object UNUSED (table),
+copy_to_mirrortab (Lisp_Object UNUSED (table), Ichar ch,
 		   Lisp_Object val, void *arg)
 {
   Lisp_Object mirrortab = VOID_TO_LISP (arg);
＠＠ -2218,13 +2216,12 ＠＠
   if (CONSP (val))
     val = XCAR (val);
   if (SYNTAX_FROM_CODE (XINT (val)) != Sinherit)
-    put_char_table (mirrortab, range, val);
+    put_char_table_1 (mirrortab, ch, val);
   return 0;
 }

 static int
-copy_if_not_already_present (struct chartab_range *range,
-			     Lisp_Object UNUSED (table),
+copy_if_not_already_present (Lisp_Object UNUSED (table), Ichar ch,
 			     Lisp_Object val, void *arg)
 {
   Lisp_Object mirrortab = VOID_TO_LISP (arg);
＠＠ -2232,24 +2229,10 ＠＠
     val = XCAR (val);
   if (SYNTAX_FROM_CODE (XINT (val)) != Sinherit)
     {
-      Lisp_Object existing =
-	updating_mirror_get_range_char_table (range, mirrortab,
-					      Vbogus_syntax_table_value);
-      if (NILP (existing))
+      Lisp_Object existing = get_char_table_raw (ch, mirrortab);
+      if (UNBOUNDP (existing))
 	/* nothing at all */
-	put_char_table (mirrortab, range, val);
-      else if (!EQ (existing, Vbogus_syntax_table_value))
-	/* full */
-	;
-      else
-	{
-	  Freset_char_table (Vtemp_table_for_use_updating_syntax_tables);
-	  copy_char_table_range
-	    (mirrortab, Vtemp_table_for_use_updating_syntax_tables, range);
-	  put_char_table (mirrortab, range, val);
-	  copy_char_table_range
-	    (Vtemp_table_for_use_updating_syntax_tables, mirrortab, range);
-	}
+	put_char_table_1 (mirrortab, ch, val);
     }

   return 0;
＠＠ -2271,7 +2254,6 ＠＠
      entries don't already exist in that table. (The copying step requires
      another mapping.)
      */
-
   map_char_table (table, &range, copy_to_mirrortab, LISP_TO_VOID (mirrortab));
   /* second clause catches bootstrapping problems when initializing the
      standard syntax table */
＠＠ -2283,6 +2265,8 ＠＠
   XCHAR_TABLE (mirrortab)->dirty = 0;
 }

+#endif /* MIRROR_TABLE */
+
 /* Called from chartab.c when a change is made to a syntax table.
    If this is the standard syntax table, we need to recompute
    *all* syntax tables (yuck).  Otherwise we just recompute this
＠＠ -2291,6 +2275,7 ＠＠
 void
 update_syntax_table (Lisp_Object table)
 {
+#ifdef MIRROR_TABLE
   Lisp_Object nonmirror = XCHAR_TABLE (table)->mirror_table;
   assert (XCHAR_TABLE (table)->mirror_table_p);
   if (EQ (nonmirror, Vstandard_syntax_table))
＠＠ -2303,6 +2288,7 ＠＠
     }
   else
     update_just_this_syntax_table (nonmirror);
+#endif /* MIRROR_TABLE */
 }

＠＠ -2319,7 +2305,9 ＠＠
   DEFSUBR (Fsyntax_table_p);
   DEFSUBR (Fsyntax_table);
 #ifdef DEBUG_XEMACS
+#ifdef MIRROR_TABLE
   DEFSUBR (Fmirror_syntax_table);
+#endif /* MIRROR_TABLE */
   DEFSUBR (Fsyntax_cache_info);
 #endif /* DEBUG_XEMACS */
   DEFSUBR (Fstandard_syntax_table);
＠＠ -2367,9 +2355,6 ＠＠
   words_include_escapes = 0;

   no_quit_in_re_search = 0;
-
-  Vbogus_syntax_table_value = make_float (0.0);
-  staticpro (&Vbogus_syntax_table_value);
 }

 static void
＠＠ -2391,9 +2376,6 ＠＠
   Vstandard_syntax_table = Fcopy_syntax_table (Qnil);
   staticpro (&Vstandard_syntax_table);

-  Vtemp_table_for_use_updating_syntax_tables = Fmake_char_table (Qgeneric);
-  staticpro (&Vtemp_table_for_use_updating_syntax_tables);
-  
   Vsyntax_designator_chars_string = make_string_nocopy (syntax_code_spec,
 							Smax);
   staticpro (&Vsyntax_designator_chars_string);
Index: src/syntax.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/syntax.h,v
retrieving revision 1.12
diff -u -r1.12 syntax.h
--- src/syntax.h	2003/02/20 08:19:43	1.12
+++ src/syntax.h	2005/11/22 14:01:06
＠＠ -1,6 +1,6 ＠＠
 /* Declarations having to do with XEmacs syntax tables.
    Copyright (C) 1985, 1992, 1993 Free Software Foundation, Inc.
-   Copyright (C) 2002, 2003 Ben Wing.
+   Copyright (C) 2002, 2003, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -24,6 +24,7 ＠＠
 #ifndef INCLUDED_syntax_h_
 #define INCLUDED_syntax_h_

+#include "buffer.h"
 #include "chartab.h"

 /* A syntax table is a type of char table.
＠＠ -35,17 +36,30 ＠＠
 integers and chars.  The lowest 7 bits of the integer are the syntax
 class.  If this is Sinherit, then the actual syntax value needs to
 be retrieved from the standard syntax table.
+*/
+
+#ifdef MIRROR_TABLE
+
+/*

-Since the logic involved in finding the actual integer isn't very
+[[ Since the logic involved in finding the actual integer isn't very
 complex, you'd think the time required to retrieve it is not a
 factor.  If you thought that, however, you'd be wrong, due to the
 high number of times (many per character) that the syntax value is
 accessed in functions such as scan_lists().  To speed this up,
 we maintain a mirror syntax table that contains the actual
 integers.  We can do this successfully because syntax tables are
-now an abstract type, where we control all access.
+now an abstract type, where we control all access. ]]
+
+I'm not sure I believe this any more, and maintaining these tables is
+more difficult with the new char tables, and is time consuming in and of
+itself.  For the moment, we're just disabling this; redo if profiling shows
+the need. --ben
+
 */

+#endif /* MIRROR_TABLE */
+
 enum syntaxcode
 {
   Swhitespace,	/* whitespace character */
＠＠ -68,12 +82,48 ＠＠
 		     other side by a char with the same syntaxcode.  */
   Smax	 /* Upper bound on codes that are meaningful */
 };
+
+/* Retrieve the syntax table from a buffer. */
+DECLARE_INLINE_HEADER (
+Lisp_Object
+BUFFER_SYNTAX_TABLE (struct buffer *buf)
+)
+{
+  return buf ? buf->syntax_table : Vstandard_syntax_table;
+}
+
+/* Retrieve the mirror syntax table from a buffer. */
+DECLARE_INLINE_HEADER (
+Lisp_Object
+BUFFER_MIRROR_SYNTAX_TABLE (struct buffer *buf)
+)
+{
+#ifdef MIRROR_TABLE
+  return buf ? buf->mirror_syntax_table :
+    XCHAR_TABLE (Vstandard_syntax_table)->mirror_table;
+#else
+  return BUFFER_SYNTAX_TABLE (buf);
+#endif /* MIRROR_TABLE */
+}
+

-enum syntaxcode charset_syntax (struct buffer *buf, Lisp_Object charset,
-				int *multi_p_out);
+#ifdef MULE

+/* Retrieve the category table from a buffer. */
+DECLARE_INLINE_HEADER (
+Lisp_Object
+BUFFER_CATEGORY_TABLE (struct buffer *buf)
+)
+{
+  return buf ? buf->category_table : Vstandard_category_table;
+}
+
+#endif /* MULE */
+
 void update_syntax_table (Lisp_Object table);

+#ifdef MIRROR_TABLE
+
 DECLARE_INLINE_HEADER (
 void
 update_mirror_syntax_if_dirty (Lisp_Object table)
＠＠ -83,6 +133,10 ＠＠
     update_syntax_table (table);
 }

+#endif /* MIRROR_TABLE */
+
+#define SYNTAX_FROM_CODE(code) ((enum syntaxcode) ((code) & 0177))
+
 /* Return the syntax code for a particular character and mirror table. */

 DECLARE_INLINE_HEADER (
＠＠ -90,9 +144,28 ＠＠
 SYNTAX_CODE (Lisp_Object table, Ichar c)
 )
 {
+  Lisp_Object code;
+#ifdef MIRROR_TABLE
   type_checking_assert (XCHAR_TABLE (table)->mirror_table_p);
   update_mirror_syntax_if_dirty (table);
   return XINT (get_char_table_1 (c, table));
+#else
+  code = get_char_table (c, table);
+
+  /* #### It's possible this code will be time consuming because of getting
+     run in an inner-loop.  But it's all inlined. */
+
+  if (CONSP (code))
+    code = XCAR (code);
+  if (SYNTAX_FROM_CODE (XINT (code)) == Sinherit)
+    {
+      code = get_char_table (c, Vstandard_syntax_table);
+      if (CONSP (code))
+	code = XCAR (code);
+    }
+
+  return XINT (code);
+#endif /* MIRROR_TABLE */
 }

 #ifdef NOT_WORTH_THE_EFFORT
＠＠ -110,8 +183,6 ＠＠

 #endif /* NOT_WORTH_THE_EFFORT */

-#define SYNTAX_FROM_CODE(code) ((enum syntaxcode) ((code) & 0177))
-
 #define SYNTAX(table, c) SYNTAX_FROM_CODE (SYNTAX_CODE (table, c))

 DECLARE_INLINE_HEADER (
＠＠ -319,7 +390,9 ＠＠
 					   always be the same buffer. */
   int syntax_code;			/* Syntax code of current char. */
   Lisp_Object syntax_table;		/* Syntax table for current pos. */
+#ifdef MIRROR_TABLE
   Lisp_Object mirror_table;		/* Mirror table for this table. */
+#endif /* MIRROR_TABLE */
   Lisp_Object start, end;		/* Markers to keep track of the
 					   known region in a buffer.
 					   Formerly we used an internal
＠＠ -388,9 +461,15 ＠＠
 #define SYNTAX_FROM_CACHE(cache, c)			\
    SYNTAX_FROM_CODE (SYNTAX_CODE_FROM_CACHE (cache, c))

+#ifdef MIRROR_TABLE
 #define SYNTAX_CODE_FROM_CACHE(cache, c)				\
   ((cache)->use_code ? (cache)->syntax_code				\
    : SYNTAX_CODE ((cache)->mirror_table, c))
+#else
+#define SYNTAX_CODE_FROM_CACHE(cache, c)				\
+  ((cache)->use_code ? (cache)->syntax_code				\
+: SYNTAX_CODE ((cache)->syntax_table, c))
+#endif /* MIRROR_TABLE */

 #ifdef NOT_WORTH_THE_EFFORT
 /* If we really cared about the theoretical performance hit of the dirty
Index: src/sysfile.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/sysfile.h,v
retrieving revision 1.21
diff -u -r1.21 sysfile.h
--- src/sysfile.h	2005/10/25 11:16:28	1.21
+++ src/sysfile.h	2005/11/22 14:01:06
＠＠ -47,13 +47,6 ＠＠
 # include <fcntl.h>
 #endif /* INCLUDED_FCNTL */

-/* The anonymous voice of the past says:
-   In some systems loading it twice is suicidal.  */
-#ifndef INCLUDED_SYS_TYPES
-# define INCLUDED_SYS_TYPES
-# include <sys/types.h>		/* some typedefs are used in sys/file.h */
-#endif /* INCLUDED_SYS_TYPES */
-
 #ifndef WIN32_NATIVE
 # include <sys/file.h>
 #endif
Index: src/sysproc.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/sysproc.h,v
retrieving revision 1.18
diff -u -r1.18 sysproc.h
--- src/sysproc.h	2005/10/25 11:16:28	1.18
+++ src/sysproc.h	2005/11/22 14:01:06
＠＠ -46,7 +46,6 ＠＠
 #endif

 #ifdef HAVE_SOCKETS	/* TCP connection support, if kernel can do it */
-# include <sys/types.h>  /* AJK */
 # ifndef WIN32_NATIVE
 #  include <sys/socket.h>
 #  include <netdb.h>
Index: src/syswait.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/syswait.h,v
retrieving revision 1.5
diff -u -r1.5 syswait.h
--- src/syswait.h	2004/09/20 19:20:02	1.5
+++ src/syswait.h	2005/11/22 14:01:06
＠＠ -22,8 +22,6 ＠＠
 #ifndef INCLUDED_syswait_h_
 #define INCLUDED_syswait_h_

-#include <sys/types.h>
-
 #ifdef HAVE_SYS_WAIT_H
 #include <sys/wait.h>
 #endif
Index: src/text.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/text.c,v
retrieving revision 1.26
diff -u -r1.26 text.c
--- src/text.c	2005/09/27 05:29:44	1.26
+++ src/text.c	2005/11/22 14:01:08
＠＠ -1,6 +1,6 ＠＠
 /* Text manipulation primitives for XEmacs.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003, 2004 Ben Wing.
+   Copyright (C) 1995, 1996, 2000, 2001, 2002, 2003, 2004, 2005 Ben Wing.
    Copyright (C) 1999 Martin Buchholz.

 This file is part of XEmacs.
＠＠ -29,6 +29,7 ＠＠
 #include "lisp.h"

 #include "buffer.h"
+#include "casetab.h"
 #include "charset.h"
 #include "file-coding.h"
 #include "lstream.h"
＠＠ -180,7 +181,7 ＠＠
    etc.  However, we really need to represent Unicode characters internally
    as-is, rather than converting to some language-specific character set.
    For efficiency, we should represent Unicode characters using 3 bytes
-   rather than 4.  This means we need to find leading bytes for Unicode.
+   rather than 4.  This means we need to find charset ID's for Unicode.
    Given that there are 65,536 characters in Unicode and we can attach
    96x96 = 9,216 characters per leading byte, we need eight leading bytes
    for Unicode.  We currently have four free (0x9A - 0x9D), and with a
＠＠ -215,13 +216,14 ＠＠
       this to 2048, and further shrinkage would become uncomfortable.
       No such problems exist in XEmacs.

-      Composite characters could be represented as 0x8D C1 C2 C3,
-      where each C[1-3] is in the range 0xA0 - 0xFF.  This allows
-      for slightly under 2^20 (one million) composite characters
-      over the XEmacs process lifetime, and you only need to
-      increase the size of a Mule character from 19 to 21 bits.
-      Or you could use 0x8D C1 C2 C3 C4, allowing for about
-      85 million (slightly over 2^26) composite characters.
+      Composite characters could be represented as 0x8D C1 C2 C3, where
+      each C[1-3] is in the range 0xA0 - 0xFF.  This allows for slightly
+      under 2^20 (one million) composite characters over the XEmacs process
+      lifetime, and you only need to increase the size of a Mule character
+      from 19 to 21 bits.  [Actually, Mule characters are now 21 bytes in
+      any case, so this might require 22 bits] Or you could use 0x8D C1 C2
+      C3 C4, allowing for about 85 million (slightly over 2^26) composite
+      characters.

    ==========================================================================
                                10. Internal API's
＠＠ -229,72 +231,63 ＠＠

    All of these are documented in more detail in text.h.

-＠enumerate
-＠item
-Basic internal-format API's
-
-These are simple functions and macros to convert between text
-representation and characters, move forward and back in text, etc.
-
-＠item
-The DFC API
-
-This is for conversion between internal and external text.  Note that
-there is also the "new DFC" API, which *returns* a pointer to the
-converted text (in alloca space), rather than storing it into a
-variable.
-
-＠item
-The Eistring API
-
-(This API is currently under-used) When doing simple things with
-internal text, the basic internal-format API's are enough.  But to do
-things like delete or replace a substring, concatenate various strings,
-etc. is difficult to do cleanly because of the allocation issues.
-The Eistring API is designed to deal with this, and provides a clean
-way of modifying and building up internal text. (Note that the former
-lack of this API has meant that some code uses Lisp strings to do
-similar manipulations, resulting in excess garbage and increased
-garbage collection.)
-
-NOTE: The Eistring API is (or should be) Mule-correct even without
-an ASCII-compatible internal representation.
-＠end enumerate
+   Basic internal-format API's
+   
+   These are simple functions and macros to convert between text
+   representation and characters, move forward and back in text, etc.
+   
+   The DFC API
+   
+   This is for conversion between internal and external text.  Note that
+   there is also the "new DFC" API, which *returns* a pointer to the
+   converted text (in alloca space), rather than storing it into a
+   variable.
+   
+   The Eistring API
+   
+   (This API is currently under-used) When doing simple things with
+   internal text, the basic internal-format API's are enough.  But to do
+   things like delete or replace a substring, concatenate various strings,
+   etc. is difficult to do cleanly because of the allocation issues.
+   The Eistring API is designed to deal with this, and provides a clean
+   way of modifying and building up internal text. (Note that the former
+   lack of this API has meant that some code uses Lisp strings to do
+   similar manipulations, resulting in excess garbage and increased
+   garbage collection.)
+   
+   NOTE: The Eistring API is (or should be) Mule-correct even without
+   an ASCII-compatible internal representation.

    ==========================================================================
                       11. Other Sources of Documentation
    ==========================================================================

    man/lispref/mule.texi
-＠enumerate
-＠item
-another intro to characters, encodings, etc; #### Merge with the
-above info
-＠item
-documentation of ISO-2022
-＠item
-The charset and coding-system Lisp API's
-＠item
-The CCL conversion language for writing encoding conversions
-＠item
-The Latin-Unity package for unifying Latin charsets
-＠end enumerate
+
+      another intro to characters, encodings, etc; #### Merge with the
+      above info
+      
+      documentation of ISO-2022
+      
+      The charset and coding-system Lisp API's
+      
+      The CCL conversion language for writing encoding conversions
+      
+      The Latin-Unity package for unifying Latin charsets

    man/internals/internals.texi (the Internals manual)
-＠enumerate
-＠item
-"Coding for Mule" -- how to write Mule-aware code
-＠item
-"Modules for Internationalization"
-＠item
-"The Text in a Buffer" -- more about the different ways of
-viewing buffer positions; #### Merge with the above info
-＠item
-"MULE Character Sets and Encodings" -- yet another intro
-to characters, encodings, etc; #### Merge with the
-above info; also some documentation of Japanese EUC and JIS7,
-and CCL internals
-＠end enumerate
+
+      "Coding for Mule" -- how to write Mule-aware code
+      
+      "Modules for Internationalization"
+      
+      "The Text in a Buffer" -- more about the different ways of
+      viewing buffer positions; #### Merge with the above info
+      
+      "MULE Character Sets and Encodings" -- yet another intro
+      to characters, encodings, etc; #### Merge with the
+      above info; also some documentation of Japanese EUC and JIS7,
+      and CCL internals

    text.h -- info about specific XEmacs-C API's for handling internal and
              external text
＠＠ -324,191 +317,191 ＠＠
                    - Mule design issues (ben)
    ==========================================================================

-circa 1999
-
-Here is a more detailed list of Mule-related projects that we will be
-working on.  They are more or less ordered according to how we will
-proceed, but it's not exact.  In particular, there will probably be
-time overlap among adjacent projects.
-
-＠enumerate
-＠item
-Modify the internal/external conversion macros to allow for
-MS Windows support.
-
-＠item
-Modify the buffer macros to allow for more than one internal
-representation, e.g. fixed width and variable width.
-
-＠item
-Review the existing Mule code, especially the lisp code, for code
-quality issues and improve the cleanliness of it.  Also work on
-creating a specification for the Mule API.
-
-＠item
-Write some more automated mule tests.
-
-＠item
-Integrate Tomohiko's UTF-2000 code, fixing it up so that nothing is
-broken when the UTF-2000 configure option is not enabled.
-
-＠item
-Fix up the MS Windows code to be Mule-correct, so that you can
-compile with Mule support under MS windows and have a working
-XEmacs, at least just with Latin-1.
-
-＠item
-Implement a scheme to guarantee no corruption of files, even with
-an incorrect coding system - in particular, guarantee no corruption
-of binary files.
-
-＠item
-Make the text property support in XEmacs robust with respect to
-string and text operations, so that the `no corruption' support in
-the previous entry works properly, even if a lot of cutting and
-pasting is done.
-
-＠item
-Improve the handling of auto-detection so that, when there is any
-possibility at all of mistake, the user is informed of the detected
-encoding and given the choice of choosing other possibilities.
-
-＠item
-Improve the support for different language environments in XEmacs,
-for example, the priority of coding systems used in auto-detection
-should properly reflect the language environment.  This probably
-necessitates rethinking the current `coding system priority'
-scheme.
-
-＠item
-Do quality work to improve the existing UTF-2000 implementation.
-
-＠item
-Implement preliminary support for 8-bit fixed width
-representation.  First, we will only implement 7-bit support, and
-will fall back to variable width as soon as any non-ASCII
-character is encountered.  Then we will improve the support to
-handle an arbitrary character set in the upper half of the 8-bit space.
-
-＠item
-Investigate any remaining hurdles to making --with-mule be the
-default configure option.
-＠end enumerate
-
+   circa 1999
+   
+   Here is a more detailed list of Mule-related projects that we will be
+   working on.  They are more or less ordered according to how we will
+   proceed, but it's not exact.  In particular, there will probably be
+   time overlap among adjacent projects.
+   
+   ＠enumerate
+   ＠item
+   Modify the internal/external conversion macros to allow for
+   MS Windows support.
+   
+   ＠item
+   Modify the buffer macros to allow for more than one internal
+   representation, e.g. fixed width and variable width.
+   
+   ＠item
+   Review the existing Mule code, especially the lisp code, for code
+   quality issues and improve the cleanliness of it.  Also work on
+   creating a specification for the Mule API.
+   
+   ＠item
+   Write some more automated mule tests.
+   
+   ＠item
+   Integrate Tomohiko's UTF-2000 code, fixing it up so that nothing is
+   broken when the UTF-2000 configure option is not enabled.
+   
+   ＠item
+   Fix up the MS Windows code to be Mule-correct, so that you can
+   compile with Mule support under MS windows and have a working
+   XEmacs, at least just with Latin-1.
+   
+   ＠item
+   Implement a scheme to guarantee no corruption of files, even with
+   an incorrect coding system - in particular, guarantee no corruption
+   of binary files.
+   
+   ＠item
+   Make the text property support in XEmacs robust with respect to
+   string and text operations, so that the `no corruption' support in
+   the previous entry works properly, even if a lot of cutting and
+   pasting is done.
+   
+   ＠item
+   Improve the handling of auto-detection so that, when there is any
+   possibility at all of mistake, the user is informed of the detected
+   encoding and given the choice of choosing other possibilities.
+   
+   ＠item
+   Improve the support for different language environments in XEmacs,
+   for example, the priority of coding systems used in auto-detection
+   should properly reflect the language environment.  This probably
+   necessitates rethinking the current `coding system priority'
+   scheme.
+   
+   ＠item
+   Do quality work to improve the existing UTF-2000 implementation.
+   
+   ＠item
+   Implement preliminary support for 8-bit fixed width
+   representation.  First, we will only implement 7-bit support, and
+   will fall back to variable width as soon as any non-ASCII
+   character is encountered.  Then we will improve the support to
+   handle an arbitrary character set in the upper half of the 8-bit space.
+   
+   ＠item
+   Investigate any remaining hurdles to making --with-mule be the
+   default configure option.
+   ＠end enumerate
+   
    ==========================================================================
                    - Mule design issues (stephen)
    ==========================================================================
-
-What I see as Mule priorities (in rough benefit order, I am not taking
-account of difficulty, nor the fact that some - eg 8 & 10 - will
-probably come as packages):
-
-＠enumerate
-＠item
-Fix the autodetect problem (by making the coding priority list
-user-configurable, as short as he likes, even null, with "binary"
-as the default).
-＠item
-Document the language environments and other Mule "APIs" as
-implemented (since there is no real design spec).  Check to see 
-how and where they are broken.
-＠item
-Make the Mule menu useful to non-ISO-2022-literate folks.
-＠item
-Redo the lstreams stuff to make it easy and robust to "pipeline",
-eg, libz | gnupg | jis2mule.
-＠item
-Make Custom Mule-aware.  (This probably depends on a sensible
-fonts model.)
-＠item
-Implement the "literal byte stream" memory feature.
-＠item
-Study the FSF implementation of Mule for background for 7 & 8.
-＠item
-Identify desirable Mule features (eg, i18n-ized messages as above, 
-collating tables by language environment, etc).  (New features
-might have priority as high as 9.)
-＠item
-Specify Mule UIs, APIs, etc, and design and (re)implement them.
-＠item
-Implement the 8-bit-wide buffer optimization.
-＠item
-Move the internal encoding to UTF-32 (subject to Olivier's caveats 
-regarding compose characters), with the variable-width char
-buffers using UTF-8.
-＠item
-Implement the 16- and 32-bit-wide buffer optimizations.
-＠end enumerate

+   What I see as Mule priorities (in rough benefit order, I am not taking
+   account of difficulty, nor the fact that some - eg 8 & 10 - will
+   probably come as packages):
+   
+   ＠enumerate
+   ＠item
+   Fix the autodetect problem (by making the coding priority list
+   user-configurable, as short as he likes, even null, with "binary"
+   as the default).
+   ＠item
+   Document the language environments and other Mule "APIs" as
+   implemented (since there is no real design spec).  Check to see 
+   how and where they are broken.
+   ＠item
+   Make the Mule menu useful to non-ISO-2022-literate folks.
+   ＠item
+   Redo the lstreams stuff to make it easy and robust to "pipeline",
+   eg, libz | gnupg | jis2mule.
+   ＠item
+   Make Custom Mule-aware.  (This probably depends on a sensible
+   fonts model.)
+   ＠item
+   Implement the "literal byte stream" memory feature.
+   ＠item
+   Study the FSF implementation of Mule for background for 7 & 8.
+   ＠item
+   Identify desirable Mule features (eg, i18n-ized messages as above, 
+   collating tables by language environment, etc).  (New features
+   might have priority as high as 9.)
+   ＠item
+   Specify Mule UIs, APIs, etc, and design and (re)implement them.
+   ＠item
+   Implement the 8-bit-wide buffer optimization.
+   ＠item
+   Move the internal encoding to UTF-32 (subject to Olivier's caveats 
+   regarding compose characters), with the variable-width char
+   buffers using UTF-8.
+   ＠item
+   Implement the 16- and 32-bit-wide buffer optimizations.
+   ＠end enumerate
+   
    ==========================================================================
                    - Mule design issues "short term" (ben)
    ==========================================================================
-
-＠enumerate
-＠item
-Finish changes in fixup/directory, get in CVS.
-
-(Test with and without "quick-build", to see if really faster)
-(need autoconf)
-
-＠item
-Finish up Windows/Mule changes.  Outline of this elsewhere;  Do
-*minimal* effort.
-
-＠item
-Continue work on Windows stability, e.g. go through existing notes
-on Windows Mule-ization + extract all info.
-
-＠item
-Get Unicode translation tables integrated.
-
-Finish UCS2/UTF16 coding system.
-

-＠item
-Make sure coding system priority list is language-environment specific.

-＠item
-Consider moving language selection Menu up to be parallel with Mule menu.
-
-＠item
-Check to make sure we grok the default locale at startup under
-Windows and understand the Windows locales.  Finish implementation
-of mswindows-multibyte and make sure it groks all the locales.
-
-＠item
-Do the above as best as we can without using Unicode tables.
-
-＠item
-Start tagging all text with a language text property,
-indicating the current language environment when the text was input.
-
-＠item
-Make sure we correctly accept input of non-ASCII chars
-(probably already do!)
-
-＠item
-Implement active language/keyboard switching under Windows.
-
-＠item
-Look into implementing support for "MS IME" protocol (Microsoft
-fancy built-in Asian input methods).
-
-＠item
-Redo implementation of mswindows-multibyte and internal display to
-entirely use translation to/from Unicode for increased accuracy.
-
-＠item
-Implement buf<->char improvements from FSF.  Also implement
-my string byte<->char optimization structure.
-
-＠item
-Integrate all Mule DOCS from 20.6 or 21.0.  Try to add sections
-for what we've added.
-
-＠item
-Implement 8-bit fixed width optimizations.  Then work on 16-bit.
-＠end enumerate
-
+   ＠enumerate
+   ＠item
+   Finish changes in fixup/directory, get in CVS.
+   
+   (Test with and without "quick-build", to see if really faster)
+   (need autoconf)
+   
+   ＠item
+   Finish up Windows/Mule changes.  Outline of this elsewhere;  Do
+   *minimal* effort.
+   
+   ＠item
+   Continue work on Windows stability, e.g. go through existing notes
+   on Windows Mule-ization + extract all info.
+   
+   ＠item
+   Get Unicode translation tables integrated.
+   
+   Finish UCS2/UTF16 coding system.
+   
+   ＠item
+   Make sure coding system priority list is language-environment specific.
+   
+   ＠item
+   Consider moving language selection Menu up to be parallel with Mule menu.
+   
+   ＠item
+   Check to make sure we grok the default locale at startup under
+   Windows and understand the Windows locales.  Finish implementation
+   of mswindows-multibyte and make sure it groks all the locales.
+   
+   ＠item
+   Do the above as best as we can without using Unicode tables.
+   
+   ＠item
+   Start tagging all text with a language text property,
+   indicating the current language environment when the text was input.
+   
+   ＠item
+   Make sure we correctly accept input of non-ASCII chars
+   (probably already do!)
+   
+   ＠item
+   Implement active language/keyboard switching under Windows.
+   
+   ＠item
+   Look into implementing support for "MS IME" protocol (Microsoft
+   fancy built-in Asian input methods).
+   
+   ＠item
+   Redo implementation of mswindows-multibyte and internal display to
+   entirely use translation to/from Unicode for increased accuracy.
+   
+   ＠item
+   Implement buf<->char improvements from FSF.  Also implement
+   my string byte<->char optimization structure.
+   
+   ＠item
+   Integrate all Mule DOCS from 20.6 or 21.0.  Try to add sections
+   for what we've added.
+   
+   ＠item
+   Implement 8-bit fixed width optimizations.  Then work on 16-bit.
+   ＠end enumerate
+   
    ==========================================================================
                    - Mule design issues (more) (ben)
    ==========================================================================
＠＠ -543,90 +536,90 ＠＠
    ==========================================================================
                    - Mule design discussion
    ==========================================================================
-
---------------------------------------------------------------------------
-
-Ben

-April 11, 2000
-
-Well yes, this was the whole point of my "no lossage" proposal of being
-able to undo any coding-system transformation on a buffer.  The idea was
-to figure out which transformations were definitely reversable, and for
-all the others, cache the original text in a text property.  This way, you
-could probably still do a fairly good job at constructing a good reversal
-even after you've gone into the text and added, deleted, and rearranged
-some things.
-
-But you could implement it much more simply and usefully by just
-determining, for any text being decoded into mule-internal, can we go back
-and read the source again?  If not, remember the entire file (GNUS
-message, etc) in text properties.  Then, implement the UI interface (like
-Netscape's) on top of that.  This way, you have something that at least
-works, but it might be inefficient.  All we would need to do is work on
-making the
-underlying implementation more efficient.
-
-Are you interested in doing this?  It would be a huge win for users.
-Hrvoje Niksic wrote:
-
-> Ben Wing <ben(a)666.com&gt; writes:
->
-> > let me know exactly what "rethink" functionality you want and
i'll
-> > come up with an interface.  perhaps you just want something like
-> > netscape's encoding menu, where if you switch encodings, it reloads
-> > and reencodes?
->
-> It might be a bit more complex than that.  In many cases, it's hard or
-> impossible to meaningfully "reload" -- for instance, this
-> functionality should be available while editing a Gnus message, as
-> well as while visiting a file.
->
-> For the special case of Latin-N <-> Latin-M conversion, things could
-> be done easily -- to convert from N to M, you only need to convert
-> internal representation back to N, and then convert it forth to M.
-
---------------------------------------------------------------------------
-April 11, 2000
-
-Well yes, this was the whole point of my "no lossage" proposal of being
-able to undo any coding-system transformation on a buffer.  The idea was
-to figure out which transformations were definitely reversable, and for
-all the others, cache the original text in a text property.  This way, you
-could probably still do a fairly good job at constructing a good reversal
-even after you've gone into the text and added, deleted, and rearranged
-some things.
-
-But you could implement it much more simply and usefully by just
-determining, for any text being decoded into mule-internal, can we go back
-and read the source again?  If not, remember the entire file (GNUS
-message, etc) in text properties.  Then, implement the UI interface (like
-Netscape's) on top of that.  This way, you have something that at least
-works, but it might be inefficient.  All we would need to do is work on
-making the
-underlying implementation more efficient.
-
-Are you interested in doing this?  It would be a huge win for users.
-Hrvoje Niksic wrote:
-
-> Ben Wing <ben(a)666.com&gt; writes:
->
-> > let me know exactly what "rethink" functionality you want and
i'll
-> > come up with an interface.  perhaps you just want something like
-> > netscape's encoding menu, where if you switch encodings, it reloads
-> > and reencodes?
->
-> It might be a bit more complex than that.  In many cases, it's hard or
-> impossible to meaningfully "reload" -- for instance, this
-> functionality should be available while editing a Gnus message, as
-> well as while visiting a file.
->
-> For the special case of Latin-N <-> Latin-M conversion, things could
-> be done easily -- to convert from N to M, you only need to convert
-> internal representation back to N, and then convert it forth to M.
-
-
-------------------------------------------------------------------------
+   --------------------------------------------------------------------------
+   
+   Ben
+   
+   April 11, 2000
+   
+   Well yes, this was the whole point of my "no lossage" proposal of being
+   able to undo any coding-system transformation on a buffer.  The idea was
+   to figure out which transformations were definitely reversable, and for
+   all the others, cache the original text in a text property.  This way, you
+   could probably still do a fairly good job at constructing a good reversal
+   even after you've gone into the text and added, deleted, and rearranged
+   some things.
+   
+   But you could implement it much more simply and usefully by just
+   determining, for any text being decoded into mule-internal, can we go back
+   and read the source again?  If not, remember the entire file (GNUS
+   message, etc) in text properties.  Then, implement the UI interface (like
+   Netscape's) on top of that.  This way, you have something that at least
+   works, but it might be inefficient.  All we would need to do is work on
+   making the
+   underlying implementation more efficient.
+   
+   Are you interested in doing this?  It would be a huge win for users.
+   Hrvoje Niksic wrote:
+   
+   > Ben Wing <ben(a)666.com&gt; writes:
+   >
+   > > let me know exactly what "rethink" functionality you want and
i'll
+   > > come up with an interface.  perhaps you just want something like
+   > > netscape's encoding menu, where if you switch encodings, it reloads
+   > > and reencodes?
+   >
+   > It might be a bit more complex than that.  In many cases, it's hard or
+   > impossible to meaningfully "reload" -- for instance, this
+   > functionality should be available while editing a Gnus message, as
+   > well as while visiting a file.
+   >
+   > For the special case of Latin-N <-> Latin-M conversion, things could
+   > be done easily -- to convert from N to M, you only need to convert
+   > internal representation back to N, and then convert it forth to M.
+   
+   --------------------------------------------------------------------------
+   April 11, 2000
+   
+   Well yes, this was the whole point of my "no lossage" proposal of being
+   able to undo any coding-system transformation on a buffer.  The idea was
+   to figure out which transformations were definitely reversable, and for
+   all the others, cache the original text in a text property.  This way, you
+   could probably still do a fairly good job at constructing a good reversal
+   even after you've gone into the text and added, deleted, and rearranged
+   some things.
+   
+   But you could implement it much more simply and usefully by just
+   determining, for any text being decoded into mule-internal, can we go back
+   and read the source again?  If not, remember the entire file (GNUS
+   message, etc) in text properties.  Then, implement the UI interface (like
+   Netscape's) on top of that.  This way, you have something that at least
+   works, but it might be inefficient.  All we would need to do is work on
+   making the
+   underlying implementation more efficient.
+   
+   Are you interested in doing this?  It would be a huge win for users.
+   Hrvoje Niksic wrote:
+   
+   > Ben Wing <ben(a)666.com&gt; writes:
+   >
+   > > let me know exactly what "rethink" functionality you want and
i'll
+   > > come up with an interface.  perhaps you just want something like
+   > > netscape's encoding menu, where if you switch encodings, it reloads
+   > > and reencodes?
+   >
+   > It might be a bit more complex than that.  In many cases, it's hard or
+   > impossible to meaningfully "reload" -- for instance, this
+   > functionality should be available while editing a Gnus message, as
+   > well as while visiting a file.
+   >
+   > For the special case of Latin-N <-> Latin-M conversion, things could
+   > be done easily -- to convert from N to M, you only need to convert
+   > internal representation back to N, and then convert it forth to M.
+   
+   
+   ------------------------------------------------------------------------

    ==========================================================================
    - Redoing translation macros [old]
＠＠ -826,138 +819,138 ＠＠
    ==========================================================================
                    - UTF-16 compatible representation
    ==========================================================================
-
-NOTE: One possible default internal representation that was compatible
-with UTF16 but allowed all possible chars in UCS4 would be to take a
-more-or-less unused range of 2048 chars (not from the private area
-because Microsoft actually uses up most or all of it with EUDC chars).
-Let's say we picked A400 - ABFF.  Then, we'd have:
-
-0000 - FFFF    Simple chars
-
-D[8-B]xx D[C-F]xx  Surrogate char, represents 1M chars
-
-A[4-B]xx D[C-F]xx D[C-F]xx   Surrogate char, represents 2G chars
-
-This is exactly the same number of chars as UCS-4 handles, and it follows the
-same property as UTF8 and Mule-internal:
-
-＠enumerate
-＠item
-There are two disjoint groupings of units, one representing leading units
-and one representing non-leading units.
-＠item
-Given a leading unit, you immediately know how many units follow to make
-up a valid char, irrespective of any other context.
-＠end enumerate

-Note that A4xx is actually currently assigned to Yi.  Since this is an
-internal representation, we could just move these elsewhere.
-
-An alternative is to pick two disjoint ranges, e.g. 2D00 - 2DFF and
-A500 - ABFF.
-
+ NOTE: One possible default internal representation that was compatible
+ with UTF16 but allowed all possible chars in UCS4 would be to take a
+ more-or-less unused range of 2048 chars (not from the private area
+ because Microsoft actually uses up most or all of it with EUDC chars).
+ Let's say we picked A400 - ABFF.  Then, we'd have:
+ 
+ 0000 - FFFF    Simple chars
+ 
+ D[8-B]xx D[C-F]xx  Surrogate char, represents 1M chars
+ 
+ A[4-B]xx D[C-F]xx D[C-F]xx   Surrogate char, represents 2G chars
+ 
+ This is exactly the same number of chars as UCS-4 handles, and it follows the
+ same property as UTF8 and Mule-internal:
+ 
+ ＠enumerate
+ ＠item
+ There are two disjoint groupings of units, one representing leading units
+ and one representing non-leading units.
+ ＠item
+ Given a leading unit, you immediately know how many units follow to make
+ up a valid char, irrespective of any other context.
+ ＠end enumerate
+ 
+ Note that A4xx is actually currently assigned to Yi.  Since this is an
+ internal representation, we could just move these elsewhere.
+ 
+ An alternative is to pick two disjoint ranges, e.g. 2D00 - 2DFF and
+ A500 - ABFF.
+ 
    ==========================================================================
                         New API for char->font mapping
    ==========================================================================
-- ; supersedes charset-registry and CCL;
-  supports all windows systems; powerful enough for Unicode; etc.
-
-  (charset-font-mapping charset)
-
-font-mapping-specifier  string
-
-char-font-mapping-table
-
-  char-table, specifier; elements of char table are either strings (which
-  specify a registry or comparable font property, or vectors of a string
-  (same) followed by keyword-value pairs (optional).  The only allowable
-  keyword currently is :ccl-program, which specifies a CCL program to map
-  the characters into font indices.  Other keywords may be added
-  e.g. allowing Elisp fragments instead of CCL programs, also allowed is
-  [inherit], which inherits from the next less-specific char-table in the
-  specifier.
-
-  The preferred interface onto this mapping (which should be portable
-  across Emacsen) is
-
-  (set-char-font-mapping key value &optional locale tag-set how-to-add)
-
-  where key is a char, range or charset (as for put-char-table), value is
-  as above, and the other arguments are standard for specifiers.  This
-  automatically creates a char table in the locale, as necessary (all
-  elements default to [inherit]).  On GNU Emacs, some specifiers arguments
-  may be unimplemented.
-
- (char-font-mapping key value &optional locale)
-works vaguely like get-specifier?   But does inheritance processing.
-locale should clearly default here to current-buffer
-
-#### should get-specifier as well?  Would make it work most like
-#### buffer-local variables.
-
-NB.  set-charset-registry and set-charset-ccl-program are obsoleted.
-

+ - ; supersedes charset-registry and CCL;
+   supports all windows systems; powerful enough for Unicode; etc.
+ 
+   (charset-font-mapping charset)
+ 
+ font-mapping-specifier  string
+ 
+ char-font-mapping-table
+ 
+   char-table, specifier; elements of char table are either strings (which
+   specify a registry or comparable font property, or vectors of a string
+   (same) followed by keyword-value pairs (optional).  The only allowable
+   keyword currently is :ccl-program, which specifies a CCL program to map
+   the characters into font indices.  Other keywords may be added
+   e.g. allowing Elisp fragments instead of CCL programs, also allowed is
+   [inherit], which inherits from the next less-specific char-table in the
+   specifier.
+ 
+   The preferred interface onto this mapping (which should be portable
+   across Emacsen) is
+ 
+   (set-char-font-mapping key value &optional locale tag-set how-to-add)
+ 
+   where key is a char, range or charset (as for put-char-table), value is
+   as above, and the other arguments are standard for specifiers.  This
+   automatically creates a char table in the locale, as necessary (all
+   elements default to [inherit]).  On GNU Emacs, some specifiers arguments
+   may be unimplemented.
+ 
+  (char-font-mapping key value &optional locale)
+ works vaguely like get-specifier?   But does inheritance processing.
+ locale should clearly default here to current-buffer
+ 
+ #### should get-specifier as well?  Would make it work most like
+ #### buffer-local variables.
+ 
+ NB.  set-charset-registry and set-charset-ccl-program are obsoleted.
+ 
    ==========================================================================
                  Implementing fixed-width 8,16,32 bit buffer optimizations
    ==========================================================================
-
-Add set-buffer-optimization (buffer &rest keywords) for
-controlling these things.
-
-Also, put in hack so that correct arglist can be retrieved by
-Lisp code.

-Look at the way keyword primitives are currently handled; make
-sure it works and is documented, etc.
-
-Implement 8-bit fixed width optimization.  Take the things that
-know about the actual implementation and put them in a single
-file, in essence creating an abstraction layer to allow
-pluggable internal representations.  Implement a fairly general
-scheme for mapping between character codes in the 8 bits or 16
-bits representation and on actual charset characters.  As part of
-set-buffer-optimization, you can specify a list of character sets
-to be used in the 8 bit to 16 bit, etc. world.  You can also
-request that the buffer be in 8, 16, etc. if possible.
-
--> set defaults wrt this.
--> perhaps this should be just buffer properties.
--> this brings up the idea of default properties on an object.
--> Implement default-put, default-get, etc.
-
-What happens when a character not assigned in the range gets
-added?  Then, must convert to variable width of some sort.
-
-Note: at first, possibly we just convert whole hog to get things
-right.  Then we'd have to poy alternative to characters that got
-added + deleted that were unassigned in the fixed width.  When
-this goes to zero and there's been enough time (heuristics), we
-go back to fixed.
-
-Side note:  We could dynamically build up the set of assigned
-chars as they go.  Conceivably this could even go down to the
-single char level: Just keep a big array of mapping from 16 bit
-values to chars, and add empty time, a char has been encountered
-that wasn't there before.  Problem need inverse mapping.
-
--> Possibility; chars are actual objects, not just numbers.
-Then you could keep track of such info in the chars itself.
-*Think about this.*
-
-Eventually, we might consider allowing mixed fixed-width,
-variable-width buffer encodings.  Then, we use range tables to
-indicate which sections are fixed and which variable and INC_CHAR does
-something like this: binary search to find the current range, which
-indicates whether it's fixed or variable, and tells us what the
-increment is.  We can cache this info and use it next time to speed
-up.
-
--> We will then have two partially shared range tables - one for
-overall fixed width vs. variable width, and possibly one containing
-this same info, but partitioning the variable width in one.  Maybe
-need fancier nested range table model.
-
+ Add set-buffer-optimization (buffer &rest keywords) for
+ controlling these things.
+ 
+ Also, put in hack so that correct arglist can be retrieved by
+ Lisp code.
+ 
+ Look at the way keyword primitives are currently handled; make
+ sure it works and is documented, etc.
+ 
+ Implement 8-bit fixed width optimization.  Take the things that
+ know about the actual implementation and put them in a single
+ file, in essence creating an abstraction layer to allow
+ pluggable internal representations.  Implement a fairly general
+ scheme for mapping between character codes in the 8 bits or 16
+ bits representation and on actual charset characters.  As part of
+ set-buffer-optimization, you can specify a list of character sets
+ to be used in the 8 bit to 16 bit, etc. world.  You can also
+ request that the buffer be in 8, 16, etc. if possible.
+ 
+ -> set defaults wrt this.
+ -> perhaps this should be just buffer properties.
+ -> this brings up the idea of default properties on an object.
+ -> Implement default-put, default-get, etc.
+ 
+ What happens when a character not assigned in the range gets
+ added?  Then, must convert to variable width of some sort.
+ 
+ Note: at first, possibly we just convert whole hog to get things
+ right.  Then we'd have to poy alternative to characters that got
+ added + deleted that were unassigned in the fixed width.  When
+ this goes to zero and there's been enough time (heuristics), we
+ go back to fixed.
+ 
+ Side note:  We could dynamically build up the set of assigned
+ chars as they go.  Conceivably this could even go down to the
+ single char level: Just keep a big array of mapping from 16 bit
+ values to chars, and add empty time, a char has been encountered
+ that wasn't there before.  Problem need inverse mapping.
+ 
+ -> Possibility; chars are actual objects, not just numbers.
+ Then you could keep track of such info in the chars itself.
+ *Think about this.*
+ 
+ Eventually, we might consider allowing mixed fixed-width,
+ variable-width buffer encodings.  Then, we use range tables to
+ indicate which sections are fixed and which variable and INC_CHAR does
+ something like this: binary search to find the current range, which
+ indicates whether it's fixed or variable, and tells us what the
+ increment is.  We can cache this info and use it next time to speed
+ up.
+ 
+ -> We will then have two partially shared range tables - one for
+ overall fixed width vs. variable width, and possibly one containing
+ this same info, but partitioning the variable width in one.  Maybe
+ need fancier nested range table model.
+ 
    ==========================================================================
         Expansion of display table and case mapping table support for all
                            chars, not just ASCII/Latin1.
＠＠ -1267,7 +1260,6 ＠＠

 */

-

 /************************************************************************/
 /*                              declarations                            */
＠＠ -1275,22 +1267,20 ＠＠

 Eistring the_eistring_zero_init, the_eistring_malloc_zero_init;

+#ifdef MULE
+
 #define MAX_CHARBPOS_GAP_SIZE_3 (65535/3)
 #define MAX_BYTEBPOS_GAP_SIZE_3 (3 * MAX_CHARBPOS_GAP_SIZE_3)

-short three_to_one_table[1 + MAX_BYTEBPOS_GAP_SIZE_3];

-#ifdef MULE
+short three_to_one_table[1 + MAX_BYTEBPOS_GAP_SIZE_3];

 /* Table of number of bytes in the string representation of a character
    indexed by the first byte of that representation.
-
-   rep_bytes_by_first_byte(c) is more efficient than the equivalent
-   canonical computation:
-
-   XCHARSET_REP_BYTES (charset_by_leading_byte (c)) */
+*/

-const Bytecount rep_bytes_by_first_byte[0xA0] =
+#ifdef UNICODE_INTERNAL
+const Bytecount rep_bytes_by_first_byte[256] =
 { /* 0x00 - 0x7f are for straight ASCII */
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
＠＠ -1300,14 +1290,70 ＠＠
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
-  /* 0x80 - 0x8f are for Dimension-1 official charsets */
+  /* 0x80 - 0xbf are not valid first bytes */
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF, 0xDEADBEEF,
+  /* 0xc0 - 0xdf for 2-byte sequences */
+  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
   2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
-  /* 0x90 - 0x9d are for Dimension-2 official charsets */
+  /* 0xe0 - 0xef for 3-byte sequences */
+  3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
+  /* 0xf0 - 0xf7 for 4-byte sequences;
+     0xf8 - 0xfb for 5-byte sequences;
+     0xfc - 0xfd for 6-byte sequences;
+     0xfe, 0xff not allowed
+   */
+  4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 0xDEADBEEF, 0xDEADBEEF
+};
+#else
+/* #### Maybe this table should be derived programmatically, at least
+   the parts from 0x80 - 0x9D. */
+const Bytecount rep_bytes_by_first_byte[0xA0] =
+{ /* 0x00 - 0x7f are for straight ASCII */
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+  /* 0x80 - 0x8c/8d are for Dimension-1 official charsets */
+  2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
+#ifndef ENABLE_COMPOSITE_CHARS
+  2,
+#else
+  3,
+#endif
+  /* 0x8d/8e - 0x9d are for Dimension-2 official charsets */
+  3, 3,
   /* 0x9e is for Dimension-1 private charsets */
   /* 0x9f is for Dimension-2 private charsets */
   3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4
 };
+#endif /* UNICODE_INTERNAL */

+
+#ifdef UNICODE_INTERNAL
+/* For UTF-8 conversion */
+unsigned int utf8_offsets_by_rep_bytes[7] = {0, 0, 0x3080, 0xE2080, 0x03C82080,
+					     0xFA082080U, 0x82082080U};
+#endif /* UNICODE_INTERNAL */
+
 #ifdef ENABLE_COMPOSITE_CHARS

 /* Hash tables for composite chars.  One maps string representing
＠＠ -1320,14 +1366,408 ＠＠
 static int composite_char_col_next;

 #endif /* ENABLE_COMPOSITE_CHARS */
-
 #endif /* MULE */

+/* For UTF-8 conversion */
+
+int firstbyte_mask[7] = {0, 0, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC};
+
 Lisp_Object QSin_char_byte_conversion;
 Lisp_Object QSin_internal_external_conversion;

+Lisp_Object /* Qfail, Qsubstitute, */ Qsubstitute_negated, Quse_private;
+

 /************************************************************************/
+/*                       Basic Ichar functions                          */
+/************************************************************************/
+
+#ifdef MULE
+
+/* Convert a non-ASCII Mule character C into a one-character Mule-encoded
+   string in STR.  Returns the number of bytes stored.
+   Do not call this directly.  Use the macro set_itext_ichar() instead.
+ */
+
+Bytecount
+non_ascii_set_itext_ichar (Ibyte *str, Ichar c)
+{
+#ifdef UNICODE_INTERNAL
+  /* #### This code is duplicated in encode_unicode_char_1 in unicode.c.
+     There should be a better way. */
+  register Bytecount bytes;
+
+  text_checking_assert (c >= 0x80);
+  text_checking_assert (valid_ichar_p (c));
+  if (c <= 0x7ff) bytes = 2;
+  else if (c <= 0xffff) bytes = 3;
+  else if (c <= 0x1fffff) bytes = 4;
+  else if (c <= 0x3ffffff) bytes = 5;
+  else bytes = 6;
+    
+  str += bytes;
+  switch (bytes)
+    {
+    case 6:*--str = (c | 0x80) & 0xBF; c >>= 6;
+    case 5:*--str = (c | 0x80) & 0xBF; c >>= 6;
+    case 4:*--str = (c | 0x80) & 0xBF; c >>= 6;
+    case 3:*--str = (c | 0x80) & 0xBF; c >>= 6;
+    case 2:*--str = (c | 0x80) & 0xBF; c >>= 6;
+    case 1:*--str = c | firstbyte_mask[bytes];
+    }
+
+  assert_valid_ibyteptr (str);
+  return bytes;
+#else
+  Lisp_Object charset;
+  int c1, c2;
+  Bytecount bytes;
+
+  old_mule_non_ascii_ichar_to_charset_codepoint_raw (c, &charset, &c1, &c2);
+  bytes = old_mule_non_ascii_charset_codepoint_to_itext_raw (charset, c1, c2,
+							     str);
+  text_checking_assert (bytes > 0);
+  return bytes;
+#endif /* UNICODE_INTERNAL */
+}
+
+/* Return the first character from a Mule-encoded string in STR,
+   assuming it's non-ASCII.  Do not call this directly.
+   Use the macro itext_ichar() instead. */
+
+Ichar
+non_ascii_itext_ichar (const Ibyte *str)
+{
+#ifdef UNICODE_INTERNAL
+  /* #### Conversion from UTF8 also happens in unicode_convert in unicode.c.
+     The algorithm is very different, however (this algorithm is the optimized
+     kind from Appendix A of the Unicode Standard, V 2.0).  The other situation
+     would be tricky to convert using this algorithm since (a) it gets
+     arbitrarily buffered data, which may cross character boundaries; (b)
+     there's no guarantee the UTF8 is correct (maybe we misguessed and we're
+     reading totally random bytes and trying to interpret them as UTF8). */
+  unsigned int ch = 0;
+  int bytes = rep_bytes_by_first_byte (*str);
+  Ichar ich;
+
+  /* The ASCII case should already have been filtered out. */
+  text_checking_assert (!byte_ascii_p (*str));
+  assert_valid_ibyteptr (str);
+
+  switch (bytes)
+    {
+    case 6: ch += *str++; ch <<= 6;
+    case 5: ch += *str++; ch <<= 6;
+    case 4: ch += *str++; ch <<= 6;
+    case 3: ch += *str++; ch <<= 6;
+    case 2: ch += *str++; ch <<= 6;
+    case 1: ch += *str;
+    }
+
+  ich = (Ichar) (ch - utf8_offsets_by_rep_bytes[bytes]);
+  text_checking_assert (valid_ichar_p (ich));
+  return ich;
+#else /* not UNICODE_INTERNAL */
+  Lisp_Object charset;
+  int c1, c2;
+  Ichar ich;
+
+  old_mule_non_ascii_itext_to_charset_codepoint_raw (str, &charset, &c1,
&c2);
+  ich = old_mule_non_ascii_charset_codepoint_to_ichar_raw (charset, c1, c2);
+  text_checking_assert (ich >= 0);
+  return ich;
+#endif /* UNICODE_INTERNAL */
+}
+
+#ifndef UNICODE_INTERNAL
+
+/* Return whether CH is a valid Ichar, assuming it's >= 0x100.
+   Do not call this directly.  Use the macro valid_ichar_p() instead. */
+
+int
+old_mule_non_ascii_valid_ichar_p (Ichar ch)
+{
+  int f1, f2, f3;
+
+  /* Must have only lowest 21 bits set */
+  if (ch & ~0x1FFFFF)
+    return 0;
+
+  f1 = ichar_field1 (ch);
+  f2 = ichar_field2 (ch);
+  f3 = ichar_field3 (ch);
+
+  if (f1 == 0)
+    {
+      /* dimension-1 char */
+      Lisp_Object charset =
+	charset_by_encodable_id (f2 + FIELD2_TO_CHARSET_ID);
+      /* charset exists and of the correct dimension */
+      if (NILP (charset) || XCHARSET_DIMENSION (charset) == 2)
+	return 0;
+      /* octet not out of range */
+      if (f3 < 0x20)
+	return 0;
+      /* check range as per size (94 or 96) of charset */
+      return ((f3 > 0x20 && f3 < 0x7f) || XCHARSET_CHARS (charset, 1) ==
96);
+    }
+  else
+    {
+      /* dimension-2 char */
+      Lisp_Object charset =
+	charset_by_encodable_id (f1 <= MAX_ICHAR_FIELD1_OFFICIAL ?
+				 f1 + FIELD1_TO_OFFICIAL_CHARSET_ID :
+				 f1 + FIELD1_TO_PRIVATE_CHARSET_ID);
+      /* charset exists and of the correct dimension */
+      if (NILP (charset) || XCHARSET_DIMENSION (charset) == 1)
+	return 0;
+      /* octets not out of range */
+      if (f2 < 0x20 || f3 < 0x20)
+	return 0;
+#ifdef ENABLE_COMPOSITE_CHARS
+      if (EQ (charset, Vcharset_composite))
+	{
+	  if (UNBOUNDP (Fgethash (make_int (ch),
+				  Vcomposite_char_char2string_hash_table,
+				  Qunbound)))
+	    return 0;
+	  return 1;
+	}
+#endif /* ENABLE_COMPOSITE_CHARS */
+      /* check range as per size (94x94 or 96x96) of charset */
+      return
+	(((f2 != 0x20 && f2 != 0x7F) || XCHARSET_CHARS (charset, 0) == 96) &&
+	 ((f3 != 0x20 && f3 != 0x7F) || XCHARSET_CHARS (charset, 1) == 96));
+    }
+}
+
+/* Separate an Ichar into its components.  The charset of character C is
+   stored in CHARSET, and the position-codes of C are stored in C1 and C2.
+   A dimension-1 character has a C2 of 0.  */
+
+void
+old_mule_non_ascii_ichar_to_charset_codepoint_raw (Ichar c,
+						   Lisp_Object *charset,
+						   int *c1, int *c2)
+{
+  text_checking_assert (c >= 0x80);
+  text_checking_assert (valid_ichar_p (c));
+  if (c <= 0x9F)
+    {
+      *charset = Vcharset_control_1;
+      *c1 = 0;
+      *c2 = (int) c;
+    }
+  else
+    {
+      *charset = charset_by_encodable_id (old_mule_ichar_charset_id (c));
+      *c1 = XCHARSET_DIMENSION (*charset) == 1 ? 0 : ichar_field2 (c);
+      *c2 = ichar_field3 (c);
+      if (XCHARSET_OFFSET (*charset, 0) >= 128)
+	*c1 += 128;
+      if (XCHARSET_OFFSET (*charset, 1) >= 128)
+	*c2 += 128;
+    }
+  text_checking_assert (valid_charset_codepoint_p (*charset, *c1, *c2));
+}
+
+/* Convert a character in the internal string representation (guaranteed
+   not to be ASCII) into a charset codepoint. */
+void
+old_mule_non_ascii_itext_to_charset_codepoint_raw (const Ibyte *ptr,
+						   Lisp_Object *charset,
+						   int *c1,
+						   int *c2)
+{
+  int id;
+
+  text_checking_assert (!byte_ascii_p (*ptr));
+  assert_valid_ibyteptr (ptr);
+
+  if (*ptr == LEAD_BYTE_PRIVATE_1)
+    id = byte_id_to_private_charset_id (*++ptr, 1);
+  else if (*ptr == LEAD_BYTE_PRIVATE_2)
+    id = byte_id_to_private_charset_id (*++ptr, 2);
+  else
+    id = byte_id_to_official_charset_id (*ptr);
+
+  *charset = charset_by_encodable_id (id);
+  if (XCHARSET_DIMENSION (*charset) == 2)
+    *c1 = *++ptr & 0x7F;
+  else
+    *c1 = 0;
+  *c2 = *++ptr & 0x7F;
+  if (XCHARSET_OFFSET (*charset, 0) >= 128)
+    *c1 += 128;
+  if (XCHARSET_OFFSET (*charset, 1) >= 128)
+    *c2 += 128;
+  if (EQ (*charset, Vcharset_control_1))
+    *c2 -= 0x20;
+
+  text_checking_assert (valid_charset_codepoint_p (*charset, *c1, *c2));
+}
+
+Ichar
+old_mule_non_ascii_charset_codepoint_to_ichar_raw (Lisp_Object charset,
+						   int c1, int c2)
+{
+  Ichar retval;
+
+  text_checking_assert (!EQ (charset, Vcharset_ascii));
+  text_checking_assert (valid_charset_codepoint_p (charset, c1, c2));
+  if (EQ (charset, Vcharset_control_1))
+    retval = c2;
+  else
+    {
+      int id = XCHARSET_ID (charset);
+      if (id > MAX_ENCODABLE_CHARSET_ID)
+	return -1;
+      c1 &= 127;
+      c2 &= 127;
+      if (XCHARSET_DIMENSION (charset) == 1)
+	retval = ((id - FIELD2_TO_CHARSET_ID) << 7) | (c2);
+      else
+	retval =
+	  ((id - (id >= MIN_PRIVATE_CHARSET_ID ?
+		  FIELD1_TO_PRIVATE_CHARSET_ID :
+		  FIELD1_TO_OFFICIAL_CHARSET_ID)) << 14) |
+	  ((c1) << 7) | (c2);
+    }
+
+  text_checking_assert (valid_ichar_p (retval));
+  return retval;
+}
+
+/* Convert a charset codepoint (guaranteed not to be ASCII) into a
+   character in the old-Mule internal string representation.  Return number
+   of bytes written out. */
+
+Bytecount
+old_mule_non_ascii_charset_codepoint_to_itext_raw (Lisp_Object charset, int c1,
+						   int c2, Ibyte *ptr)
+{
+  Ibyte *p = ptr;
+  int id = XCHARSET_ID (charset);
+
+  text_checking_assert (!EQ (charset, Vcharset_ascii));
+  text_checking_assert (valid_charset_codepoint_p (charset, c1, c2));
+
+  /* We are only called as a result of breaking a character into charset
+     and octets; so non-encodable charsets that cannot form a character
+     should never occur */
+  text_checking_assert (id <= MAX_ENCODABLE_CHARSET_ID);
+
+  if (EQ (charset, Vcharset_control_1))
+    c2 += 0x20;
+
+  if (id >= MIN_PRIVATE_CHARSET_ID)
+    {
+    *p++ = (XCHARSET_DIMENSION (charset) == 1 ? LEAD_BYTE_PRIVATE_1:
+	    LEAD_BYTE_PRIVATE_2);
+    *p++ = private_charset_id_to_byte_id (id, XCHARSET_DIMENSION (charset));
+    }
+  else
+    *p++ = official_charset_id_to_byte_id (id);
+  if (XCHARSET_DIMENSION (charset) == 2)
+    *p++ = c1 | 0x80;
+  *p++ = c2 | 0x80;
+
+  assert_valid_ibyteptr (ptr);
+  return (p - ptr);
+}
+
+int
+old_mule_ichar_columns (Ichar c)
+{
+  return XCHARSET_COLUMNS (charset_by_encodable_id
+			   (old_mule_ichar_charset_id (c)));
+}
+
+int
+old_mule_ichar_to_unicode (Ichar chr, enum converr fail)
+{
+  text_checking_assert (valid_ichar_p (chr));
+
+  /* This shortcut depends on the representation of an Ichar, see text.c. */
+  if (chr < 256)
+    return (int) chr;
+
+  {
+    int c1, c2;
+    Lisp_Object charset;
+
+    old_mule_non_ascii_ichar_to_charset_codepoint_raw (chr, &charset, &c1,
+						       &c2);
+    return charset_codepoint_to_unicode (charset, c1, c2, fail);
+  }
+}
+
+/* Convert a Unicode codepoint to an Ichar.  Return value will
+   correspond to FAIL-  possibly (Ichar) -1, a substituted character, or
+   something else. */
+
+Ichar
+old_mule_unicode_to_ichar (int code, Lisp_Object_dynarr *charsets,
+			   enum converr fail)
+{
+  Lisp_Object charset;
+  int c1, c2;
+
+  /* This shortcut depends on the representation of an Ichar, see text.c.
+     Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
+     coded character sets have points that map into that region, so this
+     function is many-valued).
+
+     #### We should be smarter about this to allow for jis-roman and such;
+     basically, we want to know if ASCII is in CHARSETS, with a high
+     precedence. */
+  if (code < 0xA0)
+    return (Ichar) code;
+
+  non_ascii_unicode_to_charset_codepoint (code, charsets, &charset, &c1,
&c2);
+  if (NILP (charset))
+    return old_mule_handle_bad_ichar (fail);
+  return charset_codepoint_to_ichar (charset, c1, c2, CONVERR_FAIL);
+}
+
+#endif /* not UNICODE_INTERNAL */
+
+
+/* Convert a charset codepoint (guaranteed not to be ASCII) into a
+   character in the internal string representation.  Return number
+   of bytes written out.  FAIL controls what happens when the charset
+   codepoint cannot be converted to Unicode. */
+Bytecount
+non_ascii_charset_codepoint_to_itext (Lisp_Object charset, int c1, int c2,
+				      Ibyte *ptr, enum converr fail)
+{
+  Ichar ch;
+
+  text_checking_assert (!EQ (charset, Vcharset_ascii));
+  ch = charset_codepoint_to_ichar (charset, c1, c2, fail);
+
+  if (ch < 0)
+    return 0;
+  /* We can't rely on the converted character being non-ASCII.  For
+     example, JISX0208 codepoint (33, 64) == Unicode 0x5C (ASCII
+     backslash). */
+  return set_itext_ichar (ptr, ch);
+}
+
+#endif /* MULE */
+
+/****************************************************************************/
+/*--------------------------------------------------------------------------*/
+/*                                                                          */
+/*              Everything above here knows about the specifics of          */
+/*              the internal character and text formats.  Nothing           */
+/*              below or anywhere else knows, except text.h.                */
+/*                                                                          */
+/*--------------------------------------------------------------------------*/
+/****************************************************************************/
+
+
+/************************************************************************/
 /*                          qxestr***() functions                       */
 /************************************************************************/

＠＠ -1752,7 +2192,7 ＠＠

 void
 convert_ichar_string_into_ibyte_dynarr (Ichar *arr, int nels,
-					  Ibyte_dynarr *dyn)
+					Ibyte_dynarr *dyn)
 {
   Ibyte str[MAX_ICHAR_LEN];
   int i;
＠＠ -1772,7 +2212,7 ＠＠

 Ibyte *
 convert_ichar_string_into_malloced_string (Ichar *arr, int nels,
-					    Bytecount *len_out)
+					   Bytecount *len_out)
 {
   /* Damn zero-termination. */
   Ibyte *str = alloca_ibytes (nels * MAX_ICHAR_LEN + 1);
＠＠ -1792,53 +2232,53 ＠＠
   return str;
 }

-#define COPY_TEXT_BETWEEN_FORMATS(srcfmt, dstfmt)			 \
-do									 \
-{									 \
-  if (dst)								 \
-    {									 \
-      Ibyte *dstend = dst + dstlen;					 \
-      Ibyte *dstp = dst;						 \
-      const Ibyte *srcend = src + srclen;				 \
-      const Ibyte *srcp = src;					 \
-									 \
-      while (srcp < srcend)						 \
-	{								 \
-	  Ichar ch = itext_ichar_fmt (srcp, srcfmt, srcobj);	 \
-	  Bytecount len = ichar_len_fmt (ch, dstfmt);			 \
-									 \
-	    if (dstp + len <= dstend)					 \
-	      {								 \
-		(void) set_itext_ichar_fmt (dstp, ch, dstfmt, dstobj);	 \
-		dstp += len;						 \
-	      }								 \
-	    else							 \
-	      break;							 \
-	  INC_IBYTEPTR_FMT (srcp, srcfmt);				 \
-	}								 \
-      text_checking_assert (srcp <= srcend);				 \
-      if (src_used)							 \
-	*src_used = srcp - src;						 \
-      return dstp - dst;						 \
-    }									 \
-  else									 \
-    {									 \
-      const Ibyte *srcend = src + srclen;				 \
-      const Ibyte *srcp = src;					 \
-      Bytecount total = 0;						 \
-									 \
-      while (srcp < srcend)						 \
-	{								 \
-	  total += ichar_len_fmt (itext_ichar_fmt (srcp, srcfmt,	 \
+#define COPY_TEXT_BETWEEN_FORMATS(srcfmt, dstfmt)			\
+do									\
+{									\
+  if (dst)								\
+    {									\
+      Ibyte *dstend = dst + dstlen;					\
+      Ibyte *dstp = dst;						\
+      const Ibyte *srcend = src + srclen;				\
+      const Ibyte *srcp = src;						\
+									\
+      while (srcp < srcend)						\
+	{								\
+	  Ichar ch = itext_ichar_fmt (srcp, srcfmt, srcobj);		\
+	  Bytecount len = ichar_len_fmt (ch, dstfmt);			\
+									\
+	    if (dstp + len <= dstend)					\
+	      {								\
+		(void) set_itext_ichar_fmt (dstp, ch, dstfmt, dstobj);	\
+		dstp += len;						\
+	      }								\
+	    else							\
+	      break;							\
+	  INC_IBYTEPTR_FMT (srcp, srcfmt);				\
+	}								\
+      text_checking_assert (srcp <= srcend);				\
+      if (src_used)							\
+	*src_used = srcp - src;						\
+      return dstp - dst;						\
+    }									\
+  else									\
+    {									\
+      const Ibyte *srcend = src + srclen;				\
+      const Ibyte *srcp = src;						\
+      Bytecount total = 0;						\
+									\
+      while (srcp < srcend)						\
+	{								\
+	  total += ichar_len_fmt (itext_ichar_fmt (srcp, srcfmt,	\
 						       srcobj), dstfmt); \
-	  INC_IBYTEPTR_FMT (srcp, srcfmt);				 \
-	}								 \
-      text_checking_assert (srcp == srcend);				 \
-      if (src_used)							 \
-	*src_used = srcp - src;						 \
-      return total;							 \
-    }									 \
-}									 \
+	  INC_IBYTEPTR_FMT (srcp, srcfmt);				\
+	}								\
+      text_checking_assert (srcp == srcend);				\
+      if (src_used)							\
+	*src_used = srcp - src;						\
+      return total;							\
+    }									\
+}									\
 while (0)

 /* Copy as much text from SRC/SRCLEN to DST/DSTLEN as will fit, converting
＠＠ -1945,57 +2385,94 ＠＠
 /*                    charset properties of strings                     */
 /************************************************************************/

+#if 0 /* Not currently used */
+
 void
-find_charsets_in_ibyte_string (unsigned char *charsets,
+find_charsets_in_ibyte_string (Lisp_Object_dynarr *charsets,
 			       const Ibyte *USED_IF_MULE (str),
 			       Bytecount USED_IF_MULE (len))
 {
 #ifndef MULE
-  /* Telescope this. */
-  charsets[0] = 1;
+  Dynarr_add (charsets, Vcharset_ascii);
 #else
   const Ibyte *strend = str + len;
-  memset (charsets, 0, NUM_LEADING_BYTES);
+  Lisp_Object prev_charset = Qunbound;
+  Lisp_Object_dynarr *prec;

   /* #### SJT doesn't like this. */
   if (len == 0)
     {
-      charsets[XCHARSET_LEADING_BYTE (Vcharset_ascii) - MIN_LEADING_BYTE] = 1;
+      Dynarr_add (charsets, Vcharset_ascii);
       return;
     }
+
+  prec = get_unicode_precedence (); /* #### FIXME */
+
+  while (str < strend)
+    {
+      Lisp_Object charset;
+      int c1, c2;
+      itext_to_charset_codepoint (str, prec, &charset, &c1, &c2,
CONVERR_FAIL);
+      if (!NILP (charset) && !EQ (charset, prev_charset))
+	{
+	  int i;
+
+	  prev_charset = charset;
+	  for (i = 0; i < Dynarr_length (charsets); i++)
+	    {
+	      if (EQ (Dynarr_at (charsets, i), charset))
+		break;
+	    }
+	  if (i == Dynarr_length (charsets))
+	    Dynarr_add (charsets, charset);
+	}

-  while (str < strend)
-    {
-      charsets[ichar_leading_byte (itext_ichar (str)) - MIN_LEADING_BYTE] =
-	1;
       INC_IBYTEPTR (str);
     }
 #endif
 }

+#endif /* 0 */
+
 void
-find_charsets_in_ichar_string (unsigned char *charsets,
+find_charsets_in_ichar_string (Lisp_Object_dynarr *charsets,
 			       const Ichar *USED_IF_MULE (str),
 			       Charcount USED_IF_MULE (len))
 {
 #ifndef MULE
-  /* Telescope this. */
-  charsets[0] = 1;
+  Dynarr_add (charsets, Vcharset_ascii);
 #else
-  int i;
-
-  memset (charsets, 0, NUM_LEADING_BYTES);
+  Lisp_Object prev_charset = Qunbound;
+  Lisp_Object_dynarr *prec;
+  int j;

   /* #### SJT doesn't like this. */
   if (len == 0)
     {
-      charsets[XCHARSET_LEADING_BYTE (Vcharset_ascii) - MIN_LEADING_BYTE] = 1;
+      Dynarr_add (charsets, Vcharset_ascii);
       return;
     }

-  for (i = 0; i < len; i++)
+  prec = get_unicode_precedence (); /* ＠＠#### FIXME */
+
+  for (j = 0; j < len; j++)
     {
-      charsets[ichar_leading_byte (str[i]) - MIN_LEADING_BYTE] = 1;
+      Lisp_Object charset;
+      int c1, c2;
+      ichar_to_charset_codepoint (str[j], prec, &charset, &c1, &c2);
+      if (!NILP (charset) && !EQ (charset, prev_charset))
+	{
+	  int i;
+
+	  prev_charset = charset;
+	  for (i = 0; i < Dynarr_length (charsets); i++)
+	    {
+	      if (EQ (Dynarr_at (charsets, i), charset))
+		break;
+	    }
+	  if (i == Dynarr_length (charsets))
+	    Dynarr_add (charsets, charset);
+	}
     }
 #endif
 }
＠＠ -2008,12 +2485,7 ＠＠

   while (str < end)
     {
-#ifdef MULE
-      Ichar ch = itext_ichar (str);
-      cols += XCHARSET_COLUMNS (ichar_charset (ch));
-#else
-      cols++;
-#endif
+      cols += ichar_columns (itext_ichar (str));
       INC_IBYTEPTR (str);
     }

＠＠ -2028,7 +2500,7 ＠＠
   int i;

   for (i = 0; i < len; i++)
-    cols += XCHARSET_COLUMNS (ichar_charset (str[i]));
+    cols += ichar_columns (str[i]);

   return cols;
 #else  /* not MULE */
＠＠ -2295,9 +2767,14 ＠＠
 	break;
       {
 	/* Optimize for successive characters from the same charset */
-	Ibyte leading_byte = *ptr;
-	int bytes = rep_bytes_by_first_byte (leading_byte);
-	while (ptr < end && *ptr == leading_byte)
+	/* This still sort of works in the Unicode world -- 0 - 7F take one
+	   byte, 80 - 7FF take two bytes, 800 - FFFF take three bytes.
+	   Latin, Greek, Cyrillic, Hebrew and Arabic scripts end up
+	   (mostly) in the two-byte ranges, and Indian and CJK scripts in
+	   the three-byte ranges. */
+	Ibyte lead_byte = *ptr;
+	int bytes = rep_bytes_by_first_byte (lead_byte);
+	while (ptr < end && *ptr == lead_byte)
 	  ptr += bytes, count++;
       }
     }
＠＠ -2326,9 +2803,11 ＠＠
 	break;
       {
 	/* Optimize for successive characters from the same charset */
-	Ibyte leading_byte = *newptr;
-	int bytes = rep_bytes_by_first_byte (leading_byte);
-	while (len > 0 && *newptr == leading_byte)
+	/* This still sort of works in the Unicode world -- see
+	   comment in bytecount_to_charcount_fun(). */
+	Ibyte lead_byte = *newptr;
+	int bytes = rep_bytes_by_first_byte (lead_byte);
+	while (len > 0 && *newptr == lead_byte)
 	  newptr += bytes, len--;
       }
     }
＠＠ -4021,8 +4500,8 ＠＠

 typedef struct
 {
-  Dynarr_declare (Ibyte_dynarr *);
-} Ibyte_dynarr_dynarr;
+  Dynarr_declare (unsigned_char_dynarr *);
+} unsigned_char_dynarr_dynarr;

 typedef struct
 {
＠＠ -4030,7 +4509,7 ＠＠
 } Extbyte_dynarr_dynarr;

 static Extbyte_dynarr_dynarr *conversion_out_dynarr_list;
-static Ibyte_dynarr_dynarr *conversion_in_dynarr_list;
+static unsigned_char_dynarr_dynarr *conversion_in_dynarr_list;

 static int dfc_convert_to_external_format_in_use;
 static int dfc_convert_to_internal_format_in_use;
＠＠ -4102,11 +4581,19 ＠＠
 	const Ibyte *end;
 	for (end = ptr + len; ptr < end;)
 	  {
-	    Ibyte c =
-	      (byte_ascii_p (*ptr))		   ? *ptr :
-	      (*ptr == LEADING_BYTE_CONTROL_1)	   ? (*(ptr+1) - 0x20) :
-	      (*ptr == LEADING_BYTE_LATIN_ISO8859_1) ? (*(ptr+1)) :
-	      '~';
+	    Ibyte c;
+	    if (byte_ascii_p (*ptr))
+	      c = *ptr;
+	    else
+	      {
+		Ichar ich = non_ascii_itext_ichar (ptr);
+		if (ich < 256)
+		  c = (Ibyte) ich;
+		else
+		  /* #### This is just plain unacceptable. */
+		  /* untranslatable character */
+		  c = CANT_CONVERT_CHAR_WHEN_ENCODING;
+	      }

 	    Dynarr_add (conversion_out_dynarr, (Extbyte) c);
 	    INC_IBYTEPTR (ptr);
＠＠ -4252,7 +4739,7 ＠＠
      esp. given that this code conversion occurs in many very hidden
      places. */
   int count;
-  Ibyte_dynarr *conversion_in_dynarr;
+  unsigned_char_dynarr *conversion_in_dynarr;
   Lisp_Object underlying_cs;
   PROFILE_DECLARE ();

＠＠ -4270,7 +4757,7 ＠＠

   if (Dynarr_length (conversion_in_dynarr_list) <=
       dfc_convert_to_internal_format_in_use)
-    Dynarr_add (conversion_in_dynarr_list, Dynarr_new (Ibyte));
+    Dynarr_add (conversion_in_dynarr_list, Dynarr_new (unsigned_char));
   conversion_in_dynarr = Dynarr_at (conversion_in_dynarr_list,
 				    dfc_convert_to_internal_format_in_use);
   Dynarr_reset (conversion_in_dynarr);
＠＠ -4312,18 +4799,7 ＠＠
         {
           Ibyte c = *ptr;

-	  if (byte_ascii_p (c))
-	    Dynarr_add (conversion_in_dynarr, c);
-	  else if (byte_c1_p (c))
-	    {
-	      Dynarr_add (conversion_in_dynarr, LEADING_BYTE_CONTROL_1);
-	      Dynarr_add (conversion_in_dynarr, c + 0x20);
-	    }
-	  else
-	    {
-	      Dynarr_add (conversion_in_dynarr, LEADING_BYTE_LATIN_ISO8859_1);
-	      Dynarr_add (conversion_in_dynarr, c);
-	    }
+	  DECODE_ADD_BINARY_CHAR (c, conversion_in_dynarr);
         }
 #else
       Dynarr_add_many (conversion_in_dynarr, source->data.ptr, source->data.len);
＠＠ -4369,20 +4845,7 ＠＠
 	{
           Ibyte c = *ptr;

-	  if (byte_ascii_p (c))
-	    Dynarr_add (conversion_in_dynarr, c);
-#ifdef MULE
-	  else if (byte_c1_p (c))
-	    {
-	      Dynarr_add (conversion_in_dynarr, LEADING_BYTE_CONTROL_1);
-	      Dynarr_add (conversion_in_dynarr, c + 0x20);
-	    }
-	  else
-	    {
-	      Dynarr_add (conversion_in_dynarr, LEADING_BYTE_LATIN_ISO8859_1);
-	      Dynarr_add (conversion_in_dynarr, c);
-	    }
-#endif /* MULE */
+	  DECODE_ADD_BINARY_CHAR (c, conversion_in_dynarr);
         }
     }
 #endif /* WIN32_ANY */
＠＠ -4628,164 +5091,6 ＠＠

 /************************************************************************/
-/*                       Basic Ichar functions                         */
-/************************************************************************/
-
-#ifdef MULE
-
-/* Convert a non-ASCII Mule character C into a one-character Mule-encoded
-   string in STR.  Returns the number of bytes stored.
-   Do not call this directly.  Use the macro set_itext_ichar() instead.
- */
-
-Bytecount
-non_ascii_set_itext_ichar (Ibyte *str, Ichar c)
-{
-  Ibyte *p;
-  Ibyte lb;
-  int c1, c2;
-  Lisp_Object charset;
-
-  p = str;
-  BREAKUP_ICHAR (c, charset, c1, c2);
-  lb = ichar_leading_byte (c);
-  if (leading_byte_private_p (lb))
-    *p++ = private_leading_byte_prefix (lb);
-  *p++ = lb;
-  if (EQ (charset, Vcharset_control_1))
-    c1 += 0x20;
-  *p++ = c1 | 0x80;
-  if (c2)
-    *p++ = c2 | 0x80;
-
-  return (p - str);
-}
-
-/* Return the first character from a Mule-encoded string in STR,
-   assuming it's non-ASCII.  Do not call this directly.
-   Use the macro itext_ichar() instead. */
-
-Ichar
-non_ascii_itext_ichar (const Ibyte *str)
-{
-  Ibyte i0 = *str, i1, i2 = 0;
-  Lisp_Object charset;
-
-  if (i0 == LEADING_BYTE_CONTROL_1)
-    return (Ichar) (*++str - 0x20);
-
-  if (leading_byte_prefix_p (i0))
-    i0 = *++str;
-
-  i1 = *++str & 0x7F;
-
-  charset = charset_by_leading_byte (i0);
-  if (XCHARSET_DIMENSION (charset) == 2)
-    i2 = *++str & 0x7F;
-
-  return make_ichar (charset, i1, i2);
-}
-
-/* Return whether CH is a valid Ichar, assuming it's non-ASCII.
-   Do not call this directly.  Use the macro valid_ichar_p() instead. */
-
-int
-non_ascii_valid_ichar_p (Ichar ch)
-{
-  int f1, f2, f3;
-
-  /* Must have only lowest 19 bits set */
-  if (ch & ~0x7FFFF)
-    return 0;
-
-  f1 = ichar_field1 (ch);
-  f2 = ichar_field2 (ch);
-  f3 = ichar_field3 (ch);
-
-  if (f1 == 0)
-    {
-      /* dimension-1 char */
-      Lisp_Object charset;
-
-      /* leading byte must be correct */
-      if (f2 < MIN_ICHAR_FIELD2_OFFICIAL ||
-	  (f2 > MAX_ICHAR_FIELD2_OFFICIAL && f2 < MIN_ICHAR_FIELD2_PRIVATE) ||
-	   f2 > MAX_ICHAR_FIELD2_PRIVATE)
-	return 0;
-      /* octet not out of range */
-      if (f3 < 0x20)
-	return 0;
-      /* charset exists */
-      /*
-	 NOTE: This takes advantage of the fact that
-	 FIELD2_TO_OFFICIAL_LEADING_BYTE and
-	 FIELD2_TO_PRIVATE_LEADING_BYTE are the same.
-	 */
-      charset = charset_by_leading_byte (f2 + FIELD2_TO_OFFICIAL_LEADING_BYTE);
-      if (EQ (charset, Qnil))
-	return 0;
-      /* check range as per size (94 or 96) of charset */
-      return ((f3 > 0x20 && f3 < 0x7f) || XCHARSET_CHARS (charset) == 96);
-    }
-  else
-    {
-      /* dimension-2 char */
-      Lisp_Object charset;
-
-      /* leading byte must be correct */
-      if (f1 < MIN_ICHAR_FIELD1_OFFICIAL ||
-	  (f1 > MAX_ICHAR_FIELD1_OFFICIAL && f1 < MIN_ICHAR_FIELD1_PRIVATE) ||
-	  f1 > MAX_ICHAR_FIELD1_PRIVATE)
-	return 0;
-      /* octets not out of range */
-      if (f2 < 0x20 || f3 < 0x20)
-	return 0;
-
-#ifdef ENABLE_COMPOSITE_CHARS
-      if (f1 + FIELD1_TO_OFFICIAL_LEADING_BYTE == LEADING_BYTE_COMPOSITE)
-	{
-	  if (UNBOUNDP (Fgethash (make_int (ch),
-				  Vcomposite_char_char2string_hash_table,
-				  Qunbound)))
-	    return 0;
-	  return 1;
-	}
-#endif /* ENABLE_COMPOSITE_CHARS */
-
-      /* charset exists */
-      if (f1 <= MAX_ICHAR_FIELD1_OFFICIAL)
-	charset =
-	  charset_by_leading_byte (f1 + FIELD1_TO_OFFICIAL_LEADING_BYTE);
-      else
-	charset =
-	  charset_by_leading_byte (f1 + FIELD1_TO_PRIVATE_LEADING_BYTE);
-
-      if (EQ (charset, Qnil))
-	return 0;
-      /* check range as per size (94x94 or 96x96) of charset */
-      return ((f2 != 0x20 && f2 != 0x7F && f3 != 0x20 && f3 !=
0x7F) ||
-	      XCHARSET_CHARS (charset) == 96);
-    }
-}
-
-/* Copy the character pointed to by SRC into DST.  Do not call this
-   directly.  Use the macro itext_copy_ichar() instead.
-   Return the number of bytes copied.  */
-
-Bytecount
-non_ascii_itext_copy_ichar (const Ibyte *src, Ibyte *dst)
-{
-  Bytecount bytes = rep_bytes_by_first_byte (*src);
-  Bytecount i;
-  for (i = bytes; i; i--, dst++, src++)
-    *dst = *src;
-  return bytes;
-}
-
-#endif /* MULE */
-
-
-/************************************************************************/
 /*                        streams of Ichars                            */
 /************************************************************************/

＠＠ -4835,17 +5140,155 ＠＠
 /************************************************************************/
 /*              Lisp primitives for working with characters             */
 /************************************************************************/
+
+static int
+check_coerce_octet (Lisp_Object arg, int low, int high)
+{
+  int retval;
+  CHECK_INT (arg);
+  /* It is useful (and safe, according to Olivier Galibert) to strip
+     the 8th bit off ARG1 and ARG2 because it allows programmers to
+     write (make-char 'latin-iso8859-2 CODE) where code is the actual
+     Latin 2 code of the character.  */
+  retval = XINT (arg);
+  if (high < 128 && retval - 128 >= low && retval - 128 <=
high)
+    retval -= 128;
+  if (low >= 128 && retval + 128 >= low && retval + 128 <=
high)
+    retval += 128;
+  if (retval < low || retval > high)
+    args_out_of_range_3 (arg, make_int (low), make_int (high));
+  return retval;
+}
+
+#ifdef MULE
+
+Lisp_Object
+get_charset_octets (Lisp_Object charset, Lisp_Object arg1, Lisp_Object arg2,
+		    int *a1, int *a2)
+{
+  int low1, high1, low2, high2;
+
+  charset = Fget_charset (charset);
+  get_charset_limits (charset, &low1, &high1, &low2, &high2);
+
+
+  if (XCHARSET_DIMENSION (charset) == 1)
+    {
+      if (!NILP (arg2))
+	invalid_argument
+	  ("Charset is of dimension one; second octet must be nil", arg2);
+      /* For one dimension a1 is always 0; this is a way where the
+	 external and internal arguments are reversed for dimension-1
+	 charset codepoints */
+      *a1 = 0;
+      *a2 = check_coerce_octet (arg1, low2, high2);
+    }
+  else
+    {
+      *a1 = check_coerce_octet (arg1, low1, high1);
+      *a2 = check_coerce_octet (arg2, low2, high2);
+    }
+  return charset;
+}
+
+#endif

-DEFUN ("make-char", Fmake_char, 2, 3, 0, /*
-Make a character from CHARSET and octets ARG1 and ARG2.
-ARG2 is required only for characters from two-dimensional charsets.
-
-Each octet should be in the range 32 through 127 for a 96 or 96x96
-charset and 33 through 126 for a 94 or 94x94 charset. (Most charsets
-are either 96 or 94x94.) Note that this is 32 more than the values
-typically given for 94x94 charsets.  When two octets are required, the
-order is "standard" -- the same as appears in ISO-2022 encodings,
-reference tables, etc.
+enum converr
+decode_handle_error (Lisp_Object err)
+{
+  CHECK_SYMBOL (err);
+  if (NILP (err) || EQ (err, Qfail))
+    return CONVERR_FAIL;
+  if (EQ (err, Qabort))
+    return CONVERR_ABORT;
+  if (EQ (err, Qsucceed))
+    return CONVERR_SUCCEED;
+  if (EQ (err, Qsubstitute))
+    return CONVERR_SUBSTITUTE;
+  if (EQ (err, Quse_private))
+    return CONVERR_USE_PRIVATE;
+  invalid_constant
+    ("Must be nil, `fail', `succeed', `substitute', or
`use-private'", err);
+  /* Not reached */
+}
+
+DEFUN ("make-char", Fmake_char, 1, 4, 0, /*
+Make a character from charset and octets (OCTET1 and OCTET2).
+
+The arguments can be of two forms:
+
+\(make-char CHARSET OCTET1 &optional OCTET2)
+\(make-char UNICODE-CODEPOINT &optional CHARSET-PRECEDENCE-LIST)
+
+Attempts to generate a character from a particular codepoint in a national
+character set.  OCTET2 is either required or disallowed, depending on
+whether CHARSET is of one or two dimensions (see `make-charset').
+
+Note that there are three different internal formats for characters:
+
+1. ("non-Mule", configured without `--with-mule') An integer between 0 and
255.
+
+2. ("Unicode-internal", configured with `--with-mule' and
+   `--with-unicode-internal') An integer representing a Unicode codepoint.
+
+3. ("old Mule-internal", configured with `--with-mule' and without
+   `--with-unicode-internal') An integer that internally encodes a national
+   character set (e.g. ISO-8859-1 or GB-2312) and associated codepoint.
+   This is the old Mule representation.  This is a flawed representation
+   because what is the same character from a logical standpoint can have
+   multiple representations. (This is a particular problem with accented
+   Latin characters.)
+
+Note that all three representations more or less agree in encoding ASCII in
+the range 0-127 and ISO-8859-1 in the range 128-255. ("More or less"
+because representation #1 does not really care what the actual significance
+of the characters is.  Under X Windows at least, XEmacs without Mule
+support could be made to support various character sets with appropriate
+font settings.)
+
+XEmacs tries to hide the internal representation of characters as much as
+possible, but it is not completely possible to hide the difference between
+representations #2 and #3 because of the explicit encoding of a charset or
+lack thereof in the character.  Conversion from Unicode codepoints to
+charset codepoints is a one-to-many operation, and requires a charset
+precedence list to determine which of many charsets to choose from.  This
+means:
+
+-- In a Unicode-internal world, conversion from a character to a charset
+   codepoint (`char-charset', `char-octet', `split-char') uses a charset
+   precedence list.
+
+-- In an old Mule-internal world, conversion from a Unicode codepoint to
+   a character (`make-char') uses a charset precedence list.
+
+In all cases, the precedence list is optional; when not specified, a
+default list, corresponding to the current language, is used. (See
+`unicode-precedence-list'.)
+
+OCTET2 is required only for characters from two-dimensional charsets.
+
+Each octet should be in the range corresponding to the offset and size
+for that dimension, as defined in the charset.  For a typical one-dimensional
+charset of size 96, such as ISO-8859-1 (aka Latin-1), the range will be
+\[32, 127].  For a typical two-dimensional charset of size 94x94, such as
+GB-2312 (Simplified Chinese), JISX-0208 (Japanese), or KS-5601 (Korean),
+the range will be [33, 126] in each dimension.  Big5 and certain other
+large charsets have a range of [33, 126] in the first dimension and
+\[64, 253] or something similar in the second dimension.  Other charsets
+may have other dimensions; e.g. ASCII has the range [0, 127].
+
+Note that the ranges as used for 94x94 charsets are 32 more in each dimension
+than the "ku-ten" representation often seen for these charsets, with a range
+of [1, 94] in each dimension.
+
+If the allowed values in a particular dimension are entirely in the range
+\[0, 127] or [128, 255], octet values offset by 128 are allowed and will be
+converted appropriately.  Hence, either \(make-char 'latin-iso8859-2 185)
+or (make-char 'latin-iso8859-2 57) will return the Latin 2 character s with
+caron.
+
+When two octets are required, the order is "standard" -- the same as
+appears in ISO-2022 encodings, reference tables, etc.

 \(Note the following non-obvious result: Computerized translation
 tables often encode the two octets as the high and low bytes,
＠＠ -4854,19 +5297,16 ＠＠
 the two cases differently when calling make-char: One is (make-char
 CHARSET HIGH LOW), the other is (make-char CHARSET LOW).)

-For example, (make-char 'latin-iso8859-2 185) or (make-char
-'latin-iso8859-2 57) will return the Latin 2 character s with caron.
+As an example, the Japanese character for "kawa" (stream), looks something
+like this:

-As another example, the Japanese character for "kawa" (stream), which
-looks something like this:
-
    |     |
    |  |  |
    |  |  |
    |  |  |
   /      |

-appears in the Unicode Standard (version 2.0) on page 7-287 with the
+It appears in the Unicode Standard (version 2.0) on page 7-287 with the
 following values (see also page 7-4):

 U 5DDD     (Unicode)
＠＠ -4902,128 +5342,335 ＠＠
 `control-1' -- ARG1 should be in the range 128 through 159.
  else -- ARG1 is coerced to be between 0 and 255, and then the high
          bit is set.
+
+ `int-to-char of the resulting ARG1' is returned, and ARG2 is always ignored.

- `int-to-char of the resulting ARG1' is returned, and ARG2 is always ignored. 
+Note that we provide the following sets of operations to deal with the
+different possible underlying representations of a character:
+
+1. `make-char', or alternatively `charset-codepoint-to-char', generates a
+    character from a charset codepoint (a national charset and one or two
+    position codes, or octets); `unicode-to-char' generates a character
+    from a Unicode codepoint.  `char-to-charset-codepoint' converts a
+    character into a charset codepoint (charset and one or two octets), and
+    `char-to-unicode' converts a character to a Unicode codepoint.
+    Depending on the particular representation used for the character, a
+    charset precedence list may be required for some of these operations.
+2. `int-to-char' and `char-to-int' convert between a character and its
+    raw representation as an integer.  With Unicode-internal, this will
+    simply be a Unicode value.  With old-Mule-internal, this will be some
+    complex number in which the charset and octets have been encoded in the
+    bit fields.
+3. `charset-codepoint-to-unicode' and `unicode-to-charset-codepoint' are
+    for converting directly between charset codepoints and Unicode codepoints,
+    regardless of the particular representation of a character.
+
+HANDLE-ERROR controls error behavior stemming from inability to translate.
+Currently, this happens only with Unicode-internal:
+
+nil or `fail'	Return nil
+`abort'		Signal an error
+`succeed'	Same as `use-private'
+`substitute'	Substitute the Unicode replacement char (0xFFFD)
+`use-private'	Encode using private Unicode space
 */
-       (charset, arg1, USED_IF_MULE (arg2)))
+       (charset, octet1, octet2, handle_error))
 {
 #ifdef MULE
-  Lisp_Charset *cs;
+  enum converr fail = decode_handle_error (handle_error);
   int a1, a2;
-  int lowlim, highlim;
+  Ichar ch;

   charset = Fget_charset (charset);
-  cs = XCHARSET (charset);
-
-  get_charset_limits (charset, &lowlim, &highlim);
-
-  CHECK_INT (arg1);
-  /* It is useful (and safe, according to Olivier Galibert) to strip
-     the 8th bit off ARG1 and ARG2 because it allows programmers to
-     write (make-char 'latin-iso8859-2 CODE) where code is the actual
-     Latin 2 code of the character.  */
-  a1 = XINT (arg1) & 0x7f;
-  if (a1 < lowlim || a1 > highlim)
-    args_out_of_range_3 (arg1, make_int (lowlim), make_int (highlim));
-
-  if (CHARSET_DIMENSION (cs) == 1)
-    {
-      if (!NILP (arg2))
-        invalid_argument
-          ("Charset is of dimension one; second octet must be nil", arg2);
-      return make_char (make_ichar (charset, a1, 0));
-    }
-
-  CHECK_INT (arg2);
-  a2 = XINT (arg2) & 0x7f;
-  if (a2 < lowlim || a2 > highlim)
-    args_out_of_range_3 (arg2, make_int (lowlim), make_int (highlim));
-
-  return make_char (make_ichar (charset, a1, a2));
+  get_charset_octets (charset, octet1, octet2, &a1, &a2);
+  ch = charset_codepoint_to_ichar (charset, a1, a2, fail);
+  if (ch < 0)
+    return Qnil;
+  return make_char (ch);
 #else
   int a1;
   int lowlim, highlim;

-  if      (EQ (charset, Qascii))     lowlim =  0, highlim = 127;
-  else if (EQ (charset, Qcontrol_1)) lowlim =  0, highlim =  31;
-  else	                             lowlim =  0, highlim = 127;
+  CHECK_SYMBOL (charset);
+  if (EQ (charset, Qascii))
+    lowlim =  0, highlim = 127;
+  else if (EQ (charset, Qcontrol_1))
+    lowlim =  0, highlim =  31;
+  else
+    lowlim =  0, highlim = 127;

-  CHECK_INT (arg1);
-  /* It is useful (and safe, according to Olivier Galibert) to strip
-     the 8th bit off ARG1 and ARG2 because it allows programmers to
-     write (make-char 'latin-iso8859-2 CODE) where code is the actual
-     Latin 2 code of the character.  */
-  a1 = XINT (arg1) & 0x7f;
-  if (a1 < lowlim || a1 > highlim)
-    args_out_of_range_3 (arg1, make_int (lowlim), make_int (highlim));
+  a1 = check_coerce_octet (octet1, lowlim, highlim);

-  if (EQ (charset, Qascii))
+  if (EQ (charset_or_unicode_codepoint, Qascii))
     return make_char (a1);
   return make_char (a1 + 128);
 #endif /* MULE */
 }

+DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 2, 0, /*
+Convert character to Unicode codepoint.
+When there is no international support (i.e. the 'mule feature is not
+present), this function simply does `char-to-int'.
+
+HANDLE-ERROR controls error behavior:
+
+nil or `fail'	Return nil
+`abort'		Signal an error
+`succeed'	Same as `use-private'
+`substitute'	Substitute the Unicode replacement char (0xFFFD)
+`use-private'	Encode using private Unicode space
+*/
+       (character, handle_error))
+{
+  enum converr fail = decode_handle_error (handle_error);
+  CHECK_CHAR (character);
+  
+#ifdef MULE
+  return make_int (ichar_to_unicode (XCHAR (character), fail));
+#else
+  return Fchar_to_int (character);
+#endif /* MULE */
+}
+
+DEFUN ("unicode-to-char", Funicode_to_char, 1, 3, 0, /*
+Convert Unicode codepoint to char.
+
+Attempts to generate a character from a particular Unicode codepoint, which
+should be a non-negative integer.  When the old Mule-internal
+representation is used, there are multiple possible return values; the
+particular value returned will reflect the given charset precedence list,
+or the default precedence.  If PRECEDENCE-LIST is given, it should
+be a list of charsets, and only those charsets will be consulted, in the
+given order, for a translation.  Otherwise, the default ordering of all
+charsets will be given (see `set-unicode-charset-precedence').  When there
+is no international support \(i.e. the `mule' feature is not present), this
+function simply does `int-to-char' and ignores the PRECEDENCE-LIST
+argument. (Redisplay will work on the sjt-xft branch, but not with
+server-side X11 fonts as is the default.)
+
+HANDLE-ERROR controls error behavior:
+
+nil or `fail'	Return nil
+`abort'		Signal an error
+`succeed'	Same as `substitute'
+`substitute'	Substitute a replacement character
+*/
+       (unicode, precedence_list, handle_error))
+{
+#if defined (MULE) && !defined (UNICODE_INTERNAL)
+  int c = decode_unicode (unicode);
+  Lisp_Object_dynarr *dyn =
+    convert_charset_list_to_precedence_dynarr (precedence_list);
+  enum converr fail = decode_handle_error (handle_error);
+  Ichar ret = unicode_to_ichar (c, dyn, fail);
+
+  free_precedence_dynarr (dyn);
+  if (ret == -1)
+    return Qnil;
+  return make_char (ret);
+#else
+  decode_unicode (unicode);
+  return Fint_to_char (unicode);
+#endif /* (not) defined (MULE) && !defined (UNICODE_INTERNAL) */
+}
+
 #ifdef MULE
+
+/* Like ichar_to_charset_codepoint() but takes a CHARSETS list and converts
+   it into an internal dynarr. */
+static void
+ichar_to_charset_codepoint_helper (Ichar ch, Lisp_Object
+				   USED_IF_UNICODE_INTERNAL (charsets),
+				   Lisp_Object *charset, int *c1, int *c2)
+{
+#ifdef UNICODE_INTERNAL
+  Lisp_Object_dynarr *dyn =
+    convert_charset_list_to_precedence_dynarr (charsets);
+  ichar_to_charset_codepoint (ch, dyn, charset, c1, c2);
+  free_precedence_dynarr (dyn);
+#else
+  ichar_to_charset_codepoint (ch, 0, charset, c1, c2);
+#endif /* UNICODE_INTERNAL */
+}
+
+DEFUN ("char-to-charset-codepoint", Fchar_to_charset_codepoint, 1, 2, 0, /*
+Return list of charset and one or two position-codes of char CH.
+Use this function in place of `split-char'.
+
+When a Unicode internal representation is used (--with-unicode-internal
+option to configure), CH will be converted according to CHARSETS, a
+charset precedence list (see `make-char').  The return value will
+be the highest-precedence charset in which the character is found, and
+the corresponding position codes (octets) in the representation in this
+charset.  The return value will be nil if no equivalent national charset
+representation can be found.
+*/
+       (ch, USED_IF_MULE (charsets)))
+{
+  Lisp_Object charset;
+  int c1, c2;
+
+  CHECK_CHAR_COERCE_INT (ch);
+  ichar_to_charset_codepoint_helper (XCHAR (ch), charsets, &charset, &c1,
&c2);
+
+  if (NILP (charset))
+    return Qnil;
+
+  if (XCHARSET_DIMENSION (charset) == 2)
+    return list3 (XCHARSET_NAME (charset), make_int (c1), make_int (c2));
+  else
+    return list2 (XCHARSET_NAME (charset), make_int (c2));
+}
+
+/* Maaaaaaaaybe it makes sense to provide some of these not under Mule; it
+   could be argued that, since we provide `make-char' that can take a
+   pseudo-charset, we should provide `split-char'.  Maybe. */
+
+DEFUN ("charset-codepoint-to-unicode", Fcharset_codepoint_to_unicode,
+       2, 4, 0, /*
+Convert a CHARSET and one or two octets to a Unicode codepoint.
+See `make-char' for the format of the octets.
+
+HANDLE-ERROR controls error behavior:
+
+nil or `fail'	Return nil
+`abort'		Signal an error
+`succeed'	Same as `use-private'
+`substitute'	Substitute the Unicode replacement char (0xFFFD)
+`use-private'	Encode using private Unicode space
+*/
+       (charset, arg1, arg2, handle_error))
+{
+  int a1, a2;
+  charset = get_charset_octets (charset, arg1, arg2, &a1, &a2);
+  enum converr err = decode_handle_error (handle_error);
+  int code = charset_codepoint_to_unicode (charset, a1, a2, err);
+  if (code == -1)
+    return Qnil;
+  return make_int (code);
+}
+
+DEFUN ("unicode-to-charset-codepoint", Funicode_to_charset_codepoint,
+       1, 2, 0, /*
+Convert a Unicode codepoint and a charset precedence list (see `make-char')
+into a list of charset and one or two octets, codepoints in the charset.
+Return nil if no conversion available.
+*/
+       (code, charsets))
+{
+  Lisp_Object_dynarr *dyn =
+    convert_charset_list_to_precedence_dynarr (charsets);
+  int c = decode_unicode (code);
+  Lisp_Object charset;
+  int a1, a2;

-DEFUN ("char-charset", Fchar_charset, 1, 1, 0, /*
+  unicode_to_charset_codepoint (c, dyn, &charset, &a1, &a2);
+  free_precedence_dynarr (dyn);
+  if (NILP (charset))
+    return Qnil;
+
+  if (XCHARSET_DIMENSION (Fget_charset (charset)) == 2)
+    return list3 (XCHARSET_NAME (charset), make_int (a1), make_int (a2));
+  else
+    return list2 (XCHARSET_NAME (charset), make_int (a2));
+}
+
+DEFUN ("char-charset", Fchar_charset, 1, 2, 0, /*
 Return the character set of char CH.
+When a Unicode internal representation is used (--with-unicode-internal
+option to configure), CH will be converted according to CHARSETS, a
+charset precedence list (see `make-char').  The return value will
+be the highest-precedence charset in which the character is found, or nil
+if no equivalent national charset representation can be found.
 */
-       (ch))
+       (ch, charsets))
 {
+  Lisp_Object charset;
+  int c1, c2;
+
   CHECK_CHAR_COERCE_INT (ch);
+  ichar_to_charset_codepoint_helper (XCHAR (ch), charsets, &charset, &c1,
&c2);
+
+  if (NILP (charset))
+    return Qnil;

-  return XCHARSET_NAME (charset_by_leading_byte
-			(ichar_leading_byte (XCHAR (ch))));
+  return XCHARSET_NAME (charset);
 }

-DEFUN ("char-octet", Fchar_octet, 1, 2, 0, /*
+DEFUN ("char-octet", Fchar_octet, 1, 3, 0, /*
 Return the octet numbered N (should be 0 or 1) of char CH.
 N defaults to 0 if omitted.
+This function is for compatibility; consider using `char-to-charset-codepoint'
+instead.
+This function is not very useful when a Unicode internal representation is
+used (--with-unicode-internal option to configure). (Specifically, this
+function is more or less equivalent to (nth (1+ N) (split-char CH)), but
+returns 0 instead of nil.)
 */
-       (ch, n))
+       (ch, n, charsets))
 {
   Lisp_Object charset;
-  int octet0, octet1;
+  int c1, c2;

   CHECK_CHAR_COERCE_INT (ch);
+  ichar_to_charset_codepoint_helper (XCHAR (ch), charsets, &charset, &c1,
&c2);
+
+  if (NILP (charset))
+    return Qnil;

-  BREAKUP_ICHAR (XCHAR (ch), charset, octet0, octet1);
+  /* Bogus bogus bogus. */
+  if (get_charset_iso2022_type (charset) >= 0)
+    {
+      c1 &= 127;
+      c2 &= 127;
+    }

   if (NILP (n) || EQ (n, Qzero))
-    return make_int (octet0);
+    return make_int (c1);
   else if (EQ (n, make_int (1)))
-    return make_int (octet1);
+    return make_int (c2);
   else
     invalid_constant ("Octet number must be 0 or 1", n);
 }

-DEFUN ("split-char", Fsplit_char, 1, 1, 0, /*
-Return list of charset and one or two position-codes of CHAR.
+DEFUN ("split-char", Fsplit_char, 1, 2, 0, /*
+Return list of charset and one or two position-codes of char CH.
+This function is for compatibility; consider using `char-to-charset-codepoint'
+instead.  This function is like `char-to-charset-codepoint' but its return
+value is hacked up for compatibility purposes: If the returned charset of
+the character is ISO-2022 compatible, the position codes will be coerced into
+the range [0, 127], even if they should be in the range [128, 255].
+
+When a Unicode internal representation is used (--with-unicode-internal
+option to configure), CH will be converted according to CHARSETS, a
+charset precedence list (see `make-char').  The return value will
+be the highest-precedence charset in which the character is found, and
+the corresponding position codes (octets) in the representation in this
+charset.  The return value will be nil if no equivalent national charset
+representation can be found.
 */
-       (character))
+       (ch, USED_IF_MULE (charsets)))
 {
-  /* This function can GC */
-  struct gcpro gcpro1, gcpro2;
-  Lisp_Object charset = Qnil;
-  Lisp_Object rc = Qnil;
+  Lisp_Object charset;
   int c1, c2;

-  GCPRO2 (charset, rc);
-  CHECK_CHAR_COERCE_INT (character);
+  CHECK_CHAR_COERCE_INT (ch);
+  ichar_to_charset_codepoint_helper (XCHAR (ch), charsets, &charset, &c1,
&c2);

-  BREAKUP_ICHAR (XCHAR (character), charset, c1, c2);
+  if (NILP (charset))
+    return Qnil;

-  if (XCHARSET_DIMENSION (Fget_charset (charset)) == 2)
-    {
-      rc = list3 (XCHARSET_NAME (charset), make_int (c1), make_int (c2));
-    }
-  else
+  /* Bogus bogus bogus. */
+  if (get_charset_iso2022_type (charset) >= 0)
     {
-      rc = list2 (XCHARSET_NAME (charset), make_int (c1));
+      c1 &= 127;
+      c2 &= 127;
     }
-  UNGCPRO;

-  return rc;
+  if (XCHARSET_DIMENSION (charset) == 2)
+    return list3 (XCHARSET_NAME (charset), make_int (c1), make_int (c2));
+  else
+    return list2 (XCHARSET_NAME (charset), make_int (c2));
 }

 #endif /* MULE */
＠＠ -5048,8 +5695,10 ＠＠
     {
       if (composite_char_row_next >= 128)
 	invalid_operation ("No more composite chars available", lispstr);
-      emch = make_ichar (Vcharset_composite, composite_char_row_next,
-			composite_char_col_next);
+      emch = charset_codepoint_to_ichar (Vcharset_composite,
+					 composite_char_row_next,
+					 composite_char_col_next,
+					 CONVERR_SUCCEED);
       Fputhash (make_char (emch), lispstr,
 	        Vcomposite_char_char2string_hash_table);
       Fputhash (lispstr, make_char (emch),
＠＠ -5097,7 +5746,7 ＠＠

   CHECK_CHAR (ch);
   emch = XCHAR (ch);
-  if (ichar_leading_byte (emch) != LEADING_BYTE_COMPOSITE)
+  if (!EQ (ichar_charset (emch), Vcharset_composite))
     invalid_argument ("Must be composite char", ch);
   return composite_char_string (emch);
 }
＠＠ -5125,8 +5774,16 ＠＠
 syms_of_text (void)
 {
   DEFSUBR (Fmake_char);
+  DEFSUBR (Fchar_to_unicode);

+  /* Qfail, Qsubstitute in general.c */
+  DEFSYMBOL (Qsubstitute_negated);
+  DEFSYMBOL (Quse_private);
+
 #ifdef MULE
+  DEFSUBR (Fcharset_codepoint_to_unicode);
+  DEFSUBR (Funicode_to_charset_codepoint);
+  DEFSUBR (Fchar_to_charset_codepoint);
   DEFSUBR (Fchar_charset);
   DEFSUBR (Fchar_octet);
   DEFSUBR (Fsplit_char);
＠＠ -5141,15 +5798,19 ＠＠
 void
 reinit_vars_of_text (void)
 {
-  int i;
-
-  conversion_in_dynarr_list = Dynarr_new2 (Ibyte_dynarr_dynarr,
-					   Ibyte_dynarr *);
+  conversion_in_dynarr_list = Dynarr_new2 (unsigned_char_dynarr_dynarr,
+					   unsigned_char_dynarr *);
   conversion_out_dynarr_list = Dynarr_new2 (Extbyte_dynarr_dynarr,
 					    Extbyte_dynarr *);
+
+#ifdef MULE
+  {
+    int i;

-  for (i = 0; i <= MAX_BYTEBPOS_GAP_SIZE_3; i++)
-    three_to_one_table[i] = i / 3;
+    for (i = 0; i <= MAX_BYTEBPOS_GAP_SIZE_3; i++)
+      three_to_one_table[i] = i / 3;
+  }
+#endif /* MULE */
 }

 void
＠＠ -5161,6 +5822,7 ＠＠
     build_msg_string ("(in internal-external conversion)");
   staticpro (&QSin_internal_external_conversion);

+#ifdef MULE
 #ifdef ENABLE_COMPOSITE_CHARS
   /* #### not dumped properly */
   composite_char_row_next = 32;
＠＠ -5173,4 +5835,5 ＠＠
   staticpro (&Vcomposite_char_string2char_hash_table);
   staticpro (&Vcomposite_char_char2string_hash_table);
 #endif /* ENABLE_COMPOSITE_CHARS */
+#endif /* MULE */
 }
Index: src/text.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/text.h,v
retrieving revision 1.28
diff -u -r1.28 text.h
--- src/text.h	2005/01/28 02:36:27	1.28
+++ src/text.h	2005/11/22 14:01:09
＠＠ -1,7 +1,7 ＠＠
 /* Header file for text manipulation primitives and macros.
    Copyright (C) 1985-1995 Free Software Foundation, Inc.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 2000, 2001, 2002, 2003, 2004 Ben Wing.
+   Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005 Ben Wing.

 This file is part of XEmacs.

＠＠ -33,6 +33,21 ＠＠
 #ifndef INCLUDED_text_h_
 #define INCLUDED_text_h_

+/****************************************************************************/
+/*--------------------------------------------------------------------------*/
+/*                                                                          */
+/*          NOTE NOTE NOTE: The specifics of how characters are             */
+/*          represented as Ichars and Ibytes should be *entirely*           */
+/*          contained in the top portions of text.c and text.h, above       */
+/*          the places with a note saying "Everything above here knows      */
+/*          about the specifics of the internal character and text          */
+/*          formats.  Nothing below or anywhere else knows ...".            */
+/*          --ben                                                           */
+/*--------------------------------------------------------------------------*/
+/****************************************************************************/
+
+BEGIN_C_DECLS
+
 #ifdef HAVE_WCHAR_H
 #include <wchar.h>
 #else
＠＠ -45,22 +60,143 ＠＠
 char *strupr (char *);
 #endif

-BEGIN_C_DECLS
+/* Following used for functions that do character conversion and need to
+   handle errors. */
+
+enum converr
+  {
+    /* ---- Basic actions ---- */

+    /* Do nothing upon failure and return a failure indication.
+       Same as what happens when the *_raw() version is called. */
+    CONVERR_FAIL,
+    /* Signal a Lisp error. ＠＠#### Not implemented. */
+    CONVERR_ABORT,
+    /* Try to "recover" and continue processing.  What exactly happens
+       depends on the type of conversion.  When converting from a charset
+       representation to Unicode, this is like CONVERR_USE_PRIVATE,
+       i.e. the unknown charset value is encoded using private Unicode
+       space.  Otherwise, it's generally like CONVERR_SUBSTITUTE, where
+       one of the substitution characters defined below (CANT_CONVERT_*)
+       is used. */
+    CONVERR_SUCCEED,
+
+    /* ---- More specific actions ---- */
+
+    /* Substitute something (0xFFFD, the Unicode replacement character,
+       when converting to Unicode or to a Unicode-internal Ichar, JISX0208
+       GETA mark when converting to non-Mule Ichar). */
+    CONVERR_SUBSTITUTE,
+    /* Use private Unicode space when converting to Unicode. */
+    CONVERR_USE_PRIVATE
+  };
+
+/* ---------------------------------------------------------------------- */
+/*       Characters to use when conversion or display is impossible       */
 /* ---------------------------------------------------------------------- */
+
+#define UNICODE_REPLACEMENT_CHAR 0xFFFD
+
+/* When unable to convert a character when encoding to an external
+   ASCII-compatible format, substitute the following character instead.
+   Should be ASCII. */
+#define CANT_CONVERT_CHAR_WHEN_ENCODING '?'
+#define CANT_CONVERT_CHAR_WHEN_ENCODING_UNICODE UNICODE_REPLACEMENT_CHAR
+/* When unable to convert a character when decoding to internal format,
+   substitute the following character instead.  Value should be an Ichar.
+   Use Unicode replacement char in Unicode-internal world, else JISX0208
+   GETA MARK. */
+#ifdef UNICODE_INTERNAL
+#define CANT_CONVERT_CHAR_WHEN_DECODING UNICODE_REPLACEMENT_CHAR
+#elif defined (MULE)
+#define CANT_CONVERT_CHAR_WHEN_DECODING					\
+  charset_codepoint_to_ichar (Vcharset_japanese_jisx0208, 34, 46,	\
+			      /* Shouldn't matter, we're old-Mule */	\
+			      CONVERR_SUCCEED)
+#else
+#define CANT_CONVERT_CHAR_WHEN_DECODING '?'
+#endif /* UNICODE_INTERNAL */
+/* When unable to display a character in a buffer, substitute the following
+   character instead.  #### Possibly we should use some sort of box, e.g.
+   the full-height open rectangular box often used for this. */
+#define CANT_DISPLAY_CHAR '~'
+
+/* ---------------------------------------------------------------------- */
 /*                     Super-basic character properties                   */
 /* ---------------------------------------------------------------------- */

 /* These properties define the specifics of how our current encoding fits
-   in the basic model used for the encoding.  Because this model is the same
-   as is used for UTF-8, all these properties could be defined for it, too.
-   This would instantly make the rest of this file work with UTF-8 (with
-   the exception of a few called functions that would need to be redefined).
+   in the basic model used for the encoding.  This model is the same
+   for the old Mule encoding and for UTF-8, used in Unicode-internal,
+   and essentially assumes, given that a logical sequence of characters
+   is represented as a sequence of bytes:
+
+   (1) The byte range(s) used to represent the first byte of a character
+       are disjoint from the byte range(s) used to represent any remaining
+       bytes.  This way, moving backwards over a string is easy (i.e.
+       constant time).
+
+   (2) Given the first byte of a character, you can determine the number
+       of bytes in the character using a table lookup, i.e. without looking
+       at any of the other bytes in the character.  This way, there is no
+       need for a "sentinel" character at the end of a sequence of
+       text. (If, in order to find the end of a character, you had to scan
+       forward to the beginning of the next character, you would be in
+       danger of illegal memory accesses.)
+
+   (3) The representation is ASCII-compatible, i.e. ASCII characters are
+       represented by bytes 00 - 7F. (By property #1, *all* bytes used
+       to represent any other character must be in the range 80 - FF.)
+
+   The old-Mule encoding satisfies this; the first byte of multi-byte
+   sequences is in the range 80-9F, and remaining bytes are A0-FF.
+
+   UTF-8 also satisfies this; the first byte of multi-byte sequences is in
+   the range C0-FD, and remaining bytes are 80-BF. (Bytes FE and FF are not
+   used at all.)
+
+   NOTE: The properties above do *not* require that the range of the first
+   byte of a character is contiguous, and UTF-8 does not satisfy this.
+   For this reason we need to be careful in regex.c.
+
+   NOTE: Ideally, the specifics of the particular encoding should be
+   *completely* encapsulated in a small number of files -- currently,
+   text.h, text.c and charset.h. (Indicated by conditionalization on
+   UNICODE_INTERNAL.) In practice, we need to conditionalize elsewhere
+   due to the fact that old-Mule characters explicitly encode a national
+   charset in their representation (and hence encode the same logical
+   character multiple ways, especially the different accented Latin
+   characters), while Unicode/UTF-8 doesn't do this.  But none of this
+   code assumes (or should assume) anything specific about the encoding
+   used, more than above; e.g. if we wanted to switch the ranges of our
+   UTF-8-like encoding to use 80-BF for the first bytes and C0-FF for
+   the remaining ones, we should be able to change *only* the three
+   files just specified.
+
+   (A fourth pervasive assumption is that characters fit into a 32-bit
+   signed integer.  To change this requires reworking of the char tables
+   and Unicode translation tables.)

-   (UTF-2000 implementers, take note!)
+   --ben
+*/
+
+/* NOTE: There are other functions/macros for working with Ichars in
+   charset.h, for retrieving the charset of an Ichar, the length of an
+   Ichar when converted to text, etc.
 */

-/* If you want more than this, you need to include charset.h */
+DECLARE_INLINE_HEADER (
+int
+valid_unicode_codepoint_p (EMACS_INT ch)
+)
+{
+#if SIZEOF_EMACS_INT > 4
+  /* On 64-bit machines, we could have a value too large */
+  return ch >= 0 && ch <= 0x7FFFFFFF;
+#else
+  return ch >= 0;
+#endif
+}

 #ifndef MULE

＠＠ -68,6 +204,16 ＠＠
 #define byte_ascii_p(byte) 1
 #define MAX_ICHAR_LEN 1

+/* This appears to work both for values > 255 and < 0. */
+#define valid_ichar_p(ch) (! (ch & ~0xFF))
+#define ichar_len(ch) 1
+#define ichar_columns(ch) 1
+
+#define DECODE_ADD_BINARY_CHAR(c, dst)		\
+do {						\
+  Dynarr_add (dst, c);				\
+} while (0)
+
 #else /* MULE */

 /* These are carefully designed to work if BYTE is signed or unsigned. */
＠＠ -79,6 +225,12 ＠＠

 /* Does BYTE represent the first byte of a character? */

+#ifdef UNICODE_INTERNAL
+#define ibyte_first_byte_p_2(byte) (((byte) & 0xC0) != 0x80)
+#else
+#define ibyte_first_byte_p_2(byte) ((byte) < 0xA0)
+#endif /* UNICODE_INTERNAL */
+
 #ifdef ERROR_CHECK_TEXT

 DECLARE_INLINE_HEADER (
＠＠ -87,47 +239,60 ＠＠
 )
 {
   assert_at_line (byte >= 0 && byte < 256, file, line);
-  return byte < 0xA0;
+  return ibyte_first_byte_p_2 (byte);
 }

 #define ibyte_first_byte_p(byte) \
   ibyte_first_byte_p_1 (byte, __FILE__, __LINE__) 

-#else
+#else /* not ERROR_CHECK_TEXT */

-#define ibyte_first_byte_p(byte) ((byte) < 0xA0)
+#define ibyte_first_byte_p(byte) ibyte_first_byte_p_2 (byte)

-#endif
+#endif /* ERROR_CHECK_TEXT */

-#ifdef ERROR_CHECK_TEXT
+#if 0 /* Unused currently 11-19-05 ben */

-/* Does BYTE represent the first byte of a multi-byte character? */
+/* Does BYTE represent the first byte of a multi-byte character?  (Usually
+   such a byte is called a "lead byte" and the remaining bytes the "fill
+   bytes".) */
+
+# ifdef UNICODE_INTERNAL
+# define ibyte_lead_byte_p_2(byte) ((byte) >= 0xC0)
+# else
+# define ibyte_lead_byte_p_2(byte) byte_c1_p (byte)
+# endif /* UNICODE_INTERNAL */

+# ifdef ERROR_CHECK_TEXT
+
 DECLARE_INLINE_HEADER (
 int
-ibyte_leading_byte_p_1 (int byte, const char *file, int line)
+ibyte_lead_byte_p_1 (int byte, const char *file, int line)
 )
 {
   assert_at_line (byte >= 0 && byte < 256, file, line);
-  return byte_c1_p (byte);
+  return ibyte_lead_byte_p_2 (byte);
 }

-#define ibyte_leading_byte_p(byte) \
-  ibyte_leading_byte_p_1 (byte, __FILE__, __LINE__) 
+# define ibyte_lead_byte_p(byte) \
+  ibyte_lead_byte_p_1 (byte, __FILE__, __LINE__) 

-#else
+# else /* not ERROR_CHECK_TEXT */

-#define ibyte_leading_byte_p(byte) byte_c1_p (byte)
+# define ibyte_lead_byte_p(byte) ibyte_lead_byte_p_2 (byte)

-#endif
+# endif /* ERROR_CHECK_TEXT */
+
+#endif /* 0 */

 /* Table of number of bytes in the string representation of a character
    indexed by the first byte of that representation.
-
-   This value can be derived in other ways -- e.g. something like
-   XCHARSET_REP_BYTES (charset_by_leading_byte (first_byte))
-   but it's faster this way. */
+*/
+#ifdef UNICODE_INTERNAL
+extern MODULE_API const Bytecount rep_bytes_by_first_byte[256];
+#else
 extern MODULE_API const Bytecount rep_bytes_by_first_byte[0xA0];
+#endif

 /* Number of bytes in the string representation of a character. */

＠＠ -138,7 +303,12 ＠＠
 rep_bytes_by_first_byte_1 (int fb, const char *file, int line)
 )
 {
+#ifdef UNICODE_INTERNAL
+  assert_at_line ((fb >= 0 && fb < 0x80) ||
+		  (fb >= 0xC0 && fb <= 0xFD), file, line);
+#else
   assert_at_line (fb >= 0 && fb < 0xA0, file, line);
+#endif
   return rep_bytes_by_first_byte[fb];
 }

＠＠ -162,10 +332,327 ＠＠
  any format.
  */

+#ifdef UNICODE_INTERNAL
+#define MAX_ICHAR_LEN 6
+#else
 #define MAX_ICHAR_LEN 4
+#endif

-#endif /* not MULE */
+#ifndef UNICODE_INTERNAL
+MODULE_API int old_mule_non_ascii_valid_ichar_p (Ichar ch);
+#endif
+
+/* Return whether the given Ichar is valid.
+ */
+
+DECLARE_INLINE_HEADER (
+int
+valid_ichar_p (Ichar ch)
+)
+{
+#ifdef UNICODE_INTERNAL
+  return valid_unicode_codepoint_p ((EMACS_INT) ch);
+#else
+  return (! (ch & ~0xFF)) || old_mule_non_ascii_valid_ichar_p (ch);
+#endif /* UNICODE_INTERNAL */
+}
+
+#ifndef UNICODE_INTERNAL
+
+/************************************************************************/
+/*              Definition of charset ID's and lead bytes               */
+/************************************************************************/
+
+  /* CHARSET_ID_LATIN_ISO8859_1 *MUST* be equal to 0x81; or more correctly,
+      */
+
+/* The following three are used elsewhere in this file and are generally
+   "special".  All other predefined charset ID's are defined and accessed
+   only in mule-charset.c.  CHARSET_ID_ASCII is not used to represent
+   text in a buffer.  CHARSET_ID_LATIN_ISO8859_1 - FIELD2_TO_CHARSET_ID
+   *MUST* equal 1.  Be very careful if you are considering changing the
+   value of any of these three; but the other values in mule-charset.c
+   can be changed without problem as long as they are distinct and within
+   the range 0x80 - 0x9D. */
+
+#define CHARSET_ID_ASCII           0x7F /* Not used except in arrays
+					   indexed by charset ID */
+#define CHARSET_ID_CONTROL_1       0x80
+#define CHARSET_ID_LATIN_ISO8859_1 0x81 /* 0x81 Right half of ISO 8859-1 */
+
+/** The following are for 1- and 2-byte characters in a private charset. **/
+
+#define LEAD_BYTE_PRIVATE_1	0x9E	/* 1-byte char-set */
+#define LEAD_BYTE_PRIVATE_2	0x9F	/* 2-byte char-set */
+
+/* We can have up to 96 private charsets of dimension 1 and 96 of dimension 2,
+   currently */
+#define MIN_PRIVATE_DIM1_CHARSET_ID	0xA0
+#define MAX_PRIVATE_DIM1_CHARSET_ID	0xFF
+#define MIN_PRIVATE_DIM2_CHARSET_ID	0x100
+#define MAX_PRIVATE_DIM2_CHARSET_ID	0x15F
+
+#define MIN_PRIVATE_CHARSET_ID	MIN_PRIVATE_DIM1_CHARSET_ID
+#define MAX_PRIVATE_CHARSET_ID	MAX_PRIVATE_DIM2_CHARSET_ID
+
+#define MIN_ENCODABLE_CHARSET_ID	CHARSET_ID_ASCII
+#define MAX_ENCODABLE_CHARSET_ID	MAX_PRIVATE_CHARSET_ID
+#define NUM_ENCODABLE_CHARSET_IDS \
+ (MAX_ENCODABLE_CHARSET_ID - MIN_ENCODABLE_CHARSET_ID + 1)
+
+/************************************************************************/
+/*                Old-Mule-internal character manipulation              */
+/************************************************************************/
+
+/* The bit fields of character are divided into 3 parts:
+   FIELD1(7bits):FIELD2(7bits):FIELD3(7bits) */
+
+/* Macros to access each field of a character code of C.  */
+
+#define ichar_field1(c) (((c) >> 14) & 0x7F)
+#define ichar_field2(c) (((c) >> 7) & 0x7F)
+#define ichar_field3(c)  ((c) & 0x7F)
+
+/* Field 1, if non-zero, usually holds a charset ID for a dimension-2
+   charset.  Field 2, if non-zero, usually holds a charset ID for a
+   dimension-1 charset. */
+
+/* Converting between field values and charset ID's.  */
+
+#define FIELD2_TO_CHARSET_ID  0x80
+#define FIELD1_TO_OFFICIAL_CHARSET_ID  0x80
+#define FIELD1_TO_PRIVATE_CHARSET_ID  0xE0
+
+#define MIN_ICHAR_FIELD2_OFFICIAL 1
+#define MIN_ICHAR_FIELD1_OFFICIAL 1
+
+#define MAX_ICHAR_FIELD2_OFFICIAL 0x1F /* properly 0x1D, but be safe */
+#define MAX_ICHAR_FIELD1_OFFICIAL 0x1F /* properly 0x1D, but be safe */
+
+#define MIN_ICHAR_FIELD2_PRIVATE 0x20
+#define MIN_ICHAR_FIELD1_PRIVATE 0x20
+
+#define MAX_ICHAR_FIELD2_PRIVATE 0x7F
+#define MAX_ICHAR_FIELD1_PRIVATE 0x7F
+
+/* Min/max values of all types (official/private dim 1/2) of character.
+
+  Ordered in character space so that
+
+  official-1 < private-1 < official-2 < private-2 */
+
+#define MIN_CHAR_OFFICIAL_DIM1     (MIN_ICHAR_FIELD2_OFFICIAL <<  7)
+#define MIN_CHAR_PRIVATE_DIM1      (MIN_ICHAR_FIELD2_PRIVATE  <<  7)
+#define MIN_CHAR_OFFICIAL_DIM2     (MIN_ICHAR_FIELD1_OFFICIAL << 14)
+#define MIN_CHAR_PRIVATE_DIM2      (MIN_ICHAR_FIELD1_PRIVATE  << 14)
+
+#define MAX_CHAR_OFFICIAL_DIM1     ((MAX_ICHAR_FIELD2_OFFICIAL << 7) | 0x7F)
+#define MAX_CHAR_PRIVATE_DIM1      ((MAX_ICHAR_FIELD2_PRIVATE  << 7) | 0x7F)
+#define MAX_CHAR_OFFICIAL_DIM2     ((MAX_ICHAR_FIELD1_OFFICIAL << 14) | 0x3FFF)
+#define MAX_CHAR_PRIVATE_DIM2      ((MAX_ICHAR_FIELD1_PRIVATE  << 14) | 0x3FFF)
+
+
+#define byte_id_to_official_charset_id(x) (x)
+#define byte_id_to_private_charset_id(x, dim) ((dim) == 2 ? (x) + 0x60: (x))
+
+#define official_charset_id_to_byte_id(x) ((Ibyte) (x))
+#define private_charset_id_to_byte_id(x, dim) \
+  ((Ibyte) ((dim) == 2 ? (x) - 0x60 : (x)))
+
+/* Leading byte of a character.
+   */
+
+DECLARE_INLINE_HEADER (
+int
+old_mule_ichar_charset_id (Ichar c)
+)
+{
+  text_checking_assert (valid_ichar_p (c));
+  if (ichar_ascii_p (c))
+    return CHARSET_ID_ASCII;
+  else if (c < 0xA0)
+    return CHARSET_ID_CONTROL_1;
+  else if (c <= MAX_CHAR_PRIVATE_DIM1)
+    return ichar_field2 (c) + FIELD2_TO_CHARSET_ID;
+  else if (c <= MAX_CHAR_OFFICIAL_DIM2)
+    return ichar_field1 (c) + FIELD1_TO_OFFICIAL_CHARSET_ID;
+  else
+    {
+      text_checking_assert (c <= MAX_CHAR_PRIVATE_DIM2);
+      return ichar_field1 (c) + FIELD1_TO_PRIVATE_CHARSET_ID;
+    }
+}
+
+#endif /* not UNICODE_INTERNAL */
+
+/************************************************************************/
+/*                         Other char functions                         */
+/************************************************************************/
+
+/* Return the length of an Ichar in internal string format, in bytes */
+
+DECLARE_INLINE_HEADER (
+Bytecount
+ichar_len (Ichar c)
+)
+{
+  text_checking_assert (valid_ichar_p (c));
+  if (ichar_ascii_p (c))
+    return 1;
+#ifdef UNICODE_INTERNAL
+  else if (c <= 0x7FF)
+    return 2;
+  else if (c <= 0xFFFF)
+    return 3;
+  else if (c <= 0x1FFFFF)
+    return 4;
+  else if (c <= 0x3FFFFFF)
+    return 5;
+  else
+    return 6;
+#else /* not UNICODE_INTERNAL */
+  else if (c <= MAX_CHAR_OFFICIAL_DIM1)
+    return 2;
+  else if (c <= MAX_CHAR_OFFICIAL_DIM2)
+    return 3; /* dimension-2 official or dimension-1 private */
+  else
+    {
+      text_checking_assert (c <= MAX_CHAR_PRIVATE_DIM2);
+      return 4;
+    }
+#endif /* UNICODE_INTERNAL */
+}
+
+int unicode_char_columns (int code);
+int old_mule_ichar_columns (Ichar c);
+
+/* Number of columns of C, in a TTY representation */
+DECLARE_INLINE_HEADER (
+int
+ichar_columns (Ichar c)
+)
+{
+  text_checking_assert (valid_ichar_p (c));
+#ifdef UNICODE_INTERNAL
+  return unicode_char_columns ((int) c);
+#else
+  return old_mule_ichar_columns (c);
+#endif
+}
+
+/* C should be a binary character in the range 0 - 255; convert
+   to internal format and add to Dynarr DST. */
+
+/* We place this here because of its dependencies on the particular format
+   of UTF-8.  We'd like to gather all such dependencies in this file, as
+   much as possible.
+
+   Note: These could be rewritten in a more general fashion using the
+   functions in charset.h, but it's faster, and not much more code, to
+   just do it manually. */
+
+#ifdef UNICODE_INTERNAL
+
+DECLARE_INLINE_HEADER (
+void
+DECODE_ADD_BINARY_CHAR (Ibyte c, unsigned_char_dynarr *dst)
+)
+{
+  if (byte_ascii_p (c))
+    Dynarr_add (dst, c);
+  else if (c <= 0xBF)
+    {
+      Dynarr_add (dst, 0xC2);
+      Dynarr_add (dst, c);
+    }
+  else
+    {
+      Dynarr_add (dst, 0xC3);
+      Dynarr_add (dst, c - 0x40);
+    }
+}
+
+#else
+
+DECLARE_INLINE_HEADER (
+void
+DECODE_ADD_BINARY_CHAR (Ibyte c, unsigned_char_dynarr *dst)
+)
+{
+  if (byte_ascii_p (c))
+    Dynarr_add (dst, c);
+  else if (byte_c1_p (c))
+    {
+      Dynarr_add (dst, CHARSET_ID_CONTROL_1);
+      Dynarr_add (dst, c + 0x20);
+    }
+  else
+    {
+      Dynarr_add (dst, CHARSET_ID_LATIN_ISO8859_1);
+      Dynarr_add (dst, c);
+    }
+}
+
+#endif /* UNICODE_INTERNAL */
+
+/************************************************************************/
+/*                         Unicode conversion                           */
+/************************************************************************/
+
+int old_mule_ichar_to_unicode (Ichar chr, enum converr fail);
+Ichar old_mule_unicode_to_ichar (int code, Lisp_Object_dynarr *charsets,
+				 enum converr fail);
+Ichar old_mule_handle_bad_ichar (enum converr fail);
+
+/* Convert an Ichar to a Unicode codepoint.
+   Return value will be -1 if cannot convert. */
+DECLARE_INLINE_HEADER (
+int
+ichar_to_unicode (Ichar chr, enum converr fail)
+)
+{
+  text_checking_assert (valid_ichar_p (chr));
+#ifdef UNICODE_INTERNAL
+  return (int) chr;
+#else
+  return old_mule_ichar_to_unicode (chr, fail);
+#endif /* UNICODE_INTERNAL */
+}
+
+/* Convert a Unicode codepoint to an Ichar.  Return value will be (Ichar) -1
+   if no conversion can be found. */
+
+DECLARE_INLINE_HEADER (
+Ichar
+unicode_to_ichar (int code, Lisp_Object_dynarr *
+		  USED_IF_MULE_NOT_UNICODE_INTERNAL (charsets),
+		  enum converr fail)
+)
+{
+  text_checking_assert (valid_unicode_codepoint_p (code));
+#ifdef UNICODE_INTERNAL
+  return (Ichar) code;
+#else
+  return old_mule_unicode_to_ichar (code, charsets, fail);
+#endif /* UNICODE_INTERNAL */
+}

+#endif /* MULE */
+
+/****************************************************************************/
+/*--------------------------------------------------------------------------*/
+/*                                                                          */
+/*              Everything above here knows about the specifics of          */
+/*              the internal character and text formats.  Nothing           */
+/*              below or anywhere else knows, except text.h.                */
+/*                                                                          */
+/*--------------------------------------------------------------------------*/
+/****************************************************************************/
+
+/* ---------------------------------------------------------------------- */
+/*                     Working with non-default formats                   */
+/* ---------------------------------------------------------------------- */
+
 /* For more discussion, see text.c, "handling non-default formats" */

 typedef enum internal_format
＠＠ -200,6 +687,27 ＠＠
 /* Convert the other way. */
 #define raw_32_bit_fixed_to_ichar(ch, object) ((Ichar) (ch))

+/* Return the length of an Ichar in the specified string format, in bytes */
+
+DECLARE_INLINE_HEADER (
+Bytecount
+ichar_len_fmt (Ichar c, Internal_Format fmt)
+)
+{
+  switch (fmt)
+    {
+    case FORMAT_DEFAULT:
+      return ichar_len (c);
+    case FORMAT_16_BIT_FIXED:
+      return 2;
+    case FORMAT_32_BIT_FIXED:
+      return 4;
+    default:
+      text_checking_assert (fmt == FORMAT_8_BIT_FIXED);
+      return 1;
+    }
+}
+
 /* Return the "raw value" of a character as stored in the buffer.  In the
    default format, this is just the same as the character.  In fixed-width
    formats, this is the actual value in the buffer, which will be limited
＠＠ -214,6 +722,7 ＠＠
 	      Lisp_Object UNUSED (object))
 )
 {
+  text_checking_assert (valid_ichar_p (ch));
   switch (fmt)
     {
     case FORMAT_DEFAULT:
＠＠ -239,6 +748,8 ＠＠
 		      Lisp_Object UNUSED (object))
 )
 {
+  text_checking_assert (valid_ichar_p (ch));
+
   switch (fmt)
     {
     case FORMAT_DEFAULT:
＠＠ -269,6 +780,7 ＠＠

 #else

+#define ichar_len_fmt(ch, fmt) 1
 #define ichar_to_raw(ch, fmt, object) ((Raw_Ichar) (ch))
 #define ichar_fits_in_format(ch, fmt, object) 1
 #define objects_have_same_internal_representation(srcobj, dstobj) 1
＠＠ -287,6 +799,45 ＠＠
     return strlen ((char *) ptr);
 }

+/* ---------------------------------------------------------------------- */
+/*                     The Lisp_Object character type                     */
+/* ---------------------------------------------------------------------- */
+
+DECLARE_INLINE_HEADER (
+Lisp_Object
+make_char (Ichar val)
+)
+{
+  type_checking_assert (valid_ichar_p (val));
+  return make_char_1 (val);
+}
+
+#define CHAR_INTP(x) (INTP (x) && valid_ichar_p (XINT (x)))
+
+#define CHAR_OR_CHAR_INTP(x) (CHARP (x) || CHAR_INTP (x))
+
+DECLARE_INLINE_HEADER (
+Ichar
+XCHAR_OR_CHAR_INT (Lisp_Object obj)
+)
+{
+  return CHARP (obj) ? XCHAR (obj) : XINT (obj);
+}
+
+/* Signal an error if CH is not a valid character or integer Lisp_Object.
+   If CH is an integer Lisp_Object, convert it to a character Lisp_Object,
+   but merely by repackaging, without performing tests for char validity.
+   */
+
+#define CHECK_CHAR_COERCE_INT(x) do {		\
+  if (CHARP (x))				\
+     ;						\
+  else if (CHAR_INTP (x))			\
+    x = make_char (XINT (x));			\
+  else						\
+    x = wrong_type_argument (Qcharacterp, x);	\
+} while (0)
+

 /************************************************************************/
 /*									*/
＠＠ -365,7 +916,7 ＠＠
 */

 /* ---------------------------------------------------------------------- */
-/*      Working with itext's (pointers to internally-formatted text)    */
+/*       Working with itext's (pointers to internally-formatted text)     */
 /* ---------------------------------------------------------------------- */

 /* Given an itext, does it point to the beginning of a character?
＠＠ -378,10 +929,28 ＠＠
 #endif

 /* If error-checking is enabled, assert that the given itext points to
-   the beginning of a character.  Otherwise, do nothing.
-   */
+   the beginning of a character and make sure the rest of the character
+   is valid.  Otherwise, do nothing. */
+
+#ifdef ERROR_CHECK_TEXT
+
+DECLARE_INLINE_HEADER (
+void
+assert_valid_ibyteptr (const Ibyte *ptr)
+)
+{
+  Bytecount len;
+  assert (ibyte_first_byte_p (*ptr));
+  len = rep_bytes_by_first_byte (*ptr);
+  /* For total char of length 3, we check only the 2
+     bytes following the first char, hence the predec. */
+  while (--len > 0)
+    assert (!ibyte_first_byte_p (*++ptr));
+}

-#define assert_valid_ibyteptr(ptr) text_checking_assert (valid_ibyteptr_p (ptr))
+#else
+#define assert_valid_ibyteptr(ptr) disabled_assert (ptr)
+#endif /* ERROR_CHECK_TEXT */

 /* Given a itext (assumed to point at the beginning of a character),
    modify that pointer so it points to the beginning of the next character.
＠＠ -480,7 +1049,7 ＠＠
 /* Note that this reads the byte at *PTR! */

 #define VALIDATE_IBYTEPTR_BACKWARD(ptr) do {	\
-  while (!valid_ibyteptr_p (ptr)) ptr--;		\
+  while (!valid_ibyteptr_p (ptr)) ptr--;	\
 } while (0)

 /* Make sure that PTR is pointing to the beginning of a character.  If not,
＠＠ -503,8 +1072,8 ＠＠
 } while (0)

 #else /* not MULE */
-#define VALIDATE_IBYTEPTR_BACKWARD(ptr)
-#define VALIDATE_IBYTEPTR_FORWARD(ptr)
+#define VALIDATE_IBYTEPTR_BACKWARD(ptr) DO_NOTHING
+#define VALIDATE_IBYTEPTR_FORWARD(ptr) DO_NOTHING
 #endif /* not MULE */

 #ifdef MULE
＠＠ -755,7 +1324,7 ＠＠
 /* Return the length of the first character at PTR.  Equivalent to
    charcount_to_bytecount (ptr, 1).

-   [Since charcount_to_bytecount() is Written as inline, a smart compiler
+   [Since charcount_to_bytecount() is written as inline, a smart compiler
    should really optimize charcount_to_bytecount (ptr, 1) to the same as
    the following, with no error checking.  But since this idiom occurs so
    often, we'll be helpful and define a special macro for it.]
＠＠ -812,20 +1381,60 ＠＠
 } while (0)

 /* -------------------------------------------------------------------- */
-/*      Retrieving or changing the character pointed to by a itext    */
+/*        Retrieving or changing the character pointed to by itext      */
 /* -------------------------------------------------------------------- */

+#ifdef ERROR_CHECK_TEXT
+DECLARE_INLINE_HEADER (
+Ichar
+simple_itext_ichar (const Ibyte *ptr)
+)
+{
+  Ichar retval = ((Ichar) (ptr)[0]);
+  assert_valid_ibyteptr (ptr);
+  assert (byte_ascii_p (retval));
+  return retval;
+}
+
+DECLARE_INLINE_HEADER (
+Bytecount
+simple_set_itext_ichar (Ibyte *ptr, Ichar x)
+)
+{
+  assert (valid_ichar_p (x));
+  assert (byte_ascii_p (x));
+  (ptr)[0] = (Ibyte) (x);
+  return 1;
+}
+#else
 #define simple_itext_ichar(ptr)		((Ichar) (ptr)[0])
 #define simple_set_itext_ichar(ptr, x) \
-	((ptr)[0] = (Ibyte) (x), (Bytecount) 1)
+        ((ptr)[0] = (Ibyte) (x), (Bytecount) 1)
+#endif /* ERROR_CHECK_TEXT */
+
 #define simple_itext_copy_ichar(src, dst) \
 	((dst)[0] = *(src), (Bytecount) 1)

 #ifdef MULE

+/* Copy the character pointed to by SRC into DST.  Do not call this
+   directly.  Use the macro itext_copy_ichar() instead.
+   Return the number of bytes copied.  */
+
+DECLARE_INLINE_HEADER (
+Bytecount
+non_ascii_itext_copy_ichar (const Ibyte *src, Ibyte *dst)
+)
+{
+  Bytecount bytes = rep_bytes_by_first_byte (*src);
+  assert_valid_ibyteptr (src);
+  text_checking_assert (bytes > 1); /* ASCII should have been filtered out */
+  memcpy (dst, src, bytes);
+  return bytes;
+}
+
 MODULE_API Ichar non_ascii_itext_ichar (const Ibyte *ptr);
 MODULE_API Bytecount non_ascii_set_itext_ichar (Ibyte *ptr, Ichar c);
-MODULE_API Bytecount non_ascii_itext_copy_ichar (const Ibyte *src, Ibyte *dst);

 /* Retrieve the character pointed to by PTR as an Ichar. */

＠＠ -987,6 +1596,16 ＠＠
     non_ascii_itext_copy_ichar (src, dst);
 }

+DECLARE_INLINE_HEADER (
+void
+Dynarr_add_ichar (unsigned_char_dynarr *dyn, Ichar ich)
+)
+{
+  Ibyte work[MAX_ICHAR_LEN];
+  Bytecount len = set_itext_ichar (work, ich);
+  Dynarr_add_many (dyn, work, len);
+}
+
 #else /* not MULE */

 # define itext_ichar(ptr) simple_itext_ichar (ptr)
＠＠ -996,6 +1615,7 ＠＠
 # define set_itext_ichar(ptr, x) simple_set_itext_ichar (ptr, x)
 # define set_itext_ichar_fmt(ptr, x, fmt, obj) set_itext_ichar (ptr, x)
 # define itext_copy_ichar(src, dst) simple_itext_copy_ichar (src, dst)
+# define Dynarr_add_ichar(dyn, ich) Dynarr_add (dyn, (unsigned char) (ich))

 #endif /* not MULE */

＠＠ -1006,73 +1626,6 ＠＠
 #define itext_ichar_n(ptr, offset) \
   itext_ichar (itext_n_addr (ptr, offset))

-
-/* ---------------------------- */
-/*      Working with Ichars     */
-/* ---------------------------- */
-
-/* NOTE: There are other functions/macros for working with Ichars in
-   charset.h, for retrieving the charset of an Ichar, the length of an
-   Ichar when converted to text, etc.
-*/
-
-#ifdef MULE
-
-MODULE_API int non_ascii_valid_ichar_p (Ichar ch);
-
-/* Return whether the given Ichar is valid.
- */
-
-DECLARE_INLINE_HEADER (
-int
-valid_ichar_p (Ichar ch)
-)
-{
-  return (! (ch & ~0xFF)) || non_ascii_valid_ichar_p (ch);
-}
-
-#else /* not MULE */
-
-#define valid_ichar_p(ch) (! (ch & ~0xFF))
-
-#endif /* not MULE */
-
-DECLARE_INLINE_HEADER (
-Lisp_Object
-make_char (Ichar val)
-)
-{
-  type_checking_assert (valid_ichar_p (val));
-  return make_char_1 (val);
-}
-
-#define CHAR_INTP(x) (INTP (x) && valid_ichar_p (XINT (x)))
-
-#define CHAR_OR_CHAR_INTP(x) (CHARP (x) || CHAR_INTP (x))
-
-DECLARE_INLINE_HEADER (
-Ichar
-XCHAR_OR_CHAR_INT (Lisp_Object obj)
-)
-{
-  return CHARP (obj) ? XCHAR (obj) : XINT (obj);
-}
-
-/* Signal an error if CH is not a valid character or integer Lisp_Object.
-   If CH is an integer Lisp_Object, convert it to a character Lisp_Object,
-   but merely by repackaging, without performing tests for char validity.
-   */
-
-#define CHECK_CHAR_COERCE_INT(x) do {		\
-  if (CHARP (x))				\
-     ;						\
-  else if (CHAR_INTP (x))			\
-    x = make_char (XINT (x));			\
-  else						\
-    x = wrong_type_argument (Qcharacterp, x);	\
-} while (0)
-
-

 /************************************************************************/
 /*									*/
＠＠ -1092,7 +1645,7 ＠＠

 #define ASSERT_VALID_BYTE_STRING_INDEX_UNSAFE(s, x) do {		\
   text_checking_assert ((x) >= 0 && x <= XSTRING_LENGTH (s));		\
-  text_checking_assert (valid_ibyteptr_p (string_byte_addr (s, x)));	\
+  assert_valid_ibyteptr (string_byte_addr (s, x));			\
 } while (0)

 /* Convert offset I in string S to a pointer to text there. */
Index: src/toolbar-common.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/toolbar-common.c,v
retrieving revision 1.11
diff -u -r1.11 toolbar-common.c
--- src/toolbar-common.c	2005/01/24 23:34:12	1.11
+++ src/toolbar-common.c	2005/11/22 14:01:09
＠＠ -1,7 +1,7 ＠＠
 /* toolbar implementation -- "Generic" (X or GTK) redisplay interface.
    Copyright (C) 1995 Board of Trustees, University of Illinois.
    Copyright (C) 1995 Sun Microsystems, Inc.
-   Copyright (C) 1995, 1996, 2002 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2005 Ben Wing.
    Copyright (C) 1996 Chuck Thompson.

 This file is part of XEmacs.
＠＠ -272,7 +272,6 ＠＠
 	    WINDOW_FACE_CACHEL (w, DEFAULT_INDEX);
 	  struct display_line dl;
 	  Lisp_Object string = IMAGE_INSTANCE_TEXT_STRING (p);
-	  unsigned char charsets[NUM_LEADING_BYTES];
 	  Ichar_dynarr *buf;
 	  struct font_metric_info fm;

＠＠ -288,10 +287,9 ＠＠
 	  buf = Dynarr_new (Ichar);
 	  convert_ibyte_string_into_ichar_dynarr
 	    (XSTRING_DATA (string), XSTRING_LENGTH (string), buf);
-	  find_charsets_in_ichar_string (charsets, Dynarr_atp (buf, 0),
-					  Dynarr_length (buf));
-	  ensure_face_cachel_complete (cachel, window, charsets);
-	  face_cachel_charset_font_metric_info (cachel, charsets, &fm);
+	  face_cachel_char_font_metric_info (cachel, window,
+					     Dynarr_atp (buf, 0),
+					     Dynarr_length (buf), &fm);

 	  dl.ascent = fm.ascent;
 	  dl.descent = fm.descent;
Index: src/tooltalk.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/tooltalk.c,v
retrieving revision 1.36
diff -u -r1.36 tooltalk.c
--- src/tooltalk.c	2005/10/25 11:16:28	1.36
+++ src/tooltalk.c	2005/11/22 14:01:09
＠＠ -172,10 +172,9 ＠＠
   Lisp_Tooltalk_Message *p = XTOOLTALK_MESSAGE (obj);

   if (print_readably)
-    printing_unreadable_object ("#<tooltalk_message 0x%x>",
-				p->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

-  write_fmt_string (printcharfun, "#<tooltalk_message id:0x%lx 0x%x>",
+  write_fmt_string (printcharfun, "#<tooltalk-message id:0x%lx 0x%x>",
 		    (long) (p->m), p->header.uid);
 }

＠＠ -250,10 +249,9 ＠＠
   Lisp_Tooltalk_Pattern *p = XTOOLTALK_PATTERN (obj);

   if (print_readably)
-    printing_unreadable_object ("#<tooltalk_pattern 0x%x>",
-				p->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

-  write_fmt_string (printcharfun, "#<tooltalk_pattern id:0x%lx 0x%x>",
+  write_fmt_string (printcharfun, "#<tooltalk-pattern id:0x%lx 0x%x>",
 		    (long) (p->p), p->header.uid);
 }

Index: src/ui-gtk.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/ui-gtk.c,v
retrieving revision 1.24
diff -u -r1.24 ui-gtk.c
--- src/ui-gtk.c	2005/10/24 10:07:41	1.24
+++ src/ui-gtk.c	2005/11/22 14:01:09
＠＠ -310,7 +310,7 ＠＠
 		    int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<ffi %p>", XFFI
(obj)->function_ptr);
+    printing_unreadable_lcrecord (obj, 0);

   write_fmt_string_lisp (printcharfun, "#<ffi %S", 1, XFFI
(obj)->function_name);
   if (XFFI (obj)->n_args)
＠＠ -780,7 +780,7 ＠＠
 			  int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<GtkObject %p>", XGTK_OBJECT
(obj)->object);
+    printing_unreadable_lcrecord (obj, 0);

   write_c_string (printcharfun, "#<GtkObject (");
   if (XGTK_OBJECT (obj)->alive_p)
＠＠ -1099,7 +1099,7 ＠＠
 			 int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<GtkBoxed %p>", XGTK_BOXED
(obj)->object);
+    printing_unreadable_lcrecord (obj, 0);

   write_c_string (printcharfun, "#<GtkBoxed (");
   write_c_string (printcharfun, gtk_type_name (XGTK_BOXED (obj)->object_type));
Index: src/unicode.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/unicode.c,v
retrieving revision 1.31
diff -u -r1.31 unicode.c
--- src/unicode.c	2005/10/25 11:16:29	1.31
+++ src/unicode.c	2005/11/22 14:01:10
＠＠ -38,6 +38,7 ＠＠
 #include "lisp.h"

 #include "charset.h"
+#include "elhash.h"
 #include "file-coding.h"
 #include "opaque.h"

＠＠ -52,24 +53,25 ＠＠

    We currently use the following format for tables:

-   If dimension == 1, to_unicode_table is a 96-element array of ints
-   (Unicode code points); else, it's a 96-element array of int * pointers,
-   each of which points to a 96-element array of ints.  If no elements in a
-   row have been filled in, the pointer will point to a default empty
-   table; that way, memory usage is more reasonable but lookup still fast.
+   If dimension == 1, to_unicode_table is a CHARSET_MAX_SIZE-element array
+   of ints (Unicode code points); else, it's a CHARSET_MAX_SIZE-element
+   array of int * pointers, each of which points to a
+   CHARSET_MAX_SIZE-element array of ints.  If no elements in a row have
+   been filled in, the pointer will point to a default empty table; that
+   way, memory usage is more reasonable but lookup still fast.

    -- If from_unicode_levels == 1, from_unicode_table is a 256-element
-   array of shorts (octet 1 in high byte, octet 2 in low byte; we don't
+   array of UINT_16_BITs (octet 1 in high byte, octet 2 in low byte; we don't
    store Ichars directly to save space).

    -- If from_unicode_levels == 2, from_unicode_table is a 256-element
-   array of short * pointers, each of which points to a 256-element array
-   of shorts.
+   array of UINT_16_BIT * pointers, each of which points to a 256-element array
+   of UINT_16_BITs.

    -- If from_unicode_levels == 3, from_unicode_table is a 256-element
-   array of short ** pointers, each of which points to a 256-element array
-   of short * pointers, each of which points to a 256-element array of
-   shorts.
+   array of UINT_16_BIT ** pointers, each of which points to a 256-element
+   array of UINT_16_BIT * pointers, each of which points to a 256-element
+   array of UINT_16_BITs.

    -- If from_unicode_levels == 4, same thing but one level deeper.

＠＠ -200,20 +202,41 ＠＠

 Lisp_Object Qutf_8_bom;

+extern int firstbyte_mask[];
+
 #ifdef MULE 
+
+/* There is no single badval that will work for all cases with the from tables,
+   because we allow arbitrary 256x256 charsets. #### This is a real problem;
+   need a better fix.  One possibility is to compute a bad value that is
+   outside the range of a particular charset, and have separate blank tables
+   for each charset.  This still chokes on 256x256, but not anywhere else.
+   The value of 0x0001 will not be valid in any dimension-1 charset (they
+   always are of the form 0xXX00), not valid in a ku-ten style charset, and
+   not valid in any ISO-2022-like charset, or Shift-JIS, Big5, JOHAB, etc.;
+   or any related charset, all of which try to avoid using the control
+   character ranges.  Of course it *is* valid in Unicode, if someone tried
+   to create a national unicode charset; but if we chose a value that is
+   invalid in Unicode, it's likely to be valid for many other charsets; no
+   win. */
+#define BADVAL_FROM_TABLE ((UINT_16_BIT) 1)
+/* For the to tables we are safe, because -1 is never a valid Unicode
+   codepoint. */
+#define BADVAL_TO_TABLE (-1)

-/* #### Using ints for to_unicode is OK (as long as they are >= 32 bits).
-   However, shouldn't the shorts below be unsigned?
+/* We use int for to_unicode; Unicode codepoints always fit into a signed
+   32-bit value.

-   Answer: Doesn't matter because the values being converted to are only
-   96x96. */
+   We use UINT_16_BIT to store a charset codepoint, up to 256x256,
+   unsigned to avoid problems.
+*/
 static int *to_unicode_blank_1;
 static int **to_unicode_blank_2;

-static short *from_unicode_blank_1;
-static short **from_unicode_blank_2;
-static short ***from_unicode_blank_3;
-static short ****from_unicode_blank_4;
+/* The lowest table is always null.  We do it this way so that the table index
+   corresponds to the number of levels of the table, i.e. how many indices
+   before you get an actual value rather than a pointer. */
+static void *from_unicode_blank[5];

 static const struct memory_description to_unicode_level_0_desc_1[] = {
   { XD_END }
＠＠ -224,7 +247,7 ＠＠
 };

 static const struct memory_description to_unicode_level_1_desc_1[] = {
-  { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_0_desc } },
+  { XD_BLOCK_PTR, 0, CHARSET_MAX_SIZE, { &to_unicode_level_0_desc } },
   { XD_END }
 };

＠＠ -233,8 +256,8 ＠＠
 };

 static const struct memory_description to_unicode_description_1[] = {
-  { XD_BLOCK_PTR, 1, 96, { &to_unicode_level_0_desc } },
-  { XD_BLOCK_PTR, 2, 96, { &to_unicode_level_1_desc } },
+  { XD_BLOCK_PTR, 1, CHARSET_MAX_SIZE, { &to_unicode_level_0_desc } },
+  { XD_BLOCK_PTR, 2, CHARSET_MAX_SIZE, { &to_unicode_level_1_desc } },
   { XD_END }
 };

＠＠ -246,7 +269,7 ＠＠

 /* Used only for to_unicode_blank_2 */
 static const struct memory_description to_unicode_level_2_desc_1[] = {
-  { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_1_desc } },
+  { XD_BLOCK_PTR, 0, CHARSET_MAX_SIZE, { &to_unicode_level_1_desc } },
   { XD_END }
 };

＠＠ -255,7 +278,7 ＠＠
 };

 static const struct sized_memory_description from_unicode_level_0_desc = {
-   sizeof (short), from_unicode_level_0_desc_1
+   sizeof (UINT_16_BIT), from_unicode_level_0_desc_1
 };

 static const struct memory_description from_unicode_level_1_desc_1[] = {
＠＠ -299,7 +322,7 ＠＠
   sizeof (void *), from_unicode_description_1
 };

-/* Used only for from_unicode_blank_4 */
+/* Used only for from_unicode_blank[4] */
 static const struct memory_description from_unicode_level_4_desc_1[] = {
   { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_3_desc } },
   { XD_END }
＠＠ -307,27 +330,29 ＠＠

 static Lisp_Object_dynarr *unicode_precedence_dynarr;

-static const struct memory_description lod_description_1[] = {
-  XD_DYNARR_DESC (Lisp_Object_dynarr, &lisp_object_description),
-  { XD_END }
-};
-
-static const struct sized_memory_description lisp_object_dynarr_description = {
-  sizeof (Lisp_Object_dynarr),
-  lod_description_1
-};
-
 Lisp_Object Vlanguage_unicode_precedence_list;
 Lisp_Object Vdefault_unicode_precedence_list;
+/* Used internally in the conversion of a precedence list into a dynarr */
+Lisp_Object Vprecedence_calculation_hash;

 Lisp_Object Qignore_first_column;

-
-/************************************************************************/
-/*                        Unicode implementation                        */
-/************************************************************************/
+/* Break up a 32-bit character code into 8-bit parts. */

-#define BREAKUP_UNICODE_CODE(val, u1, u2, u3, u4, levels)	\
+#ifdef MAXIMIZE_UNICODE_TABLE_DEPTH
+#define TO_TABLE_SIZE_FROM_CHARSET(charset) 2
+#define UNICODE_BREAKUP_CHAR_CODE(val, u1, u2, u3, u4, levels)	\
+do {								\
+  int buc_val = (val);						\
+								\
+  (u1) = buc_val >> 24;						\
+  (u2) = (buc_val >> 16) & 255;					\
+  (u3) = (buc_val >> 8) & 255;					\
+  (u4) = buc_val & 255;						\
+} while (0)
+#else /* not MAXIMIZE_UNICODE_TABLE_DEPTH */
+#define TO_TABLE_SIZE_FROM_CHARSET(charset) XCHARSET_DIMENSION (charset)
+#define UNICODE_BREAKUP_CHAR_CODE(val, u1, u2, u3, u4, levels)	\
 do {								\
   int buc_val = (val);						\
 								\
＠＠ -340,36 +365,56 ＠＠
 	      buc_val <= 0xFFFFFF ? 3 :				\
 	      4);						\
 } while (0)
+#endif /* (not) MAXIMIZE_UNICODE_TABLE_DEPTH */
+#endif /* MULE */
+
+
+/************************************************************************/
+/*                        Unicode implementation                        */
+/************************************************************************/

+/* Given a Lisp_Object that is supposed to represent a Unicode codepoint,
+   make sure it does, and return it. */
+
+int
+decode_unicode (Lisp_Object unicode)
+{
+  EMACS_INT val;
+  CHECK_INT (unicode);
+  val = XINT (unicode);
+  if (!valid_unicode_codepoint_p (val))
+    invalid_argument ("Invalid unicode codepoint (range 0 .. 2^31 - 1)",
+		      unicode);
+  return (int) val;
+}
+
+#ifdef MULE
+
 static void
 init_blank_unicode_tables (void)
 {
   int i;

-  from_unicode_blank_1 = xnew_array (short, 256);
-  from_unicode_blank_2 = xnew_array (short *, 256);
-  from_unicode_blank_3 = xnew_array (short **, 256);
-  from_unicode_blank_4 = xnew_array (short ***, 256);
+  from_unicode_blank[0] = NULL;
+  from_unicode_blank[1] = xnew_array (UINT_16_BIT, 256);
+  from_unicode_blank[2] = xnew_array (UINT_16_BIT *, 256);
+  from_unicode_blank[3] = xnew_array (UINT_16_BIT **, 256);
+  from_unicode_blank[4] = xnew_array (UINT_16_BIT ***, 256);
   for (i = 0; i < 256; i++)
     {
-      /* #### IMWTK: Why does using -1 here work? Simply because there are
-         no existing 96x96 charsets?
+      /* See comment above on BADVAL_FROM_TABLE */
+      ((UINT_16_BIT *) from_unicode_blank[1])[i] = BADVAL_FROM_TABLE;
+      ((void **) from_unicode_blank[2])[i] = from_unicode_blank[1];
+      ((void **) from_unicode_blank[3])[i] = from_unicode_blank[2];
+      ((void **) from_unicode_blank[4])[i] = from_unicode_blank[3];
+    }

-         Answer: I don't understand the concern.  -1 indicates there is no
-         entry for this particular codepoint, which is always the case for
-	 blank tables. */
-      from_unicode_blank_1[i] = (short) -1;
-      from_unicode_blank_2[i] = from_unicode_blank_1;
-      from_unicode_blank_3[i] = from_unicode_blank_2;
-      from_unicode_blank_4[i] = from_unicode_blank_3;
-    }
-
-  to_unicode_blank_1 = xnew_array (int, 96);
-  to_unicode_blank_2 = xnew_array (int *, 96);
-  for (i = 0; i < 96; i++)
+  to_unicode_blank_1 = xnew_array (int, CHARSET_MAX_SIZE);
+  to_unicode_blank_2 = xnew_array (int *, CHARSET_MAX_SIZE);
+  for (i = 0; i < CHARSET_MAX_SIZE; i++)
     {
-      /* Here -1 is guaranteed OK. */
-      to_unicode_blank_1[i] = -1;
+      /* Likewise for BADVAL_TO_TABLE */
+      to_unicode_blank_1[i] = BADVAL_TO_TABLE;
       to_unicode_blank_2[i] = to_unicode_blank_1;
     }
 }
＠＠ -377,38 +422,14 ＠＠
 static void *
 create_new_from_unicode_table (int level)
 {
-  switch (level)
-    {
-      /* WARNING: If you are thinking of compressing these, keep in
-	 mind that sizeof (short) does not equal sizeof (short *). */
-    case 1:
-      {
-	short *newtab = xnew_array (short, 256);
-	memcpy (newtab, from_unicode_blank_1, 256 * sizeof (short));
-	return newtab;
-      }
-    case 2:
-      {
-	short **newtab = xnew_array (short *, 256);
-	memcpy (newtab, from_unicode_blank_2, 256 * sizeof (short *));
-	return newtab;
-      }
-    case 3:
-      {
-	short ***newtab = xnew_array (short **, 256);
-	memcpy (newtab, from_unicode_blank_3, 256 * sizeof (short **));
-	return newtab;
-      }
-    case 4:
-      {
-	short ****newtab = xnew_array (short ***, 256);
-	memcpy (newtab, from_unicode_blank_4, 256 * sizeof (short ***));
-	return newtab;
-      }
-    default:
-      ABORT ();
-      return 0;
-    }
+  /* WARNING: sizeof (UINT_16_BIT) != sizeof (UINT_16_BIT *). */
+  Bytecount size = level == 1 ? sizeof (UINT_16_BIT) : sizeof (void *);
+  void *newtab;
+
+  text_checking_assert (level >= 1 && level <= 4);
+  newtab = xmalloc (256 * size);
+  memcpy (newtab, from_unicode_blank[level], 256 * size);
+  return newtab;
 }

 /* Allocate and blank the tables.
＠＠ -416,63 +437,41 ＠＠
 void
 init_charset_unicode_tables (Lisp_Object charset)
 {
-  if (XCHARSET_DIMENSION (charset) == 1)
+  if (TO_TABLE_SIZE_FROM_CHARSET (charset) == 1)
     {
-      int *to_table = xnew_array (int, 96);
-      memcpy (to_table, to_unicode_blank_1, 96 * sizeof (int));
+      int *to_table = xnew_array (int, CHARSET_MAX_SIZE);
+      memcpy (to_table, to_unicode_blank_1, CHARSET_MAX_SIZE * sizeof (int));
       XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
     }
   else
     {
-      int **to_table = xnew_array (int *, 96);
-      memcpy (to_table, to_unicode_blank_2, 96 * sizeof (int *));
+      int **to_table = xnew_array (int *, CHARSET_MAX_SIZE);
+      memcpy (to_table, to_unicode_blank_2, CHARSET_MAX_SIZE * sizeof (int *));
       XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
     }

-  {
-    XCHARSET_FROM_UNICODE_TABLE (charset) =
-      create_new_from_unicode_table (1);
-    XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
-  }
+#ifdef MAXIMIZE_UNICODE_TABLE_DEPTH
+  XCHARSET_FROM_UNICODE_TABLE (charset) = create_new_from_unicode_table (4);
+  XCHARSET_FROM_UNICODE_LEVELS (charset) = 4;
+#else
+  XCHARSET_FROM_UNICODE_TABLE (charset) = create_new_from_unicode_table (1);
+  XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
+#endif /* MAXIMIZE_UNICODE_TABLE_DEPTH */
 }

 static void
 free_from_unicode_table (void *table, int level)
 {
-  int i;
-
-  switch (level)
+  if (level >= 2)
     {
-    case 2:
-      {
-	short **tab = (short **) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_1)
-	      free_from_unicode_table (tab[i], 1);
-	  }
-	break;
-      }
-    case 3:
-      {
-	short ***tab = (short ***) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_2)
-	      free_from_unicode_table (tab[i], 2);
-	  }
-	break;
-      }
-    case 4:
-      {
-	short ****tab = (short ****) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_3)
-	      free_from_unicode_table (tab[i], 3);
-	  }
-	break;
-      }
+      void **tab = (void **) table;
+      int i;
+
+      for (i = 0; i < 256; i++)
+	{
+	  if (tab[i] != from_unicode_blank[level - 1])
+	    free_from_unicode_table (tab[i], level - 1);
+	}
     }

   xfree (table, void *);
＠＠ -486,7 +485,7 ＠＠
       int i;
       int **tab = (int **) table;

-      for (i = 0; i < 96; i++)
+      for (i = 0; i < CHARSET_MAX_SIZE; i++)
 	{
 	  if (tab[i] != to_unicode_blank_1)
 	    free_to_unicode_table (tab[i], 1);
＠＠ -500,7 +499,7 ＠＠
 free_charset_unicode_tables (Lisp_Object charset)
 {
   free_to_unicode_table (XCHARSET_TO_UNICODE_TABLE (charset),
-			 XCHARSET_DIMENSION (charset));
+			 TO_TABLE_SIZE_FROM_CHARSET (charset));
   free_from_unicode_table (XCHARSET_FROM_UNICODE_TABLE (charset),
 			   XCHARSET_FROM_UNICODE_LEVELS (charset));
 }
＠＠ -511,45 +510,22 ＠＠
 compute_from_unicode_table_size_1 (void *table, int level,
 				   struct overhead_stats *stats)
 {
-  int i;
   Bytecount size = 0;

-  switch (level)
+  if (level >= 2)
     {
-    case 2:
-      {
-	short **tab = (short **) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_1)
-	      size += compute_from_unicode_table_size_1 (tab[i], 1, stats);
-	  }
-	break;
-      }
-    case 3:
-      {
-	short ***tab = (short ***) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_2)
-	      size += compute_from_unicode_table_size_1 (tab[i], 2, stats);
-	  }
-	break;
-      }
-    case 4:
-      {
-	short ****tab = (short ****) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_3)
-	      size += compute_from_unicode_table_size_1 (tab[i], 3, stats);
-	  }
-	break;
-      }
+      int i;
+      void **tab = (void **) table;
+      for (i = 0; i < 256; i++)
+	{
+	  if (tab[i] != from_unicode_blank[level - 1])
+	    size += compute_from_unicode_table_size_1 (tab[i], level - 1,
+						       stats);
+	}
     }

   size += malloced_storage_size (table,
-				 256 * (level == 1 ? sizeof (short) :
+				 256 * (level == 1 ? sizeof (UINT_16_BIT) :
 					sizeof (void *)),
 				 stats);
   return size;
＠＠ -566,7 +542,7 ＠＠
       int i;
       int **tab = (int **) table;

-      for (i = 0; i < 96; i++)
+      for (i = 0; i < CHARSET_MAX_SIZE; i++)
 	{
 	  if (tab[i] != to_unicode_blank_1)
 	    size += compute_to_unicode_table_size_1 (tab[i], 1, stats);
＠＠ -574,8 +550,9 ＠＠
     }

   size += malloced_storage_size (table,
-				 96 * (level == 1 ? sizeof (int) :
-				       sizeof (void *)),
+				 CHARSET_MAX_SIZE *
+				 (level == 1 ? sizeof (int) :
+				  sizeof (void *)),
 				 stats);
   return size;
 }
＠＠ -596,7 +573,7 ＠＠
 {
   return (compute_to_unicode_table_size_1
 	  (XCHARSET_TO_UNICODE_TABLE (charset),
-	   XCHARSET_DIMENSION (charset),
+	   TO_TABLE_SIZE_FROM_CHARSET (charset),
 	   stats));
 }

＠＠ -619,10 +596,10 ＠＠
 static void
 assert_not_any_blank_table (void *tab)
 {
-  assert (tab != from_unicode_blank_1);
-  assert (tab != from_unicode_blank_2);
-  assert (tab != from_unicode_blank_3);
-  assert (tab != from_unicode_blank_4);
+  assert (tab != from_unicode_blank[1]);
+  assert (tab != from_unicode_blank[2]);
+  assert (tab != from_unicode_blank[3]);
+  assert (tab != from_unicode_blank[4]);
   assert (tab != to_unicode_blank_1);
   assert (tab != to_unicode_blank_2);
   assert (tab);
＠＠ -638,65 +615,47 ＠＠
     {
     case 1:
       {
-	short *tab = (short *) table;
+	UINT_16_BIT *tab = (UINT_16_BIT *) table;
 	for (i = 0; i < 256; i++)
 	  {
-	    if (tab[i] != -1)
+	    if (tab[i] != BADVAL_FROM_TABLE)
 	      {
-		Lisp_Object char_charset;
 		int c1, c2;

-		assert (valid_ichar_p (tab[i]));
-		BREAKUP_ICHAR (tab[i], char_charset, c1, c2);
-		assert (EQ (charset, char_charset));
-		if (XCHARSET_DIMENSION (charset) == 1)
+		c1 = tab[i] >> 8;
+		c2 = tab[i] & 0xFF;
+		assert_codepoint_in_range (charset, c1, c2);
+		if (TO_TABLE_SIZE_FROM_CHARSET (charset) == 1)
 		  {
 		    int *to_table =
 		      (int *) XCHARSET_TO_UNICODE_TABLE (charset);
 		    assert_not_any_blank_table (to_table);
-		    assert (to_table[c1 - 32] == (codetop << 8) + i);
+		    assert (to_table[c2 - CHARSET_MIN_OFFSET] ==
+			    (codetop << 8) + i);
 		  }
 		else
 		  {
 		    int **to_table =
 		      (int **) XCHARSET_TO_UNICODE_TABLE (charset);
 		    assert_not_any_blank_table (to_table);
-		    assert_not_any_blank_table (to_table[c1 - 32]);
-		    assert (to_table[c1 - 32][c2 - 32] == (codetop << 8) + i);
+		    assert_not_any_blank_table
+		      (to_table[c1 - CHARSET_MIN_OFFSET]);
+		    assert (to_table[c1 - CHARSET_MIN_OFFSET]
+			    [c2 - CHARSET_MIN_OFFSET] == (codetop << 8) + i);
 		  }
 	      }
 	  }
 	break;
       }
     case 2:
-      {
-	short **tab = (short **) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_1)
-	      sledgehammer_check_from_table (charset, tab[i], 1,
-					     (codetop << 8) + i);
-	  }
-	break;
-      }
     case 3:
-      {
-	short ***tab = (short ***) table;
-	for (i = 0; i < 256; i++)
-	  {
-	    if (tab[i] != from_unicode_blank_2)
-	      sledgehammer_check_from_table (charset, tab[i], 2,
-					     (codetop << 8) + i);
-	  }
-	break;
-      }
     case 4:
       {
-	short ****tab = (short ****) table;
+	void **tab = (void **) table;
 	for (i = 0; i < 256; i++)
 	  {
-	    if (tab[i] != from_unicode_blank_3)
-	      sledgehammer_check_from_table (charset, tab[i], 3,
+	    if (tab[i] != from_unicode_blank[level - 1])
+	      sledgehammer_check_from_table (charset, tab[i], level - 1,
 					     (codetop << 8) + i);
 	  }
 	break;
＠＠ -711,6 +670,9 ＠＠
 			     int codetop)
 {
   int i;
+  int low1, high1, low2, high2;
+
+  get_charset_limits (charset, &low1, &high1, &low2, &high2);

   switch (level)
     {
＠＠ -718,56 +680,62 ＠＠
       {
 	int *tab = (int *) table;

-	if (XCHARSET_CHARS (charset) == 94)
+	if (TO_TABLE_SIZE_FROM_CHARSET (charset) == 2)
+	  /* This means we're traversing a nested table */
+	  low1 = low2, high1 = high2;
+	for (i = 0; i < CHARSET_MAX_SIZE; i++)
 	  {
-	    assert (tab[0] == -1);
-	    assert (tab[95] == -1);
-	  }
-
-	for (i = 0; i < 96; i++)
-	  {
-	    if (tab[i] != -1)
+	    /* Make sure no out-of-bounds characters were set */
+	    if (i + CHARSET_MIN_OFFSET < low1 ||
+		i + CHARSET_MIN_OFFSET > high1)
+	      assert (tab[i] == BADVAL_TO_TABLE);
+	    if (tab[i] != BADVAL_TO_TABLE)
 	      {
 		int u4, u3, u2, u1, levels;
-		Ichar ch;
-		Ichar this_ch;
-		short val;
+		UINT_16_BIT val;
 		void *frtab = XCHARSET_FROM_UNICODE_TABLE (charset);

-		if (XCHARSET_DIMENSION (charset) == 1)
-		  this_ch = make_ichar (charset, i + 32, 0);
-		else
-		  this_ch = make_ichar (charset, codetop + 32, i + 32);
-
 		assert (tab[i] >= 0);
-		BREAKUP_UNICODE_CODE (tab[i], u4, u3, u2, u1, levels);
+		UNICODE_BREAKUP_CHAR_CODE (tab[i], u4, u3, u2, u1, levels);
+#ifdef MAXIMIZE_UNICODE_TABLE_DEPTH
+		levels = 4;
+#endif /* MAXIMIZE_UNICODE_TABLE_DEPTH */
 		assert (levels <= XCHARSET_FROM_UNICODE_LEVELS (charset));

 		switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
 		  {
-		  case 1: val = ((short *) frtab)[u1]; break;
-		  case 2: val = ((short **) frtab)[u2][u1]; break;
-		  case 3: val = ((short ***) frtab)[u3][u2][u1]; break;
-		  case 4: val = ((short ****) frtab)[u4][u3][u2][u1]; break;
+		  case 1: val = ((UINT_16_BIT *) frtab)[u1]; break;
+		  case 2: val = ((UINT_16_BIT **) frtab)[u2][u1]; break;
+		  case 3: val = ((UINT_16_BIT ***) frtab)[u3][u2][u1]; break;
+		  case 4: val = ((UINT_16_BIT ****) frtab)[u4][u3][u2][u1];
+		    break;
 		  default: ABORT ();
 		  }

-		ch = make_ichar (charset, val >> 8, val & 0xFF);
-		assert (ch == this_ch);
+		if (TO_TABLE_SIZE_FROM_CHARSET (charset) == 1)
+		  {
+		    assert (i + CHARSET_MIN_OFFSET == (val >> 8));
+		    assert (0 == (val & 0xFF));
+		  }
+		else
+		  {
+		    assert (codetop + CHARSET_MIN_OFFSET == (val >> 8));
+		    assert (i + CHARSET_MIN_OFFSET == (val & 0xFF));
+		  }

 		switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
 		  {
 		  case 4:
 		    assert_not_any_blank_table (frtab);
-		    frtab = ((short ****) frtab)[u4];
+		    frtab = ((UINT_16_BIT ****) frtab)[u4];
 		    /* fall through */
 		  case 3:
 		    assert_not_any_blank_table (frtab);
-		    frtab = ((short ***) frtab)[u3];
+		    frtab = ((UINT_16_BIT ***) frtab)[u3];
 		    /* fall through */
 		  case 2:
 		    assert_not_any_blank_table (frtab);
-		    frtab = ((short **) frtab)[u2];
+		    frtab = ((UINT_16_BIT **) frtab)[u2];
 		    /* fall through */
 		  case 1:
 		    assert_not_any_blank_table (frtab);
＠＠ -781,15 +749,13 ＠＠
     case 2:
       {
 	int **tab = (int **) table;
-
-	if (XCHARSET_CHARS (charset) == 94)
-	  {
-	    assert (tab[0] == to_unicode_blank_1);
-	    assert (tab[95] == to_unicode_blank_1);
-	  }

-	for (i = 0; i < 96; i++)
+	for (i = 0; i < CHARSET_MAX_SIZE; i++)
 	  {
+	    /* Make sure no out-of-bounds characters were set */
+	    if (i + CHARSET_MIN_OFFSET < low1 ||
+		i + CHARSET_MIN_OFFSET > high1)
+	      assert (tab[i] == to_unicode_blank_1);
 	    if (tab[i] != to_unicode_blank_1)
 	      sledgehammer_check_to_table (charset, tab[i], 1, i);
 	  }
＠＠ -810,15 +776,16 ＠＠

   for (i = 0; i < 256; i++)
     {
-      assert (from_unicode_blank_1[i] == (short) -1);
-      assert (from_unicode_blank_2[i] == from_unicode_blank_1);
-      assert (from_unicode_blank_3[i] == from_unicode_blank_2);
-      assert (from_unicode_blank_4[i] == from_unicode_blank_3);
+      assert (((UINT_16_BIT *) from_unicode_blank[1])[i] ==
+	      BADVAL_FROM_TABLE);
+      assert (((void **) from_unicode_blank[2])[i] == from_unicode_blank[1]);
+      assert (((void **) from_unicode_blank[3])[i] == from_unicode_blank[2]);
+      assert (((void **) from_unicode_blank[4])[i] == from_unicode_blank[3]);
     }

-  for (i = 0; i < 96; i++)
+  for (i = 0; i < CHARSET_MAX_SIZE; i++)
     {
-      assert (to_unicode_blank_1[i] == -1);
+      assert (to_unicode_blank_1[i] == BADVAL_TO_TABLE);
       assert (to_unicode_blank_2[i] == to_unicode_blank_1);
     }

＠＠ -830,134 +797,114 ＠＠

   sledgehammer_check_to_table (charset,
 			       XCHARSET_TO_UNICODE_TABLE (charset),
-			       XCHARSET_DIMENSION (charset), 0);
+			       TO_TABLE_SIZE_FROM_CHARSET (charset), 0);
 }

 #endif /* SLEDGEHAMMER_CHECK_UNICODE */

 static void
-set_unicode_conversion (Ichar chr, int code)
+set_unicode_conversion (int code, Lisp_Object charset, int c1, int c2)
 {
-  Lisp_Object charset;
-  int c1, c2;
+  text_checking_assert (valid_charset_codepoint_p (charset, c1, c2));
+  text_checking_assert (valid_unicode_codepoint_p (code));

-  BREAKUP_ICHAR (chr, charset, c1, c2);
-
   /* I tried an assert on code > 255 || chr == code, but that fails because
      Mule gives many Latin characters separate code points for different
      ISO 8859 coded character sets.  Obvious in hindsight.... */
-  assert (!EQ (charset, Vcharset_ascii) || chr == code);
-  assert (!EQ (charset, Vcharset_latin_iso8859_1) || chr == code);
-  assert (!EQ (charset, Vcharset_control_1) || chr == code);
+  text_checking_assert (!EQ (charset, Vcharset_ascii) || code == c2);
+  text_checking_assert (!EQ (charset, Vcharset_control_1) || code == c2);
+  text_checking_assert (!EQ (charset, Vcharset_latin_iso8859_1) ||
+			code == c2);

   /* This assert is needed because it is simply unimplemented. */
-  assert (!EQ (charset, Vcharset_composite));
+  text_checking_assert (!EQ (charset, Vcharset_composite));

 #ifdef SLEDGEHAMMER_CHECK_UNICODE
   sledgehammer_check_unicode_tables (charset);
 #endif

-  if (EQ(charset, Vcharset_ascii) || EQ(charset, Vcharset_control_1))
-    return;
-
   /* First, the char -> unicode translation */

-  if (XCHARSET_DIMENSION (charset) == 1)
+  if (TO_TABLE_SIZE_FROM_CHARSET (charset) == 1)
     {
       int *to_table = (int *) XCHARSET_TO_UNICODE_TABLE (charset);
-      to_table[c1 - 32] = code;
+      to_table[c2 - CHARSET_MIN_OFFSET] = code;
     }
   else
     {
       int **to_table_2 = (int **) XCHARSET_TO_UNICODE_TABLE (charset);
       int *to_table_1;

-      assert (XCHARSET_DIMENSION (charset) == 2);
-      to_table_1 = to_table_2[c1 - 32];
+      to_table_1 = to_table_2[c1 - CHARSET_MIN_OFFSET];
       if (to_table_1 == to_unicode_blank_1)
 	{
-	  to_table_1 = xnew_array (int, 96);
-	  memcpy (to_table_1, to_unicode_blank_1, 96 * sizeof (int));
-	  to_table_2[c1 - 32] = to_table_1;
+	  to_table_1 = xnew_array (int, CHARSET_MAX_SIZE);
+	  memcpy (to_table_1, to_unicode_blank_1,
+		  CHARSET_MAX_SIZE * sizeof (int));
+	  to_table_2[c1 - CHARSET_MIN_OFFSET] = to_table_1;
 	}
-      to_table_1[c2 - 32] = code;
+      to_table_1[c2 - CHARSET_MIN_OFFSET] = code;
     }

   /* Then, unicode -> char: much harder */

   {
-    int charset_levels;
+    int levels;
     int u4, u3, u2, u1;
+#ifndef MAXIMIZE_UNICODE_TABLE_DEPTH
     int code_levels;
-    BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
+#endif /* not MAXIMIZE_UNICODE_TABLE_DEPTH */
+    UNICODE_BREAKUP_CHAR_CODE (code, u4, u3, u2, u1, code_levels);

-    charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
+    levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
+    text_checking_assert (levels >= 1 && levels <= 4);

+#ifndef MAXIMIZE_UNICODE_TABLE_DEPTH
+    text_checking_assert (code_levels <= 4);
     /* Make sure the charset's tables have at least as many levels as
        the code point has: Note that the charset is guaranteed to have
        at least one level, because it was created that way */
-    if (charset_levels < code_levels)
+    if (levels < code_levels)
       {
 	int i;

-	assert (charset_levels > 0);
 	for (i = 2; i <= code_levels; i++)
 	  {
-	    if (charset_levels < i)
+	    if (levels < i)
 	      {
 		void *old_table = XCHARSET_FROM_UNICODE_TABLE (charset);
 		void *table = create_new_from_unicode_table (i);
 		XCHARSET_FROM_UNICODE_TABLE (charset) = table;
-		  
-		switch (i)
-		  {
-		  case 2:
-		    ((short **) table)[0] = (short *) old_table;
-		    break;
-		  case 3:
-		    ((short ***) table)[0] = (short **) old_table;
-		    break;
-		  case 4:
-		    ((short ****) table)[0] = (short ***) old_table;
-		    break;
-		  default: ABORT ();
-		  }
+		((void **) table)[0] = old_table;
 	      }
 	  }

-	charset_levels = code_levels;
+	levels = code_levels;
 	XCHARSET_FROM_UNICODE_LEVELS (charset) = code_levels;
       }
+#endif /* not MAXIMIZE_UNICODE_TABLE_DEPTH */

     /* Now, make sure there is a non-default table at each level */
     {
       int i;
       void *table = XCHARSET_FROM_UNICODE_TABLE (charset);

-      for (i = charset_levels; i >= 2; i--)
+      for (i = levels; i >= 2; i--)
 	{
+	  int ind;
+
 	  switch (i)
 	    {
-	    case 4:
-	      if (((short ****) table)[u4] == from_unicode_blank_3)
-		((short ****) table)[u4] =
-		  ((short ***) create_new_from_unicode_table (3));
-	      table = ((short ****) table)[u4];
-	      break;
-	    case 3:
-	      if (((short ***) table)[u3] == from_unicode_blank_2)
-		((short ***) table)[u3] =
-		  ((short **) create_new_from_unicode_table (2));
-	      table = ((short ***) table)[u3];
-	      break;
-	    case 2:
-	      if (((short **) table)[u2] == from_unicode_blank_1)
-		((short **) table)[u2] =
-		  ((short *) create_new_from_unicode_table (1));
-	      table = ((short **) table)[u2];
-	      break;
-	    default: ABORT ();
+	    case 4: ind = u4; break;
+	    case 3: ind = u3; break;
+	    case 2: ind = u2; break;
+	    default: ind = 0; ABORT ();
 	    }
+
+	  if (((void **) table)[ind] == from_unicode_blank[i - 1])
+	    ((void **) table)[ind] =
+	      ((void *) create_new_from_unicode_table (i - 1));
+	  table = ((void **) table)[ind];
 	}
     }

＠＠ -965,14 +912,18 ＠＠

     {
       void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
-      switch (charset_levels)
+#ifndef MAXIMIZE_UNICODE_TABLE_DEPTH
+      switch (levels)
 	{
-	case 1: ((short *) table)[u1] = (c1 << 8) + c2; break;
-	case 2: ((short **) table)[u2][u1] = (c1 << 8) + c2; break;
-	case 3: ((short ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
-	case 4: ((short ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
+	case 4: ((UINT_16_BIT ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
+	case 3: ((UINT_16_BIT ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
+	case 2: ((UINT_16_BIT **) table)[u2][u1] = (c1 << 8) + c2; break;
+	case 1: ((UINT_16_BIT *) table)[u1] = (c1 << 8) + c2; break;
 	default:  ABORT ();
 	}
+#else /* MAXIMIZE_UNICODE_TABLE_DEPTH */
+      ((UINT_16_BIT ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2;
+#endif /* not MAXIMIZE_UNICODE_TABLE_DEPTH */
     }
   }

＠＠ -980,93 +931,255 ＠＠
   sledgehammer_check_unicode_tables (charset);
 #endif
 }
-
-int
-ichar_to_unicode (Ichar chr)
-{
-  Lisp_Object charset;
-  int c1, c2;

-  type_checking_assert (valid_ichar_p (chr));
-  /* This shortcut depends on the representation of an Ichar, see text.c. */
-  if (chr < 256)
-    return (int) chr;
+/* Convert a Unicode codepoint to a charset codepoint. */

-  BREAKUP_ICHAR (chr, charset, c1, c2);
-  if (EQ (charset, Vcharset_composite))
-    return -1; /* #### don't know how to handle */
-  else if (XCHARSET_DIMENSION (charset) == 1)
-    return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32];
-  else
-    return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32][c2 - 32];
-}
-
-static Ichar
-unicode_to_ichar (int code, Lisp_Object_dynarr *charsets)
+void
+non_ascii_unicode_to_charset_codepoint (int code, Lisp_Object_dynarr *charsets,
+					Lisp_Object *charset, int *c1, int *c2)
 {
   int u1, u2, u3, u4;
+#ifndef MAXIMIZE_UNICODE_TABLE_DEPTH
   int code_levels;
+#endif
   int i;
   int n = Dynarr_length (charsets);

-  type_checking_assert (code >= 0);
-  /* This shortcut depends on the representation of an Ichar, see text.c.
-     Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
-     coded character sets have points that map into that region, so this
-     function is many-valued). */
-  if (code < 0xA0)
-    return (Ichar) code;
+  text_checking_assert (valid_unicode_codepoint_p (code));
+  text_checking_assert (code >= 128);

-  BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
+  /* ＠＠#### This optimization is not necessarily correct.  See comment in
+     unicode_to_charset_codepoint(). */
+  if (code <= 0x9F)
+    {
+      *charset = Vcharset_control_1;
+      *c1 = 0;
+      *c2 = code;
+      goto done;
+    }
+
+  UNICODE_BREAKUP_CHAR_CODE (code, u4, u3, u2, u1, code_levels);

   for (i = 0; i < n; i++)
     {
-      Lisp_Object charset = Dynarr_at (charsets, i);
-      int charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
-      if (charset_levels >= code_levels)
+      *charset = Dynarr_at (charsets, i);
+      void *table = XCHARSET_FROM_UNICODE_TABLE (*charset);
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+      if (!table)
+	{
+	  int algo_low = XCHARSET_ALGO_LOW (*charset);
+	  text_checking_assert (algo_low >= 0);
+	  if (code >= algo_low &&
+	      code < algo_low +
+	      XCHARSET_CHARS (*charset, 0) * XCHARSET_CHARS (*charset, 1))
+	    {
+	      code -= algo_low;
+	      *c1 = code / XCHARSET_CHARS (*charset, 1);
+	      *c2 = code % XCHARSET_CHARS (*charset, 1);
+	      *c1 += XCHARSET_OFFSET (*charset, 0);
+	      *c2 += XCHARSET_OFFSET (*charset, 1);
+	      goto done;
+	    }
+	  continue;
+	}
+#endif /* ALLOW_ALGORITHMIC_CONVERSION_TABLES */
+#ifdef MAXIMIZE_UNICODE_TABLE_DEPTH
+      UINT_16_BIT retval = ((UINT_16_BIT ****) table)[u4][u3][u2][u1];
+      if (retval != BADVAL_FROM_TABLE)
+	{
+	  *c1 = retval >> 8;
+	  *c2 = retval & 0xFF;
+	  goto done;
+	}
+#else
+      int levels = XCHARSET_FROM_UNICODE_LEVELS (*charset);
+      if (levels >= code_levels)
 	{
-	  void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
-	  short retval;
+	  UINT_16_BIT retval;

-	  switch (charset_levels)
+	  switch (levels)
 	    {
-	    case 1: retval = ((short *) table)[u1]; break;
-	    case 2: retval = ((short **) table)[u2][u1]; break;
-	    case 3: retval = ((short ***) table)[u3][u2][u1]; break;
-	    case 4: retval = ((short ****) table)[u4][u3][u2][u1]; break;
+	    case 1: retval = ((UINT_16_BIT *) table)[u1]; break;
+	    case 2: retval = ((UINT_16_BIT **) table)[u2][u1]; break;
+	    case 3: retval = ((UINT_16_BIT ***) table)[u3][u2][u1]; break;
+	    case 4: retval = ((UINT_16_BIT ****) table)[u4][u3][u2][u1]; break;
 	    default: ABORT (); retval = 0;
 	    }

-	  if (retval != -1)
-	    return make_ichar (charset, retval >> 8, retval & 0xFF);
+	  if (retval != BADVAL_FROM_TABLE)
+	    {
+	      c1 = retval >> 8;
+	      c2 = retval & 0xFF;
+	      goto done;
+	    }
 	}
+#endif /* MAXIMIZE_UNICODE_TABLE_DEPTH */
+    }
+
+  /* Unable to convert; try the private codepoint range */
+  private_unicode_to_charset_codepoint (code, charset, c1, c2);
+  return;
+
+done:
+  text_checking_assert (valid_charset_codepoint_p (*charset, *c1, *c2));
+  return;
+}
+
+/* ＠＠####
+ there has to be a way for Lisp callers to *always* request the private
+ codepoint if they want it.
+
+ Possibly, we want always-private characters to behave somewhat like
+ their normal equivalents, when such exists; e.g. when retrieving
+ a property from a char table using such a char, try to retrieve a
+ normal Unicode value and use its properties instead.  When setting,
+ do a similar switcheroo.  Maybe at redisplay time also, and anywhere
+ else we access properties of a char, do similar switching.
+ This might be necessary to get recompilation to really work right,
+ I'm not sure.  Hold off on this until necessary.
+*/
+
+int
+charset_codepoint_to_private_unicode (Lisp_Object charset, int c1, int c2)
+{
+  /* If the charset is ISO2022-compatible, try to use a representation
+     that is portable, in case we are writing out to an external
+     Unicode representation.  Otherwise, just do something.
+
+     What we use is this:
+
+     If ISO2022-compatible:
+
+       23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+        1 <---> <------------------> <------------------>
<------------------>
+            1            2                    3                    4
+
+     Field 1 is the type (94, 96, 94x94, 96x96).
+     Field 2 is the final byte.
+     Field 3 is the first octet.
+     Field 4 is the second octet.
+     Bit 23 is set so that we are above all possible UTF-16 chars.
+
+ <-... 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
+ <-..------------------------> <--------------------->
<--------------------->
+              1                           2                       3
+
+     If non-ISO2022-compatible:
+
+     Field 1 (extends up to 31 bits) is the charset ID + 256, to place it
+     above all the others. (NOTE: It's true that non-encodable charset ID's
+     are always >= 256; but the correspondence between encodable charsets
+     and ISO2022-compatible charsets is not one-to-one in either direction;
+     see get_charset_iso2022_type().)  Fields 2 and 3 are the octet values.
+  */
+  int type = get_charset_iso2022_type (charset);
+
+  if (type >= 0)
+    {
+      /* The types are defined between 0 and 3 in charset.h; make sure
+	 someone doesn't change them */
+      text_checking_assert (type <= 3);
+      c1 &= 127;
+      c2 &= 127;
+      return 0x800000 + (type << 21) + (XCHARSET_FINAL (charset) << 14)
+	+ (c1 << 7) + c2;
+    }
+  else
+    {
+      int retval = ((256 + XCHARSET_ID (charset)) << 16) + (c1 << 8) + c2;
+      /* Check for overflow */
+      if (!valid_unicode_codepoint_p (retval))
+	retval = CANT_CONVERT_CHAR_WHEN_ENCODING_UNICODE;
+      return retval;
     }
+}

-  return (Ichar) -1;
+void
+private_unicode_to_charset_codepoint (int priv, Lisp_Object *charset,
+				      int *c1, int *c2)
+{
+  if (priv >= 0x1000000)
+    {
+      *charset = charset_by_id ((priv >> 16) - 256);
+      *c1 = (priv >> 8) & 0xFF;
+      *c2 = priv & 0xFF;
+    }
+  else if (priv >= 0x800000)
+    {
+      int type;
+      int final;
+      
+      priv -= 0x800000;
+      type = (priv >> 21) & 3;
+      final = (priv >> 14) & 0x7F;
+      *c1 = (priv >> 7) & 0x7F;
+      *c2 = priv & 0x7F;
+      *charset = charset_by_attributes (type, final, CHARSET_LEFT_TO_RIGHT);
+      if (NILP (*charset))
+	*charset = charset_by_attributes (type, final, CHARSET_RIGHT_TO_LEFT);
+      if (!NILP (*charset))
+	{
+	  if (XCHARSET_OFFSET (*charset, 0) >= 128)
+	    *c1 += 128;
+	  if (XCHARSET_OFFSET (*charset, 1) >= 128)
+	    *c2 += 128;
+	}
+    }
+  else
+    {
+      /* ＠＠#### Better error recovery? */
+      *charset = Qnil;
+      *c1 = 0;
+      *c2 = 0;
+    }
+  if (!NILP (*charset) && !valid_charset_codepoint_p (*charset, *c1, *c2))
+    /* ＠＠#### Better error recovery? */
+    {
+      /* ＠＠#### Better error recovery? */
+      *charset = Qnil;
+      *c1 = 0;
+      *c2 = 0;
+    }
 }

+/****************** code to handle Unicode precedence lists ******************/
+
 /* Add charsets to precedence list.
    LIST must be a list of charsets.  Charsets which are in the list more
    than once are given the precedence implied by their earliest appearance.
-   Later appearances are ignored. */
+   Later appearances are ignored.  This makes use of the hash table in
+   Vprecedence_calculation_hash, which is not cleared at the beginning;
+   you must do that. */
 static void
-add_charsets_to_precedence_list (Lisp_Object list, int *lbs,
-				 Lisp_Object_dynarr *dynarr)
+add_charsets_to_precedence_list (Lisp_Object list, Lisp_Object_dynarr *dynarr)
 {
   {
     EXTERNAL_LIST_LOOP_2 (elt, list)
       {
 	Lisp_Object charset = Fget_charset (elt);
-	int lb = XCHARSET_LEADING_BYTE (charset);
-	if (lbs[lb - MIN_LEADING_BYTE] == 0)
+	if (NILP (Fgethash (charset, Vprecedence_calculation_hash, Qnil)))
 	  {
 	    Dynarr_add (dynarr, charset);
-	    lbs[lb - MIN_LEADING_BYTE] = 1;
+	    Fputhash (charset, Qt, Vprecedence_calculation_hash);
 	  }
       }
   }
 }

+/* Called for each pair of (symbol, charset) in the hash table tracking
+   charsets. */
+static int
+rup_mapper (Lisp_Object UNUSED (key), Lisp_Object value,
+	    void * UNUSED (closure))
+{
+  if (NILP (Fgethash (value, Vprecedence_calculation_hash, Qnil)))
+    {
+      Dynarr_add (unicode_precedence_dynarr, value);
+      Fputhash (value, Qt, Vprecedence_calculation_hash);
+    }
+  return 0;
+}
+
 /* Rebuild the charset precedence array.
    The "charsets preferred for the current language" get highest precedence,
    followed by the "charsets preferred by default", ordered as in
＠＠ -1075,28 +1188,63 ＠＠
 void
 recalculate_unicode_precedence (void)
 {
-  int lbs[NUM_LEADING_BYTES];
-  int i;
-
-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    lbs[i] = 0;
-
   Dynarr_reset (unicode_precedence_dynarr);

+  /* Assume precedence-calculation-hash was cleared at the end of any
+     previous operation */
   add_charsets_to_precedence_list (Vlanguage_unicode_precedence_list,
-				   lbs, unicode_precedence_dynarr);
+				   unicode_precedence_dynarr);
   add_charsets_to_precedence_list (Vdefault_unicode_precedence_list,
-				   lbs, unicode_precedence_dynarr);
+				   unicode_precedence_dynarr);

-  for (i = 0; i < NUM_LEADING_BYTES; i++)
-    {
-      if (lbs[i] == 0)
-	{
-	  Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
-	  if (!NILP (charset))
-	    Dynarr_add (unicode_precedence_dynarr, charset);
-	}
-    }
+
+  /* Now add all remaining charsets to unicode_precedence_dynarr */
+  elisp_maphash (rup_mapper, Vcharset_hash_table, NULL);
+
+  /* Maintain invariant assumption as described above */
+  Fclrhash (Vprecedence_calculation_hash);
+}
+
+Lisp_Object_dynarr *
+get_unicode_precedence (void)
+{
+  return unicode_precedence_dynarr;
+}
+
+void
+free_precedence_dynarr (Lisp_Object_dynarr *dynarr)
+{
+  if (dynarr != unicode_precedence_dynarr)
+    Dynarr_free (dynarr);
+}
+
+/* Convert the given list of charsets (not previously validated) into a
+   precedence dynarr for use with unicode_to_charset_codepoint().  When
+   done, free the dynarr with free_precedence_dynarr(). */
+
+Lisp_Object_dynarr *
+convert_charset_list_to_precedence_dynarr (Lisp_Object charsets)
+{
+  Lisp_Object_dynarr *dyn;
+
+  if (NILP (charsets))
+    return get_unicode_precedence ();
+
+  /* Must validate before allocating (or use unwind-protect) */
+  {
+    EXTERNAL_LIST_LOOP_2 (elt, charsets)
+      Fget_charset (elt);
+  }
+
+  dyn = Dynarr_new (Lisp_Object);
+
+  /* Assume precedence-calculation-hash was cleared at the end of any
+     previous operation */
+  add_charsets_to_precedence_list (charsets, dyn);
+  /* Maintain invariant assumption as described above */
+  Fclrhash (Vprecedence_calculation_hash);
+
+  return dyn;
 }

 DEFUN ("unicode-precedence-list", 
＠＠ -1124,9 +1272,10 ＠＠
 {
   int i;
   Lisp_Object list = Qnil;
+  Lisp_Object_dynarr *preclist = get_unicode_precedence();

-  for (i = Dynarr_length (unicode_precedence_dynarr) - 1; i >= 0; i--)
-    list = Fcons (Dynarr_at (unicode_precedence_dynarr, i), list);
+  for (i = Dynarr_length (preclist) - 1; i >= 0; i--)
+    list = Fcons (Dynarr_at (preclist, i), list);
   return list;
 }

＠＠ -1209,154 +1358,66 ＠＠
 }

 DEFUN ("set-unicode-conversion", Fset_unicode_conversion,
-       2, 2, 0, /*
-Add conversion information between Unicode codepoints and characters.
-Conversions for U+0000 to U+00FF are hardwired to ASCII, Control-1, and
-Latin-1.  Attempts to set these values will raise an error.
-
-CHARACTER is one of the following:
-
--- A character (in which case CODE must be a non-negative integer; values
-   above 2^20 - 1 are allowed for the purpose of specifying private
-   characters, but are illegal in standard Unicode---they will cause errors
-   when converted to utf-16)
--- A vector of characters (in which case CODE must be a vector of integers
-   of the same length)
+       3, 4, 0, /*
+Add conversion information between Unicode and charset codepoints.
+A single Unicode codepoint may have multiple corresponding charset
+codepoints, but each codepoint in a charset corresponds to only one
+Unicode codepoint.  Further calls to this function with the same
+values for (CHARSET, C1 [, C2]) and a different value for UNICODE
+will overwrite the previous value.
+
+Note that the Unicode codepoints corresponding to the ASCII, Control-1,
+and Latin-1 charsets are hard-wired.  Attempts to set these values
+will raise an error.
+
+C2 either must or may not be specified, depending on the dimension of
+CHARSET (see `make-char').
 */
-       (character, code))
+       (unicode, charset, c1, c2))
 {
-  Lisp_Object charset;
-  int ichar, unicode;
-
-  CHECK_CHAR (character);
-  CHECK_NATNUM (code);
-
-  unicode = XINT (code);
-  ichar = XCHAR (character);
-  charset = ichar_charset (ichar);
-
+  int a1, a2;
+  int ucp = decode_unicode (unicode);
+  charset = get_charset_octets (charset, c1, c2, &a1, &a2);
+  
   /* The translations of ASCII, Control-1, and Latin-1 code points are
      hard-coded in ichar_to_unicode and unicode_to_ichar.

-     Checking unicode < 256 && ichar != unicode is wrong because Mule gives
-     many Latin characters code points in a few different character sets. */
-  if ((EQ (charset, Vcharset_ascii) ||
-       EQ (charset, Vcharset_control_1) ||
-       EQ (charset, Vcharset_latin_iso8859_1))
-      && unicode != ichar)
-    signal_error (Qinvalid_argument, "Can't change Unicode translation for
ASCII, Control-1 or Latin-1 character",
-		  character);
+     #### But they shouldn't be; see comments elsewhere.

+     Checking for all unicode < 256 && c1 | 0x80 != unicode is wrong
+     because Mule gives many Latin characters code points in a few
+     different character sets. */
+  if ((EQ (charset, Vcharset_ascii) ||
+       EQ (charset, Vcharset_latin_iso8859_1) ||
+       EQ (charset, Vcharset_control_1)))
+    {
+      if (ucp != (a2 | 0x80))
+	invalid_argument
+	  ("Can't change Unicode translation for ASCII, Control-1 or Latin-1
character",
+           unicode);
+      return Qnil;
+    }
+  
   /* #### Composite characters are not properly implemented yet. */
   if (EQ (charset, Vcharset_composite))
-    signal_error (Qinvalid_argument, "Can't set Unicode translation for
Composite char",
-		  character);
-
-  set_unicode_conversion (ichar, unicode);
-  return Qnil;
-}
-
-#endif /* MULE */
-
-DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 1, 0, /*
-Convert character to Unicode codepoint.
-When there is no international support (i.e. the `mule' feature is not
-present), this function simply does `char-to-int'.
-*/
-       (character))
-{
-  CHECK_CHAR (character);
-#ifdef MULE
-  return make_int (ichar_to_unicode (XCHAR (character)));
-#else
-  return Fchar_to_int (character);
-#endif /* MULE */
-}
+    invalid_argument ("Can't set Unicode translation for Composite char",
+		      unicode);

-DEFUN ("unicode-to-char", Funicode_to_char, 1, 2, 0, /*
-Convert Unicode codepoint to character.
-CODE should be a non-negative integer.
-If CHARSETS is given, it should be a list of charsets, and only those
-charsets will be consulted, in the given order, for a translation.
-Otherwise, the default ordering of all charsets will be given (see
-`set-unicode-charset-precedence').
-
-When there is no international support (i.e. the `mule' feature is not
-present), this function simply does `int-to-char' and ignores the CHARSETS
-argument.
-
-Note that the current XEmacs internal encoding has no mapping for many
-Unicode code points, and if you use characters that are vaguely obscure with
-XEmacs' Unicode coding systems, you will lose data.
-
-To add support for some desired code point in the short term--note that our
-intention is to move to a Unicode-compatible internal encoding soon, for
-some value of soon--if you are a distributor, add something like the
-following to `site-start.el.'
-
-(make-charset 'distro-name-private 
-	      "Private character set for DISTRO"
-	      '(dimension 1
-		chars 96
-		columns 1
-		final ?5 ;; Change this--see docs for make-charset
-		long-name "Private charset for some Unicode char support."
-		short-name "Distro-Private"))
-
-(set-unicode-conversion 
- (make-char 'distro-name-private #x20) #x263A) ;; WHITE SMILING FACE
-
-(set-unicode-conversion 
- (make-char 'distro-name-private #x21) #x3030) ;; WAVY DASH
-
-;; ... 
-;;; Repeat as necessary. 
-
-Redisplay will work on the sjt-xft branch, but not with server-side X11
-fonts as is the default.  However, data read in will be preserved when they
-are written out again.
-
-*/
-       (code, USED_IF_MULE (charsets)))
-{
-#ifdef MULE
-  Lisp_Object_dynarr *dyn;
-  int lbs[NUM_LEADING_BYTES];
-  int c;
-
-  CHECK_NATNUM (code);
-  c = XINT (code);
-  {
-    EXTERNAL_LIST_LOOP_2 (elt, charsets)
-      Fget_charset (elt);
-  }
-
-  if (NILP (charsets))
+#ifdef ALLOW_ALGORITHMIC_CONVERSION_TABLES
+  if (!XCHARSET_FROM_UNICODE_TABLE (charset))
     {
-      Ichar ret = unicode_to_ichar (c, unicode_precedence_dynarr);
-      if (ret == -1)
-	return Qnil;
-      return make_char (ret);
+      text_checking_assert (XCHARSET_ALGO_LOW (charset) >= 0);
+      invalid_argument
+	("Can't set Unicode translation of charset with automatic translation",
+	 charset);
     }
+#endif

-  dyn = Dynarr_new (Lisp_Object);
-  memset (lbs, 0, NUM_LEADING_BYTES * sizeof (int));
-  add_charsets_to_precedence_list (charsets, lbs, dyn);
-  {
-    Ichar ret = unicode_to_ichar (c, dyn);
-    Dynarr_free (dyn);
-    if (ret == -1)
-      return Qnil;
-    return make_char (ret);
-  }
-#else
-  CHECK_NATNUM (code);
-  return Fint_to_char (code);
-#endif /* MULE */
+  set_unicode_conversion (ucp, charset, a1, a2);
+  return Qnil;
 }

-#ifdef MULE
-
+/* "cerrar el fulano" = close the so-and-so */
 static Lisp_Object
 cerrar_el_fulano (Lisp_Object fulano)
 {
＠＠ -1394,6 +1455,7 ＠＠
 `big5'
   The charset codepoints are Big Five codepoints; convert it to the
   hacked-up Mule codepoint in `chinese-big5-1' or `chinese-big5-2'.
+  Not when (featurep 'unicode-internal).
 */
      (filename, charset, start, end, offset, flags))
 {
＠＠ -1401,9 +1463,11 ＠＠
   FILE *file;
   struct gcpro gcpro1;
   char line[1025];
-  int fondo = specpdl_depth ();
+  int fondo = specpdl_depth (); /* "fondo" = depth */
   int ignore_first_column = 0;
+#ifndef UNICODE_INTERNAL
   int big5 = 0;
+#endif /* not UNICODE_INTERNAL */

   CHECK_STRING (filename);
   charset = Fget_charset (charset);
＠＠ -1431,8 +1495,10 ＠＠
       {
 	if (EQ (elt, Qignore_first_column))
 	  ignore_first_column = 1;
+#ifndef UNICODE_INTERNAL
 	else if (EQ (elt, Qbig5))
 	  big5 = 1;
+#endif /* not UNICODE_INTERNAL */
 	else
 	  invalid_constant
 	    ("Unrecognized `load-unicode-mapping-table' flag", elt);
＠＠ -1451,6 +1517,7 ＠＠
       int cp1, cp2, endcount;
       int cp1high, cp1low;
       int dummy;
+      int scanf_count, garbage_after_scanf;

       while (*p) /* erase all comments out of the line */
 	{
＠＠ -1471,19 +1538,31 ＠＠
 	 interpretation.

 	 Also, the return value does NOT include %n storage. */
-      if ((!ignore_first_column ?
-	   sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) < 2 :
-	   sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount)
< 3)
-	  /* #### Temporary code!  Cygwin newlib fucked up scanf() handling
-	     of numbers beginning 0x0... starting in 04/2004, in an attempt
-	     to fix another bug.  A partial fix for this was put in in
-	     06/2004, but as of 10/2004 the value of ENDCOUNT returned in
-	     such case is still wrong.  If this gets fixed soon, remove
-	     this code. --ben */
+      scanf_count =
+	(!ignore_first_column ?
+	 sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) :
+	 sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount) -
1);
+      /* #### Temporary code!  Cygwin newlib fucked up scanf() handling
+	 of numbers beginning 0x0... starting in 04/2004, in an attempt
+	 to fix another bug.  A partial fix for this was put in in
+	 06/2004, but as of 10/2004 the value of ENDCOUNT returned in
+	 such case is still wrong.  If this gets fixed soon, remove
+	 this code. --ben */
 #ifndef CYGWIN_SCANF_BUG
-	  || *(p + endcount + strspn (p + endcount, " \t\n\r\f"))
+      garbage_after_scanf =
+	*(p + endcount + strspn (p + endcount, " \t\n\r\f"));
+#else
+      garbage_after_scanf = 0;
 #endif
-	  )
+
+      /* #### Hack.  A number of the CP###.TXT files from Microsoft contain
+	 lines with a charset codepoint and no corresponding Unicode
+	 codepoint, representing undefined values in the code page.
+
+	 Skip them so we don't get a raft of warnings. */
+      if (scanf_count == 1 && !garbage_after_scanf)
+	continue;
+      if (scanf_count < 2 || garbage_after_scanf)
 	{
 	  warn_when_safe (Qunicode, Qwarning,
 			  "Unrecognized line in translation file %s:\n%s",
＠＠ -1506,40 +1585,33 ＠＠
 	  cp1high = cp1 >> 8;
 	  cp1low = cp1 & 255;

+#ifndef UNICODE_INTERNAL
 	  if (big5)
 	    {
-	      Ichar ch = decode_big5_char (cp1high, cp1low);
-	      if (ch == -1)
-
+	      Lisp_Object fake_charset;
+	      int c1, c2;
+	      big5_char_to_fake_codepoint (cp1high, cp1low, &fake_charset,
+					   &c1, &c2);
+	      if (NILP (fake_charset))
 		warn_when_safe (Qunicode, Qwarning,
 				"Out of range Big5 codepoint 0x%x in "
 				"translation file %s:\n%s",
 				cp1, XSTRING_DATA (filename), line);
 	      else
-		set_unicode_conversion (ch, cp2);
+		set_unicode_conversion (cp2, fake_charset, c1, c2);
 	    }
 	  else
+#endif /* not UNICODE_INTERNAL */
 	    {
 	      int l1, h1, l2, h2;
-	      Ichar emch;
+	      int c1 = cp1high, c2 = cp1low;

-	      switch (XCHARSET_TYPE (charset))
-		{
-		case CHARSET_TYPE_94: l1 = 33; h1 = 126; l2 = 0; h2 = 0; break;
-		case CHARSET_TYPE_96: l1 = 32; h1 = 127; l2 = 0; h2 = 0; break;
-		case CHARSET_TYPE_94X94: l1 = 33; h1 = 126; l2 = 33; h2 = 126;
-		  break;
-		case CHARSET_TYPE_96X96: l1 = 32; h1 = 127; l2 = 32; h2 = 127;
-		  break;
-		default: ABORT (); l1 = 0; h1 = 0; l2 = 0; h2 = 0;
-		}
+	      get_charset_limits (charset, &l1, &h1, &l2, &h2);

-	      if (cp1high < l2 || cp1high > h2 || cp1low < l1 || cp1low > h1)
+	      if (c1 < l1 || c1 > h1 || c2 < l2 || c2 > h2)
 		goto out_of_range;

-	      emch = (cp1high == 0 ? make_ichar (charset, cp1low, 0) :
-		      make_ichar (charset, cp1high, cp1low));
-	      set_unicode_conversion (emch, cp2);
+	      set_unicode_conversion (cp2, charset, c1, c2);
 	    }
 	}
     }
＠＠ -1556,6 +1628,47 ＠＠

 /************************************************************************/
+/*                      Properties of Unicode chars                     */
+/************************************************************************/
+
+
+int
+unicode_char_columns (int code)
+{
+#if defined (HAVE_WCWIDTH) && defined (__STDC_ISO_10646__)
+  return wcwidth ((wchar_t) code);
+#else
+  /* #### We need to do a much better job here.  Although maybe wcwidth()
+     is available everywhere we care.  ＠＠#### Copy the source for wcwidth().
+     Also check under Windows for an equivalent. */
+  /* #### Use a range table for this! */
+  if (
+      /* Tibetan */
+      (code >= 0x0F00 && code <= 0x0FFF) ||
+      /* Ethiopic, Ethiopic Supplement */
+      (code >= 0x1200 && code <= 0x139F) ||
+      /* Unified Canadian Aboriginal Syllabic */
+      (code >= 0x1400 && code <= 0x167F) ||
+      /* Ethiopic Extended */
+      (code >= 0x2D80 && code <= 0x2DDF) ||
+      /* Do not combine the previous range with this one, as
+	 0x2E00 .. 0x2E7F is Supplemental Punctuation (Ancient Greek, etc.) */
+      /* CJK Radicals Supplement ... Hangul Syllables */
+      (code >= 0x2E80 && code <= 0xD7AF) ||
+      /* CJK Compatibility Ideographs */
+      (code >= 0xF900 && code <= 0xFAFF) ||
+      /* CJK Compatibility Forms */
+      (code >= 0xFE30 && code <= 0xFE4F) ||
+      /* CJK Unified Ideographs Extension B, CJK Compatibility Ideographs
+	 Supplement, any other crap in this region */
+      (code >= 0x20000 && code <= 0x2FFFF))
+    return 2;
+  return 1;
+#endif /* defined (HAVE_WCWIDTH) && defined (__STDC_ISO_10646__) */
+}
+
+
+/************************************************************************/
 /*                         Unicode coding system                        */
 /************************************************************************/

＠＠ -1594,9 +1707,9 ＠＠
   /* decode */
   unsigned char counter;
   int seen_char;
+  int first_surrogate;
+  unsigned int slop;
   /* encode */
-  Lisp_Object current_charset;
-  int current_char_boundary;
   int wrote_bom;
 };

＠＠ -1606,62 +1719,67 ＠＠

 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);

-/* Decode a UCS-2 or UCS-4 character into a buffer.  If the lookup fails, use
-   <GETA MARK> (U+3013) of JIS X 0208, which means correct character
-   is not found, instead.
-   #### do something more appropriate (use blob?)
-        Danger, Will Robinson!  Data loss.  Should we signal user? */
+/* Decode a UCS-2 or UCS-4 character (or -1 for error) into a buffer. */
 static void
 decode_unicode_char (int ch, unsigned_char_dynarr *dst,
 		     struct unicode_coding_stream *data,
 		     unsigned int ignore_bom)
 {
+  text_checking_assert (ch >= 0);
   if (ch == 0xFEFF && !data->seen_char && ignore_bom)
     ;
   else
     {
 #ifdef MULE
-      Ichar chr = unicode_to_ichar (ch, unicode_precedence_dynarr);
-
-      if (chr != -1)
-	{
-	  Ibyte work[MAX_ICHAR_LEN];
-	  int len;
-
-	  len = set_itext_ichar (work, chr);
-	  Dynarr_add_many (dst, work, len);
-	}
-      else
-	{
-	  Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
-	  Dynarr_add (dst, 34 + 128);
-	  Dynarr_add (dst, 46 + 128);
-	}
+      /* ＠＠#### If the lookup fails, we will currently use a replacement char
+	 (e.g. <GETA MARK> (U+3013) of JIS X 0208).
+	 #### Danger, Will Robinson!  Data loss. Should we signal user? */
+      Ichar chr = unicode_to_ichar (ch, get_unicode_precedence (),
+				    CONVERR_SUCCEED);
+      Dynarr_add_ichar (dst, chr);
 #else
-      Dynarr_add (dst, (Ibyte) ch);
+      if (ch < 256)
+	Dynarr_add (dst, (Ibyte) ch);
+      else
+	/* This is OK since we are non-Mule and this becomes a single byte */
+	Dynarr_add (dst, CANT_CONVERT_CHAR_WHEN_DECODING);
 #endif /* MULE */
     }

   data->seen_char = 1;
 }

+inline static void
+add_16_bit_char (int code, unsigned_char_dynarr *dst, int little_endian)
+{
+  if (little_endian)
+    {
+      Dynarr_add (dst, (unsigned char) (code & 255));
+      Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
+    }
+  else
+    {
+      Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
+      Dynarr_add (dst, (unsigned char) (code & 255));
+    }
+}
+
 static void
-encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
-		       enum unicode_type type, unsigned int little_endian)
+encode_unicode_char (int code, unsigned_char_dynarr *dst,
+		     enum unicode_type type, unsigned int little_endian)
 {
+  text_checking_assert (code >= 0);
   switch (type)
     {
     case UNICODE_UTF_16:
-      if (little_endian)
+      /* Handle surrogates */
+      if (code > 0xFFFF)
 	{
-	  Dynarr_add (dst, (unsigned char) (code & 255));
-	  Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
+	  add_16_bit_char (0xD7C0 + (code >> 10), dst, little_endian);
+	  add_16_bit_char (0xDC00 | (code & 0x3FF), dst, little_endian);
 	}
       else
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
-	  Dynarr_add (dst, (unsigned char) (code & 255));
-	}
+	add_16_bit_char (code, dst, little_endian);
       break;

     case UNICODE_UCS_4:
＠＠ -1682,90 +1800,48 ＠＠
       break;

     case UNICODE_UTF_8:
-      if (code <= 0x7f)
-	{
+      {
+	/* #### This code is duplicated in non_ascii_set_itext_ichar() in
+	   text.c.  There should be a better way. */
+	if (code <= 0x7f)
 	  Dynarr_add (dst, (unsigned char) code);
-	}
-      else if (code <= 0x7ff)
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 6) | 0xc0));
-	  Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
-	}
-      else if (code <= 0xffff)
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 12) | 0xe0));
-	  Dynarr_add (dst, (unsigned char) (((code >>  6) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) ((code        & 0x3f) | 0x80));
-	}
-      else if (code <= 0x1fffff)
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 18) | 0xf0));
-	  Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >>  6) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) ((code        & 0x3f) | 0x80));
-	}
-      else if (code <= 0x3ffffff)
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
-	  Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >>  6) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) ((code        & 0x3f) | 0x80));
-	}
-      else
-	{
-	  Dynarr_add (dst, (unsigned char) ((code >> 30) | 0xfc));
-	  Dynarr_add (dst, (unsigned char) (((code >> 24) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) (((code >>  6) & 0x3f) | 0x80));
-	  Dynarr_add (dst, (unsigned char) ((code        & 0x3f) | 0x80));
-	}
-      break;
+	else
+	  {
+	    register int bytes;
+	    register unsigned char *str;

+	    if (code <= 0x7ff) bytes = 2;
+	    else if (code <= 0xffff) bytes = 3;
+	    else if (code <= 0x1fffff) bytes = 4;
+	    else if (code <= 0x3ffffff) bytes = 5;
+	    else bytes = 6;
+
+	    Dynarr_add_many (dst, 0, bytes);
+	    str = Dynarr_past_lastp (dst);
+	    switch (bytes)
+	      {
+	      case 6:*--str = (code | 0x80) & 0xBF; code >>= 6;
+	      case 5:*--str = (code | 0x80) & 0xBF; code >>= 6;
+	      case 4:*--str = (code | 0x80) & 0xBF; code >>= 6;
+	      case 3:*--str = (code | 0x80) & 0xBF; code >>= 6;
+	      case 2:*--str = (code | 0x80) & 0xBF; code >>= 6;
+	      case 1:*--str = code | firstbyte_mask[bytes];
+	      }
+	  }
+  
+	break;
+      }
+
     case UNICODE_UTF_7: ABORT ();

     default: ABORT ();
     }
 }

-static void
-encode_unicode_char (Lisp_Object USED_IF_MULE (charset), int h,
-		     int USED_IF_MULE (l), unsigned_char_dynarr *dst,
-		     enum unicode_type type, unsigned int little_endian)
-{
-#ifdef MULE
-  int code = ichar_to_unicode (make_ichar (charset, h & 127, l & 127));
-
-  if (code == -1)
-    {
-      if (type != UNICODE_UTF_16 &&
-	  XCHARSET_DIMENSION (charset) == 2 &&
-	  XCHARSET_CHARS (charset) == 94)
-	{
-	  unsigned char final = XCHARSET_FINAL (charset);
-
-	  if (('＠' <= final) && (final < 0x7f))
-	    code = (0xe00000 + (final - '＠') * 94 * 94
-		    + ((h & 127) - 33) * 94 + (l & 127) - 33);
-	  else
-	    code = '?';
-	}
-      else
-	code = '?';
-    }
-#else
-  int code = h;
-#endif /* MULE */
-
-  encode_unicode_char_1 (code, dst, type, little_endian);
-}
-
 static Bytecount
 unicode_convert (struct coding_stream *str, const UExtbyte *src,
 		 unsigned_char_dynarr *dst, Bytecount n)
 {
-  unsigned int ch    = str->ch;
   struct unicode_coding_stream *data = CODING_STREAM_TYPE_DATA (str, unicode);
   enum unicode_type type =
     XCODING_SYSTEM_UNICODE_TYPE (str->codesys);
＠＠ -1777,11 +1853,14 ＠＠
   if (str->direction == CODING_DECODE)
     {
       unsigned char counter = data->counter;
+      int slop    = data->slop;

       while (n--)
 	{
 	  UExtbyte c = *src++;

+	  /* #### This duplicates code elsewhere.  Maybe it is possible
+	     to combine them efficiently. */
 	  switch (type)
 	    {
 	    case UNICODE_UTF_8:
＠＠ -1790,76 +1869,97 ＠＠
 		case 0:
 		  if (c >= 0xfc)
 		    {
-		      ch = c & 0x01;
+		      slop = c & 0x01;
 		      counter = 5;
 		    }
 		  else if (c >= 0xf8)
 		    {
-		      ch = c & 0x03;
+		      slop = c & 0x03;
 		      counter = 4;
 		    }
 		  else if (c >= 0xf0)
 		    {
-		      ch = c & 0x07;
+		      slop = c & 0x07;
 		      counter = 3;
 		    }
 		  else if (c >= 0xe0)
 		    {
-		      ch = c & 0x0f;
+		      slop = c & 0x0f;
 		      counter = 2;
 		    }
 		  else if (c >= 0xc0)
 		    {
-		      ch = c & 0x1f;
+		      slop = c & 0x1f;
 		      counter = 1;
 		    }
 		  else
 		    decode_unicode_char (c, dst, data, ignore_bom);
 		  break;
 		case 1:
-		  ch = (ch << 6) | (c & 0x3f);
-		  decode_unicode_char (ch, dst, data, ignore_bom);
-		  ch = 0;
+		  slop = (slop << 6) | (c & 0x3f);
+		  decode_unicode_char (slop, dst, data, ignore_bom);
+		  slop = 0;
 		  counter = 0;
 		  break;
 		default:
-		  ch = (ch << 6) | (c & 0x3f);
+		  slop = (slop << 6) | (c & 0x3f);
 		  counter--;
 		}
 	      break;

 	    case UNICODE_UTF_16:
 	      if (little_endian)
-		ch = (c << counter) | ch;
+		slop = (c << counter) | slop;
 	      else
-		ch = (ch << 8) | c;
+		slop = (slop << 8) | c;
 	      counter += 8;
-	      if (counter == 16)
+	      if (counter == 32)
 		{
-		  int tempch = ch;
-		  ch = 0;
+		  /* Surrogate processing */
 		  counter = 0;
-		  decode_unicode_char (tempch, dst, data, ignore_bom);
+		  if ((slop & 0xFC00) == 0xDC00)
+		    decode_unicode_char
+		      (((data->first_surrogate - 0xD800) << 10) +
+		       slop - 0xDC00 + 0x10000, dst, data, ignore_bom);
+		  else
+		    /* Houston, we have a problem. */
+		    /* !!#### error handling? */
+		    {
+		      DECODE_ADD_BINARY_CHAR
+			((data->first_surrogate >> 8) & 255, dst);
+		      DECODE_ADD_BINARY_CHAR
+			(data->first_surrogate & 255, dst);
+		      DECODE_ADD_BINARY_CHAR
+			((slop >> 8) & 255, dst);
+		      DECODE_ADD_BINARY_CHAR
+			(slop & 255, dst);
+		    }
 		}
+	      else if (counter == 16)
+		{
+		  if ((slop & 0xFC00) == 0xD800)
+		    data->first_surrogate = slop;
+		  else
+		    {
+		      counter = 0;
+		      decode_unicode_char (slop, dst, data, ignore_bom);
+		    }
+		  slop = 0;
+		}
 	      break;

 	    case UNICODE_UCS_4:
 	      if (little_endian)
-		ch = (c << counter) | ch;
+		slop = (c << counter) | slop;
 	      else
-		ch = (ch << 8) | c;
+		slop = (slop << 8) | c;
 	      counter += 8;
 	      if (counter == 32)
 		{
-		  int tempch = ch;
-		  ch = 0;
 		  counter = 0;
-		  if (tempch < 0)
-		    {
-		      /* !!#### indicate an error */
-		      tempch = '~';
-		    }
-		  decode_unicode_char (tempch, dst, data, ignore_bom);
+		  /* !!#### indicate an error if slop < 0 */
+		  decode_unicode_char (slop, dst, data, ignore_bom);
+		  slop = 0;
 		}
 	      break;

＠＠ -1871,16 +1971,24 ＠＠
 	    }

 	}
-      if (str->eof)
-	DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+      if (str->eof && counter > 0)
+	{
+	  if (type == UNICODE_UTF_16 && counter > 16)
+	    slop = (data->first_surrogate << 8) + slop;
+	  switch (counter)
+	    {
+	      /* fall through */
+	    case 24: DECODE_ADD_BINARY_CHAR ((slop >> 16) & 255, dst);
+	    case 16: DECODE_ADD_BINARY_CHAR ((slop >> 8) & 255, dst);
+	    case 8: DECODE_ADD_BINARY_CHAR (slop & 255, dst);
+	    }
+	}

+      data->slop = slop;
       data->counter = counter;
     }
   else
     {
-      unsigned char char_boundary = data->current_char_boundary;
-      Lisp_Object charset = data->current_charset;
-
 #ifdef ENABLE_COMPOSITE_CHARS
       /* flags for handling composite chars.  We do a little switcheroo
 	 on the source while we're outputting the composite char. */
＠＠ -1893,7 +2001,7 ＠＠

       if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) &&
!data->wrote_bom)
 	{
-	  encode_unicode_char_1 (0xFEFF, dst, type, little_endian);
+	  encode_unicode_char (0xFEFF, dst, type, little_endian);
 	  data->wrote_bom = 1;
 	}

＠＠ -1904,104 +2012,56 ＠＠
 #ifdef MULE
 	  if (byte_ascii_p (c))
 #endif /* MULE */
-	    {			/* Processing ASCII character */
-	      ch = 0;
-	      encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
-				   little_endian);
-
-	      char_boundary = 1;
-	    }
+	    encode_unicode_char (c, dst, type, little_endian);
 #ifdef MULE
-	  else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
-	    {			/* Processing Leading Byte */
-	      ch = 0;
-	      charset = charset_by_leading_byte (c);
-	      if (leading_byte_prefix_p(c))
-		ch = c;
-	      char_boundary = 0;
-	    }
 	  else
-	    {			/* Processing Non-ASCII character */
-	      char_boundary = 1;
-	      if (EQ (charset, Vcharset_control_1))
-		/* See:
-
-		   (Info-goto-node "(internals)Internal String Encoding")
-
-		   for the rationale behind subtracting #xa0 from the
-		   character's code. */
-		encode_unicode_char (Vcharset_control_1, c - 0xa0, 0, dst,
-				     type, little_endian);
-	      else
+	    {
+	      COPY_PARTIAL_CHAR_BYTE (c, str);
+	      if (!str->pind_remaining)
 		{
-		  switch (XCHARSET_REP_BYTES (charset))
-		    {
-		    case 2:
-		      encode_unicode_char (charset, c, 0, dst, type,
-					   little_endian);
-		      break;
-		    case 3:
-		      if (XCHARSET_PRIVATE_P (charset))
-			{
-			  encode_unicode_char (charset, c, 0, dst, type,
-					       little_endian);
-			  ch = 0;
-			}
-		      else if (ch)
-			{
+#ifdef UNICODE_INTERNAL
+		  encode_unicode_char (non_ascii_itext_ichar (str->partial),
+				       dst, type, little_endian);
+#else
+		  Lisp_Object charset;
+		  int c1, c2;
+		  non_ascii_itext_to_charset_codepoint_raw (str->partial, 0,
+							    &charset, &c1,
+							    &c2);
 #ifdef ENABLE_COMPOSITE_CHARS
-			  if (EQ (charset, Vcharset_composite))
-			    {
-			      if (in_composite)
-				{
-				  /* #### Bother! We don't know how to
-				     handle this yet. */
-				  encode_unicode_char (Vcharset_ascii, '~', 0,
-						       dst, type,
-						       little_endian);
-				}
-			      else
-				{
-				  Ichar emch = make_ichar (Vcharset_composite,
-							   ch & 0x7F,
-							   c & 0x7F);
-				  Lisp_Object lstr =
-				    composite_char_string (emch);
-				  saved_n = n;
-				  saved_src = src;
-				  in_composite = 1;
-				  src = XSTRING_DATA   (lstr);
-				  n   = XSTRING_LENGTH (lstr);
-				}
-			    }
-			  else
-#endif /* ENABLE_COMPOSITE_CHARS */
-			    encode_unicode_char (charset, ch, c, dst, type,
-						 little_endian);
-			  ch = 0;
-			}
-		      else
-			{
-			  ch = c;
-			  char_boundary = 0;
-			}
-		      break;
-		    case 4:
-		      if (ch)
+		  if (EQ (charset, Vcharset_composite))
+		    {
+		      if (in_composite)
 			{
-			  encode_unicode_char (charset, ch, c, dst, type,
-					       little_endian);
-			  ch = 0;
+			  /* #### Bother! We don't know how to
+			     handle this yet. */
+			  encode_unicode_char
+			    (CANT_CONVERT_CHAR_WHEN_ENCODING_UNICODE,
+			     dst, type, little_endian);
 			}
 		      else
 			{
-			  ch = c;
-			  char_boundary = 0;
+			  Ichar emch =
+			    charset_codepoint_to_unicode
+			    (Vcharset_composite, c1, c2,
+			     CONVERR_SUCCEED);
+			  Lisp_Object lstr = composite_char_string (emch);
+			  saved_n = n;
+			  saved_src = src;
+			  in_composite = 1;
+			  src = XSTRING_DATA   (lstr);
+			  n   = XSTRING_LENGTH (lstr);
 			}
-		      break;
-		    default:
-		      ABORT ();
 		    }
+		  else
+#endif /* ENABLE_COMPOSITE_CHARS */
+		    {
+		      int code =
+			charset_codepoint_to_unicode
+			(charset, c1, c2, CONVERR_SUCCEED);
+		      encode_unicode_char (code, dst, type, little_endian);
+		    }
+#endif /* UNICODE_INTERNAL */
 		}
 	    }
 #endif /* MULE */
＠＠ -2017,15 +2077,11 ＠＠
 	}
 #endif /* ENABLE_COMPOSITE_CHARS */

-      data->current_char_boundary = char_boundary;
-      data->current_charset = charset;
-
       /* La palabra se hizo carne! */
       /* A palavra fez-se carne! */
       /* Whatever. */
     }

-  str->ch    = ch;
   return orign;
 }

＠＠ -2336,7 +2392,6 ＠＠
   struct unicode_coding_stream *data =
     CODING_STREAM_TYPE_DATA (str, unicode);
   xzero (*data);
-  data->current_charset = Qnil;
 }

 static void
＠＠ -2432,7 +2487,24 ＠＠
 /*                             Initialization                           */
 /************************************************************************/

+#ifdef MULE
+
 void
+initialize_ascii_control_1_latin_1_unicode_translation (void)
+{
+  int i;
+
+  for (i = 0; i < 128; i++)
+    set_unicode_conversion (i, Vcharset_ascii, 0, i);
+  for (i = 128; i < 160; i++)
+    set_unicode_conversion (i, Vcharset_control_1, 0, i);
+  for (i = 160; i < 256; i++)
+    set_unicode_conversion (i, Vcharset_latin_iso8859_1, 0, i);
+}
+
+#endif
+
+void
 syms_of_unicode (void)
 {
 #ifdef MULE
＠＠ -2448,9 +2520,6 ＠＠
   DEFSYMBOL (Qignore_first_column);
 #endif /* MULE */

-  DEFSUBR (Fchar_to_unicode);
-  DEFSUBR (Funicode_to_char);
-
   DEFSYMBOL (Qunicode);
   DEFSYMBOL (Qucs_4);
   DEFSYMBOL (Qutf_16);
＠＠ -2514,9 +2583,13 ＠＠
   staticpro (&Vdefault_unicode_precedence_list);
   Vdefault_unicode_precedence_list = Qnil;

+  staticpro (&Vprecedence_calculation_hash);
+  Vprecedence_calculation_hash =
+    make_lisp_hash_table (20, HASH_TABLE_NON_WEAK, HASH_TABLE_EQUAL);
+
   unicode_precedence_dynarr = Dynarr_new (Lisp_Object);
   dump_add_root_block_ptr (&unicode_precedence_dynarr,
-			    &lisp_object_dynarr_description);
+			    &Lisp_Object_dynarr_description);

   init_blank_unicode_tables ();

＠＠ -2532,13 +2605,13 ＠＠
   dump_add_root_block (&to_unicode_blank_2, sizeof (void *),
 		       to_unicode_level_2_desc_1);

-  dump_add_root_block (&from_unicode_blank_1, sizeof (void *),
+  dump_add_root_block (&from_unicode_blank[1], sizeof (void *),
 		       from_unicode_level_1_desc_1);
-  dump_add_root_block (&from_unicode_blank_2, sizeof (void *),
+  dump_add_root_block (&from_unicode_blank[2], sizeof (void *),
 		       from_unicode_level_2_desc_1);
-  dump_add_root_block (&from_unicode_blank_3, sizeof (void *),
+  dump_add_root_block (&from_unicode_blank[3], sizeof (void *),
 		       from_unicode_level_3_desc_1);
-  dump_add_root_block (&from_unicode_blank_4, sizeof (void *),
+  dump_add_root_block (&from_unicode_blank[4], sizeof (void *),
 		       from_unicode_level_4_desc_1);
 #endif /* MULE */
 }
Index: src/win32.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/win32.c,v
retrieving revision 1.25
diff -u -r1.25 win32.c
--- src/win32.c	2005/01/28 02:36:28	1.25
+++ src/win32.c	2005/11/22 14:01:10
＠＠ -21,7 +21,7 ＠＠
 #include <config.h>
 #include "lisp.h"

-#include "buffer.h"
+#include "casetab.h"
 #include "console-msw.h"
 #include "hash.h"
 #include "profile.h"
Index: src/window.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/window.c,v
retrieving revision 1.88
diff -u -r1.88 window.c
--- src/window.c	2005/10/25 11:16:30	1.88
+++ src/window.c	2005/11/22 14:01:12
＠＠ -1,7 +1,7 ＠＠
 /* Window creation, deletion and examination for XEmacs.
    Copyright (C) 1985-1987, 1992-1995 Free Software Foundation, Inc.
    Copyright (C) 1994, 1995 Board of Trustees, University of Illinois.
-   Copyright (C) 1995, 1996, 2002 Ben Wing.
+   Copyright (C) 1995, 1996, 2002, 2005 Ben Wing.
    Copyright (C) 1996 Chuck Thompson.

 This file is part of XEmacs.
＠＠ -150,32 +150,23 ＠＠

-static const struct memory_description int_description_1[] = {
-  { XD_END }
-};
-
-static const struct sized_memory_description int_description = {
-  sizeof (int),
-  int_description_1
-};
-
-static const struct memory_description int_dynarr_description_1[] = {
-  XD_DYNARR_DESC (int_dynarr, &int_description),
-  { XD_END }
-};
-
-static const struct sized_memory_description int_dynarr_description = {
-  sizeof (int_dynarr),
-  int_dynarr_description_1
-};
-
 static const struct memory_description face_cachel_description_1[] = {
-  { XD_BLOCK_PTR, offsetof (face_cachel, merged_faces),
+  /* #### Hack; look inside of the Stynarr structs */
+  { XD_BLOCK_PTR, offsetof (face_cachel, merged_faces.els),
     1, { &int_dynarr_description } },
+  { XD_BLOCK_PTR, offsetof (face_cachel, font_specified.els),
+    1, { &unsigned_char_dynarr_description } },
+  { XD_BLOCK_PTR, offsetof (face_cachel, font_updated.els),
+    1, { &unsigned_char_dynarr_description } },
+  /* Even if the whole static part isn't filled, this is OK, because those
+     values will be set to Qzero */
+  { XD_LISP_OBJECT_ARRAY, offsetof (face_cachel, font),
+    countof (((struct face_cachel *) 0)->font.els_static) },
+  { XD_BLOCK_PTR, offsetof (face_cachel, font.els),
+    1, { &Lisp_Object_pair_dynarr_description } },
   { XD_LISP_OBJECT, offsetof (face_cachel, face) },
   { XD_LISP_OBJECT, offsetof (face_cachel, foreground) },
   { XD_LISP_OBJECT, offsetof (face_cachel, background) },
-  { XD_LISP_OBJECT_ARRAY, offsetof (face_cachel, font), NUM_LEADING_BYTES },
   { XD_LISP_OBJECT, offsetof (face_cachel, display_table) },
   { XD_LISP_OBJECT, offsetof (face_cachel, background_pixmap) },
   { XD_END }
＠＠ -269,7 +260,7 ＠＠
 	      int UNUSED (escapeflag))
 {
   if (print_readably)
-    printing_unreadable_object ("#<window 0x%x>", XWINDOW
(obj)->header.uid);
+    printing_unreadable_lcrecord (obj, 0);

   write_c_string (printcharfun, "#<window");
   if (!NILP (XWINDOW (obj)->buffer))
＠＠ -298,11 +289,10 ＠＠
       for (i = 0; i < Dynarr_length (w->face_cachels); i++)
 	{
 	  struct face_cachel *cachel = Dynarr_atp (w->face_cachels, i);
-	  if (cachel->merged_faces)
-	    {
-	      Dynarr_free (cachel->merged_faces);
-	      cachel->merged_faces = 0;
-	    }
+	  Stynarr_free (cachel->merged_faces);
+	  Stynarr_free (cachel->font);
+	  Stynarr_free (cachel->font_specified);
+	  Stynarr_free (cachel->font_updated);
 	}
       Dynarr_free (w->face_cachels);
       w->face_cachels = 0;
＠＠ -4145,7 +4135,7 ＠＠
 	    ypos1 = WINDOW_TEXT_TOP (w);
 	  else
 	    {
-	      dl = Dynarr_atp (dla, Dynarr_length (dla) - 1);
+	      dl = Dynarr_lastp (dla);
 	      /* If this line is clipped then we know that there is no
                  blank room between eob and the modeline.  If we are
                  scrolling on clipped lines just know off the clipped
＠＠ -4167,7 +4157,7 ＠＠
 	num_lines--;

       if (scroll_on_clipped_lines
-	  && Dynarr_atp (dla, Dynarr_length (dla) - 1)->clip)
+	  && Dynarr_lastp (dla)->clip)
 	num_lines--;
     }

cvs server: Diffing src/m
cvs server: Diffing src/s
cvs server: Diffing tests
cvs server: Diffing tests/DLL
cvs server: Diffing tests/Dnd
cvs server: Diffing tests/autoconf
cvs server: Diffing tests/automated
cvs server: Diffing tests/gtk
cvs server: Diffing tests/mule
cvs server: Diffing tests/tooltalk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: Is there any more information about unicode-internal?