NOTE: This patch has been committed.
The warnings about data loss are subsequent to discussion with Stephen; the
addition of mule-ucs.el is entirely on my initiative.
mule-packages/mule-ucs/ChangeLog addition:
2005-02-28 Aidan Kehoe <kehoea(a)parhasard.net>
* doc/mule-ucs.texi (Top):
"latin-unity may be of use" -> "latin-unity will probably be of
use"
* doc/mule-ucs.texi (Overview):
Mention that Mule-UCS will probably trash data.
* doc/mule-ucs.texi (Configuration):
Re-iterate that Mule-UCS will probably trash data. Comment out
the hope that advanced features will be documented; I don't
anticipate that happening.
mule-packages/mule-ucs/lisp/ChangeLog addition:
2005-02-28 Aidan Kehoe <kehoea(a)parhasard.net>
* unicode.el (ucs-to-char):
Document that many code points won't have an XEmacs mapping.
* unicode.el (char-to-ucs):
Add a docstring.
* mule-ucs.el: Add a Lisp library that just does a (require
'un-define), on the assumption that people may hear about Mule-UCS
as the way to get Unicode support in 21.4 and try M-: (require
'mule-ucs) RET before going and reading docs.
XEmacs Packages source patch:
Diff command: cvs -q diff -Nu
Files affected: mule-packages/mule-ucs/lisp/unicode.el
mule-packages/mule-ucs/lisp/mule-ucs.el mule-packages/mule-ucs/doc/mule-ucs.texi
Index: mule-packages/mule-ucs/doc/mule-ucs.texi
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/mule-packages/mule-ucs/doc/mule-ucs.texi,v
retrieving revision 1.5
diff -u -u -r1.5 mule-ucs.texi
--- mule-packages/mule-ucs/doc/mule-ucs.texi 2005/01/30 18:43:09 1.5
+++ mule-packages/mule-ucs/doc/mule-ucs.texi 2005/02/28 21:47:23
@@ -108,6 +108,14 @@
@c You can find the latest version of this document on the web at
@c @uref{http://www.xemacs.org/}.
+IMPORTANT NOTE; Mule-UCS translates from Unicode to XEmacs' internal
+Mule encoding, and vice-versa. This internal encoding does not have a
+mapping for every Unicode code point, so if you are using any code point
+that is remotely obscure, there's a good chance it will be trashed, and
+you will lose data. Examples of such code points are U+264A WHITE
+SMILING FACE and U+201A SINGLE LOW-9 QUOTATION MARK, the latter as often
+used in Central Europe.
+
@ifhtml
@c This manual is also available as a @uref{mule-ucs_ja.html, a Japanese
@c translation}.
@@ -233,7 +241,7 @@
character sets, where by international standard as well as common
practice characters common to more than one character set are considered
identical (not "unified" as for the Han characters in Unicode), the
-@file{latin-unity} package may be of use.
+@file{latin-unity} package will probably be of use.
@c #### need examples of un-define-change-charset-order usage
@@ -241,7 +249,8 @@
translate multilingual texts into non-Unicode encodings such as ISO 2022
will have to be done by hand.)
-That is all that most users of Mule-UCS need to know.
+That is all that most users of Mule-UCS need to know---but make sure
+you've read the warning at the start of this document about losing data!
Mule-UCS is still under development and any problems you encounter,
trivial or major, should be reported to the Mule-UCS developers. Use
@@ -359,14 +368,15 @@
implemented in XEmacs itself. Mule-UCS provides some utilities in the
@file{un-tools} library, but these are of unknown reliability.
-That is all that most users of Mule-UCS need to know. The rest of this
-section documents various advanced features which allow Mule-UCS to be
-tuned to resolve ambiguities (such as the unification of the Han
-characters across several languages) more appropriately.
+That is all that most users of Mule-UCS need to know---but make sure
+you've read the warning at the start of this document about losing data!
-@c #### FIXME!
-Well, it will once it's written. @code{:-P}
+@c The rest of this section documents various advanced features which allow
+@c Mule-UCS to be tuned to resolve ambiguities (such as the unification of
+@c the Han characters across several languages) more appropriately.
+@c #### FIXME!
+@c Well, it will once it's written. @code{:-P}
@node Design of Mule-UCS, , Configuration, Top
@chapter Design goal
Index: mule-packages/mule-ucs/lisp/mule-ucs.el
===================================================================
RCS file: mule-ucs.el
diff -N mule-ucs.el
--- /dev/null Mon Feb 28 22:47:21 2005
+++ mule-packages/mule-ucs/lisp/mule-ucs.el Mon Feb 28 22:47:23 2005
@@ -0,0 +1,37 @@
+;;; mule-ucs.el --- Create an entry point for Mule-UCS on the pattern of
+;;; every other Lisp package out there.
+
+;; Copyright (C) 2005 Free Software Foundation
+
+;; Author: Aidan Kehoe
+;; Keywords: mule, multilingual, unicode
+;; Created: 2005-02-28
+
+;; This file is part of XEmacs
+
+;; XEmacs is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 2, or (at your option) any later
+;; version.
+
+;; XEmacs is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+;; FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
+;; details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with this program; see the file COPYING. If not, write to the
+;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+;; Boston, MA 02111-1307, USA.
+
+;; Comment:
+
+;; I don't advocate pointing people towards this file; this is purely to
+;; make the thing Just Work for people who hear about Mule-UCS as the means
+;; to Unicode support in 21.4 and guess that it works on the model of every
+;; Lisp library out there.
+
+(require 'un-define)
+
+(provide 'mule-ucs)
+
Index: mule-packages/mule-ucs/lisp/unicode.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/packages/mule-packages/mule-ucs/lisp/unicode.el,v
retrieving revision 1.2
diff -u -u -r1.2 unicode.el
--- mule-packages/mule-ucs/lisp/unicode.el 2002/03/18 09:29:02 1.2
+++ mule-packages/mule-ucs/lisp/unicode.el 2005/02/28 21:47:24
@@ -89,9 +89,19 @@
;;;
(defun ucs-to-char (codepoint)
+ "Convert Unicode codepoint to an XEmacs character.
+CODE should be a non-negative integer.
+
+If `codepoint' cannot be represented as an XEmacs character--that is,
+Mule-UCS doesn't know about any registered translation from it to some
+character in the multiple-character-set non-Unified Mule model, return nil.
+Beware that *many* Unicode codepoints have no representation in the Mule
+model, and since the Mule-UCS coding systems have the same backend as does
+this function, they will tend to trash data. "
(ucs-representation-decoding-backend 'ucs codepoint nil))
(defun char-to-ucs (char)
+ "Convert character to Unicode codepoint. "
(ucs-representation-encoding-backend char 'ucs nil))
;;;
--
“I, for instance, am gung-ho about open source because my family is being
held hostage in Rob Malda’s basement. But who fact-checks me, or Enderle,
when we say something in public? No-one!” -- Danny O’Brien