src/ChangeLog addition:
2006-11-17 Aidan Kehoe <kehoea(a)parhasard.net>
* text.c:
* text.c (Fmake_char):
`Octet' is incorrect; only the low seven bits are used. Mention
the MIME use of the term `charset' and cross-reference to
coding-system-p for the XEmacs implementation of the same
idea. Add an example call to (decode-char 'ucs ...), move the
decode-big5-char arguments to hex.
* text.c (Fsplit_char):
BREAKUP_ICHAR assigns charset the character set object, never its
name, so the get-charset call is superfluous.
XEmacs Trunk source patch:
Diff command: cvs -q diff -u
Files affected: src/text.c
===================================================================
RCS
Index: src/text.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/text.c,v
retrieving revision 1.29
diff -u -r1.29 text.c
--- src/text.c 2006/08/24 21:21:36 1.29
+++ src/text.c 2006/11/17 16:18:32
@@ -4840,24 +4840,33 @@
/************************************************************************/
DEFUN ("make-char", Fmake_char, 2, 3, 0, /*
-Make a character from CHARSET and octets ARG1 and ARG2.
+Make a character from CHARSET and integers ARG1 and ARG2.
ARG2 is required only for characters from two-dimensional charsets.
-Each octet should be in the range 32 through 127 for a 96 or 96x96
-charset and 33 through 126 for a 94 or 94x94 charset. (Most charsets
-are either 96 or 94x94.) Note that this is 32 more than the values
-typically given for 94x94 charsets. When two octets are required, the
-order is "standard" -- the same as appears in ISO-2022 encodings,
-reference tables, etc.
+An XEmacs `charset' is a very distinct concept from a MIME charset,
+and unfortunately for the XEmacs documentation, the MIME
+interpretation is today the better known of the two. For information
+on how we implement the concept that MIME describes using the term,
+see `coding-system-p'. XEmacs takes its terminology from ISO/IEC
+2022, a much older standard that was mostly implemented in East Asia,
+and which the ECMA makes freely available as Ecma-035.pdf.
+
+The low-order seven bits of each integer should be in the range 32
+through 127 for a 96- or 96x96-character charset and 33 through 126
+for a 94- or 94x94-character charset--most XEmacs charsets are either
+96 or 94x94. This is 32 more than the values typically given for
+94x94 charsets. When two integers are required, the order is that
+that appears in in ISO-2022 encodings, and the standard documents'
+reference tables.
\(Note the following non-obvious result: Computerized translation
-tables often encode the two octets as the high and low bytes,
-respectively, of a hex short, while when there's only one octet, it
+tables often encode the two integers as the high and low bytes,
+respectively, of a hex short, while when there's only one integer, it
goes in the low byte. When decoding such a value, you need to treat
the two cases differently when calling make-char: One is (make-char
CHARSET HIGH LOW), the other is (make-char CHARSET LOW).)
-For example, (make-char 'latin-iso8859-2 185) or (make-char
+For example, \(make-char 'latin-iso8859-2 #xB9) or \(make-char
'latin-iso8859-2 57) will return the Latin 2 character s with caron.
As another example, the Japanese character for "kawa" (stream), which
@@ -4882,20 +4891,19 @@
These are equivalent to:
+\(decode-char 'ucs #x5DDD)
\(make-char 'chinese-gb2312 52 40)
\(make-char 'japanese-jisx0208 64 110)
\(make-char 'korean-ksc5601 116 57)
+\(decode-big5-char '(#xA4 . #x74))
\(make-char 'chinese-cns11643-1 76 87)
-\(decode-big5-char '(164 . 116))
-\(All codes above are two decimal numbers except for Big Five and ANSI
-Z39.64, which we don't support. We add 32 to each of the decimal
-numbers. Big Five is split in a rather hackish fashion into two
-charsets, `big5-1' and `big5-2', due to its excessive size -- 94x157,
-with the first codepoint in the range 0xA1 to 0xFE and the second in
-the range 0x40 to 0x7E or 0xA1 to 0xFE. `decode-big5-char' is used to
-generate the char from its codes, and `encode-big5-char' extracts the
-codes.)
+\(We add 32 to each of the decimal numbers passed to make-char. Big
+Five is split in a rather hackish fashion into two charsets, `big5-1'
+and `big5-2', due to its excessive size -- 94x157, with the first
+codepoint in the range 0xA1 to 0xFE and the second in the range 0x40
+to 0x7E or 0xA1 to 0xFE. `decode-big5-char' is used to generate the
+char from its codes, and `encode-big5-char' extracts the codes.)
When compiled without MULE, this function does not do much, but it's
provided for compatibility. In this case, the following CHARSET symbols
@@ -4933,7 +4941,7 @@
{
if (!NILP (arg2))
invalid_argument
- ("Charset is of dimension one; second octet must be nil", arg2);
+ ("Charset is of dimension one; second integer must be nil", arg2);
return make_char (make_ichar (charset, a1, 0));
}
@@ -5005,7 +5013,13 @@
*/
(character))
{
- /* This function can GC */
+ /* [ This function can GC ]
+
+ Apart from argument checking, I disagree, but that's mostly
+ irrelevant, because if you're calling this function often
+ enough that it matters, talk to us, and we'll probably
+ implement something for you in C. Aidan Kehoe, Fri Nov 17
+ 16:24:30 2006. */
struct gcpro gcpro1, gcpro2;
Lisp_Object charset = Qnil;
Lisp_Object rc = Qnil;
@@ -5016,7 +5030,7 @@
BREAKUP_ICHAR (XCHAR (character), charset, c1, c2);
- if (XCHARSET_DIMENSION (Fget_charset (charset)) == 2)
+ if (XCHARSET_DIMENSION (charset) == 2)
{
rc = list3 (XCHARSET_NAME (charset), make_int (c1), make_int (c2));
}
--
Santa Maradona, priez pour moi!
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches