Prevent infinite recursion with undecided coding system
17 years, 4 months
Michael Sperber
This has been haunting me forever. I'll apply Thursday if nobody
objects.
2007-07-31 Mike Sperber <mike(a)xemacs.org>
* file-coding.c (undecided_convert): Kludge to prevent infinite
recursion.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Index: src/file-coding.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/file-coding.c,v
retrieving revision 1.57
diff -u -r1.57 file-coding.c
--- src/file-coding.c 22 Jul 2007 22:04:13 -0000 1.57
+++ src/file-coding.c 31 Jul 2007 15:46:41 -0000
@@ -3869,6 +3869,9 @@
random result when doing subprocess detection. */
detect_coding_type (data->st, src, n);
data->actual = detected_coding_system (data->st);
+ /* kludge to prevent infinite recursion */
+ if (XCODING_SYSTEM(data->actual)->methods->enumtype == undecided_coding_system)
+ data->actual = Fget_coding_system (Qraw_text);
}
}
/* We need to set the detected coding system if we actually have
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches
[COMMIT] Preserve invalid UTF-8, UTF-16 sequences on encoding, decoding.
17 years, 4 months
Aidan Kehoe
SUPERSEDES 18082.9316.913823.211586(a)parhasard.net
APPROVE COMMIT
NOTE: This patch has been committed.
Remaining to be done on this; moving to a separate Mule character set for
the error codes, so they can be searched for with some consistency, then
documenting what happens with invalid sequences for the sake of the
applications of people like David Kastrup (that particular order of things
is more desirable for the sake of people not needing to revise their code),
adding code to save-buffer to worry about invalid sequences and to notify
the user about them, extensive automated tests, adding language environment
code to make it clear to users what characters invalid sequences would map
to in windows-1252, koi8-r, windows-1250, as appropriate.
lisp/ChangeLog addition:
2007-08-04 Aidan Kehoe <kehoea(a)parhasard.net>
* unicode.el:
* unicode.el (utf-32):
* unicode.el (utf-32-little-endian):
Add UTF-32 coding systems.
* unicode.el (decode-char):
Only accept valid Unicode in this function.
src/ChangeLog addition:
2007-08-04 Aidan Kehoe <kehoea(a)parhasard.net>
* charset.h:
* charset.h (enum unicode_type):
Add UNICODE_UTF_32.
* lisp.h:
Add Qutf_32.
* lread.c (read_unicode_escape):
Error on an invalid Unicode escape; error on no mapping, as GNU does.
* mule-coding.c:
* mule-coding.c (dynarr_add_2022_one_dimension):
* mule-coding.c (dynarr_add_2022_two_dimensions):
* mule-coding.c (struct iso2022_coding_stream):
* mule-coding.c (decode_unicode_char):
* mule-coding.c (indicate_invalid_utf_8):
* mule-coding.c (iso2022_decode):
* unicode.c:
* unicode.c (struct unicode_coding_stream):
* unicode.c (decode_unicode_char):
* unicode.c (DECODE_ERROR_OCTET):
* unicode.c (indicate_invalid_utf_8):
* unicode.c (encode_unicode_char_1):
* unicode.c (encode_unicode_char):
* unicode.c (unicode_convert):
* unicode.c (unicode_putprop):
* unicode.c (unicode_getprop):
* unicode.c (syms_of_unicode):
Make UTF-8 and UTF-16 handling more robust; indicate error
sequences when decoding, passing the octets as distinct from the
corresponding ISO8859-1 characters, and (by default) writing them
to disk on encoding. Don't accept over-long UTF-8 sequences, codes
>= #x110000, or UTF-16 surrogates on reading in the utf-8 coding
system; represent them as error sequences.
Do accept code points above #x110000 in the ISO IR 196 handling,
since we decode Unicode error sequences to "Unicode" code points
starting at 0x200000, and will need to save them as such in
escape-quoted. Do not accept over-long UTF-8 sequences or UTF-16
surrogates in escape-quoted.
This change means that when a non-UTF-8 file is opened as UTF-8,
one change made, and immediately saved, the non-ASCII characters
are not corrupted. In Europe, this is a distinct win.
Add UCS-4, UTF-32 as coding systems.
XEmacs Trunk source patch:
Diff command: cvs -q diff -Nu
Files affected: src/unicode.c
===================================================================
RCS src/mule-coding.c
===================================================================
RCS src/lread.c
===================================================================
RCS src/lisp.h
===================================================================
RCS src/charset.h
===================================================================
RCS lisp/unicode.el
===================================================================
RCS
Index: lisp/unicode.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/unicode.el,v
retrieving revision 1.21
diff -u -u -r1.21 unicode.el
--- lisp/unicode.el 2007/07/28 08:02:16 1.21
+++ lisp/unicode.el 2007/08/04 19:44:13
@@ -233,6 +233,26 @@
little-endian t))
(make-coding-system
+ 'utf-32 'unicode
+ "UTF-32"
+ '(mnemonic "UTF32"
+ documentation
+ "UTF-32 Unicode encoding -- fixed-width four-byte encoding,
+characters less than #x10FFFF are not supported. "
+ unicode-type utf-32))
+
+(make-coding-system
+ 'utf-32-little-endian 'unicode
+ "UTF-32 Little Endian"
+ '(mnemonic "UTF32-LE"
+ documentation
+ "Little-endian version of UTF-32 Unicode encoding.
+
+A fixed-width four-byte encoding, characters less than #x10FFFF are not
+supported. "
+ unicode-type ucs-4 little-endian t))
+
+(make-coding-system
'utf-8 'unicode
"UTF-8"
'(mnemonic "UTF8"
@@ -274,6 +294,10 @@
(defun decode-char (quote-ucs code &optional restriction)
"FSF compatibility--return Mule character with Unicode codepoint CODE.
The second argument must be 'ucs, the third argument is ignored. "
+ ;; We're prepared to accept invalid Unicode in unicode-to-char, but not in
+ ;; this function, which is the API that should actually be used, since
+ ;; it's available in GNU and in Mule-UCS.
+ (check-argument-range code #x0 #x10FFFF)
(assert (eq quote-ucs 'ucs) t
"Sorry, decode-char doesn't yet support anything but the UCS. ")
(unicode-to-char code))
Index: src/charset.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/charset.h,v
retrieving revision 1.16
diff -u -u -r1.16 charset.h
--- src/charset.h 2006/11/12 13:40:07 1.16
+++ src/charset.h 2007/08/04 19:44:13
@@ -567,12 +567,20 @@
UNICODE_UTF_16,
UNICODE_UTF_8,
UNICODE_UTF_7,
- UNICODE_UCS_4
+ UNICODE_UCS_4,
+ UNICODE_UTF_32
};
void encode_unicode_char (Lisp_Object USED_IF_MULE (charset), int h,
int USED_IF_MULE (l), unsigned_char_dynarr *dst,
- enum unicode_type type, unsigned int little_endian);
+ enum unicode_type type, unsigned int little_endian,
+ int write_error_characters_as_such);
+
+#define UNICODE_ERROR_OCTET_RANGE_START 0x200000
+
+#define valid_utf_16_first_surrogate(ch) (((ch) & 0xFC00) == 0xD800)
+#define valid_utf_16_last_surrogate(ch) (((ch) & 0xFC00) == 0xDC00)
+#define valid_utf_16_surrogate(ch) (((ch) & 0xF800) == 0xD800)
void set_charset_registries(Lisp_Object charset, Lisp_Object registries);
Index: src/lisp.h
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lisp.h,v
retrieving revision 1.145
diff -u -u -r1.145 lisp.h
--- src/lisp.h 2007/05/26 18:28:23 1.145
+++ src/lisp.h 2007/08/04 19:44:16
@@ -5488,7 +5488,7 @@
void free_charset_unicode_tables (Lisp_Object charset);
void recalculate_unicode_precedence (void);
extern Lisp_Object Qunicode;
-extern Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7;
+extern Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7, Qutf_32;
#ifdef MEMORY_USAGE_STATS
Bytecount compute_from_unicode_table_size (Lisp_Object charset,
struct overhead_stats *stats);
Index: src/lread.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/lread.c,v
retrieving revision 1.81
diff -u -u -r1.81 lread.c
--- src/lread.c 2006/11/07 15:58:24 1.81
+++ src/lread.c 2007/08/04 19:44:17
@@ -1694,24 +1694,26 @@
}
}
+ if (i > 0x110000 || i < 0)
+ {
+ syntax_error ("Not a Unicode code point", make_int(i));
+ }
+
lisp_char = Funicode_to_char(make_int(i), Qnil);
if (EQ(Qnil, lisp_char))
{
- /* This is ugly and horrible and trashes the user's data, but
- it's what unicode.c does. In the future, unicode-to-char
- should not return nil. */
-#ifdef MULE
- i = make_ichar (Vcharset_japanese_jisx0208, 34 + 128, 46 + 128);
-#else
- i = '~';
-#endif
- return i;
- }
- else
- {
- return XCHAR(lisp_char);
+ /* Will happen on non-Mule. Silent corruption is what happens
+ elsewhere, and we used to do that to be consistent, but GNU error,
+ so people writing portable code need to be able to handle that, and
+ given a choice I prefer that behaviour.
+
+ An undesirable aspect to this error is that the code point is shown
+ as a decimal integer, which is mostly unreadable. */
+ syntax_error ("Unsupported Unicode code point", make_int(i));
}
+
+ return XCHAR(lisp_char);
}
Index: src/mule-coding.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/mule-coding.c,v
retrieving revision 1.39
diff -u -u -r1.39 mule-coding.c
--- src/mule-coding.c 2006/11/23 13:43:19 1.39
+++ src/mule-coding.c 2007/08/04 19:44:18
@@ -104,7 +104,7 @@
if (XCHARSET_ENCODE_AS_UTF_8 (charset))
{
encode_unicode_char (charset, c & charmask, 0,
- dst, UNICODE_UTF_8, 0);
+ dst, UNICODE_UTF_8, 0, 0);
}
else
{
@@ -123,7 +123,7 @@
encode_unicode_char (charset,
ch & charmask,
c & charmask, dst,
- UNICODE_UTF_8, 0);
+ UNICODE_UTF_8, 0, 0);
}
else
{
@@ -969,6 +969,7 @@
/* Used for handling UTF-8. */
unsigned char counter;
+ unsigned char indicated_length;
};
static const struct memory_description ccs_description_1[] =
@@ -1804,6 +1805,39 @@
}
}
+/* Note that this name conflicts with a function in unicode.c. */
+static void
+decode_unicode_char (int ucs, unsigned_char_dynarr *dst)
+{
+ Ibyte work[MAX_ICHAR_LEN];
+ int len;
+ Lisp_Object chr;
+
+ chr = Funicode_to_char(make_int(ucs), Qnil);
+ assert (!NILP(chr));
+ len = set_itext_ichar (work, XCHAR(chr));
+ Dynarr_add_many (dst, work, len);
+}
+
+#define DECODE_ERROR_OCTET(octet, dst) \
+ decode_unicode_char ((octet) + UNICODE_ERROR_OCTET_RANGE_START, dst)
+
+static inline void
+indicate_invalid_utf_8 (unsigned char indicated_length,
+ unsigned char counter,
+ int ch, unsigned_char_dynarr *dst)
+{
+ Binbyte stored = indicated_length - counter;
+ Binbyte mask = "\x00\x00\xC0\xE0\xF0\xF8\xFC"[indicated_length];
+
+ while (stored > 0)
+ {
+ DECODE_ERROR_OCTET (((ch >> (6 * (stored - 1))) & 0x3f) | mask,
+ dst);
+ mask = 0x80, stored--;
+ }
+}
+
/* Convert ISO2022-format data to internal format. */
static Bytecount
@@ -1907,9 +1941,7 @@
else if (flags & ISO_STATE_UTF_8)
{
unsigned char counter = data->counter;
- Ibyte work[MAX_ICHAR_LEN];
- int len;
- Lisp_Object chr;
+ unsigned char indicated_length = data->indicated_length;
if (ISO_CODE_ESC == c)
{
@@ -1919,74 +1951,127 @@
data->esc_bytes_index = 1;
continue;
}
-
- switch (counter)
- {
- case 0:
- if (c >= 0xfc)
- {
- ch = c & 0x01;
- counter = 5;
- }
- else if (c >= 0xf8)
- {
- ch = c & 0x03;
- counter = 4;
- }
- else if (c >= 0xf0)
- {
- ch = c & 0x07;
- counter = 3;
- }
- else if (c >= 0xe0)
- {
- ch = c & 0x0f;
- counter = 2;
- }
- else if (c >= 0xc0)
- {
- ch = c & 0x1f;
- counter = 1;
- }
- else
- /* ASCII, or the lower control characters.
-
- Perhaps we should signal an error if the character is in
- the range 0x80-0xc0; this is illegal UTF-8. */
- Dynarr_add (dst, (c & 0x7f));
-
- break;
- case 1:
- ch = (ch << 6) | (c & 0x3f);
- chr = Funicode_to_char(make_int(ch), Qnil);
-
- if (!NILP (chr))
- {
- assert(CHARP(chr));
- len = set_itext_ichar (work, XCHAR(chr));
- Dynarr_add_many (dst, work, len);
- }
- else
- {
- /* Shouldn't happen, this code should only be enabled in
- XEmacsen with support for all of Unicode. */
- Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
- Dynarr_add (dst, 34 + 128);
- Dynarr_add (dst, 46 + 128);
- }
-
- ch = 0;
- counter = 0;
- break;
- default:
- ch = (ch << 6) | (c & 0x3f);
- counter--;
- }
- if (str->eof)
- DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+ if (0 == counter)
+ {
+ if (0 == (c & 0x80))
+ {
+ /* ASCII. */
+ decode_unicode_char (c, dst);
+ }
+ else if (0 == (c & 0x40))
+ {
+ /* Highest bit set, second highest not--there's
+ something wrong. */
+ DECODE_ERROR_OCTET (c, dst);
+ }
+ else if (0 == (c & 0x20))
+ {
+ ch = c & 0x1f;
+ counter = 1;
+ indicated_length = 2;
+ }
+ else if (0 == (c & 0x10))
+ {
+ ch = c & 0x0f;
+ counter = 2;
+ indicated_length = 3;
+ }
+ else if (0 == (c & 0x08))
+ {
+ ch = c & 0x0f;
+ counter = 3;
+ indicated_length = 4;
+ }
+ /* We support lengths longer than 4 here, since we want to
+ represent UTF-8 error chars as distinct from the
+ corresponding ISO 8859-1 characters in escape-quoted.
+
+ However, we can't differentiate UTF-8 error chars as
+ written to disk, and UTF-8 errors in escape-quoted. This
+ is not a big problem;
+ non-Unicode-chars-encoded-as-UTF-8-in-ISO-2022 is not
+ deployed, in practice, so if such a sequence of octets
+ occurs, XEmacs generated it. */
+ else if (0 == (c & 0x04))
+ {
+ ch = c & 0x03;
+ counter = 4;
+ indicated_length = 5;
+ }
+ else if (0 == (c & 0x02))
+ {
+ ch = c & 0x01;
+ counter = 5;
+ indicated_length = 6;
+ }
+ else
+ {
+ /* #xFF is not a valid leading byte in any form of
+ UTF-8. */
+ DECODE_ERROR_OCTET (c, dst);
+
+ }
+ }
+ else
+ {
+ /* counter != 0 */
+ if ((0 == (c & 0x80)) || (0 != (c & 0x40)))
+ {
+ indicate_invalid_utf_8(indicated_length,
+ counter,
+ ch, dst);
+ if (c & 0x80)
+ {
+ DECODE_ERROR_OCTET (c, dst);
+ }
+ else
+ {
+ /* The character just read is ASCII. Treat it as
+ such. */
+ decode_unicode_char (c, dst);
+ }
+ ch = 0;
+ counter = 0;
+ }
+ else
+ {
+ ch = (ch << 6) | (c & 0x3f);
+ counter--;
+
+ /* Just processed the final byte. Emit the character. */
+ if (!counter)
+ {
+ /* Don't accept over-long sequences, or surrogates. */
+ if ((ch < 0x80) ||
+ ((ch < 0x800) && indicated_length > 2) ||
+ ((ch < 0x10000) && indicated_length > 3) ||
+ /* We accept values above #x110000 in
+ escape-quoted, though not in UTF-8. */
+ /* (ch > 0x110000) || */
+ valid_utf_16_surrogate(ch))
+ {
+ indicate_invalid_utf_8(indicated_length,
+ counter,
+ ch, dst);
+ }
+ else
+ {
+ decode_unicode_char (ch, dst);
+ }
+ ch = 0;
+ }
+ }
+ }
+
+ if (str->eof && ch)
+ {
+ DECODE_ERROR_OCTET (ch, dst);
+ ch = 0;
+ }
data->counter = counter;
+ data->indicated_length = indicated_length;
}
else if (byte_c0_p (c) || byte_c1_p (c))
{ /* Control characters */
Index: src/unicode.c
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/src/unicode.c,v
retrieving revision 1.37
diff -u -u -r1.37 unicode.c
--- src/unicode.c 2007/05/13 11:11:30 1.37
+++ src/unicode.c 2007/08/04 19:44:19
@@ -146,13 +146,6 @@
(1) User-defined charsets: It would be inconvenient to require all
dumped user-defined charsets to be reloaded at init time.
- (2) Starting up in a non-ISO-8859-1 directory. If we load at run-time,
- we don't load the tables until after we've parsed the current
- directories, and we run into a real bootstrapping problem, if the
- directories themselves are non-ISO-8859-1. This is potentially fixable
- once we switch to using Unicode internally, so we don't have to do any
- conversion (other than the automatic kind, e.g. UTF-16 to UTF-8).
-
NB With run-time loading, we load in init-mule-at-startup, in
mule-cmds.el. This is called from startup.el, which is quite late in
the initialization process -- but data-directory isn't set until then.
@@ -192,7 +185,7 @@
convert them back.) */
Lisp_Object Qunicode;
-Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7;
+Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7, Qutf_32;
Lisp_Object Qneed_bom;
Lisp_Object Qutf_16_little_endian, Qutf_16_bom;
@@ -218,10 +211,6 @@
trail = 0xDC00 + (__ctu16s_code & 0x3FF); \
} while (0)
-#define valid_utf_16_first_surrogate(ch) (((ch) & 0xFC00) == 0xD800)
-#define valid_utf_16_last_surrogate(ch) (((ch) & 0xFC00) == 0xDC00)
-#define valid_utf_16_surrogate(ch) (((ch) & 0xF800) == 0xD800)
-
#ifdef MULE
/* Using ints for to_unicode is OK (as long as they are >= 32 bits).
@@ -1703,6 +1692,7 @@
{
/* decode */
unsigned char counter;
+ unsigned char indicated_length;
int seen_char;
/* encode */
Lisp_Object current_charset;
@@ -1716,11 +1706,6 @@
DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);
-/* Decode a UCS-2 or UCS-4 character into a buffer. If the lookup fails, use
- <GETA MARK> (U+3013) of JIS X 0208, which means correct character
- is not found, instead.
- #### do something more appropriate (use blob?)
- Danger, Will Robinson! Data loss. Should we signal user? */
static void
decode_unicode_char (int ch, unsigned_char_dynarr *dst,
struct unicode_coding_stream *data,
@@ -1755,9 +1740,32 @@
data->seen_char = 1;
}
+#define DECODE_ERROR_OCTET(octet, dst, data, ignore_bom) \
+ decode_unicode_char ((octet) + UNICODE_ERROR_OCTET_RANGE_START, \
+ dst, data, ignore_bom)
+
+static inline void
+indicate_invalid_utf_8 (unsigned char indicated_length,
+ unsigned char counter,
+ int ch, unsigned_char_dynarr *dst,
+ struct unicode_coding_stream *data,
+ unsigned int ignore_bom)
+{
+ Binbyte stored = indicated_length - counter;
+ Binbyte mask = "\x00\x00\xC0\xE0\xF0\xF8\xFC"[indicated_length];
+
+ while (stored > 0)
+ {
+ DECODE_ERROR_OCTET (((ch >> (6 * (stored - 1))) & 0x3f) | mask,
+ dst, data, ignore_bom);
+ mask = 0x80, stored--;
+ }
+}
+
static void
encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
- enum unicode_type type, unsigned int little_endian)
+ enum unicode_type type, unsigned int little_endian,
+ int write_error_characters_as_such)
{
switch (type)
{
@@ -1767,53 +1775,105 @@
if (code < 0x10000) {
Dynarr_add (dst, (unsigned char) (code & 255));
Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
- } else {
- /* Little endian; least significant byte first. */
- int first, second;
-
- CODE_TO_UTF_16_SURROGATES(code, first, second);
-
- Dynarr_add (dst, (unsigned char) (first & 255));
- Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
-
- Dynarr_add (dst, (unsigned char) (second & 255));
- Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
- }
+ } else if (write_error_characters_as_such &&
+ code >= UNICODE_ERROR_OCTET_RANGE_START &&
+ code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
+ {
+ Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
+ }
+ else if (code < 0x110000)
+ {
+ /* Little endian; least significant byte first. */
+ int first, second;
+
+ CODE_TO_UTF_16_SURROGATES(code, first, second);
+
+ Dynarr_add (dst, (unsigned char) (first & 255));
+ Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
+
+ Dynarr_add (dst, (unsigned char) (second & 255));
+ Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
+ }
+ else
+ {
+ /* Not valid Unicode. Pass U+FFFD, least significant byte
+ first. */
+ Dynarr_add (dst, (unsigned char) 0xFD);
+ Dynarr_add (dst, (unsigned char) 0xFF);
+ }
}
else
{
if (code < 0x10000) {
Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
Dynarr_add (dst, (unsigned char) (code & 255));
- } else {
- /* Big endian; most significant byte first. */
- int first, second;
-
- CODE_TO_UTF_16_SURROGATES(code, first, second);
-
- Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
- Dynarr_add (dst, (unsigned char) (first & 255));
-
- Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
- Dynarr_add (dst, (unsigned char) (second & 255));
- }
+ } else if (write_error_characters_as_such &&
+ code >= UNICODE_ERROR_OCTET_RANGE_START &&
+ code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
+ {
+ Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
+ }
+ else if (code < 0x110000)
+ {
+ /* Big endian; most significant byte first. */
+ int first, second;
+
+ CODE_TO_UTF_16_SURROGATES(code, first, second);
+
+ Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
+ Dynarr_add (dst, (unsigned char) (first & 255));
+
+ Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
+ Dynarr_add (dst, (unsigned char) (second & 255));
+ }
+ else
+ {
+ /* Not valid Unicode. Pass U+FFFD, most significant byte
+ first. */
+ Dynarr_add (dst, (unsigned char) 0xFF);
+ Dynarr_add (dst, (unsigned char) 0xFD);
+ }
}
break;
case UNICODE_UCS_4:
+ case UNICODE_UTF_32:
if (little_endian)
{
- Dynarr_add (dst, (unsigned char) (code & 255));
- Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
- Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
- Dynarr_add (dst, (unsigned char) (code >> 24));
+ if (write_error_characters_as_such &&
+ code >= UNICODE_ERROR_OCTET_RANGE_START &&
+ code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
+ {
+ Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
+ }
+ else
+ {
+ /* We generate and accept incorrect sequences here, which is
+ okay, in the interest of preservation of the user's
+ data. */
+ Dynarr_add (dst, (unsigned char) (code & 255));
+ Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
+ Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
+ Dynarr_add (dst, (unsigned char) (code >> 24));
+ }
}
else
{
- Dynarr_add (dst, (unsigned char) (code >> 24));
- Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
- Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
- Dynarr_add (dst, (unsigned char) (code & 255));
+ if (write_error_characters_as_such &&
+ code >= UNICODE_ERROR_OCTET_RANGE_START &&
+ code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
+ {
+ Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
+ }
+ else
+ {
+ /* We generate and accept incorrect sequences here, which is okay,
+ in the interest of preservation of the user's data. */
+ Dynarr_add (dst, (unsigned char) (code >> 24));
+ Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
+ Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
+ Dynarr_add (dst, (unsigned char) (code & 255));
+ }
}
break;
@@ -1842,11 +1902,25 @@
}
else if (code <= 0x3ffffff)
{
- Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
- Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
- Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
- Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
- Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
+
+#if !(UNICODE_ERROR_OCTET_RANGE_START > 0x1fffff \
+ && UNICODE_ERROR_OCTET_RANGE_START < 0x3ffffff)
+#error "This code needs to be rewritten. "
+#endif
+ if (write_error_characters_as_such &&
+ code >= UNICODE_ERROR_OCTET_RANGE_START &&
+ code < (UNICODE_ERROR_OCTET_RANGE_START }
}
- if (str->eof)
- DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
+
+ if (str->eof && ch)
+ {
+ switch (type)
+ {
+ case UNICODE_UTF_8:
+ indicate_invalid_utf_8(indicated_length,
+ counter, ch, dst, data,
+ ignore_bom);
+ break;
+
+ case UNICODE_UTF_16:
+ case UNICODE_UCS_4:
+ case UNICODE_UTF_32:
+ if (8 == counter)
+ {
+ DECODE_ERROR_OCTET (ch, dst, data, ignore_bom);
+ }
+ else if (16 == counter)
+ {
+ if (little_endian)
+ {
+ DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
+ DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
+ ignore_bom);
+ }
+ else
+ {
+ DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
+ ignore_bom);
+ DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
+ }
+ }
+ else if (24 == counter)
+ {
+ if (little_endian)
+ {
+ DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
+ ignore_bom);
+ DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
+ DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
+ ignore_bom);
+ }
+ else
+ {
+ DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
+ ignore_bom);
+ DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
+ ignore_bom);
+ DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
+ ignore_bom);
+ }
+ }
+ else assert(0);
+ break;
+ }
+ ch = 0;
+ }
data->counter = counter;
+ data->indicated_length = indicated_length;
}
else
{
@@ -2054,7 +2275,7 @@
if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) && !data->wrote_bom)
{
- encode_unicode_char_1 (0xFEFF, dst, type, little_endian);
+ encode_unicode_char_1 (0xFEFF, dst, type, little_endian, 1);
data->wrote_bom = 1;
}
@@ -2068,7 +2289,7 @@
{ /* Processing ASCII character */
ch = 0;
encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
- little_endian);
+ little_endian, 1);
char_boundary = 1;
}
@@ -2092,20 +2313,20 @@
for the rationale behind subtracting #xa0 from the
character's code. */
encode_unicode_char (Vcharset_control_1, c - 0xa0, 0, dst,
- type, little_endian);
+ type, little_endian, 1);
else
{
switch (XCHARSET_REP_BYTES (charset))
{
case 2:
encode_unicode_char (charset, c, 0, dst, type,
- little_endian);
+ little_endian, 1);
break;
case 3:
if (XCHARSET_PRIVATE_P (charset))
{
encode_unicode_char (charset, c, 0, dst, type,
- little_endian);
+ little_endian, 1);
ch = 0;
}
else if (ch)
@@ -2119,7 +2340,7 @@
handle this yet. */
encode_unicode_char (Vcharset_ascii, '~', 0,
dst, type,
- little_endian);
+ little_endian, 1);
}
else
{
@@ -2138,7 +2359,7 @@
else
#endif /* ENABLE_COMPOSITE_CHARS */
encode_unicode_char (charset, ch, c, dst, type,
- little_endian);
+ little_endian, 1);
ch = 0;
}
else
@@ -2151,7 +2372,7 @@
if (ch)
{
encode_unicode_char (charset, ch, c, dst, type,
- little_endian);
+ little_endian, 1);
ch = 0;
}
else
@@ -2521,6 +2742,8 @@
type = UNICODE_UTF_7;
else if (EQ (value, Qucs_4))
type = UNICODE_UCS_4;
+ else if (EQ (value, Qutf_32))
+ type = UNICODE_UTF_32;
else
invalid_constant ("Invalid Unicode type", key);
@@ -2546,6 +2769,7 @@
case UNICODE_UTF_8: return Qutf_8;
case UNICODE_UTF_7: return Qutf_7;
case UNICODE_UCS_4: return Qucs_4;
+ case UNICODE_UTF_32: return Qutf_32;
default: ABORT ();
}
}
@@ -2620,6 +2844,7 @@
DEFSYMBOL (Qunicode);
DEFSYMBOL (Qucs_4);
DEFSYMBOL (Qutf_16);
+ DEFSYMBOL (Qutf_32);
DEFSYMBOL (Qutf_8);
DEFSYMBOL (Qutf_7);
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches
Re: [Bug: 21.5-b28] w3 fails on font-spatial-to-canonical
17 years, 4 months
Aidan Kehoe
Ar an triochadú lá de mí Iúil, scríobh Mike FABIAN:
> Giacomo Boffi <giacomo.boffi(a)polimi.it> さんは書きました:
>
> > w3 (w3-find-file) fails on a call of font-spatial-to-canonical, the
> > lisp stack follows
> >
> > please note that this happens only when xemacs runs in X, w3 on a
> > tty is ok, afaict
>
> I have already reported the same:
>
> http://thread.gmane.org/gmane.emacs.xemacs.beta/25191
>
> > Debugger entered--Lisp error: (wrong-type-argument numberp nil)
> > +(nil 12)
> > font-spatial-to-canonical("+12pt")
> > css-expand-length("+12pt" t)
> > css-expand-value(height "+12pt")
> > css-parse-args(238 257)
>
> [...]
>
> It happens only with the Xft build, i.e. if you omit '--with-xft'
> there is no such problem.
If you have the opportunity, please try the below patch, and tell me if the
problem goes away.
Stephen, does it really bring value to also accept XLFDs on the XFT build? I
get the feeling that using different X resources and separate face and font
handling function for the two builds would be cleaner and more usable.
lisp/ChangeLog addition:
2007-08-03 Aidan Kehoe <kehoea(a)parhasard.net>
* font.el (x-font-create-object):
When handed an XFT font name string, parse it as such.
* font.el (font-xft-font-regexp):
Don't check for the existence of xft-font-regexp; accept escaped
dashes and colons in font family names.
XEmacs Trunk source patch:
Diff command: cvs -q diff -u
Files affected: lisp/font.el
===================================================================
RCS
Index: lisp/font.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/font.el,v
retrieving revision 1.20
diff -u -r1.20 font.el
--- lisp/font.el 2006/04/25 14:01:53 1.20
+++ lisp/font.el 2007/08/03 13:31:05
@@ -587,7 +587,13 @@
(let ((case-fold-search t))
(if (or (not (stringp fontname))
(not (string-match font-x-font-regexp fontname)))
- (make-font)
+ (if (and (stringp fontname)
+ (string-match font-xft-font-regexp fontname))
+ ;; Return an XFT font.
+ (xft-font-create-object fontname)
+ ;; It's unclear how to parse the font; return an unspecified
+ ;; one.
+ (make-font))
(let ((family nil)
(size nil)
(weight (match-string 1 fontname))
@@ -751,16 +757,15 @@
;;; #### FIXME actually, this section should be fc-*, right?
(defvar font-xft-font-regexp
- ;; #### FIXME what the fuck?!?
- (when (and (boundp 'xft-font-regexp) xft-font-regexp)
- (concat "\\`"
- "[^:-]*" ; optional foundry and family
- ; incorrect, escaping exists
- "\\(-[0-9]*\\(\\.[0-9]*\\)?\\)?" ; optional size (points)
- "\\(:[^:]*\\)*" ; optional properties
+ (concat "\\`"
+ #r"\(\\-\|\\:\|[^:-]\)*" ; optional foundry and family
+ ; (allows for escaped colons,
+ ; dashes.)
+ "\\(-[0-9]*\\(\\.[0-9]*\\)?\\)?" ; optional size (points)
+ "\\(:[^:]*\\)*" ; optional properties
; not necessarily key=value!!
"\\'"
- )))
+ ))
(defvar font-xft-family-mappings
;; #### FIXME this shouldn't be needed or used for Xft
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches
Re: [Q] Prevent infinite recursion with undecided coding system
17 years, 4 months
Aidan Kehoe
Ar an chéad lá de mí Lúnasa, scríobh Stephen J. Turnbull:
> Aidan Kehoe writes:
>
> > It looks like Qbinary would be more correct in both places, but then
> > handling code will be more ready to deal with Qraw_text, since it can
> > return that already.
>
> I don't understand this comment. This function can already return
> Qbinary, no?
Right, if the coding cookie specifies that, for example. But
detected_coding_system returns Qraw_text on no match, as Michael says, and
in the context of ‘no match’ that will be a little more expected. My comment
was unclear to the extent that it was wrong.
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches
[C] Typo fix in setting for `emacs-roots'
17 years, 4 months
Michael Sperber
This has been sitting in my workspace forever:
2007-08-02 Mike Sperber <mike(a)xemacs.org>
* startup.el (startup-setup-paths): Fix typo in init expression
for `emacs-roots'.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Index: lisp/startup.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/startup.el,v
retrieving revision 1.58
diff -u -r1.58 startup.el
--- lisp/startup.el 25 May 2007 15:47:56 -0000 1.58
+++ lisp/startup.el 2 Aug 2007 06:35:43 -0000
@@ -1452,7 +1452,7 @@
t)))
(setq emacs-roots (paths-find-emacs-roots invocation-directory invocation-name
- #'paths-emacs-data-root-p))
+ #'paths-emacs-root-p))
(setq emacs-data-roots (paths-find-emacs-roots invocation-directory invocation-name
#'paths-emacs-data-root-p))
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches
[COMMIT] Add some more locale information, Windows-1253, fix a make-8-bit-coding-system bug
17 years, 4 months
Aidan Kehoe
APPROVE COMMIT
NOTE: This patch has been committed.
2007-08-01 Aidan Kehoe <kehoea(a)parhasard.net>
* mule/cyrillic.el:
* mule/cyrillic.el ("Russian"):
* mule/cyrillic.el ("Ukrainian"):
* mule/cyrillic.el ("Bulgarian"):
* mule/cyrillic.el ("Belarusian"):
Add POSIX locale information for all four languages. Remove
information about specific coding systems in the docstrings, since
this information is inaccurate if a variant language environment
is being used.
* mule/greek.el:
* mule/latin.el (for):
Add POSIX locale information, provide Windows-1253 as well.
* mule/mule-coding.el (make-8-bit-generate-helper):
Fix a bug that was biting me with windows-1251. I need to include
tests in tests/automated/mule-tests.el that check that all the
coding-systems created with make-8-bit-coding-system are
reversible, since all of them should be.
XEmacs Trunk source patch:
Diff command: cvs -q diff -u
Files affected: lisp/mule/mule-coding.el
===================================================================
RCS lisp/mule/latin.el
===================================================================
RCS lisp/mule/greek.el
===================================================================
RCS lisp/mule/cyrillic.el
===================================================================
RCS
Index: lisp/mule/cyrillic.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/cyrillic.el,v
retrieving revision 1.15
diff -u -r1.15 cyrillic.el
--- lisp/mule/cyrillic.el 2007/07/22 22:03:48 1.15
+++ lisp/mule/cyrillic.el 2007/08/01 13:40:14
@@ -115,6 +115,8 @@
charset-g3 t
mnemonic "ISO8/Cyr"))
+;; Provide this locale; but don't allow it to be picked up from the Unix
+;; locale (it has no locale entry in the alist), we leave that to Russian.
(set-language-info-alist
"Cyrillic-ISO" '((charset cyrillic-iso8859-5)
(tutorial . "TUTORIAL.ru")
@@ -271,23 +273,26 @@
;; Create a corresponding language environment.
(set-language-info-alist
- "Cyrillic-KOI8" '((charset cyrillic-iso8859-5)
- (coding-system koi8-r)
- (native-coding-system koi8-r)
- (coding-priority koi8-r)
- (input-method . "cyrillic-yawerty")
- (features cyril-util)
- (locale "ru")
- (mswindows-locale . "RUSSIAN")
- (tutorial . "TUTORIAL.ru")
- (sample-text . "Russian (,L@caaZXY(B) ,L7T`PRabRcYbU(B!")
- (documentation . "Support for Cyrillic KOI8-R."))
+ "Russian" '((charset cyrillic-iso8859-5)
+ (coding-system koi8-r)
+ (native-coding-system koi8-r)
+ (coding-priority koi8-r)
+ (input-method . "cyrillic-yawerty")
+ (features cyril-util)
+ (locale "ru")
+ (mswindows-locale . "RUSSIAN")
+ (tutorial . "TUTORIAL.ru")
+ (sample-text . "Russian (,L@caaZXY(B) ,L7T`PRabRcYbU(B!")
+ (documentation . "Support for Russian."))
'("Cyrillic"))
-;; Alias it to Russian.
+;; Provide Cyrillic-KOI8 for old times' sake too, but don't allow it to be
+;; selected by the Unix locale. A variant language environment called
+;; "Cyrillic-KOI8 (UTF-8)" just looks too odd.
+
(set-language-info-alist
- "Russian"
- (cdr (assoc "Cyrillic-KOI8" language-info-alist))
+ "Cyrillic-KOI8"
+ (remassq 'locale (copy-list (cdr (assoc "Russian" language-info-alist))))
'("Cyrillic"))
;; KOI8-U, for Ukrainian.
@@ -444,13 +449,15 @@
(set-language-info-alist
"Ukrainian" '((coding-system koi8-u)
(coding-priority koi8-u)
+ (locale "uk")
(input-method . "cyrillic-ukrainian")
(documentation
- . "Support for Ukrainian with KOI8-U character set."))
+ . "Support for Ukrainian."))
'("Cyrillic"))
-;; Windows 1251 may be provide automatically on Windows, in which case
-;; we don't need to.
+;; Windows 1251 may be provided automatically on Windows, in which case we
+;; don't need to provide it.
+;; #### (Though we should provide the CP1251 alias.)
(unless (find-coding-system 'windows-1251)
(make-8-bit-coding-system
'windows-1251
@@ -594,18 +601,20 @@
"Bulgarian" '((coding-system windows-1251)
(coding-priority windows-1251)
(input-method . "bulgarian-bds")
+ (locale "bg")
(documentation
- . "Support for Bulgarian with windows-1251 character set.")
+ . "Support for Bulgarian. ")
(tutorial . "TUTORIAL.bg"))
'("Cyrillic"))
(set-language-info-alist
"Belarusian" '((coding-system windows-1251)
(coding-priority windows-1251)
+ (locale "be")
(input-method . "belarusian")
(documentation
- . "Support for Belarusian with windows-1251 character set.
-\(The name Belarusian replaced Byelorussian in the early 1990s.)"))
+ . "Support for Belarusian. \(The name Belarusian replaced\
+Byelorussian in the early 1990s.)"))
'("Cyrillic"))
;;; Alternativnyj
@@ -890,17 +899,6 @@
'(mnemonic ",L@C(B"
aliases (cp21866)))
-(set-language-info-alist
- "Cyrillic-KOI8RU" '((charset cyrillic-iso8859-5)
- (coding-system koi8-ru)
- (native-coding-system koi8-ru)
- (coding-priority koi8-ru)
- (input-method . "cyrillic-yawerty")
- (tutorial . "TUTORIAL.ru")
- (sample-text . "Russian (,L@caaZXY(B) ,L7T`PRabRcYbU(B!")
- (documentation . "Support for Cyrillic ALTERNATIVNYJ."))
- '("Cyrillic"))
-
;; We should provide an input method and the corresponding language
;; environments for the next three coding systems.
@@ -1339,4 +1337,4 @@
(provide 'cyrillic)
-;;; cyrillic.el ends here
\ No newline at end of file
+;;; cyrillic.el ends here
Index: lisp/mule/greek.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/greek.el,v
retrieving revision 1.8
diff -u -r1.8 greek.el
--- lisp/mule/greek.el 2006/12/30 17:04:32 1.8
+++ lisp/mule/greek.el 2007/08/01 13:40:14
@@ -126,14 +126,141 @@
charset-g3 t
mnemonic "Grk"))
+;; Windows 1253 may be provided automatically on Windows, in which case
+;; we don't need to provide it.
+(unless (find-coding-system 'windows-1253)
+ (make-8-bit-coding-system
+ 'windows-1253
+ '((#x80 ?\u20AC) ;; EURO SIGN
+ (#x82 ?\u201A) ;; SINGLE LOW-9 QUOTATION MARK
+ (#x83 ?\u0192) ;; LATIN SMALL LETTER F WITH HOOK
+ (#x84 ?\u201E) ;; DOUBLE LOW-9 QUOTATION MARK
+ (#x85 ?\u2026) ;; HORIZONTAL ELLIPSIS
+ (#x86 ?\u2020) ;; DAGGER
+ (#x87 ?\u2021) ;; DOUBLE DAGGER
+ (#x89 ?\u2030) ;; PER MILLE SIGN
+ (#x8B ?\u2039) ;; SINGLE LEFT-POINTING ANGLE QUOTATION MARK
+ (#x91 ?\u2018) ;; LEFT SINGLE QUOTATION MARK
+ (#x92 ?\u2019) ;; RIGHT SINGLE QUOTATION MARK
+ (#x93 ?\u201C) ;; LEFT DOUBLE QUOTATION MARK
+ (#x94 ?\u201D) ;; RIGHT DOUBLE QUOTATION MARK
+ (#x95 ?\u2022) ;; BULLET
+ (#x96 ?\u2013) ;; EN DASH
+ (#x97 ?\u2014) ;; EM DASH
+ (#x99 ?\u2122) ;; TRADE MARK SIGN
+ (#x9B ?\u203A) ;; SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
+ (#xA0 ?\u00A0) ;; NO-BREAK SPACE
+ (#xA1 ?\u0385) ;; GREEK DIALYTIKA TONOS
+ (#xA2 ?\u0386) ;; GREEK CAPITAL LETTER ALPHA WITH TONOS
+ (#xA3 ?\u00A3) ;; POUND SIGN
+ (#xA4 ?\u00A4) ;; CURRENCY SIGN
+ (#xA5 ?\u00A5) ;; YEN SIGN
+ (#xA6 ?\u00A6) ;; BROKEN BAR
+ (#xA7 ?\u00A7) ;; SECTION SIGN
+ (#xA8 ?\u00A8) ;; DIAERESIS
+ (#xA9 ?\u00A9) ;; COPYRIGHT SIGN
+ (#xAB ?\u00AB) ;; LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
+ (#xAC ?\u00AC) ;; NOT SIGN
+ (#xAD ?\u00AD) ;; SOFT HYPHEN
+ (#xAE ?\u00AE) ;; REGISTERED SIGN
+ (#xAF ?\u2015) ;; HORIZONTAL BAR
+ (#xB0 ?\u00B0) ;; DEGREE SIGN
+ (#xB1 ?\u00B1) ;; PLUS-MINUS SIGN
+ (#xB2 ?\u00B2) ;; SUPERSCRIPT TWO
+ (#xB3 ?\u00B3) ;; SUPERSCRIPT THREE
+ (#xB4 ?\u0384) ;; GREEK TONOS
+ (#xB5 ?\u00B5) ;; MICRO SIGN
+ (#xB6 ?\u00B6) ;; PILCROW SIGN
+ (#xB7 ?\u00B7) ;; MIDDLE DOT
+ (#xB8 ?\u0388) ;; GREEK CAPITAL LETTER EPSILON WITH TONOS
+ (#xB9 ?\u0389) ;; GREEK CAPITAL LETTER ETA WITH TONOS
+ (#xBA ?\u038A) ;; GREEK CAPITAL LETTER IOTA WITH TONOS
+ (#xBB ?\u00BB) ;; RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
+ (#xBC ?\u038C) ;; GREEK CAPITAL LETTER OMICRON WITH TONOS
+ (#xBD ?\u00BD) ;; VULGAR FRACTION ONE HALF
+ (#xBE ?\u038E) ;; GREEK CAPITAL LETTER UPSILON WITH TONOS
+ (#xBF ?\u038F) ;; GREEK CAPITAL LETTER OMEGA WITH TONOS
+ (#xC0 ?\u0390) ;; GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
+ (#xC1 ?\u0391) ;; GREEK CAPITAL LETTER ALPHA
+ (#xC2 ?\u0392) ;; GREEK CAPITAL LETTER BETA
+ (#xC3 ?\u0393) ;; GREEK CAPITAL LETTER GAMMA
+ (#xC4 ?\u0394) ;; GREEK CAPITAL LETTER DELTA
+ (#xC5 ?\u0395) ;; GREEK CAPITAL LETTER EPSILON
+ (#xC6 ?\u0396) ;; GREEK CAPITAL LETTER ZETA
+ (#xC7 ?\u0397) ;; GREEK CAPITAL LETTER ETA
+ (#xC8 ?\u0398) ;; GREEK CAPITAL LETTER THETA
+ (#xC9 ?\u0399) ;; GREEK CAPITAL LETTER IOTA
+ (#xCA ?\u039A) ;; GREEK CAPITAL LETTER KAPPA
+ (#xCB ?\u039B) ;; GREEK CAPITAL LETTER LAMDA
+ (#xCC ?\u039C) ;; GREEK CAPITAL LETTER MU
+ (#xCD ?\u039D) ;; GREEK CAPITAL LETTER NU
+ (#xCE ?\u039E) ;; GREEK CAPITAL LETTER XI
+ (#xCF ?\u039F) ;; GREEK CAPITAL LETTER OMICRON
+ (#xD0 ?\u03A0) ;; GREEK CAPITAL LETTER PI
+ (#xD1 ?\u03A1) ;; GREEK CAPITAL LETTER RHO
+ (#xD3 ?\u03A3) ;; GREEK CAPITAL LETTER SIGMA
+ (#xD4 ?\u03A4) ;; GREEK CAPITAL LETTER TAU
+ (#xD5 ?\u03A5) ;; GREEK CAPITAL LETTER UPSILON
+ (#xD6 ?\u03A6) ;; GREEK CAPITAL LETTER PHI
+ (#xD7 ?\u03A7) ;; GREEK CAPITAL LETTER CHI
+ (#xD8 ?\u03A8) ;; GREEK CAPITAL LETTER PSI
+ (#xD9 ?\u03A9) ;; GREEK CAPITAL LETTER OMEGA
+ (#xDA ?\u03AA) ;; GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
+ (#xDB ?\u03AB) ;; GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
+ (#xDC ?\u03AC) ;; GREEK SMALL LETTER ALPHA WITH TONOS
+ (#xDD ?\u03AD) ;; GREEK SMALL LETTER EPSILON WITH TONOS
+ (#xDE ?\u03AE) ;; GREEK SMALL LETTER ETA WITH TONOS
+ (#xDF ?\u03AF) ;; GREEK SMALL LETTER IOTA WITH TONOS
+ (#xE0 ?\u03B0) ;; GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
+ (#xE1 ?\u03B1) ;; GREEK SMALL LETTER ALPHA
+ (#xE2 ?\u03B2) ;; GREEK SMALL LETTER BETA
+ (#xE3 ?\u03B3) ;; GREEK SMALL LETTER GAMMA
+ (#xE4 ?\u03B4) ;; GREEK SMALL LETTER DELTA
+ (#xE5 ?\u03B5) ;; GREEK SMALL LETTER EPSILON
+ (#xE6 ?\u03B6) ;; GREEK SMALL LETTER ZETA
+ (#xE7 ?\u03B7) ;; GREEK SMALL LETTER ETA
+ (#xE8 ?\u03B8) ;; GREEK SMALL LETTER THETA
+ (#xE9 ?\u03B9) ;; GREEK SMALL LETTER IOTA
+ (#xEA ?\u03BA) ;; GREEK SMALL LETTER KAPPA
+ (#xEB ?\u03BB) ;; GREEK SMALL LETTER LAMDA
+ (#xEC ?\u03BC) ;; GREEK SMALL LETTER MU
+ (#xED ?\u03BD) ;; GREEK SMALL LETTER NU
+ (#xEE ?\u03BE) ;; GREEK SMALL LETTER XI
+ (#xEF ?\u03BF) ;; GREEK SMALL LETTER OMICRON
+ (#xF0 ?\u03C0) ;; GREEK SMALL LETTER PI
+ (#xF1 ?\u03C1) ;; GREEK SMALL LETTER RHO
+ (#xF2 ?\u03C2) ;; GREEK SMALL LETTER FINAL SIGMA
+ (#xF3 ?\u03C3) ;; GREEK SMALL LETTER SIGMA
+ (#xF4 ?\u03C4) ;; GREEK SMALL LETTER TAU
+ (#xF5 ?\u03C5) ;; GREEK SMALL LETTER UPSILON
+ (#xF6 ?\u03C6) ;; GREEK SMALL LETTER PHI
+ (#xF7 ?\u03C7) ;; GREEK SMALL LETTER CHI
+ (#xF8 ?\u03C8) ;; GREEK SMALL LETTER PSI
+ (#xF9 ?\u03C9) ;; GREEK SMALL LETTER OMEGA
+ (#xFA ?\u03CA) ;; GREEK SMALL LETTER IOTA WITH DIALYTIKA
+ (#xFB ?\u03CB) ;; GREEK SMALL LETTER UPSILON WITH DIALYTIKA
+ (#xFC ?\u03CC) ;; GREEK SMALL LETTER OMICRON WITH TONOS
+ (#xFD ?\u03CD) ;; GREEK SMALL LETTER UPSILON WITH TONOS
+ (#xFE ?\u03CE)) ;; GREEK SMALL LETTER OMEGA WITH TONOS
+ "Microsoft's Code Page 1253, for monotonic Greek. "
+ '(mnemonic "GrkW"
+ documentation
+ "This ASCII-compatible encoding is slightly incompatibile with
+ISO-8859-7; it provides several widely-used punctuation marks in the C1
+ISO-2022 area, which makes it incompatbile with the latter standard, but
+that latter standard is not used in Greece, "
+ aliases (cp1253))))
+
(set-language-info-alist
"Greek" '((charset greek-iso8859-7)
(coding-system iso-8859-7)
(coding-priority iso-8859-7)
(native-coding-system iso-8859-7)
- (locale "el_GR.iso88597" "el_GR.greek8" "el_GR" "greek" "el")
+ (locale "el")
(input-method . "greek")
(sample-text . "Greek (,FGkk]mija(B) ,FCei\(B ,Fsar(B")
(documentation . t)))
+
+;; Greek (WINDOWS-1253) will be generated automatically under Unix.
;;; greek.el ends here
Index: lisp/mule/latin.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/latin.el,v
retrieving revision 1.7
diff -u -r1.7 latin.el
--- lisp/mule/latin.el 2007/07/22 22:03:49 1.7
+++ lisp/mule/latin.el 2007/08/01 13:40:14
@@ -645,7 +645,7 @@
Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish.")
(("Danish" "da")
("Dutch" "nl" "TUTORIAL.nl")
- ("Faeroese")
+ ("Faeroese" "fo")
("Finnish" "fi")
("French" "fr" "TUTORIAL.fr" "Bonjour, ,Ag(Ba va?")
("German" "de" "TUTORIAL.de" "\
@@ -666,7 +666,7 @@
" Albanian, Czech, English, German, Hungarian, Polish, Romanian,
Serbian, Croatian, Slovak, Slovene, Sorbian (upper and lower),
and Swedish.") ;; " added because fontification got screwed up, CVS-20061203.
- (("Albanian" nil)
+ (("Albanian" "sq")
("Croatian" ("hrvatski" "hr") "TUTORIAL.hr")
("Czech" ("cs" "cz") "TUTORIAL.cs" "P,Bx(Bejeme v,Ba(Bm hezk,B}(B den!"
"latin-2-postfix")
@@ -685,15 +685,15 @@
German, Italian, Maltese, Spanish, and Turkish.")
(("Afrikaans" "af")
("Catalan" ("catalan" "ca"))
- ("Esperanto")
- ("Galician")
- ("Maltese")))
+ ("Esperanto" "eo")
+ ("Galician" "gl")
+ ("Maltese" "mt")))
((latin-iso8859-4 iso-8859-4 "latin-4-prefix" "Latin-4" "ISO-8859-4"
" Danish, English, Estonian, Finnish, German, Greenlandic, Lappish,
Latvian, Lithuanian, and Norwegian.")
(("Estonian" "et")
- ("Greenlandic")
- ("Lappish")
+ ("Greenlandic" "kl")
+ ("Lappish" "se")
("Latvian" "lv")
("Lithuanian" "li")))
((latin-iso8859-9 iso-8859-9 "latin-5-prefix" "Latin-5" "ISO-8859-9")
Index: lisp/mule/mule-coding.el
===================================================================
RCS file: /pack/xemacscvs/XEmacs/xemacs/lisp/mule/mule-coding.el,v
retrieving revision 1.22
diff -u -r1.22 mule-coding.el
--- lisp/mule/mule-coding.el 2007/07/28 09:32:32 1.22
+++ lisp/mule/mule-coding.el 2007/08/01 13:40:14
@@ -315,7 +315,7 @@
(when worth-trying
(setq other-charset-vector (make-vector 256 encode-failure-octet))
(loop for i from charset-lower to charset-upper
- do (aset other-charset-vector (+ #x80 i)
+ do (aset other-charset-vector i
(gethash (encode-char (make-char worth-trying i)
'ucs) encode-table)))
(setq encode-program
cvs server: cannot find modules/ldap/configure
cvs server: cannot find modules/postgresql/configure
--
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)
_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches