Re: [COMMIT] Generally make the language environments and coding systems a little more sane.

Tuesday, 28 August 2007

 Ar an cúigiú lá is fiche de mí Lúnasa, scríobh Aidan Kehoe: 

...
 [...]

 +  ;; Make them available to user code.
 +  (defvar unicode-error-sequence-zero
 +    (aref (decode-coding-string "\xd8\x00\x01\x00" 'utf-16-be) 3)
 +    "The XEmacs character representing an invalid zero octet in Unicode.
 +
 +Subtract this character from each XEmacs character in an invalid sequence to
 +get the octet on disk. E.g.
 +
 +\(- (aref (decode-coding-string ?\\x80 'utf-8) 0)
 +   unicode-error-characters-zero)
 +=> ?\\x80
 +
 +You can search for invalid sequences using
 +`unicode-error-sequence-regexp-range', which see.  ")
 [...] 
Note to everyone; this doesn’t work, since the integer values of the
relevant characters are not numerically contiguous. For example, for me on
this build, 

  (decode-coding-string "\xD0" 'utf-8)

gives (U+2000D0 jit-ucs-charset-0 35 32), numerical value 1069472, while 

  (decode-coding-string "\xCF" 'utf-8)

gives (U+2000CF jit-ucs-charset-0 34 127), numerical value 1069439, 33
numeric values apart instead of one. 

The approach I intend to take to fix this is to create a char table mapping
from the error octets to characters with the on-disk values, and advise
using this char table with translate-region to get the octets. 

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [COMMIT] Generally make the language environments and coding systems a little more sane.