Re: #'query-coding-region and invalid Unicode sequences.

Sunday, 18 January 2009

        On 2009-01-16, David Kastrup <dak(a)gnu.org&gt; wrote:

...
 There is a transparent way, however.  Note that utf-8 is an encoding
 scheme that can, even within 4-byte values, encode more than just legal
 utf-8.  Pick a 256-byte code page from there (either beyond the
 2^21-something threshold, or, saving one byte but being more obfuscate,
 in the Unicode pages reserved for utf-16 surrogates and thus left free).
 Now this is our XEmacs-internal code page.  _Any_ bytes that are not
 part of valid codes in a particular encoding (and this _includes_
 non-minimal code sequences in utf-8 and utf-16) are encoded using this
 XEmacs-internal encoding into "bad byte of value xxx" and are displayed
 as \xxx octal escapes byte by byte.  When writing out, such "byte" code
 points get encoded back into single bytes. 
Neat! I think I might add that to my native-Unicode 21.4 (which is on
hold while I do some work, sigh).

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: #'query-coding-region and invalid Unicode sequences.