RE: proposed Eistring interface

Saturday, 22 April 2000

...
 From: Bill Tutt 

 > From: Stephen J. Turnbull [mailto:turnbull＠sk.tsukuba.ac.jp]
 > 
 > See my other message for further discussion.
 > 

I beg your pardon, I thought you meant one you'd written privously, not one
you wrote today.
Please ignore my writing on why you want more than one internal
representation.

...
 From: Stephen J. Turnbull 
...
 However, there are uses for the huge UCS-4 range.  Tomo and his
 buddies are already playing with the so-called "konjaku-mojikyo"
 pseudo-charsets, which have over 70,000 code points already assigned.
 These are quite popular with Japanese Windows users, too.  Anyway,
 they will eat up most of the UTF-16 private space if used in the
 obvious way.   
...
 Nor do we want to encourage "Japanese exceptionalists"
 to borrow not yet-standardized-parts of the UTF-16 space. 
So why not just do this in 2 phases. Use the UTF-16 private space until more
CJK glyphs get added into UTF-16 properly, you can then alter the encoding
transforms from Asian encodings to pick the new CJK glyph.

The only annoying thing here is that ELisp programming that cares about
those kinds of characters would break down.

Another possible alternative might be: (from the Unicode FAQ)
"For a particular implementation, if someone really, really wanted a
representation that encoded more characters in a series of 16-bit code units
then a series of private-use characters would work. For example, suppose you
use a representation that consisted one BMP private-use character followed
by one private-use surrogate pair (e.g. three 16-bit units). With such a
representation, you can encode 6400 x 131,072 ( = 838,860,800) private use
code points."

...
 Also, although _we_ should not support "language-tagged
character"
 encodings (pace, Olivier) by default, we should permit third party
 libraries and extension modules to do so.  This could be easily done
 using UCS-4 private space. 
This doesn't sound like the Unicode technical report I mentioned in my other
email.
Could you explain what you mean here?

Thanks,
Bill

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998