surrogates can only encode 1,000,000 chars.  ucs-4 encodes 4,000,000,000 chars.
is there another extension mechanism to handle the rest?
Bill Tutt wrote:
 > From: Ben Wing [mailto:ben@666.com]
 > "Stephen J. Turnbull" wrote:
 >
 > > (3) We may want to be a little bit careful with the notion of the
 > >     default internal representation.  I can see that a default
 > >     internal representation of UCS-2 (UTF-16, I presume is what you
 > >     really mean?) would be attractive.  So what happens if you have
 > >     data that is not representable in the default internal
 > >     representation?  Do we just tell those users to get lost?
 > >
 > >     It would be kind of weird if the default internal representation
 > >     that Eistrings dealt with was UCS-2 but UTF-8 representation was
 > >     available in buffers, which you don't rule out.
 >
 > by its nature, the default int. rep. must be able to
 > represent all chars.  that
 > would rule out utf16 if we have more than 1,000,000 and some
 > chars.  but it
 > doesn't rule out ucs4, or some utf16 extension that could
 > encode gigs o' chars,
 > etc.
 >
 To clarify UTF-16 can represent all characters in UCS-4. UTF-16, just like
 UTF-8 breaks that annoying simplification that all characters are fixed
 width. As a happy concidence, the only difference between UTF-16 and UCS-2
 is knowing where the character boundaries are. A UTF-16 encoding of a
 unicode character (e.g. U+000E0020) is itself two valid UCS-2 characters.
 This is what the surrogate pair range in the Unicode code space is for.
 Making things completly Unicode aware isn't as easy as some people think,
 have a gander at some of the stuff on 
www.unicode.org if you haven't
 recently. (esp. the techincal reports)
 e.g. Implementing a regular expression engine that supports a good chunk of
 Unicode's "features" is very non-trivial, especially if you don't want
it to
 take forever.
 Bill
 Not a MS PR guy, etc... 
--
Ben
In order to save my hands, I am cutting back on my mail.  I also write
as succinctly as possible -- please don't be offended.  If you send me
mail, you _will_ get a response, but please be patient, especially for
XEmacs-related mail.  If you need an immediate response and it is not
apparent in your message, please say so.  Thanks for your understanding.
See also 
http://www.666.com/ben/typing.html.