>>>> "Bill" == Bill Tutt
<billtut(a)microsoft.com> writes:
Bill> "Proposal to restrict the range of code positions to the
Bill> values up to U-0010FFFF"
I was aware of this proposal. Although I didn't know it had been
formally moved and approved, I expected it would be.
OK, so we're free to treat all code points above U-0010FFFF as private
space, so long as we don't try to feed them to an application which
insists on complying strictly to Unicode/ISO-10646.
But strict compliance is a moot point, since I'm sure Microsoft and
other companies will produce software that emits data containing code
points in the private space without announcing, let alone negotiating,
it. Thus, fragmenting the Unicode "standard". The Tower of Babel
rises again.
In a separate message:
Bill>
(HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\EUDCCodeRange)
Whatever that means....
Bill> Is there any particular reason this method of encoding isn't
Bill> a good enough extension for unassigned UTF-16 safe ranges,
You mean "private space", or "reserved space"? Vendors shouldn't
use either in external data without some sort of negotiation protocol
IMO. See below.
Bill> or do you just not want to bother touching the code when we
Bill> discover the loads of alien races that are out there?
The aliens is _us_.
I am assuming (for safety) that this will be as badly designed and
implemented as the Microsoft MIME mailer which regularly emits
(still, with users claiming to use Windows-2000) mail containing
arbitrary characters (some with bit-7-set) but labelled
"charset=ISO-2022-JP". (No, it's not Unicode, either. It's
shift-JIS, and KOI-8, and Windows-12xx code page stuff.)
Thus, I see no reason to suppose that that mapping won't simply be
assumed by software as diverse as Outlook Express and Word, and that
mail and documents produced by Microsoft software will often include
"private space" characters that are only understood by software that
complies with the Microsoft defacto standard. This is a misuse of
private space; it basically requires an assumption that the encoding
is unsafe if you are not communicating with a Microsoft system.
Unless those data formats contain an ISO-2022-like designation
syntax---which I doubt, since it defeats the purpose of Unicode.
I'm naming Microsoft explicitly only because I've seen the spoor of
Microsoft on mislabelled MIME mail and other documents many times.
Microsoft software is the producer of the majority by far of
non-compliant data that I have observed. NEC is also an occasional
offender with its precomposed Roman numerals (extending the JIS X 0208
character set), and I've heard that Apple put its logo into Unicode at
U+E000, although I've never seen it.
I consider this kind of design subversion of the standard: Microsoft,
NEC, and Apple users have often told me _my_ software is defective
because I'm the only person who complains about their spew. This kind
of design _encourages_ emission of defective documents, by users who
should not have to understand the ins and outs of the standards. By
contrast, users of VM and Gnus (the dominant messaging agents in the
XEmacs world) _cannot_ produce defective mail using _only_ the
operations and options provided by that software. The reason is that
before being encapsulated, the mail is checked for character sets and
properly MIME-formatted.
--
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Institute of Policy and Planning Sciences Tel/fax: +81 (298) 53-5091
_________________ _________________ _________________ _________________
What are those straight lines for? "XEmacs rules."