Glynn Clements writes:
I was referring mainly to the technical issues, e.g. the
non-reversibility of encoding conversions.
Like everything else you mention, AFAICS that's not caused by Unicode,
and even with Mule code or TRON code you have issues like "packed EUC
or 7-bit ISO-2022" which the charset codes can't help with. Rather,
programmers tend to throw away information that they don't see an
immediate use for (perhaps in the name of efficiency).
> > If you want to retrieve a filename from the OS then pass
it back at a
> > later point, you need to retain the raw data. If you can't get at the
> > raw data, you lose.
>
> That's exactly the conclusion the Python people just came to.
Which conclusion? "Retain the raw data" or "you lose"?
"Retain the raw data or lose." There's no third alternative, although
sufficiently creative programmers can (and do) have their data and
lose anyway. ;-)
Unfortunately, this canonicalisation frequently doesn't happen.
It
isn't too surprising, given the way that Unicode is so often touted
as eliminating these sorts of problems.
Not by the Unicode Consortium though. Rather by the same lazy or
overworked programmers you've been citing throughout.
The fact is that the problem is the Tower of Babel. One ISO standard
is not going to turn back God's wrath (in fact, it probably just made
Her madder!) Unicode is a major step toward making the world safe for
low energy/high burden programmers, at least in a restricted area of
multilingual and/or localized text processing. But as usual, the 10%
of corner cases involve 90% of the work, and also as usual, those of
us who care about the corner cases are going to have to bear the
burden of dealing with them.
Surely that doesn't surprise you. ;-)
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta