Aidan Kehoe writes:
We could disable the charsets-in-region tests on the Unicode builds.
Or have
a defined order we expect depending on the current language environment; but
to be honest, with unicode-internal, the output of charsets-in-region isn’t
something the user is going to care about, I would lean more towards the
former.
Users like me who use diff on ISO-2022 (or windows-xxxx for that
matter) care about getting the same charsets out that they put in. If
we get that right, charsets-in-* will follow.
Ben’s choice. We do need more than 2^21, for our invalid sequence
characters, but we certainly don’t need the full 2^30.
Um, 2^21 leaves us with 15 non-Unicode planes (17-31). Surely that's
enough? Am I missing something? There's also the Python PEP 283
strategy (use singleton low surrogates), which allows you to live
within the 17 Unicode planes.
Steve