Ar an ceathrú lá is fiche de mí na Nollaig, scríobh Stephen J. Turnbull:
> So after dealing with the test all characters issue[1] I noticed that
> charsets-in-region and charsets-in-string were failing. I suspect the
> reason is that in the Unicode build the precedence lists are not tuned
> right. I guess a real fix would need to be some intelligent approach
> to modifying the precedence list according to the charsets used at
> read time or input (although that won't work if the input is
> Unicode). The other problem is that "intelligent" really means
> constructing some kind of precedence graph, and it could easily be
> impossible (eg if both Chinese and Japanese were used in the same
> file, you'd need a language tag to disambiguate).
>
> I guess the first thing I'd try is ensuring that ISO charsets come
> before the windows-xxx and IBM CPxxx versions.
>
> Any other suggestions?
We could disable the charsets-in-region tests on the Unicode builds. Or have
a defined order we expect depending on the current language environment; but
to be honest, with unicode-internal, the output of charsets-in-region isn’t
something the user is going to care about, I would lean more towards the
former.
> Footnotes:
> [1] By the way, why does the Unicode build have a 2^30 repertoire?
> ISO 10646 has a 2^31 repertoire IIRC (maybe 2^32?), but Unicode only
> has 2^21 (precisely, 2^20+2^16).
Ben’s choice. We do need more than 2^21, for our invalid sequence
characters, but we certainly don’t need the full 2^30.
--
‘As I sat looking up at the Guinness ad, I could never figure out /
How your man stayed up on the surfboard after forty pints of stout’
(C. Moore)