I noticed a problem with some Croatian characters when I set up XKB to
a Croatian keyboard layout. The "hr_US" layout is intended for
programming: it uses the standard US layout with the addition of
Croatian keys when AltGr and the appropriate character where the
Croatian keys are expected to be found on Croatian keyboards.
I tested it in XEmacs and it seemed to work nicely. Then I noticed
that the "š" character was misbehaving: M-b and M-f were treating it
as a word separator and M-u wouldn't capitalize it. WTF? I was
pretty sure that character properties had worked correctly in Mule for
at least several years; was this a regression?
It turned out that it wasn't: the š character produced using any of
the Quail input methods behaved correctly. Pressing the scaron keysym
simply causes the insertion of a /different character/. For š it
inserts a character with the mule code 0x14a8 and for ž it inserts the
character with the mule code 0x14b8. The Latin-2 š and ž have Mule
codes 0x139 and 0x13e, respectively. The impostors have identical
glyphs, so they are not visually distinguishable from the real ones.
char-charset reports those chars to belong to the `latin-iso8859-15'
charset, which comes as a surprise. I didn't know ISO 8859-15 even
*contained* š and ž! But they do, and XEmacs prefers that charset to
ISO 8859-2 when inserting keysyms "scaron" and "zcaron". For
example:
(char-charset (get 'scaron 'ascii-character))
-> latin-iso8859-15
(char-charset (get 'ccaron 'ascii-character))
-> latin-iso8859-2
I know I could use latin-unity to convert the Latin 9 chars to Latin
2, but it seems reasonable to attempt to insert the correct chars in
the first place. If the user's environment is Latin 2 based, wouldn't
it make sense to prefer Latin 2 characters when converting keysyms to
characters?