Latin 9 characters cropping up from XKB usage

Friday, 16 September 2005

        I noticed a problem with some Croatian characters when I set up XKB to
a Croatian keyboard layout.  The "hr_US" layout is intended for
programming: it uses the standard US layout with the addition of
Croatian keys when AltGr and the appropriate character where the
Croatian keys are expected to be found on Croatian keyboards.

I tested it in XEmacs and it seemed to work nicely.  Then I noticed
that the "š" character was misbehaving: M-b and M-f were treating it
as a word separator and M-u wouldn't capitalize it.  WTF?  I was
pretty sure that character properties had worked correctly in Mule for
at least several years; was this a regression?

It turned out that it wasn't: the š character produced using any of
the Quail input methods behaved correctly.  Pressing the scaron keysym
simply causes the insertion of a /different character/.  For š it
inserts a character with the mule code 0x14a8 and for ž it inserts the
character with the mule code 0x14b8.  The Latin-2 š and ž have Mule
codes 0x139 and 0x13e, respectively.  The impostors have identical
glyphs, so they are not visually distinguishable from the real ones.

char-charset reports those chars to belong to the `latin-iso8859-15'
charset, which comes as a surprise.  I didn't know ISO 8859-15 even
*contained* š and ž!  But they do, and XEmacs prefers that charset to
ISO 8859-2 when inserting keysyms "scaron" and "zcaron".  For
example:

(char-charset (get 'scaron 'ascii-character))
  -> latin-iso8859-15
(char-charset (get 'ccaron 'ascii-character))
  -> latin-iso8859-2

I know I could use latin-unity to convert the Latin 9 chars to Latin
2, but it seems reasonable to attempt to insert the correct chars in
the first place.  If the user's environment is Latin 2 based, wouldn't
it make sense to prefer Latin 2 characters when converting keysyms to
characters?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Latin 9 characters cropping up from XKB usage