Re: [Q21.5] Add support for non-ISO2022 8 bit fixed-width coding-systems.

Sunday, 22 July 2007

 Ar an dara lá is fiche de mí Iúil, scríobh Stephen J. Turnbull: 

...
 Aidan Kehoe writes:

  > Not these. Conceptually, to encode a character, these coding systems
  > convert the character to Unicode, and then do a hash lookup of the UCS
  > code -> octets on disk table. So:
  >
  >   (encode-coding-string (make-char 'japanese-jisx0208 39 107) 'koi8-r) 
  > 
  > does the right thing. 

 And what does

      (encode-coding-string (make-char 'japanese-jisx0208 48 108) 'koi8-r)

 do?  
The right thing; it returns a string consisting of a tilde. (The actual
error octet can be specified at coding system creation, but it defaults to
the ASCII value for tilde. This is inappropriate for EBCDIC.)

The extant koi8-r coding system returns the string "\x92\xB0\xEC", which is
not the right thing, for any sane understanding of “the right thing.”

It seems to me that an API like 

(query-coding-region START END CODING-SYSTEM &optional BUFFER)

returning, say, a list of buffer offsets and lengths, is the most
appropriate general way to implement a UI for warning that a given coding
system will not encode a given buffer.

GNU’s safe-charset and safe-chars properties doesn’t work in our context, or
even in theirs, since whether a given internal character can be encoded by a
given coding system will vary from one invocation to the next, and will
often (but not always) be dependent on its unicode mapping. My
encode-coding-string example gives a string consisting of a single question
mark there, for example. And you’ve come across the problems with the
(decode-coding-region START END CODING-SYSTEM BUFFER FLAGS) API yourself. 
I’m not proposing to implement this right now, mind, but it’s an idea.

...
 Is there a reason why this technique should be restricted to coding
 systems currently implemented in CCL, or could/should we replace all ISO
 8859 coding systems with this stuff? 
Well, latin-unity deals with that problem for the 8859 coding systems, and
in a way that’s compatible with 21.4, so I don’t necessarily see any reason
to change that. 

...
  > These coding systems are much faster than that implies.

 I don't think it's worth worrying about speed of coding systems until
 somebody complains.  AFAIK nobody's complained about the *speed* of
 mule-ucs, so I doubt they'll complain about this either. 
Spoken like a true Lisper :-) . If I it weren’t for its terrible Unicode
support I’d be on SXEmacs right now, without making a complaint
here--because the situation is clear to everyone, right?--because of the
terrible speed and memory usage of 21.5. Lots of people care about
performance, but realise that complaining about it will rarely get them
anywhere, since solving performance problems post facto is hard.

...
  >  > If you just mean you're making this distinction
properly, hurray! 
  >  > But we should avoid polymorphism in these functions if at all
  >  > possible.
  > 
  > It’s not possible. 

 In software, anything's possible.  What it sounds like you're saying
 is that you're spending time and effort on a subsystem that is so
 broken it needs mercy-killing.  Is that a good idea? 
The API is independent of the implementation. And maybe I wanted to learn
some Lisp for a change.

-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003