[PATCH] Re: coding system oddity

Sunday, 7 June 2009

 Ar an seachtú lá de mí Meitheamh, scríobh Stephen J. Turnbull: 

...
 Aidan Kehoe writes:
  >  Ar an seachtú lá de mí Meitheamh, scríobh Stephen J. Turnbull: 

  >  > Ben's original concept for the new detection mechanism was to assign
  >  > likelihoods and pick the best of the matching coding systems.
  >  > However, it's possible that there would be no match or a tie
  >  > (depending on how the algorithm is tuned), in which case it might be
  >  > natural for the result to be undecided.
  >  > 
  >  > What do you think, Aidan?

  > It may be that coding_stream_detected_coding_system *should* never
  > give undecided, even if no data have been seen, in which case the
  > correct thing to do to fix this bug is to change
  > coding_stream_detected_coding_system to do that.

 I guess it's fair to say your answer is "I'm not ready to say yet",
then? 
‘Yet’ is perhaps a bit strong. Ben might be able to answer the question, if
he remembers. He may not, and he may not answer the question if we ask for
reasons independent of that. If either of these is the case, the thing is
unanswerable, and the decision as to the semantics of c_s_d_c_s is ours. 

...
 My personal take is that it is a good idea to allow c_s_d_c_s to
 return undecided, because the final decision should be made at the
 Lisp level, not in C.  I think your fix is in the right place, even if
 it's not the final version.

  > It may also be that undecided is a reasonable value there. It’s just
  > not clear, and my patch as written is minimally invasive and fixes the
  > issue.

  > Your warning below risks having people think that autodetection
  > succeeding means that the characters in the buffer reflect what
  > whoever wrote the octets to disk meant them to mean, which is not
  > true, our autodetection gets things wrong all the time.

 That's always true, though.  Ie, AFAIK currently autodetection will
 always return a system that can save all the characters in the buffer
 at the time, if it returns a coding system. 
No; this used to be especially not true. It’s less not true now, but it is
still the case that the autodetection code may well return a coding system
that will lose data if that coding system is used to decode and then encode
the entire file. Consider that the UTF-8 coding systems used to lose data in
this context all the time, despite that the autodetection code correctly
recognised UTF-8; and that the ISO-2022-1 coding systems do not preserve
some distinct error sequences on decoding followed by encoding.

...
 [...] That's why I want the warning. 
OK; I disagree with you, but the downside from doing it your way is not that
significant, let’s go with that.

-- 
¿Dónde estará ahora mi sobrino Yoghurtu Nghe, que tuvo que huir
precipitadamente de la aldea por culpa de la escasez de rinocerontes?

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998