Re: [Q] Non-Latin-1 escapes can lead to corrupted ELC code.

Monday, 7 May 2007

 Ar an seachtú lá de mí Bealtaine, scríobh Stephen J. Turnbull: 

...
 QUERY

 Aidan Kehoe writes:

  > Without this patch, the following test file is compiled incorrectly: 
  > 
  > (defvar Pravda "\u05bf\u05e0\u05d0\u05d2\u05d4\u05d0")

 I VETO that, you Capitalist Lackey!  That should be DEFCONST!! 
Also on the grounds that I generated those codes using XEmacs character
values, not UCS, so it’s actually in Hebrew script, and promoting the
interests of rootless cosmopolitans! (Nonsensically, since I’m pretty sure
ֿנאגהא doesn’t mean anything.)

...
 Non-mandatory suggestion/question:  How hard would it be for
non-mule
 to recognize the new Unicode escapes and signal an `unimplemented'
 error?  If that can be done correctly, it would be one small step to a
 non-Mule Unicode- enabled XEmacs. 
Non-Mule currently recognises the new Unicode escapes. The thing is that the
Lisp printer doesn’t generated them; when strings are written to compiled
files, they’re written as literals. 

...
  > +            (let ((case-fold-search nil))
  > +              (re-search-forward 
  > +               (concat "[^\000-\377]" 
  > +                      
#r"\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")
  > +               nil t)))

 Don't you need an OR in the regexp? 
Yeah.

I’ll write test cases for this before I commit it, to catch stupid things
like that. (I’ve currently another patch in the pipeline with tests to catch
stupid things like the Turkish language environment referring to a character
set that doesn’t exist, and the Latin-10 language environment referring to
an input method that doesn’t exist, but I think I need to eliminate the
second error from packages before I can make check correctly.)

...
                (concat "[^\000-\377]" 
                        #r"\|\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")
                           ^
 HERE ---------------------+

  > +            ;; Look for any non-Latin-1 literals or Unicode character
  > +            ;; escapes. Also catches them in comments, which is actually
  > +            ;; irrelevant to us, but implementing a more complex algorithm
  > +            ;; is not worth the trade-off.

 Non-mandatory suggestion:  Wouldn't

   (let ((case-fold-search nil)
         (mule-re (concat "[^\000-\377]" 
                          #r"\|\\u[0-9a-fA-F]\{4,4\}\|\\U[0-9a-fA-F]\{8,8\}")))
     (catch 'need-to-escape-quote
       (while (re-search-forward mule-re nil t)
         (skip-chars-backward ";" (point-at-bol)) 
(skip-chars-backward "^;" (point-at-bol)) ?

...
         (if (bolp)
             (throw 'need-to-escape-quote t))
           (forward-line 1))))

 do the trick for avoiding triggering on comments?  
Yes, unless people use the  #＠COUNT construct in source files, which we
don’t recommend. 

...
 Since it's compile-time and essentially one-pass, performance is
really
 not that big an issue. Whether this is a good idea is another question;
 I'm of two minds about that. 
-- 
On the quay of the little Black Sea port, where the rescued pair came once
more into contact with civilization, Dobrinton was bitten by a dog which was
assumed to be mad, though it may only have been indiscriminating. (Saki)

_______________________________________________
XEmacs-Patches mailing list
XEmacs-Patches(a)xemacs.org
http://calypso.tux.org/cgi-bin/mailman/listinfo/xemacs-patches

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [Q] Non-Latin-1 escapes can lead to corrupted ELC code.