Re: large el files and performance: replace words from lists

Thursday, 7 November 2013

        On Thu, Nov 7, 2013 at 3:36 AM, Uwe Brauer <oub(a)mat.ucm.es&gt; wrote:
...
 Hello

 I would like to have a lisp pkg which would replace certain words in a text.[1]

 For this I need two things,

     - a function which does the replacement and

     - a list containing words with and without niqqud.

 Concerning the first, I have already a function, which is loosely
 based on iso-accentuate, or a function
 (TeX-to-char) which was provided by Aidan
 some years ago, which replaces latex symbols by its UTF8
 equivalents. I am not sure which code is more efficient, I'll
 will to post the central part of the code later.

 However what bothers me more is the second part. I obtained the
 hebrew bible in UTF8 format and could then generate the desired
 list. However it seems to me that this list would be huge, at
 least 2000 to 3000 words if not more.

 What is a reasonable size limit for such a list???

 Is 2000 words to big? or must I divide the list in several parts
 (and files) and write corresponding functions?

 Footnotes:
 [1]  to be precise, to substitute hebrew words by hebrew words with
      vowels, so called niqqud 
My first thought was to stuff your list into a hash table, something like this:

(defun replace-words-in-text (words replacements &optional buffer)
  (let ((table (make-hash-table :test #'equal)))
    (map nil #'(lambda (word replacement)
         (puthash word replacement table))
     words replacements)
    (with-current-buffer (or buffer (current-buffer))
      (save-excursion
    (goto-char (point-min))
    (skip-syntax-forward "-.")
    (while (< (point) (point-max))
      (let* ((oldpoint (point))
         (word
          (progn
            (skip-syntax-forward "w_")
            (buffer-substring-no-properties oldpoint (point))))
         (replacement (gethash word table)))
        (when replacement
          (delete-region oldpoint (point))
          (insert replacement))
        (skip-syntax-forward "-.")))))))

But that probably doesn't perform very well.  Anyway, the point is
that doing hash table lookups should be much more efficient than
iterating over a list.  And the nice thing about a hash table is that
the size of your list doesn't matter (much).
-- 
Jerry James
http://www.jamezone.org/

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: large el files and performance: replace words from lists