large el files and performance: replace words from lists

Friday, 8 November 2013

        Uwe Brauer writes:

...
     - a function which does the replacement and  
  (while (search-forward TARGET nil t)
    (replace-match REPLACEMENT))

is extremely fast assuming multiple replacements per TARGET.  If you
expect most of your words to occur many times, I would put this in the
inner loop unless your file is more than a megabyte or two.

...
     - a list containing words with and without niqqud.

 However what bothers me more is the second part. I obtained the
 hebrew bible in UTF8 format and could then generate the desired
 list. However it seems to me that this list would be huge, at
 least 2000 to 3000 words if not more.

 What is a reasonable size limit for such a list??? 
Depends on what you're going to do with it.  If you're just going to
search one file for each word once, 2000 or 3000 is not that big.  I
would guess that the very naive program which loops over words and
searches the whole file for each one as in the while-search-forward
loop would take minutes (maybe up to 30, but I bet much less).

If you think you might do this for other files in the future, and it
is easy to parse words in Hebrew, then Jerry's suggestion of a hash
table with

  (while (re-search-forward HEBREW-WORD-REGEXP nil t)
    (let ((word (gethash (match-string 0) hebrew-word-table)))
      (when word
        (replace-match word))))

will probably get it down to a couple minutes or less.  You can dump
the hash table to a file with

  (with-temp-file "load-my-table.el"
    (insert "(setq hebrew-word-table (make-hash-table))\n"
    (insert "(mapc (lambda (pair) (puthash (car pair) (cdr pair) my-table))"
            "\n(")
    (maphash (lambda (k v)
               (insert "(" k " . " v ")\n"))
             hebrew-word-table)
    (insert "))")

(but make sure to count parentheses, I'm not being terribly careful!!
:-) and then you can just load the table file with `load' after that.

Steve

_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

large el files and performance: replace words from lists