Uwe Brauer writes:
- a function which does the replacement and
(while (search-forward TARGET nil t)
(replace-match REPLACEMENT))
is extremely fast assuming multiple replacements per TARGET. If you
expect most of your words to occur many times, I would put this in the
inner loop unless your file is more than a megabyte or two.
- a list containing words with and without niqqud.
However what bothers me more is the second part. I obtained the
hebrew bible in UTF8 format and could then generate the desired
list. However it seems to me that this list would be huge, at
least 2000 to 3000 words if not more.
What is a reasonable size limit for such a list???
Depends on what you're going to do with it. If you're just going to
search one file for each word once, 2000 or 3000 is not that big. I
would guess that the very naive program which loops over words and
searches the whole file for each one as in the while-search-forward
loop would take minutes (maybe up to 30, but I bet much less).
If you think you might do this for other files in the future, and it
is easy to parse words in Hebrew, then Jerry's suggestion of a hash
table with
(while (re-search-forward HEBREW-WORD-REGEXP nil t)
(let ((word (gethash (match-string 0) hebrew-word-table)))
(when word
(replace-match word))))
will probably get it down to a couple minutes or less. You can dump
the hash table to a file with
(with-temp-file "load-my-table.el"
(insert "(setq hebrew-word-table (make-hash-table))\n"
(insert "(mapc (lambda (pair) (puthash (car pair) (cdr pair) my-table))"
"\n(")
(maphash (lambda (k v)
(insert "(" k " . " v ")\n"))
hebrew-word-table)
(insert "))")
(but make sure to count parentheses, I'm not being terribly careful!!
:-) and then you can just load the table file with `load' after that.
Steve
_______________________________________________
XEmacs-Beta mailing list
XEmacs-Beta(a)xemacs.org
http://lists.xemacs.org/mailman/listinfo/xemacs-beta