[Novalug] Fwd: perl help | filename spaces

Michael Henry lug-user at drmikehenry.com
Thu Mar 18 06:39:20 EDT 2010


On 03/17/2010 10:42 PM, Jon LaBadie wrote:
> On Wed, Mar 17, 2010 at 09:18:18PM -0400, Ken Kauffman wrote:
>> find . \( -name cache -prune \) -o  -name "*.htm*" -print0 | xargs -0 perl
>> -i.save -pe "s/news.html/news\//g;"
>
> You can get the same effect without the -print0, pipe, and extra command.
>
> find . \( -name cache -prune \) -o  -name "*.htm*" \
>      -exec perl -i.save -pe "s/news.html/news\//g;" {} +

In the past, the ``-exec`` option had only one syntax, ``-exec
{} ;``.  The ``-exec`` option was deprecated in cases where the
associated command could accept multiple filenames because it
spawned a new invocation of the command for every filename.  The
``xargs``-based solution optimized for this case by aggregating
the maximum number of filenames together per invocation of the
command, minimizing the number of spawned processes.  But that
downside was removed with the addition of the "+" delimiter for
``-exec`` (I'm not sure how long ago). 

When using ``-exec {} +`` as Jon has suggested, ``find`` will
aggregate up filenames and minimize the number of spawned
processes.  In most ways, it's equivalent to the ``xargs``-based
solution, but there are slight performance differences.

I'd done some benchmarking in the past, and found an
at-first counter-intuitive measurement that says spawning the
extra ``xargs`` process actually saves time overall.  I believe
this is due to the multiprocessing going on between ``find``
(which can continue to find files in the background) and
``xargs`` (which spawns the desired command on batches of
already-found filenames).

Here's a quick benchmark that's repeatable on my box:

  cd ~/projects
  $ time find -type f -exec grep bigteststring {} +

  real    0m0.210s
  user    0m0.073s
  sys     0m0.137s

  $ time find -type f -print0 | xargs -0 grep bigteststring

  real    0m0.172s
  user    0m0.080s
  sys     0m0.140s

It's not a huge difference, but it's repeatable.  It's certainly
not enough to change what you write interactively at the prompt,
but in a script it may be worth considering.  Either way, I find
it an interesting result.

Michael Henry




More information about the Novalug mailing list