[Novalug] Fwd: perl help | filename spaces
Michael Henry
lug-user at drmikehenry.com
Thu Mar 18 06:39:20 EDT 2010
On 03/17/2010 10:42 PM, Jon LaBadie wrote:
> On Wed, Mar 17, 2010 at 09:18:18PM -0400, Ken Kauffman wrote:
>> find . \( -name cache -prune \) -o -name "*.htm*" -print0 | xargs -0 perl
>> -i.save -pe "s/news.html/news\//g;"
>
> You can get the same effect without the -print0, pipe, and extra command.
>
> find . \( -name cache -prune \) -o -name "*.htm*" \
> -exec perl -i.save -pe "s/news.html/news\//g;" {} +
In the past, the ``-exec`` option had only one syntax, ``-exec
{} ;``. The ``-exec`` option was deprecated in cases where the
associated command could accept multiple filenames because it
spawned a new invocation of the command for every filename. The
``xargs``-based solution optimized for this case by aggregating
the maximum number of filenames together per invocation of the
command, minimizing the number of spawned processes. But that
downside was removed with the addition of the "+" delimiter for
``-exec`` (I'm not sure how long ago).
When using ``-exec {} +`` as Jon has suggested, ``find`` will
aggregate up filenames and minimize the number of spawned
processes. In most ways, it's equivalent to the ``xargs``-based
solution, but there are slight performance differences.
I'd done some benchmarking in the past, and found an
at-first counter-intuitive measurement that says spawning the
extra ``xargs`` process actually saves time overall. I believe
this is due to the multiprocessing going on between ``find``
(which can continue to find files in the background) and
``xargs`` (which spawns the desired command on batches of
already-found filenames).
Here's a quick benchmark that's repeatable on my box:
cd ~/projects
$ time find -type f -exec grep bigteststring {} +
real 0m0.210s
user 0m0.073s
sys 0m0.137s
$ time find -type f -print0 | xargs -0 grep bigteststring
real 0m0.172s
user 0m0.080s
sys 0m0.140s
It's not a huge difference, but it's repeatable. It's certainly
not enough to change what you write interactively at the prompt,
but in a script it may be worth considering. Either way, I find
it an interesting result.
Michael Henry
More information about the Novalug
mailing list