[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sheflug] Thanks for awk-ward solutions
And Lo! The Great Prophet Jonathan uttered these words of wisdom:
> J.L. White wrote:
>>
>> [deletia]
>>
>> perl -ne 'print unless /Cannot connect/../Graphics support/' testc.inp > te
>> stc.out
>> cat testc.inp | awk '/Cannot connect/,/Graphics support/{next};{print $0}'
>> testc.out
>>
>> Maybe perl 'wins' for being 4 chars shorter than awk?
>> My program loses bigtime for being about a dozen lines,
>> tho it also removes the extra blank line at the end of each block.
>
> [...] Try this:
> sed -e '/Cannot connect/,/Graphics support/d' testc.inp > testc.out
>
Just for those that are interested in performance...
Some (no-frills, simple) benchmarking here has shown the perl version to
be the quickest of the three on an 11 million line file of test data :-)
A short piece of C generated a number of lines starting aaaaa, aaaab,
aaaac, aaaad, ... zzzzz with a suffix that included a line count and a
couple of words - a total of 26^5 lines. Each statement had the range
/abcde/ thru' /bbcde/.
Timings were obtained with '/usr/bin/time -v command infile > outfile',
(yep I've revoked the 'cat | pipe' in the awk above), and results are:
Run 1 Run 2 Run 3
Perl: 1m 39.87s 1m 51.37m 1m 43.70s
Sed : 1m 55.82s 1m 53.09s 1m 50.76s
Awk : 2m 37.91s 2m 16.87s 2m 13.59s
I was quite suprised at how long awk took actually, I was expecting it to
be somewhere in the same ballpark as Perl at least... (perl 5.8.4, awk
3.1.3). But there ye go. It won't stop me writing awk for quick hacks, but
perl is something I might have to use more... :-) Also, for some reason,
the Perl results seem to be jumping about more than the others, and I'm not
sure why (unless something kicked in to nick some CPU time temporarily).
This is by no means definitive, and probably has the odd flaw in it, but it
may give peeps an idea of the tools to use :-) Bear in mind this was a
large file (~11 million lines, ~300MB in size), and I suspect that this is
rather extreme, so in the real-world, there probably isn't much difference
between all three methods, so it comes down to preference.
Chris...
--
\ Chris Johnson \ NP:
\ cej [at] nightwolf.org.uk \
\ http://cej.nightwolf.org.uk/ \
\ http://redclaw.org.uk/ ~---------------------------------------
___________________________________________________________________
Sheffield Linux User's Group -
http://www.sheflug.co.uk/mailfaq.html
GNU the choice of a complete generation.