[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sheflug] Thanks for awk-ward solutions



And Lo! The Great Prophet Jonathan uttered these words of wisdom:
> J.L. White wrote:
>>
>> [deletia]
>>
>> perl -ne 'print unless /Cannot connect/../Graphics support/' testc.inp > te
>> stc.out
>> cat testc.inp | awk '/Cannot connect/,/Graphics support/{next};{print $0}' 
>> testc.out
>>
>> Maybe perl 'wins' for being 4 chars shorter than awk?  
>> My program loses bigtime for being about a dozen lines, 
>> tho it also removes the extra blank line at the end of each block.
>
> [...] Try this:
> sed -e '/Cannot connect/,/Graphics support/d' testc.inp > testc.out
> 

Just for those that are interested in performance...

Some (no-frills, simple) benchmarking here has shown the perl version to 
be the quickest of the three on an 11 million line file of test data :-)

A short piece of C generated a number of lines starting aaaaa, aaaab, 
aaaac, aaaad, ... zzzzz with a suffix that included a line count and a 
couple of words - a total of 26^5 lines. Each statement had the range
/abcde/ thru' /bbcde/.

Timings were obtained with '/usr/bin/time -v command infile > outfile', 
(yep I've revoked the 'cat | pipe' in the awk above), and results are:

      Run 1       Run 2       Run 3
Perl: 1m 39.87s   1m 51.37m   1m 43.70s
Sed : 1m 55.82s   1m 53.09s   1m 50.76s
Awk : 2m 37.91s   2m 16.87s   2m 13.59s

I was quite suprised at how long awk took actually, I was expecting it to 
be somewhere in the same ballpark as Perl at least... (perl 5.8.4, awk 
3.1.3). But there ye go. It won't stop me writing awk for quick hacks, but 
perl is something I might have to use more... :-) Also, for some reason, 
the Perl results seem to be jumping about more than the others, and I'm not 
sure why (unless something kicked in to nick some CPU time temporarily).

This is by no means definitive, and probably has the odd flaw in it, but it 
may give peeps an idea of the tools to use :-) Bear in mind this was a 
large file (~11 million lines, ~300MB in size), and I suspect that this is 
rather extreme, so in the real-world, there probably isn't much difference 
between all three methods, so it comes down to preference.

Chris...

-- 
\ Chris Johnson                 \ NP: 
 \ cej [at] nightwolf.org.uk          \  
  \ http://cej.nightwolf.org.uk/  \ 
   \ http://redclaw.org.uk/        ~---------------------------------------


___________________________________________________________________

Sheffield Linux User's Group -
http://www.sheflug.co.uk/mailfaq.html

  GNU the choice of a complete generation.