參考在網上找到的代碼,沒想到相差那麼大,應該是grep比perl的模式匹配更快吧。 linux
[gzhy@nearby stat]$ wc -l 1 234033 1 [gzhy@nearby stat]$ perl 1.pl cost 1 seconds zjtel : 32606 [gzhy@nearby stat]$ perl 2.pl cost 111 seconds zjtel : 32606
#!/usr/bin/perl my $time=time(); open(file,"1"); while(<file>;) { chomp; if(m/:zjtel:/) { $zjtel++; } } close(file); $time=time()-$time; print "cost $time seconds\n"; print "zjtel : $zjtel\n";1.pl
#!/usr/bin/perl
$time=time();
$count=`grep zjtel 1 | wc -l `;
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $count\n":
測試:在一個文件夾下有199個純文本文件,要掃描265個關鍵字,分別用pattern-match
和grep兩種方式進行掃描,統計掃描時間。 shell
結果:均查出6354行。 less
pattern-match:2173 seconds; 測試
grep1: 888 seconds;
spa
grep2: 193 seconds; 命令行
參考代碼以下: scala
pattern-match: code
use strict; use File::Basename; #在一個目錄的文件文件中查找包含關鍵字的 <文件名>:<行數>:<行內容> my ($dir,$keywords)= @ARGV; my @filenames=glob "$dir*"; open KEY,"<$keywords" or die "Can't open $keywords:$!"; my @keywords=<KEY>; close KEY; my $num_key=scalar @keywords; my @match_lines; my $time=time(); foreach my $file(@filenames){ eval{ open FILE,"<$file" or die "Can't open $file:$!" }; if($@){ print $@; next; } $n=1; while my $line(<FILE>){ chomp $line; foreach my $key(@keywords){ if($line=~m/$key/){ $context="$file:$n:$line\n"; push @match_lines,$context; } }$n++; } close(file); } open RS,">result_file_pattern"; foreach(@match_lines){ print RS $_; } close RS; $time=time()-$time; print "Patter-match ($num_key keywords) end:$time seconds\n";
grep1: 分別掃每一個keywords orm
use strict; use File::Basename; #在一個目錄的文件文件中查找包含關鍵字的 <文件名>:<行數>:<行內容> my ($dir,$keywords)= @ARGV; my @filename=glob "$dir*"; open KEY,"<$keywords" or die "Can't open $keywords"; my @keywords=<KEY>; close KEY; my $num_key=scalar @keywords; my @match_lines; my $time=time(); foreach my $file(@filenames){ foreach $key(@keywords){ chomp $key; next unless ($key); my $m_keyword = "\\<$key\\>\|\[\^a-z\]$key\$\|\^$key\[\^a-z\]\|\[\^a-z\]$key\[\^a-z\]"; my @sub_match_lines eval{ @sub_match_lines=`grep -EnriIHs $m_keyword $file` or die "grep Error:$!" }; if($@}{ print "$@"; next; } push @match_lines,@sub_match_lines; } } open RS,">result_file_grep"; foreach(@match_lines){ print RS $_; } close RS; $time=time()-$time; print "Grep ($num_key keywords) end : $time seconds\n"; //若是直接將$context print到RS句柄和如今這種方式是否有區別?
grep2: 經265個keywords放在一個匹配字符串中掃描 遞歸
use strict; use File::Basename; my ($dir,$keywords)= @ARGV; my @filenames=glob "$dir*"; my $num=scalar @filenames; print $num; open KEY,"<$keywords" or die "Can't open $keywords"; my $all_keys; my $i=0; foreach my $key(<KEY>){ chomp $key; next unless($key); my $m_keyword = "\\<$key\\>\|\[\^a-z\]$key\$\|\^$key\[\^a-z\]\|\[\^a-z\]$key\[\^a-z\]"; if($i==0){ $all_keys.="$m_keyword"; $i=1; }else{ $all_keys.="\|$m_keyword"; } } close KEY; my @match_lines; my $time1=time(); foreach my $file(@filenames){ chomp $file; print $file."\n"; my @sub_match_lines; eval{ my $grep ="grep -EnriIHs \"$all_keys\" $file"; @sub_match_lines=`$grep` or die "$grep Error:$!"; }; if($@){ print $@; next; } push @match_lines,@sub_match_lines; } open RS,">result_file_grep2"; foreach(@match_lines){ print RS $_; } close RS; $time=time()-$time; print "Grep end:$time\n";
File::Basename模塊:
File::Basename - Parse file paths into directory, filename and suffix.
File::Basename中經常使用的方法有fileparse, basename, dirname。
fileparse方法會傳回包含路經名稱三個部份的串列。
basename方法傳回路經位置。
basename方法傳回檔案名稱。
($name,$path,$suffix) = fileparse($fullname,@suffixlist);
my $filename = fileparse("/foo/bar/baz.txt", qr/\Q.txt\E/);
目錄句柄操做:
opendir,readdir,closedir
opendir只能返回目錄下文件的不帶路徑的文件名。
opendir(DIRHANDLE,$dir) or die "Can't open $dir:$!"; my @filenames=sort readdir(DIRHANDLE); closedir(DIRHANDLE);
Glob:
在shell中會將命令行的文件名模式擴展成全部匹配的文件名,這就成爲globbing(文件名匹配模式)。
在Perl中經過glob操做符來實現。
my @all_files=glob "dir/*" ;#保存dir目錄下全部文件名和目錄,除了以點號開頭的隱藏文件。
如:目錄/file/下有a.txt,b.txt,和C文件夾,D文件夾;則glob "/file/*" ;則結果是/file/a.txt ;/file/b.txt ;/file/C;/file/D;
等同於my @all_files=<*>;
my @file=<FILE>;#表示讀取文件內容;
my @file_dirs=<FILE/*>;#表示glob;
另外linux grep -r或-R能夠遞歸遍歷文件目錄下的文件