宅男福利-perl爬虫-下载美女网站图片

今早想看看go的,结果google带我进入了一个go爬虫的程序的项目,下载试用了一下,发现居然不理想,想想perl不是专干这个的嘛,所以就有了以下代码

use strict;
use strict;
use WWW::Mechanize;
use Bloom::Filter;
use 5.010;
#use URI;

$|=1;

my $dir="./images/";
if(!-d $dir){
	mkdir $dir;
}

my $filter = Bloom::Filter->new(capacity => 100000, error_rate => 0.0001);

my $mech = WWW::Mechanize->new(stack_depth     => 0, timeout         => 10,autocheck       => 0);
$mech->agent_alias( 'Windows IE 6' );
$mech->add_header( Referer => 'https://meizi.us/' );

my @queue;

push @queue,"https://meizi.us/";

while(my $http=pop(@queue)){
	say "open url:".$http;
	$mech->get($http);
	for my $link($mech->links()){
		my $url=$link->url();
		#say "link $url";
		if($url=~/page=/){
			if(!$filter->check($url)){
				$filter->add($url);
				push @queue,$url;
			}
		}

	}
	for my $img($mech->images){
		my $url=$img->url;
		#say "image  $url";
		if($url=~/small\.jpg/){
			
			$url=~s/_small//;
			say "download image: $url";
			my ($file) = $url =~ m|([^/]+\z)|;;
			#say $file;
			my $ff=$dir.$file;
			if(!-f $ff){
				eval{
					$mech->get($url, ':content_file' => $dir.$file);
				};
				next if($@);
			}
			#exit;
		}
	}
	say "New counter:",scalar (@queue);
	sleep 1;
}

效果嘛,有点不好意思发了,反正准备一个大硬盘就对了

参考

http://www.php-oa.com/2013/05/24/mojo-perl-crawler.html

https://metacpan.org/pod/WWW::Mechanize#new

http://stackoverflow.com/questions/31539687/how-do-i-download-an-image-file-from-a-website-using-wwwmechanize

https://github.com/qibin0506/Meizar  –前面所说的go 项目