Linux下利用Shell使PHP併發採集淘寶產品

上次項目中用到<<PHP採集淘寶商品>>php

此方法有一個缺點,就是執行效率問題.一個商品採集平均須要0.8秒.那10000個商品採集完須要2個半小時.html

首先想到的解決辦法是併發.mysql

可是PHP不支持併發(這裏是指經過PHP命令執行PHP文件,若是經過apache或nginx等作服務器是能夠併發的,是併發訪問,不能在程序中實現併發).nginx

經過Shell把對php命令推到後臺執行,以達到併發的效果.sql

總體思路:數據庫

  1.在Shell中鏈接數據庫,取出須要更新的產品apache

  2.Shell中對數據進行循環,把商品id,price,url傳遞給PHP執行,將執行過程推到後臺json

  3.每循環20條使程序暫停5秒,達到控制併發數的目的服務器

  4.php獲得id,price,url參數後,經過URL進行採集,並返回現價併發

  5.將現價和原數據庫中的價格進行比較,若是有變化或下架則更新.

  6.將執行結果寫入日誌文件.

Shell

#!/bin/sh
#updateprice.sh
j=0
currcyline=0;
#查詢數據庫
for i in `/usr/local/bin/mysql -uroot -pshop123 shop -e"SELECT id,price,url FROM s_goods WHERE url!='' AND keyid like 'taobao%' AND is_off_sale=0 ORDER BY id DESC  "` 
do
        if [ $j -gt 2 ]; then
        #前3個循環分別爲id,price,url這至關於表頭,不須要進行操做,因此從第3開始.
                line=$(($j%3))
                case $line in
                0)
                        currcyline=$(($currcyline+1))

                        s=$(($currcyline%20))

                        if [ $s -eq 0 ]; then
                                sleep 5  #每循環20次休息5秒,以此來控制避免產生過多的後臺進行,使服務器壓力過大或死機.
                        fi
                        id=$i;;
                1)
                        price=$i;;

                2)
                        url=$i;
                        #echo id:${id}  price:${price}  url:${url}

                        {
                                                                #調用php命令執行PHP文件.
                                re=`/usr/local/bin/php /var/www/9384shop/cron/goodsupdate.php ${id} ${price} "${url}"` 
                        }&
                        #此&爲推到後臺執行, 關鍵
                esac


        fi
        j=$(($j+1))
done
wait
#等待後臺進行執行完成

 

PHP:

<?php
/*
 ============================================================================
 Name        : goodsupdate.php
 Author      : 風飄無痕
 Version     :
 Copyright   : Your copyright notice
 Description : Collect taobao goods
 ============================================================================
 */

//$argv爲獲取命令中的參數
$s=microtime(1);
$id=$argv[1];
$oldprice=$argv[2];
$price=getPrice($argv[3]);
$t=microtime(1)-$s;
$r=array();
$r[]=date('Y-m-d H:i:s');
$r[]=$id;
$r[]=ceil($t*1000)/1000;
if($price=='soldout'){
        $r[]="OutStock";
        $con=mysql_connect('localhost','shop','shop123');
        mysql_select_db("shop", $con);
        mysql_query("UPDATE s_goods SET is_off_sale=1 WHERE id=".$id);
        mysql_close($con);
}
elseif($price===false) $r[]= 'FALSE';
elseif($price==$oldprice) $r[]='EQUAL';
else{
        $r[]="UPDATE";
        $r[]=$oldprice;
        $r[]=$price;
        $con=mysql_connect('localhost','shop','shop123');
        mysql_select_db("shop", $con);
        mysql_query("UPDATE s_goods SET price=".$price." WHERE id=".$id);
        mysql_close($con);
}
//以日誌的形式保存執行過程
$h=fopen('/home/staff/www/9384shop/log/goodsUpdate.log','a+');

fputcsv($h,$r);
fclose($h);

function getPrice($url,$time=1){
        $des_url='';
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0');
        curl_setopt($ch, CURLOPT_REFERER,'http://www.tmall.com/');//採集淘寶商品必須設置此項
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//設置輸出方式, 0爲自動輸出返回的內容, 1爲返回輸出的內容,但不自動輸出.
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); //timeout on connect
        curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout on response
        curl_setopt($ch, CURLOPT_HEADER, 1);//是否輸出頭信息,0爲不輸出,非零則輸出
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10 );
        curl_setopt($ch, CURLOPT_URL, $url);
        $content = curl_exec($ch);
        curl_close($ch);
        if(preg_match('/noitem\.htm/',$content)){
                return 'soldout';
        }elseif(preg_match("/'reservePrice'\s*:\s*'([\d\.]+?)',/",$content,$price)){
                $price = (float)$price[1];
        }elseif(preg_match('/price:([\d\.]+?),/',$content,$price)){
                $price = (float)$price[1];
        }
        if(!$price){
                preg_match('/id=(\d+)+/',$url,$temp);
                $url2="http://mdskip.taobao.com/core/initItemDetail.htm?itemId=".$temp[1];
                $ch = curl_init();
                curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
                curl_setopt( $ch, CURLOPT_URL, $url2 );
                curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
                curl_setopt( $ch, CURLOPT_ENCODING, "" );
                curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
                curl_setopt( $ch, CURLOPT_REFERER, 'http://www.tmall.com' );
                curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
                curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false );   
                curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 10 );
                curl_setopt( $ch, CURLOPT_TIMEOUT, 10 );
                curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
                $price_content = curl_exec( $ch );
                $response = curl_getinfo( $ch );
                curl_close ( $ch );
                $price_content=json_decode(iconv('gbk','utf-8',preg_replace('/(\d{10,}):/','"${1}":',$price_content)),true);
                $priceinfo=$price_content['defaultModel']['itemPriceResultDO']['priceInfo'];
                $price=array();
                if(is_array($priceinfo)){
                foreach ($priceinfo as $v){
                        $price[]=$v['price'];
                        if(is_array($v['promotionList'])){
                                foreach ($v['promotionList'] as $v2){
                                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                                }
                        }
                        if(is_array($v['suggestivePromotionList'])){
                                foreach ($v['suggestivePromotionList'] as $v2){
                                        $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price'];
                                }
                        }
                }
                }
                $price=count($price)>0?min($price):false;

        }
        if($price) return $price;
        elseif($time<3) return getPrice($url,$time+1);//若是沒有取到價格遞歸執行,最多執行3次.
        else return false;
}

執行結果:

tail -10 goodsUpdate.log
"2014-03-21 13:45:34",13357,0.273,EQUAL
"2014-03-21 13:45:35",13380,5.883,EQUAL
"2014-03-21 13:45:35",13343,0.914,EQUAL
"2014-03-21 13:45:35",13344,0.923,EQUAL
"2014-03-21 13:45:35",13347,0.927,UPDATE,599.00,181.00
"2014-03-21 13:45:35",13339,0.908,EQUAL
"2014-03-21 13:45:35",13342,0.93,EQUAL
"2014-03-21 13:45:35",13348,0.933,EQUAL
"2014-03-21 13:45:35",13349,0.946,UPDATE,1547.00,1877.00
"2014-03-21 13:45:35",13338,0.947,EQUAL

此方法比只用PHP更新大大節約了時間,更新2萬個商品大約是半小時.可是線程數很是很差控制

相關文章
相關標籤/搜索