上次項目中用到<<PHP採集淘寶商品>>php
此方法有一個缺點,就是執行效率問題.一個商品採集平均須要0.8秒.那10000個商品採集完須要2個半小時.html
首先想到的解決辦法是併發.mysql
可是PHP不支持併發(這裏是指經過PHP命令執行PHP文件,若是經過apache或nginx等作服務器是能夠併發的,是併發訪問,不能在程序中實現併發).nginx
經過Shell把對php命令推到後臺執行,以達到併發的效果.sql
總體思路:數據庫
1.在Shell中鏈接數據庫,取出須要更新的產品apache
2.Shell中對數據進行循環,把商品id,price,url傳遞給PHP執行,將執行過程推到後臺json
3.每循環20條使程序暫停5秒,達到控制併發數的目的服務器
4.php獲得id,price,url參數後,經過URL進行採集,並返回現價併發
5.將現價和原數據庫中的價格進行比較,若是有變化或下架則更新.
6.將執行結果寫入日誌文件.
Shell
#!/bin/sh #updateprice.sh j=0 currcyline=0; #查詢數據庫 for i in `/usr/local/bin/mysql -uroot -pshop123 shop -e"SELECT id,price,url FROM s_goods WHERE url!='' AND keyid like 'taobao%' AND is_off_sale=0 ORDER BY id DESC "` do if [ $j -gt 2 ]; then #前3個循環分別爲id,price,url這至關於表頭,不須要進行操做,因此從第3開始. line=$(($j%3)) case $line in 0) currcyline=$(($currcyline+1)) s=$(($currcyline%20)) if [ $s -eq 0 ]; then sleep 5 #每循環20次休息5秒,以此來控制避免產生過多的後臺進行,使服務器壓力過大或死機. fi id=$i;; 1) price=$i;; 2) url=$i; #echo id:${id} price:${price} url:${url} { #調用php命令執行PHP文件. re=`/usr/local/bin/php /var/www/9384shop/cron/goodsupdate.php ${id} ${price} "${url}"` }& #此&爲推到後臺執行, 關鍵 esac fi j=$(($j+1)) done wait #等待後臺進行執行完成
PHP:
<?php /* ============================================================================ Name : goodsupdate.php Author : 風飄無痕 Version : Copyright : Your copyright notice Description : Collect taobao goods ============================================================================ */ //$argv爲獲取命令中的參數 $s=microtime(1); $id=$argv[1]; $oldprice=$argv[2]; $price=getPrice($argv[3]); $t=microtime(1)-$s; $r=array(); $r[]=date('Y-m-d H:i:s'); $r[]=$id; $r[]=ceil($t*1000)/1000; if($price=='soldout'){ $r[]="OutStock"; $con=mysql_connect('localhost','shop','shop123'); mysql_select_db("shop", $con); mysql_query("UPDATE s_goods SET is_off_sale=1 WHERE id=".$id); mysql_close($con); } elseif($price===false) $r[]= 'FALSE'; elseif($price==$oldprice) $r[]='EQUAL'; else{ $r[]="UPDATE"; $r[]=$oldprice; $r[]=$price; $con=mysql_connect('localhost','shop','shop123'); mysql_select_db("shop", $con); mysql_query("UPDATE s_goods SET price=".$price." WHERE id=".$id); mysql_close($con); } //以日誌的形式保存執行過程 $h=fopen('/home/staff/www/9384shop/log/goodsUpdate.log','a+'); fputcsv($h,$r); fclose($h); function getPrice($url,$time=1){ $des_url=''; $ch = curl_init(); curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0'); curl_setopt($ch, CURLOPT_REFERER,'http://www.tmall.com/');//採集淘寶商品必須設置此項 curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//設置輸出方式, 0爲自動輸出返回的內容, 1爲返回輸出的內容,但不自動輸出. curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30); //timeout on connect curl_setopt($ch, CURLOPT_TIMEOUT, 30); //timeout on response curl_setopt($ch, CURLOPT_HEADER, 1);//是否輸出頭信息,0爲不輸出,非零則輸出 curl_setopt($ch, CURLOPT_MAXREDIRS, 10 ); curl_setopt($ch, CURLOPT_URL, $url); $content = curl_exec($ch); curl_close($ch); if(preg_match('/noitem\.htm/',$content)){ return 'soldout'; }elseif(preg_match("/'reservePrice'\s*:\s*'([\d\.]+?)',/",$content,$price)){ $price = (float)$price[1]; }elseif(preg_match('/price:([\d\.]+?),/',$content,$price)){ $price = (float)$price[1]; } if(!$price){ preg_match('/id=(\d+)+/',$url,$temp); $url2="http://mdskip.taobao.com/core/initItemDetail.htm?itemId=".$temp[1]; $ch = curl_init(); curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" ); curl_setopt( $ch, CURLOPT_URL, $url2 ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_ENCODING, "" ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_REFERER, 'http://www.tmall.com' ); curl_setopt( $ch, CURLOPT_AUTOREFERER, true ); curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false ); curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, 10 ); curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 ); $price_content = curl_exec( $ch ); $response = curl_getinfo( $ch ); curl_close ( $ch ); $price_content=json_decode(iconv('gbk','utf-8',preg_replace('/(\d{10,}):/','"${1}":',$price_content)),true); $priceinfo=$price_content['defaultModel']['itemPriceResultDO']['priceInfo']; $price=array(); if(is_array($priceinfo)){ foreach ($priceinfo as $v){ $price[]=$v['price']; if(is_array($v['promotionList'])){ foreach ($v['promotionList'] as $v2){ $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price']; } } if(is_array($v['suggestivePromotionList'])){ foreach ($v['suggestivePromotionList'] as $v2){ $price[]=$v2['extraPromPrice']?$v2['extraPromPrice']:$v2['price']; } } } } $price=count($price)>0?min($price):false; } if($price) return $price; elseif($time<3) return getPrice($url,$time+1);//若是沒有取到價格遞歸執行,最多執行3次. else return false; }
執行結果:
tail -10 goodsUpdate.log "2014-03-21 13:45:34",13357,0.273,EQUAL "2014-03-21 13:45:35",13380,5.883,EQUAL "2014-03-21 13:45:35",13343,0.914,EQUAL "2014-03-21 13:45:35",13344,0.923,EQUAL "2014-03-21 13:45:35",13347,0.927,UPDATE,599.00,181.00 "2014-03-21 13:45:35",13339,0.908,EQUAL "2014-03-21 13:45:35",13342,0.93,EQUAL "2014-03-21 13:45:35",13348,0.933,EQUAL "2014-03-21 13:45:35",13349,0.946,UPDATE,1547.00,1877.00 "2014-03-21 13:45:35",13338,0.947,EQUAL
此方法比只用PHP更新大大節約了時間,更新2萬個商品大約是半小時.可是線程數很是很差控制