PHP的curl經常使用的5個例子

PHP的curl經常使用的5個例子

 

1,抓取無訪問控制文件

<?php 
     $ch= curl_init(); 
     curl_setopt($ch, CURLOPT_URL,"http://localhost/mytest/phpinfo.php"); 
     curl_setopt($ch, CURLOPT_HEADER, false); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//若是把這行註釋掉的話,就會直接輸出 
     $result=curl_exec($ch); 
     curl_close($ch); 
     ?>

 

2,使用代理進行抓取

 

爲何要使用代理進行抓取呢?以google爲例吧,若是去抓google的數據,短期內抓的很頻繁的話,你就抓取不到了。google對你的ip地址作限制這個時候,你能夠換代理從新抓。php

<?php 
     $ch= curl_init(); 
     curl_setopt($ch, CURLOPT_URL,"http://blog.51yip.com"); 
     curl_setopt($ch, CURLOPT_HEADER, false); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
     curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE); 
     curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080); 
     //url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');若是要密碼的話,加上這個 
     $result=curl_exec($ch); 
     curl_close($ch); 
     ?>

 

3,post數據後,抓取數據

 

單獨說一下數據提交數據,由於用 curl的時候,不少時候會有數據交互的,因此比較重要的。html

<?php 
     $ch= curl_init(); 
     /*在這裏須要注意的是,要提交的數據不能是二維數組或者更高
     *例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010')
     *例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')這樣會報錯的*/ 
     $data=array('name'=>'test','sex'=>1,'birth'=>'20101010'); 
     curl_setopt($ch, CURLOPT_URL,'http://localhost/mytest/curl/upload.php'); 
     curl_setopt($ch, CURLOPT_POST, 1); 
     curl_setopt($ch, CURLOPT_POSTFIELDS,$data); 
     curl_exec($ch); 
     ?>

 

在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php輸出的內容Array ( [name] => test [sex] => 1 [birth] => 20101010 )數組

 

4,抓取一些有頁面訪問控制的頁面

 

之前寫過一篇,頁面訪問控制的3種方法有興趣的能夠看一下。cookie

 

若是用上面提到的方法抓的話,會報如下錯誤curl

 

You are not authorized to view this pagepost

Youdonot have permission to view this directoryorpage using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.ui

 

這個時候,咱們就要用CURLOPT_USERPWD來進行驗證了this

<?php 
     $ch= curl_init(); 
     curl_setopt($ch, CURLOPT_URL,"http://club-china"); 
     /*CURLOPT_USERPWD主要用來破解頁面訪問控制的
     *例如平時咱們因此htpasswd產生頁面控制等。*/ 
     //curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd='); 
     curl_setopt($ch, CURLOPT_HTTPGET, 1); 
     curl_setopt($ch, CURLOPT_REFERER,"http://club-china"); 
     curl_setopt($ch, CURLOPT_HEADER, 0); 
     $result=curl_exec($ch); 
     curl_close($ch); 
     ?>

 

5,模擬登陸到sina

 

咱們要抓取數據,多是登陸之後的內容,這個時候咱們就要用到curl的模擬登陸功能了。google

<?php  
       
    functionchecklogin($user,$password) 
     { 
     if( emptyempty($user) || emptyempty($password) ) 
     { 
     return0; 
     } 
     $ch= curl_init( ); 
     curl_setopt($ch, CURLOPT_REFERER,"http://mail.sina.com.cn/index.html"); 
     curl_setopt($ch, CURLOPT_HEADER, true ); 
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true ); 
     curl_setopt($ch, CURLOPT_USERAGENT, USERAGENT ); 
     curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIEJAR ); 
     curl_setopt($ch, CURLOPT_TIMEOUT, TIMEOUT ); 
     curl_setopt($ch, CURLOPT_URL,"http://mail.sina.com.cn/cgi-bin/login.cgi"); 
     curl_setopt($ch, CURLOPT_POST, true ); 
     curl_setopt($ch, CURLOPT_POSTFIELDS,"&logintype=uid&u=".urlencode($user)."&psw=".$password); 
     $contents= curl_exec($ch); 
     curl_close($ch); 
     if( !preg_match("/Location: (.*)\\/cgi\\/index\\.php\\?check_time=(.*)\n/",$contents,$matches) ) 
     { 
     return0; 
     }else{ 
     return1; 
     } 
     }  
       
     define("USERAGENT",$_SERVER['HTTP_USER_AGENT'] ); 
     define("COOKIEJAR", tempnam("/tmp","cookie") ); 
     define("TIMEOUT", 500 );  
       
     echochecklogin("zhangying215","xtaj227"); 
     ?> 
 
打開/tmp下面的cookie文件看一下
 
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
 
mail.sina.com.cn    FALSE    /    FALSE    0    SINAMAIL-WEBFACE-SESSID    65223c4bd8900284ed463d2a3e1ac182
#HttpOnly_.sina.com.cn    TRUE    /    FALSE    0    SUE    es%3D8d96db0820c6c79922ad57d422f575e8%26ev%3Dv0%26es2%3Dcddfb8400dc5ca95902367ddcd7f57dd
.sina.com.cn    TRUE    /    FALSE    0    SUP    cv%3D1%26bt%3D1286900433%26et%3D1286986833%26lt%3D1%26uid%3D1445632344%26user%3D%25E5%25BC%25A0%25E6%2598%25A02001%26ag%3D2%26name%3Dzhangying20015%2540sina.com%26nick%3D%25E5%25BC%25A0%25E6%2598%25A02001%26sex%3D1%26ps%3D0%26email%3Dzhangying20015%2540sina.com%26dob%3D1982-07-18
#HttpOnly_.sina.com.cn    TRUE    /    FALSE    0    SID    BihcallomxMx-QZxzGrOlcSQx%2F0B%2F0cmr.NyQ%2F0B%2FcmGGalmarlmcHrcGlSmrmxmfxal_CBZ%2F_afugCmmGirBYHm0Bc%40fr5ciZiGG5i
#HttpOnly_.sina.com.cn    TRUE    /    FALSE    0    SPRIAL    bfb4102951fd5892a3fd5b42d442cd26
#HttpOnly_.sina.com.cn    TRUE    /    FALSE    0    SINA_USER    %D5%C5%D2001
相關文章
相關標籤/搜索