PHP 爬取須要運行 JS 的頁面 (Run JS While Grabing Web Page With PHP)

參照安裝javascript

以CentOS爲例,下載Linux 64-bit版本(32/64區分好)php

wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2

解壓文件html

tar xvf phantomjs-2.1.1-linux-x86_64.tar.bz2

移動文件到bin目錄下java

cp phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin

 寫一個js來進行判斷linux

/**
 * Created by liwei on 2017/3/6.
 */
console.log(' hello world');

運行命令laravel

phantomjs helloworld.js

也能夠參考git

https://laravel-china.org/topics/3590/php-crawls-the-page-that-needs-to-run-js-run-js-grabing-web-page-with-php-whilegithub

和官網http://jonnnnyw.github.io/php-phantomjs/4.0/web

 

<?php
/**
 * Created by PhpStorm.
 * User: liwei
 * Date: 2017/3/6
 * Time: 下午4:18
 */

require __DIR__ . '/vendor/autoload.php';


use JonnyW\PhantomJs\Client;

$client = Client::getInstance();

/**
 * @see JonnyW\PhantomJs\Http\Request
 **/
$client->getEngine()->setPath('/usr/local/bin/phantomjs');

$request = $client->getMessageFactory()->createRequest('http://www.****.com/ckplayer/js/play.php?v=45554f535e1b140c5b0f4a520342544e0259144042575e5102150c5b5e56544f4b480f4a4c4c575b03494f4e411e4d4a491c53131d130f5b1c4b591546597efd2a&t=qq', 'GET');
$request->setHeaders(["Referer"=>"http://www.****.com/play/45453-0-1.html"]);


/**
 * @see JonnyW\PhantomJs\Http\Response
 **/
$response = $client->getMessageFactory()->createResponse();

// Send the request
$client->send($request, $response);

$html =  $response->getContent();
$reg = '|\#url=(http[^"]+)|';
if(preg_match($reg, $html ,$matches)){

    $url = urldecode($matches[1]);
    echo $url;
    //var_dump($matches);
}
exit;

if($response->getStatus() === 200) {

    // Dump the requested page content
    echo $response->getContent();
}
相關文章
相關標籤/搜索