Phantomjs 進程通訊方式

時間 2019-11-18

原文原文鏈接

Phantomjs^[1]是一款無界面Webkit瀏覽器，可用於網頁自動化測試。最近一個項目涉及到Phantomjs與其餘進程間的通訊，如下介紹其餘進程中如何調用Phantomjs做數據接口。java

目的：其餘程序調用Phantomjs，以Java爲例node

1. 命令行方式git

經過命令行能夠啓動Phantomjs進程，在Java中能夠用Runtime.getRuntime.exec(String cmd)的方式。這種方式網上不少例子，這裏不詳細說。在這種方式下，每次調用Phantomjs都須要啓動一個進程，調用完退出，而開啓一次Phantomjs進程比較費時，因此這種方式不適合於生產環境。github

2. 驅動方式web

Selenium提供PhantomjsDriver，提供了在Java直接調用Phantomjs的一系列方法。這種方式下只能調用驅動提供的方法，不能直接調用js文件，不夠靈活，所以此次項目中沒有用這個方式。有須要能夠查閱PhantomjsDriver^[2]有關文檔。api

3. Webserver方式瀏覽器

Phantomjs提供了Webserver^[3]模塊，能夠用該模塊來搭建http服務器。經過Webserver監聽端口，Java發起Http請求，就能夠實現二者通訊的目的。服務器

Phantomjs充當服務端，解析一個URL對應網站的title，而後把title返回。多線程

 1 var webserver = require('webserver').create();  2 var page = require('webpage').create();  3 var system = require('system');  4 
 5 var port = system.args[1];    //取第二個參數爲端口號
 6 
 7 webserver.listen(system.args[1], function(request, response) {  8     var url = request.postRaw;        //接收post數據爲url
 9     page.open(url, function(status) { 10         var title = page.evaluate(function() { 11             return document.title; 12  }); 13  response.write(title); 14  response.close(); 15  }); 16 });

Java充當客戶端，根據URL列表查詢URL的title信息併發

 1 public class Demo {  2      public static void main(String[] arg) {  3          //要查詢的URL地址列表
 4          String[] urls = new String[]{  5              "http://www.baidu.com/",  6              "http://www.cnblogs.com/",  7              "http://www.w3school.com.cn/"
 8  };  9          for (int i=0; i<urls.length; i++) { 10              //Http類的詳細代碼不提供，可用HttpURLConnection或HttpClient封裝
11              Http http = new Http("http://127.0.0.1:9999");    //Phantomjs開放的端口
12              http.setParam(urls[i]);        //設置Post參數(URL地址)
13              http.post();                　　//發起Post請求
14  System.out.println(http.getResponse()); 15  } 16  } 17  }

把Phantomjs保存爲D:/script.js，用Phantomjs加載（phantomjs D:/script.js 9999）。Java發起請求後，Phantomjs接收Request的Post參數做爲要查詢的URL地址，獲取該網站的title後經過Response返回。Java收到Response後，把title打印到console。

在現階段最新版本中，Webserver模塊並未開發得很完善，尤爲是在併發方面。因此不建議將這種方式用於大併發的狀況下。

4. std方式

進程間最基本的通訊方式，相比起Webserver，穩定性更好一些。但一樣，不適合用於大併發的狀況下。

先看Java端，PhantomjsConnector用於維護Java和Phantomjs之間的std流。

 1 public class PhantomjsConnector {  2     private String pid;        //進程PID
 3     private OutputStream out;  4     private PrintWriter writer;  5     private InputStream in;  6     private InputStreamReader inReader;  7     private BufferedReader reader;  8     
 9     public PhantomjsConnector() { 10         try { 11             Process process = Runtime.getRuntime().exec("phantomjs D:/script.js");    //經過命令行啓動phantomjs 12             //初始化IO流
13             in = process.getInputStream(); 14             inReader = new InputStreamReader(in, "utf-8"); 15             reader = new BufferedReader(inReader); 16             pid = reader.readLine();        //從phantomjs腳本中獲取本進程的PID
17             out = process.getOutputStream(); 18             writer = new PrintWriter(out); 19         } catch (Exception e) { 20  close(); 21  e.printStackTrace(); 22  } 23  } 24     
25     //結束當前維護的進程
26     public void kill() { 27         try { 28             close();    //先關閉IO流
29             Runtime.getRuntime().exec("taskkill /F /PID " + pid);    //Windows下清除進程的命令，Linux則爲kill -9 pid
30         } catch (Exception e) { 31  e.printStackTrace(); 32  } 33  } 34     
35     //執行查詢
36     public String exec(String url) throws IOException { 37         writer.println(url);        　　//把url輸出到phantomjs
38         writer.flush();                //當即輸出
39         return reader.readLine();    　//讀取phantomjs的輸出
40  } 41     
42     //關閉IO
43     private void close() { 44         try { 45             if (in!=null) in.close(); 46             if (inReader!=null) inReader.close(); 47             if (reader!=null) reader.close(); 48             if (out!=null) out.close(); 49             if (writer!=null) writer.close(); 50         } catch (IOException e) { 51  e.printStackTrace(); 52  } 53  } 54 }

當實例化時，java經過命令行啓動Phantomjs進程並保持IO流的鏈接。執行查詢時，向流輸出字符(url)，而後從流中讀取內容(Phantomjs返回的title)。程序完成後可根據pid結束Phantomjs進程。

主類中，只須要循環執行PhantomjsConnector的exec方法。

 1 public class Demo {  2     public static void main(String[] arg) throws IOException {  3         //要查詢的URL地址列表
 4         String[] urls = new String[]{  5             "http://www.baidu.com/",  6             "http://www.cnblogs.com/",  7             "http://www.w3school.com.cn/"
 8  };  9         PhantomjsConnector connector = new PhantomjsConnector(); 10         
11         for (int i=0; i<urls.length; i++) { 12             String title = connector.exec(urls[i]); 13  System.out.println(title); 14  } 15         
16         connector.kill();    //最後結束該進程
17  } 18 }

再看Phantomjs端，在js腳本中首先返回本次進程的pid，而後循環監聽std輸入的內容。

 1 var system = require("system");  2 console.log(system.pid);    //本次進程pid
 3 
 4 //監聽std輸入
 5 var listen = function() {  6     var url = system.stdin.readLine();    //接收std內容爲url
 7     var page = require('webpage').create();  8     page.open(url, function(status) {  9         var title = page.evaluate(function() { 10             return document.title; 11  }); 12         system.stdout.writeLine(title);    //再經過stdout輸出
13         system.stdout.flush();            //當即輸出
14 
15         //稍做延遲再開始下一次監聽
16         setTimeout(function() { 17  listen(); 18         }, 100); 19  }); 20 }; 21 
22 listen();

在只啓動一個進程的狀況下，Phantomjs只能同時間執行一個查詢操做，一次查詢結束後才能監聽下一個url。在多線程場景下，能夠在Java端啓動多個Phantomjs的進程，對應多個PhantomjsConnector實例，根據需求把各個類動態分給不一樣的線程(BlockingQueue可實現)，具體不做陳述。

除此之外，Phantomjs與其餘語言還有一些集成化驅動，好比與nodejs的phantomjs-node模塊之類。以上的只是基本的幾種方式，具體選用什麼方式通訊，仍是要根據業務需求決定。

參考資料及引用：

[1] Phantomjs：Phantomjs官網.
http://phantomjs.org/

[2] PhantomjsDriver：Github. GhostDriver.
https://github.com/detro/ghostdriver

[3] Webserver模塊：Phantomjs官網. Phantomjs Api.
http://phantomjs.org/api/webserver/