Puppeteer 初探

時間 2019-11-08

標籤 puppeteer 初探简体版

原文原文鏈接

木偶 Puppeteer

更友好的 Headless Chrome Node API
木偶也是有心的 (=･ω･=)

Puppeteer是什麼？

Puppeteer是一個Node庫，它提供了一個高級API來經過DevTools協議控制無頭 Chrome或Chromium ，它也能夠配置爲使用完整（非無頭）Chrome或Chromium。javascript

你能夠經過Puppeteer的提供的api直接控制Chrome模擬大部分用戶操做來進行UI Test或者做爲爬蟲訪問頁面來收集數據。前端

爲何會產生Puppeteer呢？

很早很早以前，前端就有了對 headless 瀏覽器的需求，最多的應用場景有兩個java

UI 自動化測試：擺脫手工瀏覽點擊頁面確認功能模式
爬蟲：解決頁面內容異步加載等問題

在Chrome headless 和Puppeteer出現以前，headless 瀏覽器有如下幾種：git

PhantomJS, 基於 Webkit
SlimerJS, 基於 Gecko
HtmlUnit, 基於 Rhnio
TrifleJS, 基於 Trident
Splash, 基於 Webkit

但這些都有共同的通病，環境安裝複雜，API 調用不友好github

2017 年 Chrome 官方團隊連續放了兩個大招 Headless Chrome 和對應的 NodeJS API Puppeteer，直接讓 PhantomJS 和 Selenium IDE for Firefox 做者宣佈暫停繼續維護其產品，PhantomJs的開發者更直接宣稱本身要失業了。npm

Puppeteer能作什麼？

你能夠在瀏覽器中手動完成的大部分事情均可以使用Puppteer完成
好比：api

生成頁面的屏幕截圖和PDF。
抓取SPA並生成預先呈現的內容（即「SSR」）。
自動錶單提交，UI測試，鍵盤輸入等。
建立一個最新的自動化測試環境。使用最新的的JavaScript和瀏覽器功能，直接在最新版本的Chrome瀏覽器中運行測試。
捕獲您網站的時間線跟蹤，以幫助診斷性能問題。

入門

安裝Puppeteer數組

npm install puppeteer
或者
yarn add puppeteer

Puppeteer至少須要Node v6.4.0，但若是想要使用async / await，它僅在Node v7.6.0或更高版本中受支持。

實例一截屏保存

導航到 https://example.com 並將截屏保存爲 example.png：瀏覽器

const puppeteer = require('puppeteer');
async function screenShot(url, path, name) {
    await console.log('Screen Shot ... ');
    await console.log('Save path: ' + path + name + '.png');
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url);
    await page.screenshot({path: path + name + '.png'});

    await browser.close();
}

puppeteer 默認的頁面大小爲800x600分辨率，頁面的大小能夠經過Page.setViewport()來更改less

實例二建立一個PDF

const puppeteer = require('puppeteer');

async function downloadPdf(url, path, name) {
    await console.log('Download Pdf ... ');
    await console.log('Save path: ' + path + name + '.pdf');
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    //networkidle2: consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
    await page.goto(url, {waitUntil: 'networkidle2'});
    await page.pdf({path: path + name + '.pdf', format: 'A4'});

    await browser.close();
}

實例三在渲染的頁面中執行代碼

const puppeteer = require('puppeteer');

async function getDimension(url) {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();
    await page.goto(url);

    // Get the "viewport" of the page, as reported by the page.
    const dimensions = await page.evaluate(() => {
        return {
            width: document.documentElement.clientWidth,
            height: document.documentElement.clientHeight,
            deviceScaleFactor: window.devicePixelRatio
        };
    });

    console.log('Dimensions:', dimensions);

    // await browser.close();
}

進階

page.type 獲取輸入框焦點並輸入文字

page.keyboard.press 模擬鍵盤按下某個按鍵，目前mac上組合鍵無效爲已知bug

page.waitFor 頁面等待，能夠是時間、某個元素、某個函數

page.frames() 獲取當前頁面全部的 iframe，而後根據 iframe 的名字精確獲取某個想要的 iframe

iframe.$('.srchsongst') 獲取 iframe 中的某個元素

iframe.evaluate() 在瀏覽器中執行函數，至關於在控制檯中執行函數，返回一個 Promise

Array.from 將類數組對象轉化爲對象

page.click() 點擊一個元素

iframe.$eval() 至關於在 iframe 中運行 document.queryselector 獲取指定元素，並將其做爲第一個參數傳遞

iframe.$$eval 至關於在 iframe 中運行 document.querySelectorAll 獲取指定元素數組，並將其做爲第一個參數傳遞

仍是看這篇文章吧，做者寫了兩個實例Demo，看一下代碼就能懂上面的基礎用法了。

一些默認的設置和開發調試建議

1. 使用Headless模式

Puppeteer默認以Headless模式加載Chromium，若是想加載完整的Chromium（這樣方便觀察網頁加載的效果到底是怎麼樣的），能夠執行如下命令

const browser = await puppeteer.launch({headless: false}); // default is true

2. 使執行本地版本的Chrome或者Chromium

const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});

3. 延遲執行Puppeteer

const browser = await puppeteer.launch({
   headless: false,
   slowMo: 250 // slow down by 250ms
 });

4. 獲取控制檯輸出

能夠監聽console的事件，也能夠經過evaluate來執行console

page.on('console', msg => console.log('PAGE LOG:', msg.text()));

 await page.evaluate(() => console.log(`url is ${location.href}`));

5. 設置頁面視窗大小

await page.setViewport({
        width: 1366,
        height: 768 * 2
    });

參考連接

Puppeteer的入門教程和實踐任乃千 https://www.jianshu.com/p/2f0...

官方文檔 https://github.com/GoogleChro...

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。