.net core + headless chrome實現動態網頁爬蟲

時間 2019-12-18

標籤 core headless chrome 實現動態網頁爬蟲欄目 Chrome 简体版

原文原文鏈接

通常的http請求庫只可以抓取到網頁的靜態內容，若是想抓取經過js動態生成的內容能夠使用沒有gui的browser庫，以前許多人會使用phantomjs做爲headless browser，不過如今phantomjs團隊已經宣佈中止更新工做，須要一款替代庫，因而這裏就採用了headless chrome來進行動態網頁內容抓取。php

爬蟲實現以下:html

1.在.net core項目中引用以下nuget包linux

Selenium.WebDriver
Selenium.WebDriver.ChromeDriver

注意:引用Selenium.WebDriver.ChromeDriver後，會在代碼目錄中copy出chromedriver.exe文件，exe文件只能運行與windows平臺下，因此咱們須要去網站(http://chromedriver.storage.googleapis.com/index.html)下載當前最新的chromedriver程序linux版，並將程序添加到項目中，屬性設置爲複製到輸出目錄。這樣導出的程序才能夠在linux和windwos平臺下都正常運行。chrome

注意2:爬蟲的宿主服務器中須要安裝和chromedriver一致版本的chrome版本(兩個都安裝最新版就能夠)windows

2.爬蟲代碼api

class Program
    {
        static void Main(string[] args)
        {
            ChromeOptions op = new ChromeOptions();
            op.AddArguments("--headless");//開啓無gui模式
            op.AddArguments("--no-sandbox");//停用沙箱以在Linux中正常運行
            ChromeDriver cd = new ChromeDriver(Environment.CurrentDirectory, op,TimeSpan.FromSeconds(180));
            cd.Navigate().GoToUrl("http://chart.icaile.com/sd11x5.php");
            string text = cd.FindElementById("fixedtable").Text;
            cd.Quit();
            Console.WriteLine(text);
            Console.Read();
        }
    }

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。