nodejs爬蟲 http，cheerio，mysql模塊

時間 2019-11-07

標籤 nodejs 爬蟲 http cheerio mysql 模塊欄目網絡爬蟲简体版

原文原文鏈接

nodejs相關模塊

獲取網頁內容（httprequestsuperagent等）html

篩選網頁信息（cheerio）node

輸出或存儲信息（consolefsmongodbmysql等）mysql

一、使用 request 模塊來獲取網頁內容

var request = require('request');
    // 經過 GET 請求來讀取 http://cnodejs.org/ 的內容
    request('http://cnodejs.org/', function (error, response, body) {
        if (!error && response.statusCode == 200) {
            // 輸出網頁內容
            console.log(body);
        }
    });

若是是其餘的請求方法，或者須要指定請求頭等信息，能夠在第一個參數中傳入一個對象來指定，好比：jquery

var request = require('request');
request({
    url:    'http://cnodejs.org/',   // 請求的URL
    method: 'GET',                   // 請求方法
    headers: {                       // 指定請求頭
        'Accept-Language': 'zh-CN,zh;q=0.8',         // 指定 Accept-Language
        'Cookie': '__utma=4454.11221.455353.21.143;' // 指定 Cookie
    }
}, function (error, response, body) {
    if (!error && response.statusCode == 200) {
        console.log(body) // 輸出網頁內容
    }
});

二、使用 cheerio 模塊來提取網頁中的數據

cheerio 是一個 jQuery Core 的子集，其實現了 jQuery Core 中瀏覽器無關的 DOM 操做 API，如下是一個簡單的示例：sql

var cheerio = require('cheerio');

// 經過 load 方法把 HTML 代碼轉換成一個 jQuery 對象
var $ = cheerio.load('<h2 class="title">Hello world</h2>');

// 能夠使用與 jQuery 同樣的語法來操做
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

console.log($.html());
// 將輸出 <h2 class="title welcome">Hello there!</h2>

三、使用 mysql 模塊來將數據儲存到數據庫

mysql 模塊內置了鏈接池機制，如下是一個簡單的使用示例：mongodb

var mysql = require('mysql');

// 建立數據庫鏈接池
var pool  = mysql.createPool({
  host:           'localhost', // 數據庫地址
  user:           'root',      // 數據庫用戶
  password:        '',         // 對應的密碼
  database:        'example',  // 數據庫名稱
  connectionLimit: 10          // 最大鏈接數，默認爲10
});

// 在使用 SQL 查詢前，須要調用 pool.getConnection() 來取得一個鏈接
pool.getConnection(function(err, connection) {
  if (err) throw err;

  // connection 即爲當前一個可用的數據庫鏈接
});

參考文檔

jquery選擇器總結 https://www.cnblogs.com/xiaxuexiaoab/p/7091527.html 
nodejs爬蟲 https://www.cnblogs.com/xiaxuexiaoab/p/7124956.html

歡迎評論數據庫

1. nodejs express cheerio request爬蟲
2. nodejs爬蟲筆記(一)---request與cheerio等模塊的應用
3. nodejs爬蟲
4. nodejs的爬蟲
5. Nodejs小爬蟲
6. NodeJS爬蟲
7. nodejs實現爬蟲
8. nodejs製做小爬蟲
9. nodejs 小爬蟲
10. nodejs爬蟲入門
更多相關文章...
• Lua 模塊與包 - Lua 教程
• DTD - XML 構建模塊 - DTD 教程
• 委託模式
• 漫談MySQL的鎖機制

相關標籤/搜索

nodeJS爬蟲

nodejs+http+fs+request+cheerio

nodejs+request+cheerio

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。