node.js 爬網頁中文問題

時間 2019-11-11

標籤 node.js node 爬網中文問題欄目 Node.js 简体版

原文原文鏈接

得感謝這個做者。挺好。html

https://github.com/sylvinus/node-crawlernode

按照網站例子作，一取中文頁面。亂碼。git

bing, baidu一番。github

獲得的結果不理想。但知道官關鍵詞是iconv。網站

看了一眼crawler.js代碼原來作好了。ui

只是初始化要設置options的屬性。forceUTF8爲true。 incomingEncoding爲gb2312。編碼

源代碼中有TODO註釋就分析 html中的內容獲得編碼。但願讀到此文的同窗實現一下。url

var Crawler = require('crawler');
var url = require('url');
var fs = require('fs');spa

var c = new Crawler({
   maxConnections:1,
   debug:true,
   forceUTF8:true,                  //
   incomingEncoding:'gb2312',
   callback:function(error,result,$) {
       //console.log($.html());
       fs.writeFileSync('sina.html', $.html(), 'utf8');

        $('a').each(function(index, a) {
            var href = $(a).attr('href');
            //console.log(href);
        });
   },debug