本身怎麼了,改變本身能改變的,比之前好就能夠了,一個小說的下載html
命名空間引入(須要導入包HtmlAgilityPack):網絡
using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Net; using System.Net.Http; using System.Text; using System.Threading; using System.Threading.Tasks; using HtmlAgilityPack;
抓取內容:async
static string m_strTextPath = "D:\\333.txt"; static void Main(string[] args) { try { LoadNovel(); } catch (Exception e) { Console.WriteLine(e.ToString()); throw; } Console.ReadKey(); } /// <summary> /// 下載導航頁面 /// </summary> async static void LoadNovel() { string l_strURL = "http://www.vodtw.com/html/book/34/34009/"; WebClient wc = new WebClient(); wc.BaseAddress = l_strURL; wc.Encoding = Encoding.GetEncoding("gb2312"); HtmlDocument doc = new HtmlDocument(); string html = wc.DownloadString("index.html"); doc.LoadHtml(html); HtmlNode navNode1 = doc.DocumentNode.SelectSingleNode("/html/body/div[7]/div[5]/dl/dd/ul"); HtmlNodeCollection CNodes1 = navNode1.SelectNodes("child::li"); //解析class 等於下面全部的a標籤的URL 存入一個List集合中 List<string> list = new List<string>(); foreach (HtmlNode item in CNodes1) { list.Add(item.FirstChild.Attributes["href"].Value); } //從集合中讀取URL 追加到D盤 222.txt中 foreach (string l_str in list) { string html1 = wc.DownloadString(l_str); //過於頻繁的網絡請求有問題 Thread.Sleep(100); doc.LoadHtml(html1); //章節名稱 也能夠不抓取 再內容裏面有 //HtmlNode CaptureName = doc.DocumentNode.SelectSingleNode("html/body/div[3]/div[1]/b"); //File.AppendAllText("D:\\333.txt", CaptureName.InnerText + "\r\n"); //加載內容 HtmlNode navNode2 = doc.DocumentNode.SelectSingleNode("html/body/div[4]/div[4]"); HtmlNodeCollection CNodes2 = navNode2.SelectNodes("child::p"); foreach (HtmlNode ddd in CNodes2) { if (ddd.InnerText == "===========================") break; File.AppendAllText(m_strTextPath, ddd.InnerText+"\r\n"); } } }
後續須要校驗抓取數據的完整性,裏面是否有雜亂的東西ide