年中墮落-小說

本身怎麼了,改變本身能改變的,比之前好就能夠了,一個小說的下載html

命名空間引入(須要導入包HtmlAgilityPack):網絡

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using HtmlAgilityPack;
View Code

抓取內容:async

static string m_strTextPath = "D:\\333.txt";
        static void Main(string[] args)
        {
            try
            {
                LoadNovel();
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
                throw;
            }
            Console.ReadKey();
        }

        /// <summary>
        /// 下載導航頁面
        /// </summary>
        async
        static void LoadNovel()
        {
            string l_strURL = "http://www.vodtw.com/html/book/34/34009/";
            WebClient wc = new WebClient();
            wc.BaseAddress = l_strURL;
            wc.Encoding = Encoding.GetEncoding("gb2312");

            HtmlDocument doc = new HtmlDocument();
            string html = wc.DownloadString("index.html");

            doc.LoadHtml(html);
            HtmlNode navNode1 = doc.DocumentNode.SelectSingleNode("/html/body/div[7]/div[5]/dl/dd/ul");
      
            HtmlNodeCollection CNodes1 = navNode1.SelectNodes("child::li");

            //解析class 等於下面全部的a標籤的URL 存入一個List集合中
            List<string> list = new List<string>();
            foreach (HtmlNode item in CNodes1)
            {
                list.Add(item.FirstChild.Attributes["href"].Value);
            }

            //從集合中讀取URL 追加到D盤 222.txt中
            foreach (string l_str in list)
            {
                string html1 = wc.DownloadString(l_str);
                //過於頻繁的網絡請求有問題
                Thread.Sleep(100);
                doc.LoadHtml(html1);

                //章節名稱 也能夠不抓取 再內容裏面有
                //HtmlNode CaptureName = doc.DocumentNode.SelectSingleNode("html/body/div[3]/div[1]/b");
                //File.AppendAllText("D:\\333.txt", CaptureName.InnerText + "\r\n");

                //加載內容
                HtmlNode navNode2 =   doc.DocumentNode.SelectSingleNode("html/body/div[4]/div[4]");
                HtmlNodeCollection CNodes2 = navNode2.SelectNodes("child::p");
             
                foreach (HtmlNode  ddd in CNodes2)
                {
                    if (ddd.InnerText == "===========================")
                        break;
                    File.AppendAllText(m_strTextPath, ddd.InnerText+"\r\n");
                }
            }
        }
View Code

後續須要校驗抓取數據的完整性,裏面是否有雜亂的東西ide

相關文章
相關標籤/搜索