經過goquery爬取知乎數據

goquery的使用

由於畢設模仿知乎作了個網站,須要點數據,因此打算爬點知乎的數據,原本想經過python寫個爬蟲,可是發現go也有個挺好用的爬蟲庫——goquery,若是你學過前端,那你徹底能夠在半個小時以內用goquery寫出一個爬蟲前端

goquery相似jquery,它是jquery的go語言版本實現,使用它,能夠很方便對HTML進行處理。python

它能夠經過HTML Element元素,也能夠經過Id選擇器,Class選擇器,以及屬性選擇器去篩選數據mysql

github:https://github.com/PuerkitoBio/goqueryjquery

如下是我爬取知乎數據的demo代碼git

package main

import (
    "fmt"
    "log"
    "net/http"
    "strconv"
    "strings"

    "github.com/PuerkitoBio/goquery"
    _ "github.com/go-sql-driver/mysql"
)

func ExampleScrape() {

    for i := 321450693; i > 321450680; i-- {
        res, err := http.Get("https://www.zhihu.com/question/" + strconv.Itoa(i))
        if err != nil || res.StatusCode != 200 {
            continue
        }

        doc, err := goquery.NewDocumentFromReader(res.Body)
        if err != nil {
            log.Fatal(err)
        }

        doc.Find(".QuestionHeader .QuestionHeader-content .QuestionHeader-main").Each(func(i int, s *goquery.Selection) {
            questionTitle := s.Find(".QuestionHeader-title").Text()
            questionContent := s.Find(".QuestionHeader-detail").Text()
            questionContent = questionContent[0 : len(questionContent)-12]

            fmt.Println("questionTitle:", questionTitle)
            fmt.Println("questionContent:", questionContent)
        })

        doc.Find(".ContentItem-actions").Each(func(i int, s *goquery.Selection) {

        })
        doc.Find(".ListShortcut .List .List-item ").Each(func(i int, s *goquery.Selection) {
            head_url, _ := s.Find("a img").Attr("src")
            author := s.Find(".AuthorInfo-head").Text()
            fmt.Println("head_url:", head_url)
            fmt.Println("author:", author)

            voters := s.Find(".Voters").Text()
            voters = strings.Split(voters, " ")[0]
            content := s.Find(".RichContent-inner").Text() //帶標籤的能夠用Html()
            createTime := s.Find(".ContentItem-time").Text()
            createTime = strings.Split(createTime, " ")[1]

            commentCount := s.Find(".ContentItem-actions span").Text()
            fmt.Println("voters:", voters)
            fmt.Println("content:", content)
            fmt.Println("createTime:", createTime)
            fmt.Println("commentCount : ", commentCount)
        })

    }

}

func main() {
    ExampleScrape()
}
相關文章
相關標籤/搜索