按照國際慣例,我首先應該介紹下Jsoup是個什麼東西,而後在介紹下具體用法,而後在來個demo演示,其實我也是這麼想的,小編今天花了一天的時間從學習—>解析頁面,總算是成果圓滿了吧,啊哈,可是,一個不會總結的程序猿不是一個帥氣的程序猿,啊哈,這就意味着我是個帥氣的猿猿java
----------------------------------------------------------------------------------------------------------------------node
1、什麼是Jsoup?post
官網網站:http://jsoup.org/ 學習
可在官網下載對應的jar網站
通俗的將Jsoup就是一個解析網頁的東西,而後咱們在看下官方的解釋:編碼
官方解釋就是高大上~url
2、Jsoup的基本用法(http://www.open-open.com/jsoup/parsing-a-document.htm)spa
網站寫的很詳細,我想聰明的你們看看開發文檔一看就懂…恩,有道理,正所謂帥的人都能看懂..code
3、demo演示 解析的url:http://sex.guokr.com/htm
寫在前面:忽略連接內容,小編就是找到一個不錯的網站~,啊哈,別想歪了
1.解析一個ul –>li
咱們來看下這段的源代碼:
由此咱們知道了大致的樣子,如今咱們來寫編碼
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; /** * 使用Jsoup解析url * @tag:url :http://sex.guokr.com/ * Created by monster on 2015/12/11. */ public class JsoupZX { public static void main(String[] args){ final String url="http://sex.guokr.com/" ; try { Document doc = Jsoup.connect(url).get(); Elements container = doc.getElementsByClass("container"); Document containerDoc = Jsoup.parse(container.toString()); Elements module = containerDoc.getElementsByClass("module-list"); Document moduleDoc = Jsoup.parse(module.toString()); //Elements clearfix = moduleDoc.getElementsByClass("clearfix"); //DOM的形式 Elements clearfix = moduleDoc.select(".clearfix"); //選擇器的形式 for (Element clearfixli : clearfix){ Document clearfixliDoc = Jsoup.parse(clearfixli.toString()); Elements kind = clearfixliDoc.select(".board-tag"); //選擇器的形式 Elements title = clearfixliDoc.select(".tit-post"); Elements author = clearfixliDoc.select("span a"); System.out.println("類別"+kind.text()); //分類 System.out.println("標題"+title.text()); //標題 System.out.println("做者"+author.text()); //做者 System.out.println("詳情連接"+title.attr("href")); //標題下的連接 System.out.println("====================="); } // String title = clearfixli.getElementsByTag("a").text(); // System.out.println(clearfix); } catch (IOException e) { e.printStackTrace(); } } }
結果:
=================================================================================================
2.解析詳情頁面和評論
連接:http://sex.guokr.com/post/1100992/
上述就是頁面的內容
而後咱們看下源碼:
內容:
評論:
看完源碼後,咱們進行編碼:
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.io.IOException; /**
* 使用Jsoup解析帖子詳情和評論
* @tag: url:http://sex.guokr.com/post/1100992/ * Created by monster on 2015/12/11. */ public class JSoupDetail { public static void main(String args[]){ final String url= "http://sex.guokr.com/post/1100992/"; try { Document doc = Jsoup.connect(url).get(); Elements container = doc.getElementsByClass("container"); Document containerDoc = Jsoup.parse(container.toString()); String articleTitle = containerDoc.getElementById("articleTitle").text(); String authorName = containerDoc.getElementById("authorName").text(); String time = containerDoc.select("span").first().text(); String imgphotoUrl=containerDoc.select("img").get(1).attr("src"); System.out.println("標題:" + articleTitle); //標題 System.out.println("做者:"+authorName); //做者 System.out.println("發佈時間:"+time); //發佈時間 System.out.println("做者頭像的url:"+imgphotoUrl); //發佈時間 Element articleContent = containerDoc.getElementById("articleContent"); Document articleContentDoc = Jsoup.parse(articleContent.toString()); int size= articleContentDoc.select("p").size(); System.out.println("段落數:"+size); System.out.println("帖子內容:"); for (int i=0;i<size;i++){ String content = articleContentDoc.select("p").get(i).text(); System.out.println(content); } System.out.println("================================================"); System.out.println("帖子評論區域(按照樓層分佈)"); Elements cmts =containerDoc.getElementsByClass("cmts"); Document cmtsDoc = Jsoup.parse(cmts.toString()); System.out.println("評論樓層:"+cmtsDoc.select("span").first().text()); Elements cmtslist =cmtsDoc.getElementsByClass("cmts-list"); for (Element clearfix:cmtslist){ String user = clearfix.select("a").get(1).text(); String userPhotoUrl =clearfix.select("img").get(0).attr("src"); String replyTime = clearfix.select("a").get(3).text(); String floor=clearfix.select("span").text(); System.out.println("評論者:"+user+"\n"+"評論者頭像url:"+userPhotoUrl+"\n"+"回覆時間:"+replyTime+"\n"+"所在樓層:"+floor); Document replyContentDoc = Jsoup.parse(clearfix.toString()); Elements replyContent = replyContentDoc.getElementsByClass("cmt-content"); System.out.println("評論內容:"); int s =replyContent.select("p").size(); for (int j=0;j<s;j++){ String replycontent = replyContent.select("p").get(j).text(); System.out.println(replycontent); } System.out.println("================================================"); } } catch (IOException e) { e.printStackTrace(); } } }
輸出結果:
--------->
以上就是小編的demo,寫的有點簡單,但願理解,啊哈~
另外:歡迎關注小編的博客,麼麼噠