最近在極客學院得到體驗會員3個月,而後就去上面看了看,感受課程講的還不錯。整好最近學習Android,而後去上面找點視頻看看。發現只有使用RMB買的會員才能在上面下載視頻。抱着試一試的態度,去看他的網頁源碼,不巧發現有視頻地址連接。而後想起來jsoup提取網頁元素挺方便的,沒事幹就寫了一個demo。html
<source src="http://cv3.jikexueyuan.com/201508081934/f8f3f9f8088f1ba0a6c75594448d96ab/course/1501-1600/1557/video/4278_b_h264_sd_960_540.mp4" type="video/mp4"></source>
咱們獲取整個html源碼,而後根據<scource/>對源碼進行提取,很容易獲取下載連接。java
接着經過分析網頁,咱們能夠獲得一門課程全部視頻信息。網頁源碼以下:node
1 <dl class="lessonvideo-list"> 2 <dd class="playing"> 3 <h2> <span class="sm-icon "></span> <a href="http://www.jikexueyuan.com/course/1748_1.html?ss=1" jktag="&posGP=103001&posArea=0002&posOper=8005&posColumn=1.1">1.編寫本身的自定義 View(上)</a> <span class="lesson-time">00:10:24</span> </h2> 4 <blockquote> 5 本課時主要講解最簡單的自定義 View,而後加入繪製元素(文字、圖形等),而且能夠像使用系統控件同樣在佈局中使用。 6 </blockquote> 7 </dd> 8 <dd> 9 <h2> <span class="sm-icon "></span> <a href="http://www.jikexueyuan.com/course/1748_2.html?ss=1" jktag="&posGP=103001&posArea=0002&posOper=8005&posColumn=2.2">2.編寫本身的自定義 View(下)</a> <span class="lesson-time">00:12:05</span> </h2> 10 <blockquote> 11 本課時主要講解最簡單的自定義 View,而後加入繪製元素(文字、圖形等),而且能夠像使用系統控件同樣在佈局中使用。 12 </blockquote> 13 </dd> 14 <dd> 15 <h2> <span class="sm-icon "></span> <a href="http://www.jikexueyuan.com/course/1748_3.html?ss=1" jktag="&posGP=103001&posArea=0002&posOper=8005&posColumn=3.3">3.加入邏輯線程</a> <span class="lesson-time">00:20:34</span> </h2> 16 <blockquote> 17 本課時須要讓繪製的元素動起來,可是又不阻塞主線程,因此引入邏輯線程。在子線程更新 UI 是不被容許的,可是 View 提供了方法。讓咱們來看看吧。 18 </blockquote> 19 </dd> 20 <dd> 21 <h2> <span class="sm-icon "></span> <a href="http://www.jikexueyuan.com/course/1748_4.html?ss=1" jktag="&posGP=103001&posArea=0002&posOper=8005&posColumn=4.4">4.提取和封裝自定義 View</a> <span class="lesson-time">00:15:41</span> </h2> 22 <blockquote> 23 本課時主要講解在上個課程的基礎上,進行提取代碼來構造自定義 View 的基類,主要目的是:建立新的自定義 View 時,只需繼承此類並只關心繪製和邏輯,其餘工做由父類完成。這樣既減小重複編碼,也簡化了邏輯。 24 </blockquote> 25 </dd> 26 <dd> 27 <h2> <span class="sm-icon "></span> <a href="http://www.jikexueyuan.com/course/1748_5.html?ss=1" jktag="&posGP=103001&posArea=0002&posOper=8005&posColumn=5.5">5.在 xml 中定義樣式來影響顯示效果</a> <span class="lesson-time">00:14:05</span> </h2> 28 <blockquote> 29 本課時主要講解的是在 xml 中定義樣式及其屬性,怎麼來影響自定義 View 中的顯示效果的過程和步驟。 30 </blockquote> 31 </dd> 32 </dl>
經過 Elements results1 = doc.getElementsByClass("lessonvideo-list"); 咱們能夠得到視頻列表。而後咱們接着對從視頻列表獲取課程每節課視頻地址使用jsoup遍歷獲取視頻連接。apache
以上是主要思路,另外使用jsoup get方法獲取網頁Docment是是沒有cooike狀態的,有些視頻須要VIP會員登陸才能獲取到視頻播放地址。所以咱們須要用httpclient來模擬用戶登陸狀態。cookie
一下是整個工程源碼。網絡
1 public class Course { 2 3 /** 4 * 連接的地址 5 */ 6 private String linkHref; 7 /** 8 * 連接的標題 9 */ 10 private String linkText; 11 12 public String getLinkHref() { 13 return linkHref; 14 } 15 16 public void setLinkHref(String linkHref) { 17 this.linkHref = linkHref; 18 } 19 20 public String getLinkText() { 21 return linkText; 22 } 23 24 public void setLinkText(String linkText) { 25 this.linkText = linkText; 26 } 27 28 @Override 29 public String toString() { 30 return "Video [linkHref=" + linkHref + ", linkText=" + linkText + "]"; 31 } 32 33 }
1 import java.io.IOException; 2 import java.io.InputStream; 3 import java.io.UnsupportedEncodingException; 4 5 import org.apache.http.Header; 6 import org.apache.http.HttpEntity; 7 import org.apache.http.HttpHeaders; 8 import org.apache.http.HttpResponse; 9 import org.apache.http.HttpStatus; 10 import org.apache.http.client.ClientProtocolException; 11 import org.apache.http.client.HttpClient; 12 import org.apache.http.client.methods.CloseableHttpResponse; 13 import org.apache.http.client.methods.HttpGet; 14 import org.apache.http.client.methods.HttpPost; 15 import org.apache.http.entity.StringEntity; 16 import org.apache.http.impl.client.CloseableHttpClient; 17 import org.apache.http.impl.client.DefaultHttpClient; 18 import org.apache.http.impl.client.HttpClients; 19 import org.apache.http.util.EntityUtils; 20 21 @SuppressWarnings("deprecation") 22 public class HttpUtils { 23 String cookieStr = ""; 24 25 public String getCookieStr() { 26 return cookieStr; 27 } 28 29 CloseableHttpResponse response = null; 30 31 public CloseableHttpResponse getResponse() { 32 return response; 33 } 34 35 public HttpUtils(String cookieStr) { 36 this.cookieStr = cookieStr; 37 } 38 39 public HttpUtils() { 40 41 } 42 43 public String Get(String url) { 44 CloseableHttpClient httpclient = HttpClients.createDefault(); 45 HttpGet httpget = new HttpGet(url); 46 httpget.setHeader("cookie", cookieStr); 47 httpget.setHeader( 48 HttpHeaders.USER_AGENT, 49 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"); 50 51 try { 52 response = httpclient.execute(httpget); 53 HttpEntity entity = response.getEntity(); 54 String res = EntityUtils.toString(entity, "UTF-8"); 55 56 return res; 57 } catch (Exception e) { 58 System.err.println(String.format("HTTP GET error %s", 59 e.getMessage())); 60 } finally { 61 try { 62 httpclient.close(); 63 } catch (IOException e) { 64 // e.printStackTrace(); 65 } 66 } 67 return null; 68 } 69 70 public String Post(String url) { 71 CloseableHttpClient httpclient = HttpClients.createDefault(); 72 HttpPost httppost = new HttpPost(url.split("\\?")[0]); 73 StringEntity reqEntity = null; 74 try { 75 reqEntity = new StringEntity(url.split("\\?")[1], "UTF-8"); 76 } catch (UnsupportedEncodingException e1) { 77 // TODO Auto-generated catch block 78 e1.printStackTrace(); 79 } 80 httppost.setHeader("cookie", cookieStr); 81 reqEntity 82 .setContentType("application/x-www-form-urlencoded;charset=UTF-8"); 83 httppost.setEntity(reqEntity); 84 httppost.setHeader( 85 HttpHeaders.USER_AGENT, 86 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"); 87 try { 88 response = httpclient.execute(httppost); 89 Header[] headers = response.getAllHeaders(); 90 for (Header h : headers) { 91 String name = h.getName(); 92 String value = h.getValue(); 93 if ("Set-Cookie".equalsIgnoreCase(name)) { 94 cookieStr += subCookie(value); 95 //System.out.println(cookieStr); 96 // break; 97 } 98 } 99 HttpEntity entity = response.getEntity(); 100 101 return EntityUtils.toString(entity, "UTF-8"); 102 } catch (Exception e) { 103 System.err.println(String.format("HTTP POST error %s", 104 e.getMessage())); 105 } finally { 106 try { 107 httpclient.close(); 108 } catch (IOException e) { 109 // e.printStackTrace(); 110 } 111 } 112 return null; 113 } 114 115 public String GetLoginCookie(String url) { 116 CloseableHttpClient httpclient = HttpClients.createDefault(); 117 HttpGet httpget = new HttpGet(url); 118 httpget.setHeader("Cookie", cookieStr); 119 httpget.setHeader( 120 HttpHeaders.USER_AGENT, 121 "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"); 122 try { 123 response = httpclient.execute(httpget); 124 Header[] headers = response.getAllHeaders(); 125 for (Header h : headers) { 126 String name = h.getName(); 127 String value = h.getValue(); 128 if ("Set-Cookie".equalsIgnoreCase(name)) { 129 cookieStr = subCookie(value); 130 return cookieStr; 131 } 132 133 } 134 } catch (Exception e) { 135 System.err.println(String.format("HTTP GET error %s", 136 e.getMessage())); 137 } finally { 138 try { 139 httpclient.close(); 140 } catch (IOException e) { 141 // e.printStackTrace(); 142 } 143 } 144 return "4";// 錯誤碼 145 } 146 147 public String subCookie(String value) { 148 int end = value.indexOf(";"); 149 return value.substring(0, end + 1); 150 } 151 152 public InputStream GetImage(String url) { 153 InputStream is = null; 154 HttpClient httpclient = new DefaultHttpClient(); 155 HttpGet httpGet = new HttpGet(url); 156 if (cookieStr != null) 157 httpGet.setHeader("Cookie", cookieStr); 158 HttpResponse response; 159 try { 160 response = httpclient.execute(httpGet); 161 if (HttpStatus.SC_OK == response.getStatusLine().getStatusCode()) { 162 HttpEntity entity = response.getEntity(); 163 if (entity != null) { 164 //System.out.println(entity.getContentType()); 165 // 能夠判斷是不是文件數據流 166 //System.out.println(entity.isStreaming()); 167 // File storeFile = new File("F:\\code.jpg"); 168 // FileOutputStream output = new 169 // FileOutputStream(storeFile); 170 // 獲得網絡資源並寫入文件 171 InputStream input = entity.getContent(); 172 is = input; 173 // byte b[] = new byte[1024]; 174 // int j = 0; 175 // while ((j = input.read(b)) != -1) { 176 // output.write(b, 0, j); 177 // } 178 // output.flush(); 179 // output.close(); 180 } 181 } 182 } catch (ClientProtocolException e) { 183 // TODO Auto-generated catch block 184 e.printStackTrace(); 185 } catch (IOException e) { 186 // TODO Auto-generated catch block 187 e.printStackTrace(); 188 } 189 return is; 190 }
1 package com.debughao.down; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 import java.util.Scanner; 6 7 import org.jsoup.Jsoup; 8 import org.jsoup.nodes.Document; 9 import org.jsoup.nodes.Element; 10 import org.jsoup.select.Elements; 11 12 import com.debughao.bean.Course; 13 14 public class Test { 15 16 public static void main(String[] args) { 17 HttpUtils http = new HttpUtils("stat_uuid=1436867409341663197461; uname=qq_rwe4zg5t; uid=3812752; code=LZ8XF1; " 18 + "authcode=b809MIxLGp8syQcnuAAdIT9PuCEH2%2FuiyvRuuLALSxb6z6iGoM3xcihNJKzHK%2BAZWzVIGFAW0QrBYiSLmHN1qnhi0YQLmBeWeqkJHXh5xsoylWuRCFmRDJZyUtAGr3U; " 19 + "level_id=3; is_expire=0; domain=debughao; stat_fromWebUrl=; stat_ssid=1439813138264;" 20 + " connect.sid=s%3A5xux57xcLyCBheevR40DUa0beJD_ok-S.0aTnwfjSvm7A49zydLGbtXy7vdCGfH7lB7MwmZURppQ; " 21 + "QINGCLOUDELB=37e16e60f0cd051b754b0acf9bdfd4b5d562b81daa2a899c46d3a1e304c7eb2b|VcWiq|VcWiq; " 22 + "_ga=GA1.2.889563867.1436867384; _gat=1; Hm_lvt_f3c68d41bda15331608595c98e9c3915=1438945833,1438947627,1438995076,1438995133;" 23 + " Hm_lpvt_f3c68d41bda15331608595c98e9c3915=1439015591; MECHAT_LVTime=1439015591174; MECHAT_CKID=cookieVal=006600143686858016573509; " 24 + "undefined=; stat_isNew=0"); 25 Scanner sc=new Scanner(System.in); 26 String url= sc.nextLine(); 27 sc.close(); 28 String res = http.Get(url); 29 Document doc = getDocByRes(res); 30 List<Course> videos = getVideoList(doc); 31 for (Course video : videos) { 32 System.out.println(video.getLinkText()); 33 } 34 for (Course video : videos) { 35 String urls = video.getLinkHref(); 36 String res2 = http.Get(urls); 37 Document doc1 = getDocByRes(res2); 38 getVideoLink(doc1); 39 40 } 41 } 42 43 private static Document getDocByRes(String res) { 44 // TODO Auto-generated method stub 45 Document doc = null; 46 doc = Jsoup.parse(res); 47 return doc; 48 } 49 50 public static List<Course> getVideoList(Element doc) { 51 Elements links; 52 List<Course> courses = new ArrayList<Course>(); 53 Course course = null; 54 Elements results1 = doc.getElementsByClass("lessonvideo-list"); 55 String title = doc.getElementsByTag("title").text(); 56 System.out.println(title); 57 for (Element element : results1) { 58 links = element.getElementsByTag("a"); 59 for (Element link : links) { 60 String linkList = link.attr("href"); 61 String linkText = link.text(); 62 // System.out.println(linkText); 63 course = new Course(); 64 course.setLinkHref(linkList); 65 course.setLinkText(linkText); 66 courses.add(course); 67 } 68 } 69 return courses; 70 } 71 72 public static void getVideoLink(Document doc) { 73 Elements results2 = doc.select("source"); 74 String mp4Links = results2.attr("src"); 75 System.out.println(mp4Links); 76 } 77 }
1 http://www.jikexueyuan.com/course/1748.html 2 自定義 View 基礎和原理-極客學院 3 1.編寫本身的自定義 View(上) 4 2.編寫本身的自定義 View(下) 5 3.加入邏輯線程 6 4.提取和封裝自定義 View 7 5.在 xml 中定義樣式來影響顯示效果 8 http://cv3.jikexueyuan.com/201508082007/99549fa37069a39a2e128278ee60768c/course/1501-1600/1557/video/4278_b_h264_sd_960_540.mp4 9 http://cv3.jikexueyuan.com/201508082007/a068be74f7f31900e128f109523b0925/course/1501-1600/1557/video/4279_b_h264_sd_960_540.mp4 10 http://cv3.jikexueyuan.com/201508082008/bf216e06770e9a9b0adda34ea4d01dfc/course/1501-1600/1557/video/4280_b_h264_sd_960_540.mp4 11 http://cv3.jikexueyuan.com/201508082008/75b51573a75458848136e61e848d1ae7/course/1501-1600/1557/video/4281_b_h264_sd_960_540.mp4 12 http://cv3.jikexueyuan.com/201508082008/ca20fad3e1bc622aa64bbfa7d2b768dd/course/1501-1600/1557/video/5159_b_h264_sd_960_540.mp4