最近看了極客學院的視頻教程,至關不錯,渴望把視頻下載到本地。手動下載耗時耗力,於是決定研究一番,寫一程序自動下載,終於小有成果!有圖爲證:html
既然要實現自動下載,免不了要爬取極客學院的頁面獲取視頻地址。經嘗試,能夠發現大部分視頻觀看是須要登陸,而且是會員。因而所要解決的要點有:web
一、擁有會員ajax
二、模擬登陸正則表達式
三、解析htmlcookie
四、繼續下載(可能出現異常狀況下)session
(1)、要獲取極客學院會員至關簡單(若是已是會員用戶,可跳過此步驟),利用邀請好友獲取30天體驗,以下圖:(30天足夠把全部的視頻下載完畢)框架
(2)、會員問題解決了,最大的難點就是模擬登陸了。細心的人就會發現,極客學院提供了一個ajax方式登陸的接口!以下圖:less
當咱們點擊登陸的時候,會發起一個ajax請求,經過抓包工具,咱們能夠獲取到登陸地址:http://passport.jikexueyuan.com/submit/login?is_ajax=1,驗證碼地址:http://passport.jikexueyuan.com/sso/verifyide
「萬事俱備只欠東風」,地址都知道了,還怕登陸不了麼?以下是本人最終實現登陸效果:工具
1 /// <summary> 2 /// 獲取驗證碼 3 /// </summary> 4 void GetVerifyCode() 5 { 6 VerifyCode.Source = null; 7 8 // 1. 先請求登陸地址,獲取到當前用戶的會話cookie 9 HttpResponseParameter responseParameter = HttpProvider.Excute(new HttpRequestParameter 10 { 11 IsPost = false, 12 Url = "http://passport.jikexueyuan.com/sso/login" 13 }); 14 SessionCookie = responseParameter.Cookie; 15 16 // 2.帶上會話cookie請求驗證碼圖片(保證是當前用戶) 17 HttpProvider.Excute(new HttpRequestParameter 18 { 19 Url = string.Format("http://passport.jikexueyuan.com/sso/verify?t={0}", DateTime.Now.ToString("yyyyMMddHHmmsss")), 20 Cookie = SessionCookie, 21 ResponseEnum = HttpResponseEnum.Stream, 22 StreamAction = x => 23 { 24 MemoryStream ms = new MemoryStream(); 25 byte[] buffer = new byte[1024]; 26 while (true) 27 { 28 int sz = x.Read(buffer, 0, buffer.Length); 29 if (sz == 0) break; 30 ms.Write(buffer, 0, sz); 31 } 32 ms.Position = 0; 33 34 BitmapImage bmp = new BitmapImage(); 35 bmp.BeginInit(); 36 bmp.StreamSource = new MemoryStream(ms.ToArray()); 37 bmp.EndInit(); 38 39 VerifyCode.Source = bmp; 40 41 ms.Close(); 42 } 43 }); 44 }
1 /// <summary> 2 /// 登陸方法 3 /// </summary> 4 /// <param name="userName">帳號</param> 5 /// <param name="userPwd">密碼</param> 6 /// <param name="verifyCode">驗證碼</param> 7 void LoginDo(string userName, string userPwd, string verifyCode) 8 { 9 // 1.登陸 10 IDictionary<string, string> postData = new Dictionary<string, string>(); 11 postData.Add("referer", HttpUtility.UrlEncode("http://www.jikexueyuan.com/")); 12 postData.Add("uname", userName); 13 postData.Add("password", userPwd); 14 postData.Add("is_ajax", "1"); 15 postData.Add("verify", verifyCode); 16 HttpResponseParameter responseParameter = HttpProvider.Excute(new HttpRequestParameter 17 { 18 Url = "http://passport.jikexueyuan.com/submit/login?is_ajax=1", 19 IsPost = true, 20 Parameters = postData, 21 Cookie = SessionCookie 22 }); 23 24 LoginResultEntity loginResult = responseParameter.Body.DeserializeObject<LoginResultEntity>(); 25 if (loginResult.status == 1) 26 { 27 // 2.登陸成功,保存cookie 28 CookieStoreInstance.CurrentCookie = responseParameter.Cookie; 29 } 30 31 //MessageBox.Show(string.Format("body={0},cookie={1}", Unicode2String(responseParameter.Body), 32 // responseParameter.Cookie.CookieString)); 33 }
流程說明:先訪問極客學院任意頁面(這裏已首頁爲例,),獲取到當前的用戶訪問的會話cookie(這裏仍是未登陸的cookie), 而後拉取驗證碼,封裝數據發起登陸請求。把登陸成功後的cooki保存在本地文件或全局變量中(本例使用),以備下載視頻使用。
(3)登陸成功後,接下來就是獲取視頻下載地址。這裏以http://www.jikexueyuan.com/course/360.html這個地址爲例。訪問這個地址以下圖
查看源代碼,咱們會發現一段驚奇的代碼:<source src="http://cv4.jikexueyuan.com/ae892b3b4a8c63fa579af4d2c6f6bb03/201512151558/csharp/course_360/01/video/c360b_01_h264_sd_960_540.mp4" type="video/mp4" />,其中src地址就是咱們所需的視頻地址(必須登陸纔有這段代碼)。所以應該先解析此頁面,獲取課時列表的連接。而後一個一個遍歷發起Http請求(注意帶cookie)獲取視頻地址,而後下載保存到本地。
1 public List<LessonEntity> GetLessonEntities(string url, HttpCookieType sessionCookie) 2 { 3 // 請求課時列表,解析html源碼,並提取課時 信息和連接 4 HttpResponseParameter responseParameter = HttpProvider.Excute(new HttpRequestParameter() 5 { 6 IsPost = false, 7 Url = url, 8 Encoding = Encoding.UTF8, 9 Cookie = sessionCookie 10 }); 11 12 List<LessonEntity> results = new List<LessonEntity>(); 13 14 HtmlDocument htmlDocument = new HtmlDocument(); 15 htmlDocument.LoadHtml(responseParameter.Body); 16 HtmlNode rootNode = htmlDocument.DocumentNode; 17 HtmlNodeCollection lessonNodes = rootNode.SelectNodes("//div[@id=\"pager\"]/div[3]/div[2]/div[2]/ul/li"); 18 foreach (HtmlNode lessonNode in lessonNodes) 19 { 20 HtmlNode aNode = lessonNode.SelectSingleNode("div[1]/h2[1]/a[1]"); 21 if (aNode != null) 22 { 23 results.Add(new LessonEntity 24 { 25 Title = aNode.InnerText.Trim(), 26 Href = aNode.GetAttributeValue("href", string.Empty) 27 }); 28 } 29 } 30 31 return results; 32 }
public string GetVideoUrl(string url, HttpCookieType sessionCookie) { HttpResponseParameter responseParameter = HttpProvider.Excute(new HttpRequestParameter { Url = url, Cookie = sessionCookie }); // 正則提早視頻文件 地址 string result = Regex.Match(responseParameter.Body, "<source.*src=\"(.+?\").*/>").Groups[1].Value.Replace("\"", string.Empty); sessionCookie = responseParameter.Cookie; return result; }
public void DownloadVideo(string filePath, string url, HttpCookieType sessionCookie, Action action = null) { string folder = Path.GetDirectoryName(filePath); if (!string.IsNullOrEmpty(folder) && !Directory.Exists(folder)) { // 若是目錄文件夾不存在, 則建立 Directory.CreateDirectory(folder); } if (action != null) { action(); } HttpProvider.Excute(new HttpRequestParameter { Url = url, ResponseEnum = HttpResponseEnum.Stream, Cookie = sessionCookie, StreamAction = x => { NetExtensions.WriteFile(filePath, x); } }); //WebClient webClient = new WebClient(); //webClient.DownloadFile(new Uri(url, UriKind.RelativeOrAbsolute), filePath); }
1 public void DownloadCode(string filePath, string courseId, HttpCookieType sessionCookie) 2 { 3 if (File.Exists(filePath)) return; 4 5 HttpResponseParameter responseParameter = HttpProvider.Excute(new HttpRequestParameter 6 { 7 Url = string.Format("http://www.jikexueyuan.com/course/downloadRes?course_id={0}", courseId), 8 Cookie = sessionCookie 9 }); 10 CodeDownloadResultEntity result = responseParameter.Body.DeserializeObject<CodeDownloadResultEntity>(); 11 if (result.code == 200) 12 { 13 string folder = Path.GetDirectoryName(filePath); 14 if (!string.IsNullOrEmpty(folder) && !Directory.Exists(folder)) 15 { 16 // 若是目錄文件夾不存在, 則建立 17 Directory.CreateDirectory(folder); 18 } 19 20 WebClient webClient = new WebClient(); 21 webClient.DownloadFile(new Uri(result.data.url, UriKind.RelativeOrAbsolute), filePath); 22 } 23 }
(4)、至此視頻下載功能已完成。那麼咱們要批量下載極客學院【職業路徑圖課程視頻】就至關簡單了(http://ke.jikexueyuan.com/zhiye/),一步一步解析頁面,獲取到視頻播放頁地址,下載視頻(解析頁面主要用到正則表達式和HtmlAgilityPack框架)。
(5)如此,極客學院職業路徑圖課程就能夠盡收囊中。
「苟富貴勿相忘」,若是這篇文章是你認爲好的,不管你是一直潛水,請點個贊好嗎?