使用工具charles,主要是用來獲取訪問的API數據的,爲啥說初級呢,由於並無解決mas加密,這樣的話只能刷到50條新數據,而後mas就失效了java
以前有篇文章 抖音API分析 大概梳理了視頻地址獲取方式,懶得繼續寫,正好最近又有興趣,繼續續上正則表達式
如今既然都分析清楚了,下面就是模擬客戶端獲取數據下載了apache
private static String url = "https://aweme.snssdk.com/aweme/v1/feed/?iid=32142611788&ac=4G&os_api=18&app_name=aweme&channel=App%20Store&idfa=67642C64-6404-403A-8B0D-31A059C3A2BD&device_platform=iphone&build_number=17909&vid=9D61EDED-6680-471A-A134-D1C96399BB83&openudid=9a661cd28951ab44f0870508f7af64dfb9b5dc36&device_type=iPhone8,2&app_version=1.7.9&device_id=50862505508&version_code=1.7.9&os_version=10.2.1&screen_width=1125&aid=1128&count=6&feed_style=0&max_cursor=0&min_cursor=0&pull_type=0&type=0&user_id=96840867747&volume=0.00&mas=000171d64eb699219ac45f410bcc83d1accd3ee629e6ec51f8ceb1&as=a1859114c0ed4b50731900&ts=1531121872"; public static void main(String[] args) throws Exception{ CloseableHttpClient httpClient = org.apache.http.impl.client.HttpClients.createDefault(); RequestConfig requestConfig = RequestConfig.custom() .setSocketTimeout(15000) .setConnectTimeout(15000) .build(); HttpGet get = new HttpGet(url); get.setConfig(requestConfig); get.setHeader("Accept","*/*"); get.setHeader("User-Agent","Aweme/1.7.9 (iPhone; iOS 10.2.1; Scale/3.00)"); get.setHeader("Cookie","這裏填寫我的本身的cookie"); CloseableHttpResponse response = httpClient.execute(get); HttpEntity entity = response.getEntity(); String content = EntityUtils.toString(entity, "gbk"); JSONObject jsonObject = JSON.parseObject(content); if(jsonObject.getInteger("status_code") == 0){ JSONArray jsonArray = jsonObject.getJSONArray("aweme_list"); for(int i=0;i<5;i++){ JSONObject detail = jsonArray.getJSONObject(i); String url = detail.getJSONObject("video").getJSONObject("play_addr_lowbr").getJSONArray("url_list").get(0).toString(); System.out.println(url); } }else{ System.out.println("is not 0"); System.exit(0); } }
跑出來的就是json
而後怎麼用呢,看下對應的apiapi
其實至關於根據視頻的一些惟一標識去獲取對應的視頻真實地址,原本想用jsoup,想一想這麼簡單整個正則得了cookie
String str = "<a href=\"http://v3-dy-x.ixigua.com/1fc1320a2829de5164add4678abf2192/5b431eef/video/m/220785bb6e0882346e8aff9b7613756f4e71158d06e000055b44e3f1db4/\">Found</a>."; String regEx = "href=\"(.*?)\">"; Pattern pattern = Pattern.compile(regEx); Matcher matcher = pattern.matcher(str); if(matcher.find()){ System.out.println(matcher.group(1)); }
小插曲,使用httpclient訪問的時候,接受到302後自動跳轉了,我說怎麼一個簡單的代碼跑的時間感受有點長呢,禁用302跳轉就行了多線程
private static String filePath = "/Users/xingzhe/douyin"; private static String url = "https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f660000bctojepcgf31ghmrsmdg&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0"; public static void main(String[] args) throws Exception{ CloseableHttpClient httpClient = org.apache.http.impl.client.HttpClients.createDefault(); RequestConfig requestConfig = RequestConfig.custom() .setSocketTimeout(15000) .setConnectTimeout(15000) .setRedirectsEnabled(false) .build(); HttpGet get = new HttpGet(url); get.setConfig(requestConfig); get.setHeader("Accept","*/*"); get.setHeader("User-Agent","Aweme/1.7.9 (iPhone; iOS 10.2.1; Scale/3.00)"); CloseableHttpResponse response = httpClient.execute(get); System.out.println(response.getStatusLine().getStatusCode()); if(response.getStatusLine().getStatusCode() != 302){ System.exit(0); } HttpEntity entity = response.getEntity(); String content = EntityUtils.toString(entity, "gbk"); String detail = getVideo(content); download(detail); } private static String getVideo(String str){ String regEx = "href=\"(.*?)\">"; // 編譯正則表達式 Pattern pattern = Pattern.compile(regEx); // 忽略大小寫的寫法 // Pattern pat = Pattern.compile(regEx, Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(str); if(matcher.find()){ return matcher.group(1); } return null; } private static void download(String videoUrl) throws Exception{ // 構造URL URL url = new URL(videoUrl); // 打開鏈接 URLConnection con = url.openConnection(); //設置請求超時爲5s con.setConnectTimeout(5*1000); // 輸入流 InputStream is = con.getInputStream(); // 1K的數據緩衝 byte[] bs = new byte[1024]; // 讀取到的數據長度 int len; // 輸出的文件流 File sf=new File(filePath); long time = System.currentTimeMillis()/1000; OutputStream os = new FileOutputStream(sf.getPath()+"/"+time+".mp4"); // 開始讀取 while ((len = is.read(bs)) != -1) { os.write(bs, 0, len); } // 完畢,關閉全部連接 os.close(); is.close(); }
用上面這個代碼就能下載到視頻了,寫的很隨意主要就是完成功能,視頻的文件名都直接用的時間戳,批量的話也簡單,for循環一下就行,隨便搞了下app
記錄一個小問題,最開始下載使用的 url.openConnection() 這種最原始的方式,下了一會發現403了,又轉回使用httpclient下載,get方法加上header,主要是iphone
User-Agent Aweme/1.7.9 (iPhone; iOS 10.2.1; Scale/3.00)
沒再出現403ide
代碼全部須要訪問url的地方都直接從新new一個httpclient,其餘的對象也沒有複用也沒有排重,下載也沒有多線程,娛樂之做,爬抖音最重要的仍是在as和mas的生成,由於我測試發現大概下載60或者多少的時候就會返回"status_code": 2151,須要從新抓包搞下mas和as,顯然對效率來講是不可接受的,之後再有時間研究下as的生成方式,就這樣了