阿里試用排序
抱歉,以前莫名其妙把配置文件給 ignore 了,已經修復,抱歉javascript
前景提要
說來簡直丟盡了鋼鐵直男的臉,沒錯,昨晚我在愉快的作着外包的活(中國移動的小程序,自由職業,喂),11點多了,女朋友忽然腦子一抽:「你能不能幫我把這個玩意排序一下給我用啊,我好薅點羊毛,技術能實現嘛?」
我比較無奈的看了看,阿里試用咩?什麼鬼,哦哦哦,就這玩意啊,爬蟲爬一下就是了。我是前端……
回道:「沒問題啊,爬蟲唄。」
她:「哇,多久能作出來啊?」
我:「我如今在忙誒,1-2小時吧。」
她:「行了,你別忙了,趕忙幫我弄一下出來!」
我看了看她的臉,羞恥的最小化《微信開發者工具》。。。html
頁面展現
你要是以爲這也是廣告,那真是太擡舉我了。前端
爬蟲搞起來
NodeJS 爬蟲,百度一下,處處都是現成的代碼,我也就不一一分析了,拿出簡書的一段代碼,來自 埃米莉Emily:java
const express = require('express'); // 調用 express 實例,它是一個函數,不帶參數調用時,會返回一個 express 實例,將這個變量賦予 app 變量。 const superagent = require('superagent'); const cheerio = require('cheerio'); const app = express(); app.get('/', (req, res, next) => { console.log(req) superagent.get('https://www.v2ex.com/') .end((err, sres) => { // 常規的錯誤處理 if (err) { return next(err); } // sres.text 裏面存儲着網頁的 html 內容,將它傳給 cheerio.load 以後 // 就能夠獲得一個實現了 jquery 接口的變量,咱們習慣性地將它命名爲 `$` // 剩下就都是 jquery 的內容了 let $ = cheerio.load(sres.text); let items = []; $('.item_title a').each((idx, element) => { let $element = $(element); items.push({ title: $element.text(), href: $element.attr('href') }); }); res.send(items); }); }); app.listen(3000, function () { console.log('app is listening at port 3000'); });
嘛,express 用 NodeJS 的不可能不知道,superagent 理解成能夠在 Node 裏面作對外請求便可,cheerio 嗯,Node 專用 JQ。jquery
首爬
把上面的請求地址換成:https://try.taobao.com/
,查看頁面標籤結構,找到想要的選擇器結構:git
.tb-try-wd-item-info > .detail
,把這個替換上面選擇器 .item_title a
,走起:github
……我不想展現結果,由於只有六個,頁面實際展現是 10 個,找了半天,發現兩個問題:express
如上,第一個是爬到的 6 個是推薦,喵的,不是下面列表;
第二個,下面列表是後面經過 POST 單獨請求來的數據,怎麼看都是某框架的 SSR 乾的好事。json
因而爬蟲不成,得換戰略。小程序
模擬 POST
OK,既然是 POST,就好弄了,直接把鏈接跟參數刨出來,而後 superagent 模擬:
superagent .post( `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch` ) .set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8') .end((err, sres) => { // 常規的錯誤處理 if (err) { return next(err) } const result = JSON.parse(sres.text).result // 返回結構樹 resolve(result) })
content-type 源自:
哼哼哼,你沒猜錯,失敗了,以下:
想一想是必然的,怎麼可能給你隨便請求呢,而後該怎麼作?研究?nonono,老夫上來就是一梭子,不就是 Content-Type 麼!
superagent .post( `https://try.taobao.com/api3/call?what=show&page=${paylaod.page}&pageSize&api=x%2Fsearch` ) .set( 'user-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36' ) .set('accept', 'pplication/json, text/javascript, */*; q=0.01') .set('accept-encoding', 'gzip, deflate, br') .set( 'accept-language', 'zh-CN,zh;q=0.9,en;q=0.8,la;q=0.7,zh-TW;q=0.6,da;q=0.5' ) // .set('content-length', '8') .set('content-type', 'application/x-www-form-urlencoded; charset=UTF-8') .set( 'cookie', 'your cookie' ) .set('origin', 'https://try.taobao.com') .set('referer', 'https://try.taobao.com') .set('x-csrf-token', 'f0b8e7443eb7e') .set('x-requested-with', 'XMLHttpRequest') .end((err, sres) => { // 常規的錯誤處理 if (err) { return next(err) } const result = JSON.parse(sres.text).result resolve(result) })
依據就是下面這個:
不就是頭麼,不就是源麼,不就是用戶代理麼,用個 HTTPS 尚未你辦法了?
注意上面 .set('content-length', '8')
,不知道那邊怎麼玩,加上這個就超時……
因而,交代了吧:
{ "pages": { "paging": { "n": 2182, "page": 1, "pages": 219 }, "items": [ { "shopUserId": "2450112357", "title": "凱度高端款嵌入式蒸烤箱", "status": 1, "totalNum": 1, "requestNum": 15530, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "casdon凱度旗艦店", "showId": "2561626", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34530215", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1ycS2eMDqK1RjSZSyXXaxEVXa.jpg", "shopItemId": "559771706359", "price": 13850 }, { "shopUserId": "3189770892", "title": "皇家美素佳兒老包裝2段400g", "status": 1, "totalNum": 50, "requestNum": 2079, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "皇家美素佳兒旗艦店", "showId": "2551240", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34396042", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1YrSZaVYqK1RjSZLeXXbXppXa.jpg", "shopItemId": "547114874458", "price": 189 }, { "shopUserId": "1077716829", "title": "關注店鋪優先審水密碼幻彩隔離", "status": 1, "totalNum": 10, "requestNum": 6907, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "水密碼旗艦店", "showId": "2568391", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34784086", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB16_4ChmzqK1RjSZPxXXc4tVXa.jpg", "shopItemId": "559005882880", "price": 599 }, { "shopUserId": "725786863", "title": "精品皮草派克大衣", "status": 1, "totalNum": 1, "requestNum": 11793, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "美瑞蓓特", "showId": "2557886", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34574078", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1zVLMdCrqK1RjSZK9XXXyypXa.jpg", "shopItemId": "577418950477", "price": 5980 }, { "shopUserId": "3000840351", "title": "保友智能新品Pofit電腦椅", "status": 1, "totalNum": 1, "requestNum": 12895, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "保友辦公傢俱旗艦店", "showId": "2557100", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34528042", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1bYZEg6TpK1RjSZKPXXa3UpXa.png", "shopItemId": "577598687971", "price": 5408 }, { "shopUserId": "791732485", "title": "TEK手持吸塵器A8", "status": 1, "totalNum": 1, "requestNum": 17195, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "泰怡凱旗艦店", "showId": "2552265", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34444014", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1D6bWbhTpK1RjSZFGXXcHqFXa.jpg", "shopItemId": "547653053965", "price": 5199 }, { "shopUserId": "3229583972", "title": "椰富海南冷炸椰子油食用油1L", "status": 1, "totalNum": 20, "requestNum": 4451, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "椰富食品專營店", "showId": "2561698", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532250", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1VjLSePDpK1RjSZFrXXa78VXa.jpg", "shopItemId": "578653506446", "price": 256 }, { "shopUserId": "855223948", "title": "卡西歐立式家用電鋼琴PX770", "status": 1, "totalNum": 1, "requestNum": 16762, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "世紀音緣樂器專營店", "showId": "2551326", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34420041", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1CC6aa9zqK1RjSZFpXXakSXXa.jpg", "shopItemId": "562405126383", "price": 4838 }, { "shopUserId": "4065939832", "title": "關注寶貝送輕奢沙發牀", "status": 1, "totalNum": 1, "requestNum": 17436, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "貝兮旗艦店", "showId": "2559904", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532170", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1AzxYegHqK1RjSZFPXXcwapXa.jpg", "shopItemId": "577798067313", "price": 4399 }, { "shopUserId": "807974445", "title": "森海塞爾CX6藍牙耳機", "status": 1, "totalNum": 4, "requestNum": 22557, "acceptNum": 0, "reportNum": 0, "isApplied": false, "shopName": "sennheiser旗艦店", "showId": "2559701", "startTime": 1539619200000, "endTime": 1540220400000, "id": "34532161", "type": 1, "pic": "//img.alicdn.com/bao/uploaded/TB1HET6d7voK1RjSZFwXXciCFXa.jpg", "shopItemId": "564408956766", "price": 999 } ] } }
細心的小夥伴應該看到,我沒有發送 form 給他,同樣能夠請求到須要的數據,page 掛在了 query 上……
展現部分
數據拿到,就簡單了,其實就是這一個接口實現剩下的功能了,沒錯,記住我是前端。
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <title>tb try</title> <style> .warning { color: red; } button { width: 100px; height: 44px; margin-right: 44px; } table { border: 1px solid #d8d8d8; border-collapse: collapse; } tr { border-bottom: 1px solid #d8d8d8; cursor: pointer; } tr:last-child { border: 0; } </style> </head> <body> <button onclick="postPage()">下一頁</button> <span id="currentPage"></span> <table> <tbody> <tr> <th>序號(倒序)</th> <th>機率</th> <th>名字</th> </tr> </tbody> <tbody id="results"></tbody> </table> <script> let currentPage = 0 // 當前頁面 let allItems = [] // 所有數據 let currentTime = 0 // 鎖頻率使用,標記上次時間 const xhr = new XMLHttpRequest() const loopInterval = 2 // 鎖頻率步長,單位秒 const results = document.querySelector('#results') const currentPageText = document.querySelector('#currentPage') const reFullTBody = arr => { let innerHtml = '' arr.forEach((item, i) => { item.rate = item.totalNum / item.requestNum * 100 let tr = ` <tr onclick="window.open('https://try.taobao.com/item.htm?id=${item.id}')"> <td>${i + 1}</td> <td>${item.rate.toFixed(3) + '%'}</td> <td>${item.title}</td> </tr> ` if (item.rate > 5) tr = tr.replace('<tr', '<tr class="warning"') innerHtml += tr }) currentPageText.innerText = `當前頁:${currentPage}` results.innerHTML = innerHtml } const postPage = () => { // 鎖頻率步長內取消請求 const newTime = new Date().getTime() const shoudBack = newTime - currentTime < loopInterval * 1000 if(shoudBack) { alert(loopInterval + '秒內不要屢次點擊哦。') return } currentTime = newTime xhr.onreadystatechange = function() { if(this.readyState === 4 && this.status === 200) { const res = JSON.parse(this.response) if(res.length < 1) { alert('今天結束的已經篩選完了') return } allItems = [...allItems, ...res] allItems.sort((a, b) => b.rate - a.rate) reFullTBody(allItems) currentPage-- } } xhr.open('post', '/table') xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); //發送請求 xhr.send("page=" + currentPage) } xhr.onreadystatechange = function() { if(this.readyState === 4 && this.status === 200) { currentPage = JSON.parse(this.response).pages postPage() } } xhr.open('get', '/total') xhr.send() </script> </body> </html>
長這個樣子:
我多人性化,能夠點擊跳轉、機率超過 5% 紅色展現、還告訴你當前所在頁碼、點太快還給你提示………………………………
就是這麼好用,喜歡的趕忙體驗吧!
線上:點我體驗
Github: Spider
以爲有用,不要吝惜 star 哦。