爬蟲實踐--CBA歷年比賽數據

閒來無聊,恰好有個朋友來問爬蟲的事情,提及來了CBA這兩年的比賽數據,作個分析,再來個大數據啥的。來了興趣,果真搞起來,下面分享一下爬蟲的思路。 ## 一、選取數據源 這裏我並不懂CBA,數據源選的是國內某門戶網站的CBA專欄,下面會放連接地址,有興趣的能夠去看看。 ## 二、分析數據 通過查看頁面元素,發現頁面是後臺渲染,沒辦法經過接口直接獲取數據。下面就要分析頁面元素,看到全部的數據都是存在表格裏面的,這下就簡單了不少。 ## 三、肯定思路 思路比較簡單,經過正則把全部行數據都提取出來,過濾掉無用的修飾信息,獲得的就是想要的數據。此處我把每行的列符合替換成了「,」方便用csv記錄數據。 通過過濾以後的數據以下: ``` 球隊,第一節,第二節,第三節,第四節,總比分 廣州,33,37,36,27,133 北控,23,18,17,34,92 2019-01-1619:35:00輪次:31場序309開始比賽  比賽已結束 首發,球員,出場時間,兩分球,三分球,罰球,進攻,籃板,助攻,失誤,搶斷,犯規,蓋帽,得分 ,張永鵬,25.8,7-9,0-0,1-1,4,8,3,0,0,1,0,15 ,鞠明欣,19.1,2-4,1-2,0-0,2,5,2,2,0,1,0,7 ,西熱力江,25.5,1-1,4-8,0-0,1,2,4,1,3,1,0,14 ,郭凱,15.5,2-2,0-0,0-0,2,3,0,2,0,2,0,4 ,凱爾·弗格,38.1,5-9,5-9,11-11,0,10,12,2,2,4,0,36 ,姚天一,12.3,0-1,1-4,0-0,0,1,5,0,0,0,0,3 ,科裏·傑弗森,24.0,4-4,2-4,3-4,0,6,0,1,0,1,1,17 ,陳盈駿,22.6,1-1,2-7,1-1,0,2,4,2,1,2,0,9 ,司坤,19.0,2-2,0-2,0-0,0,5,1,0,1,4,0,4 ,孫鳴陽,20.6,2-3,0-0,3-3,1,4,1,2,3,4,0,7 ,谷玥灼,7.4,1-1,1-2,0-0,0,0,2,0,0,0,0,5 ,鄭準,10.1,3-4,2-3,0-0,0,2,0,0,0,1,0,12 ,總計,240.0,30-41(73.2%),18-41(43.9%),19-20(95.0%),10,48,34,12,10,21,1,133 首發,球員,出場時間,兩分球,三分球,罰球,進攻,籃板,助攻,失誤,搶斷,犯規,蓋帽,得分 ,於梁,20.8,1-3,0-1,0-0,0,0,2,0,1,5,0,2 ,於澍龍,17.9,0-1,1-3,0-0,0,2,1,2,0,1,0,3 ,許夢君,46.2,1-3,5-12,0-0,1,6,2,1,0,3,0,17 ,托馬斯·羅賓遜,43.4,9-20,0-2,9-14,3,11,5,2,1,3,1,27 ,楊敬敏,16.0,3-4,0-3,0-0,0,0,0,2,0,1,0,6 ,孫賀男,2.8,0-0,0-0,0-0,0,0,0,1,0,1,0,0 ,劉大鵬,28.0,1-1,3-5,0-0,1,4,3,2,2,3,0,11 ,張銘浩,8.5,0-0,0-0,1-2,0,0,0,0,1,1,0,1 ,張帆,27.5,5-7,1-3,0-0,0,1,6,4,1,2,0,13 ,王徵,23.3,3-3,0-0,6-8,0,2,0,0,1,1,1,12 ,常亞鬆,5.6,0-0,0-1,0-0,0,1,0,1,2,0,0,0 ,總計,240.0,23-42(54.8%),10-30(33.3%),16-24(66.7%),5,27,19,15,9,21,2,92 ``` 下面分享本身代碼: ``` package com.fun import com.fun.frame.Save import com.fun.frame.httpclient.FanLibrary import com.fun.utils.Regex import com.fun.utils.WriteRead class sd extends FanLibrary { public static void main(String[] args) { int i = 1 def total = [] range(300, 381).forEach {x -> total.addAll test(x) } Save.saveStringList(total, "total4.csv") testOver() } static def test(int i) { if (new File(LONG_Path + "${i}.csv").exists()) return WriteRead.readTxtFileByLine(LONG_Path + "${i}.csv") String url = "http://cbadata.sports.sohu.com/game/content/2017/${i}" def get = getHttpGet(url) def response = getHttpResponse(get) def string = response.getString("content").replaceAll("\\s", EMPTY) // output(string) def all = Regex.regexAll(string, " <\\/tr>") def list = [] all.forEach {x -> def info = x.replaceAll(" ", EMPTY).replaceAll("", ",") info = info.replaceAll("<.*?>", EMPTY) info = info.charAt(info.length() - 1) == ',' ? info.substring(0, info.length() - 1) : info if (info.startsWith("總計")) info = "," + info list << info output(info) } Save.saveStringList(list, "${i}.csv") return list } } ``` 有興趣的,能夠後臺回覆「大爺來玩啊」獲取本人微信號,我們私聊。 ## 技術類文章精選 - [java一行代碼打印心形](https://mp.weixin.qq.com/s/QPSryoSbViVURpSa9QXtpg) - [Linux性能監控軟件netdata中文漢化版](https://mp.weixin.qq.com/s/fdXtK-5WwKnxjLZdyg6-nA) - [接口測試代碼覆蓋率(jacoco)方案分享](https://mp.weixin.qq.com/s/D73Sq6NLjeRKN8aCpGLOjQ) - [性能測試框架](https://mp.weixin.qq.com/s/3_09j7-5ex35u30HQRyWug) - [如何在Linux命令行界面愉快進行性能測試](https://mp.weixin.qq.com/s/fwGqBe1SpA2V0lPfAOd04Q) - [圖解HTTP腦圖](https://mp.weixin.qq.com/s/100Vm8FVEuXs0x6rDGTipw) - [將swagger文檔自動變成測試代碼](https://mp.weixin.qq.com/s/SY8mVenj0zMe5b47GS9VSQ) - [五行代碼構建靜態博客](https://mp.weixin.qq.com/s/hZnimJOg5OqxRSDyFvuiiQ) - [基於java的直線型接口測試框架初探](https://mp.weixin.qq.com/s/xhg4exdb1G18-nG5E7exkQ) - [JUnit中用於Selenium測試的中實踐](https://mp.weixin.qq.com/s/KG4sltQMCfH2MGXkRdtnwA) ## 非技術文章精選 - [爲何選擇軟件測試做爲職業道路?](https://mp.weixin.qq.com/s/o83wYvFUvy17kBPLDO609A) - [寫給全部人的編程思惟](https://mp.weixin.qq.com/s/Oj33UCnYfbUgzsBzEm2GPQ) - [成爲優秀自動化測試工程師的7個步驟](https://mp.weixin.qq.com/s/wdw1l4AZnPpdPBZZueCcnw) - [手動測試存在的重要緣由](https://mp.weixin.qq.com/s/mW5vryoJIkeskZLkBPFe0Q) - [成爲自動化測試的7種技能](https://mp.weixin.qq.com/s/e-HAGMO0JLR7VBBWLvk0dQ) - [自動化和手動測試,保持平衡!](https://mp.weixin.qq.com/s/mMr_4C98W_FOkks2i2TiCg) - [自動化測試生命週期](https://mp.weixin.qq.com/s/SH-vb2RagYQ3sfCY8QM5ew) - [如何在DevOps引入自動化測試](https://mp.weixin.qq.com/s/MclK3VvMN1dsiXXJO8g7ig) ## 大咖風采 - [Tcloud 雲測平臺--集大成者](https://mp.weixin.qq.com/s/29sEO39_NyDiJr-kY5ufdw) - [Android App 測試工具及知識大集合](https://mp.weixin.qq.com/s/Xk9rCW8whXOTAQuCfhZqTg) - [4399AT UI自動化CI與CD](https://mp.weixin.qq.com/s/cVwg8ddnScWPX4uldsJ0fA) - [Android App常規測試內容](https://mp.weixin.qq.com/s/tweeoS5wTqK3k7R2TVuDXA) - [JVM的對象和堆](https://mp.weixin.qq.com/s/iNDpTz3gBK3By_bvUnrWOA)
相關文章
相關標籤/搜索