網站的個數能夠做爲本身要爬取時間的估算。
技術棧能夠知道本身要爬取的難度。javascript
www.baidu.com 而後 輸入 site:www.cnblogs.com 就能夠知道 博客園大概有多少個頁面了。 1000萬個左右。vue
pip install builtwith
java
import builtwith builtwith.parse('http://www.cnblogs.com') {'advertising-networks': ['DoubleClick for Publishers (DFP)'], 'javascript-frameworks': ['Vue.js', 'jQuery']} // 得知 採用的是vue 和 jquery。
pip install python-whois
python
import whois print(whois.whois('www.changeworld.shop')) { "domain_name": "CHANGEWORLD.SHOP", "registrar": "Bizcn.com,Inc", "whois_server": null, "referral_url": null, "updated_date": "2019-04-24 04:22:03", "creation_date": "2019-04-15 14:23:58", "expiration_date": "2020-04-15 23:59:59", "name_servers": [ "NS1.BDYDNS.CN", "NS2.BDYDNS.CN" ], "status": "clientTransferProhibited https://icann.org/epp#clientTransferProhibited", "emails": null, "dnssec": "unsigned", "name": null, "org": null, "address": null, "city": null, "state": "Zhejiang", "zipcode": null, "country": "CN" } 能夠看出大體的信息。