robots.txt 簡單解析

時間 2019-11-09

標籤 robots.txt robots txt 簡單解析简体版

原文原文鏈接

簡介

robots.txt 是一個規範，對於執行正常操做的爬蟲理應遵照的規範.html

例子

博客園例子

https://www.cnblogs.com/robots.txtide

User-Agent: *
Allow: /

容許全部爬蟲爬取網站任何地址。網站

百度例子

User-agent: Baiduspider # 百度本身的爬蟲
Disallow: /baidu # 不容許本身的爬蟲爬取百度的站點 https://www.baidu.com/baidu.html
Disallow: /s?
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/ # /home/news/data/目錄的全部內容code

User-agent: Googlebot
Disallow: /baidu
Disallow: /s?
Disallow: /shifen/
Disallow: /homepage/
Disallow: /cpro
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/htm

參考連接

百度站長管理blog

1. robots.txt
2. robots.txt 是什麼？
3. UINavigationBar簡單解析
4. 簡單解析URL
5. Html 簡單解析
6. LinkedHashMap簡單解析
7. SEO優化-robots.txt解讀
8. robots.txt文件格式詳解
9. 簡單SAX解析詳解
10. robots.txt文件
更多相關文章...
• XML DOM 解析器 - XML DOM 教程
• TCP報文格式解析 - TCP/IP教程
• Github 簡明教程
• Git可視化極簡易教程 — Git GUI使用方法

相關標籤/搜索