網絡爬蟲1

時間 2019-12-17

原文原文鏈接

網絡爬蟲，web crawler（網頁蜘蛛，網絡機器人,網頁追逐者），是一種按照必定的規則，自動地抓取萬維網信息的程序html

最簡單的網絡爬蟲：讀取頁面中全部的郵箱java

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WebCrawler {

    
    public static void main(String[] args) throws IOException{
        // 網址
        //URL url = new URL("http://localhost:8080/JavaWeb/index.jsp");
 　　　　URL url = new URL("https://www.meizu.com/contact.html");
        URLConnection conn = url.openConnection();
        
        // 轉流
        InputStream is = conn.getInputStream();
        InputStreamReader isReader = new InputStreamReader(is);
        
        // 讀取
        BufferedReader bufRead = new BufferedReader(isReader);
        String line = null;
        String mailReg = "\\w+@\\w+(\\.\\w+)+";
        Pattern p = Pattern.compile(mailReg);
        
        while((line=bufRead.readLine())!=null){
            // 匹配
            Matcher matcher = p.matcher(line); while(matcher.find()){
                System.out.println(matcher.group());
            }
        }
        
        is.close();
    }
}

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。