網絡爬蟲1

網絡爬蟲,web crawler(網頁蜘蛛,網絡機器人,網頁追逐者),是一種按照必定的規則,自動地抓取萬維網信息的程序html

最簡單的網絡爬蟲:讀取頁面中全部的郵箱java

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WebCrawler {

    
    public static void main(String[] args) throws IOException{
        // 網址
        //URL url = new URL("http://localhost:8080/JavaWeb/index.jsp");
    URL url = new URL("https://www.meizu.com/contact.html"); URLConnection conn
= url.openConnection(); // 轉流 InputStream is = conn.getInputStream(); InputStreamReader isReader = new InputStreamReader(is); // 讀取 BufferedReader bufRead = new BufferedReader(isReader); String line = null; String mailReg = "\\w+@\\w+(\\.\\w+)+"; Pattern p = Pattern.compile(mailReg); while((line=bufRead.readLine())!=null){ // 匹配 Matcher matcher = p.matcher(line); while(matcher.find()){ System.out.println(matcher.group()); } } is.close(); } }
相關文章
相關標籤/搜索