基於Java的數據採集(終結篇)

關於寫過關於JAVA採集入庫的三篇文章:html

基於Java數據採集入庫(一):http://www.cnblogs.com/lichenwei/p/3904715.html
java

基於Java數據採集入庫(二):http://www.cnblogs.com/lichenwei/p/3905370.htmlmysql

基於Java數據採集入庫(三):http://www.cnblogs.com/lichenwei/p/3907007.htmlsql

分別實現了數據庫

①抓取頁面信息並顯示數組

②簡單採集入庫存儲緩存

③調用本地數據庫查詢服務器

④遠程調用實現操做(未實現)ide

以上這些功能都是基於本地的,有時候咱們須要遠程去調用這類數據,這時咱們就能夠用JAVA提供的RMI機制實行遠程調用訪問。ui

固然也能夠用WebServices實現(PHP版本,有時間再寫個JAVA版本的):http://www.cnblogs.com/lichenwei/p/3891297.html

 

什麼是RMI?

RMI 指的是遠程方法調用 (Remote Method Invocation)。它是一種機制,可以讓在某個 Java虛擬機上的對象調用另外一個 Java 虛擬機中的對象上的方法。能夠用此方法調用的任何對象必須實現該遠程接口。調用這樣一個對象時,其參數爲 "marshalled" 並將其從本地虛擬機發送到遠程虛擬機(該遠程虛擬機的參數爲 "unmarshalled")上。該方法終止時,將編組來自遠程機的結果並將結果發送到調用方的虛擬機。若是方法調用致使拋出異常,則該異常將指示給調用方。

簡單瞭解下RMI,看下簡單實現吧

 

一、定義遠程接口

首先,咱們須要寫個遠程接口IHello 該接口繼承了遠程對象Remote.

接口IHello裏面有個hello的方法,用於客戶端鏈接後 打招呼.

因爲IHello繼承了遠程Remote對象, 因此須要拋一個 RemoteException 遠程異常.

1 import java.rmi.Remote;
2 import java.rmi.RemoteException;
3 
4 
5 public interface IHello extends Remote{
6 
7     public String hello(String name) throws RemoteException;
8 }

二、實現接口

接下來,咱們實現下 該接口裏的方法, 實現接口的方法在服務端.

這裏的HelloImpl類 實現了接口IHello裏的方法.

注意:這裏HelloImpl一樣繼承了 UnicastRemoteObject 遠程對象,這個必須寫,否則服務端啓動後會莫名其妙報錯.

 1 import java.rmi.RemoteException;
 2 import java.rmi.server.UnicastRemoteObject;
 3 
 4 /**
 5  * UnicastRemoteObject 這個必須寫,雖然不寫代碼也不會出錯,但在運行服務器的時候會出現莫名錯誤
 6  * @author Balla_兔子
 7  *
 8  */
 9 public class HelloImpl extends UnicastRemoteObject implements IHello {
10 
11     protected HelloImpl() throws RemoteException {
12         super();
13     }
14 
15     @Override
16     public String hello(String name) {
17         String strHello="你好!"+name+"正在訪問服務端";
18         System.out.println(name+"正在訪問服務端");
19         return strHello;
20     }
21 
22 }

三、編寫服務端

服務端,因爲RMI實現遠程訪問的機制是指:客戶端經過在RMI註冊表上尋找遠程接口對象的地址(服務端地址) 達到實現遠程訪問的目的,

因此,咱們須要在服務端建立一個遠程對象的註冊表,用於綁定和註冊 服務端地址 和 遠程接口對象,便於後期客戶端可以成功找到服務端

 1 import java.rmi.Naming;
 2 import java.rmi.RemoteException;
 3 import java.rmi.registry.LocateRegistry;
 4 
 5 
 6 public class Server {
 7 
 8     /**
 9      * @param args
10      */
11     public static void main(String[] args) {
12         try {
13             IHello hello=new HelloImpl();
14             int port=6666;
15             LocateRegistry.createRegistry(port);
16             String address="rmi://localhost:"+port+"/tuzi";
17             Naming.bind(address, hello);
18             System.out.println(">>>服務端啓動成功");
19             System.out.println(">>>請啓動客戶端進行鏈接訪問..");
20             
21         } catch (Exception e) {
22             e.printStackTrace();
23         }
24     }
25 
26 }

四、編寫客戶端

客戶端上一樣須要定義一個 遠程訪問的地址 - 即服務端地址,

而後,經過在RMI註冊表上尋找該地址;  若是找到 則創建鏈接.

 1 import java.net.MalformedURLException;
 2 import java.rmi.Naming;
 3 import java.rmi.NotBoundException;
 4 import java.rmi.RemoteException;
 5 import java.util.Scanner;
 6 
 7 
 8 public class Client {
 9     public static void main(String[] args) {
10         
11         int port=6666;
12         String address="rmi://localhost:"+port+"/tuzi";
13         try {
14             IHello hello=(IHello) Naming.lookup(address);
15             System.out.println("<<<客戶端訪問成功!");
16             //客戶端 Client 調用 遠程接口裏的 sayHello 方法  並打印出來
17             System.out.println(hello.hello("Rabbit"));             
18             Scanner scanner=new Scanner(System.in);
19             String input=scanner.next();
20         } catch (MalformedURLException e) {
21             // TODO Auto-generated catch block
22             e.printStackTrace();
23         } catch (RemoteException e) {
24             // TODO Auto-generated catch block
25             e.printStackTrace();
26         } catch (NotBoundException e) {
27             // TODO Auto-generated catch block
28             e.printStackTrace();
29         }
30         
31     }
32 }

運行效果圖:

 

華麗的分割線


 

接下來就來看看咱們的程序吧,今天換種口味來採集下《2013-2014賽季常規賽排名》

這是數據網址:http://nbadata.sports.qq.com/teams_stat.aspx

先上效果圖:

 

好了,剩下的上代碼吧,具體看代碼註釋:

 

IdoAction.java (功能調用接口代碼)

 1 package com.lcw.rmi.collection;
 2 
 3 import java.rmi.Remote;
 4 import java.rmi.RemoteException;
 5 import java.util.List;
 6 
 7 public interface IdoAction extends Remote{
 8     
 9     
10     public void initData() throws RemoteException;
11     
12     public void getAllDatas() throws RemoteException;
13     
14     public List<String> getAllTeams() throws RemoteException;
15     
16     public List<String> getTeamInfo(String team) throws RemoteException;
17     
18     public List<String> getAllInfo() throws RemoteException;
19     
20 }
IdoAction.java

doActionImpl.java (接口實現類)

  1 package com.lcw.rmi.collection;
  2 
  3 import java.rmi.RemoteException;
  4 import java.rmi.server.UnicastRemoteObject;
  5 import java.sql.ResultSet;
  6 import java.sql.SQLException;
  7 import java.util.ArrayList;
  8 import java.util.List;
  9 
 10 public class doActionImpl extends UnicastRemoteObject implements IdoAction {
 11 
 12     /**
 13      * 
 14      */
 15     private static final long serialVersionUID = 1L;
 16     private Mysql mysql;
 17     private ResultSet resultSet;
 18 
 19     public doActionImpl() throws RemoteException {
 20         mysql = new Mysql();
 21     }
 22 
 23     @Override
 24     public void getAllDatas() throws RemoteException {
 25         // 調用採集類,獲取全部數據
 26         CollectData data = new CollectData();
 27         data.getAllDatas();
 28         System.out.println("數據採集成功!");
 29     }
 30 
 31     @Override
 32     public List<String> getAllInfo() throws RemoteException {
 33         // 查詢全部數據
 34         String sql = "select * from data";
 35         resultSet = mysql.querySQL(sql);
 36         List<String> list=new ArrayList<String>();
 37         System.out.println("當前執行命令5,正在獲取NBA(2013-2014)賽季常規賽隊伍全部信息..");
 38         System.out.println("獲取成功,已在客戶端展現..");
 39         try {
 40             while(resultSet.next()) {
 41                 for (int i = 2; i < 17; i++) {
 42                     //System.out.println("++++++++++++++");調試
 43                     list.add(resultSet.getString(i));
 44                 }
 45                 System.out.println();
 46             }
 47         } catch (SQLException e) {
 48             e.printStackTrace();
 49         }
 50         return list;
 51     }
 52 
 53     @Override
 54     public List<String> getAllTeams() throws RemoteException {
 55         // 查詢全部隊伍名稱
 56         String sql = "select team from data";
 57         resultSet = mysql.querySQL(sql);
 58         List<String> list = new ArrayList<String>();
 59         System.out.println("當前執行命令3,正在獲取NBA(2013-2014)賽季常規賽隊伍..");
 60         System.out.println("獲取成功,已在客戶端展現..");
 61         try {
 62             while (resultSet.next()) {
 63                 list.add(resultSet.getString("team"));
 64             }
 65         } catch (SQLException e) {
 66             System.out.println("數據庫暫無信息,請執行自動化採集命令");
 67             e.printStackTrace();
 68         }
 69         return list;
 70 
 71     }
 72 
 73     @Override
 74     public List<String> getTeamInfo(String team) throws RemoteException {
 75         // 根據隊伍查詢隊伍信息
 76         ResultSet resultSet = mysql.querySQL("select * from data where team='"
 77                 + team + "'");
 78         List<String> list=new ArrayList<String>();
 79         System.out.println("當前執行命令4,正在獲取用戶所查詢隊伍信息..");
 80         System.out.println("獲取成功,已在客戶端展現..");
 81         try {
 82             if (resultSet.next()) {
 83                 for (int i = 2; i < 17; i++) {
 84                     list.add(resultSet.getString(i));
 85                 }
 86             }
 87             System.out.println();
 88         } catch (SQLException e) {
 89             System.out.println("數據庫暫無信息,請執行自動化採集命令");
 90             e.printStackTrace();
 91         }
 92         return list;
 93     }
 94 
 95     @Override
 96     public void initData() throws RemoteException {
 97         // 初始化數據庫
 98         String sql = "delete from data";
 99         try {
100             mysql.updateSQL(sql);
101             System.out.println("數據庫初始化成功!");
102         } catch (Exception e) {
103             System.out.println("數據庫初始化失敗!");
104         }
105 
106     }
107 
108 }
doActionImpl.java

CollectData.java (採集主類)

 1 package com.lcw.rmi.collection;
 2 
 3 import java.io.BufferedReader;
 4 import java.io.IOException;
 5 import java.io.InputStream;
 6 import java.io.InputStreamReader;
 7 import java.net.MalformedURLException;
 8 import java.net.URL;
 9 import java.util.ArrayList;
10 import java.util.Arrays;
11 import java.util.List;
12 
13 public class CollectData {
14 
15     /**
16      * 採集類,獲取全部數據
17      */
18     public void getAllDatas() {
19         String address = "http://nbadata.sports.qq.com/teams_stat.aspx";// 要採集數據的url
20         try {
21             URL url = new URL(address);
22             try {
23                 InputStream inputStream = url.openStream();// 打開url,返回字節流
24                 InputStreamReader inputStreamReader = new InputStreamReader(
25                         inputStream, "gbk");// 將字節流轉換爲字符流,編碼utf-8
26                 BufferedReader reader = new BufferedReader(inputStreamReader);// 提升效率,緩存
27                 String rankRegEx = ">\\d{1,2}</td>";// 排名正則
28                 String teamRegEx = ">[^<>]*</a>";// 隊名正則
29                 String dataRegEx = ">\\d{1,3}(\\.)\\d{0,2}</td>";// 正常數據正則
30                 String percentRegEX = ">\\d{1,2}(\\.)*(\\d)*%</span></td>";// 百分比數據
31                 GetRegExData regExData = new GetRegExData();
32                 String temp = "";// 存放臨時讀取數據
33                 int flag = 0;
34                 String tempRank = "";// 存放匹配到的返回數據
35                 String tempTeam = "";// 存放匹配到的返回數據
36                 String tempData = "";
37                 String tempPercent = "";
38                 List<String> list = new ArrayList<String>();
39                 Mysql mysql = new Mysql();
40                 while ((temp = reader.readLine()) != null) {
41                     // 匹配排名
42                     if ((tempRank = regExData.getData(rankRegEx, temp)) != "") {
43                         tempRank = tempRank.substring(1, tempRank
44                                 .indexOf("</td>"));
45                         // System.out.println("排名:" + tempRank);
46                         list.add(tempRank);
47                         flag++;
48                     }
49                     // 匹配球隊
50                     // 因爲該正則會匹配到其餘地方的數據,需給它一個標識符,讓它從"找到排名位置"纔開始匹配
51                     if ((tempTeam = regExData.getData(teamRegEx, temp)) != ""
52                             && flag == 1) {
53                         tempTeam = tempTeam.substring(1, tempTeam
54                                 .indexOf("</a>"));
55                         // System.out.println("球隊名稱:" + tempTeam);
56                         list.add(tempTeam);
57                         flag = 0;
58                     }
59                     // 匹配正常數據
60                     if ((tempData = regExData.getData(dataRegEx, temp)) != "") {
61                         tempData = tempData.substring(1, tempData
62                                 .indexOf("</td>"));
63                         // System.out.println(tempData);
64                         list.add(tempData);
65 
66                     }
67                     // 匹配百分比數據
68                     if ((tempPercent = regExData.getData(percentRegEX, temp)) != "") {
69                         tempPercent = tempPercent.substring(1, tempPercent
70                                 .indexOf("</span></td>"));
71                         // System.out.println(tempPercent);
72                         list.add(tempPercent);
73                     }
74 
75                 }
76                 reader.close();
77                 Object[] arr = list.toArray();// 將集合轉換爲數組
78                 int a = -15;
79                 int b = 0;
80                 String sql = "insert into data(rank,team,chushou1,mingzhong1,chushou2,mingzhong2,chushou3,mingzhong3,qianchang,houchang,zong,zhugong,shiwu,fangui,defen) values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)";
81                 for (int i = 0; i < 30; i++) {
82                     a += 15;
83                     b += 15;
84                     if (b <= 450) {
85                         Object[] arr1 = Arrays.copyOfRange(arr, a, b);
86                         mysql.insertNewData(sql, arr1);
87                         System.out.println("正在採集數據..當前採集數據:" + (i + 1) + "條");
88                     }
89                 }
90 
91             } catch (IOException e) {
92                 e.printStackTrace();
93             }
94         } catch (MalformedURLException e) {
95             e.printStackTrace();
96         }
97     }
98 
99 }
CollectData.java

GetRegExData.java (正則過濾功能類)

 1 package com.lcw.rmi.collection;
 2 
 3 import java.util.regex.Matcher;
 4 import java.util.regex.Pattern;
 5 
 6 public class GetRegExData {
 7 
 8     public String getData(String regex, String content) {
 9         Pattern pattern = Pattern.compile(regex);
10         Matcher matcher = pattern.matcher(content);
11         if (matcher.find()) {
12             return matcher.group();
13         } else {
14             return "";
15         }
16 
17     }
18 }
GetRegExData.java

Mysql.java (數據庫操做類)

  1 package com.lcw.rmi.collection;
  2 
  3 import java.sql.Connection;
  4 import java.sql.DriverManager;
  5 import java.sql.PreparedStatement;
  6 import java.sql.ResultSet;
  7 import java.sql.SQLException;
  8 
  9 public class Mysql {
 10 
 11     private String driver = "com.mysql.jdbc.Driver";
 12     private String url = "jdbc:mysql://localhost:3306/nba";
 13     private String user = "root";
 14     private String password = "";
 15 
 16     private PreparedStatement stmt = null;
 17     private Connection conn = null;
 18     private ResultSet resultSet = null;
 19 
 20     /**
 21      * 
 22      * @param insertSql
 23      *            採集類,插入數據操做
 24      * @param arr
 25      */
 26     public void insertNewData(String insertSql, Object[] arr) {
 27 
 28         try {
 29             Class.forName(driver).newInstance();
 30             try {
 31                 conn = DriverManager.getConnection(url, user, password);
 32                 stmt = conn.prepareStatement(insertSql);
 33                 stmt.setString(1, arr[0].toString());
 34                 stmt.setString(2, arr[1].toString());
 35                 stmt.setString(3, arr[2].toString());
 36                 stmt.setString(4, arr[3].toString());
 37                 stmt.setString(5, arr[4].toString());
 38                 stmt.setString(6, arr[5].toString());
 39                 stmt.setString(7, arr[6].toString());
 40                 stmt.setString(8, arr[7].toString());
 41                 stmt.setString(9, arr[8].toString());
 42                 stmt.setString(10, arr[9].toString());
 43                 stmt.setString(11, arr[10].toString());
 44                 stmt.setString(12, arr[11].toString());
 45                 stmt.setString(13, arr[12].toString());
 46                 stmt.setString(14, arr[13].toString());
 47                 stmt.setString(15, arr[14].toString());
 48                 stmt.executeUpdate();
 49                 stmt.close();
 50                 conn.close();
 51 
 52             } catch (SQLException e) {
 53                 e.printStackTrace();
 54             }
 55         } catch (InstantiationException e) {
 56             e.printStackTrace();
 57         } catch (IllegalAccessException e) {
 58             e.printStackTrace();
 59         } catch (ClassNotFoundException e) {
 60             e.printStackTrace();
 61         }
 62 
 63     }
 64 
 65     /**
 66      * 
 67      * @param sql更新數據庫語句
 68      */
 69     public void updateSQL(String updateSql) {
 70         try {
 71             Class.forName(driver).newInstance();
 72             try {
 73                 conn = DriverManager.getConnection(url, user, password);
 74             } catch (SQLException e) {
 75                 e.printStackTrace();
 76             }
 77             try {
 78                 stmt = conn.prepareStatement(updateSql);
 79                 stmt.execute(updateSql);
 80             } catch (SQLException e) {
 81                 e.printStackTrace();
 82             }
 83 
 84         } catch (InstantiationException e) {
 85             e.printStackTrace();
 86         } catch (IllegalAccessException e) {
 87             e.printStackTrace();
 88         } catch (ClassNotFoundException e) {
 89             e.printStackTrace();
 90         }
 91     }
 92 
 93     /**
 94      * 
 95      * @param sql通常查詢
 96      */
 97     public ResultSet querySQL(String searchSql) {
 98         try {
 99             Class.forName(driver).newInstance();
100             try {
101                 conn = DriverManager.getConnection(url, user, password);
102             } catch (SQLException e) {
103                 e.printStackTrace();
104             }
105             try {
106                 stmt = conn.prepareStatement(searchSql);
107                 resultSet = stmt.executeQuery();
108             } catch (SQLException e) {
109                 e.printStackTrace();
110             }
111 
112         } catch (InstantiationException e) {
113             e.printStackTrace();
114         } catch (IllegalAccessException e) {
115             e.printStackTrace();
116         } catch (ClassNotFoundException e) {
117             e.printStackTrace();
118         }
119         return resultSet;
120     }
121 }
Mysql.java

Server.java (服務端類)

 1 package com.lcw.rmi.collection;
 2 
 3 import java.net.MalformedURLException;
 4 import java.rmi.AlreadyBoundException;
 5 import java.rmi.Naming;
 6 import java.rmi.RemoteException;
 7 import java.rmi.registry.LocateRegistry;
 8 
 9 public class Server {
10 
11     /**
12      * @param args
13      */
14     public static void main(String[] args) {
15         try {
16             int port = 9797;
17             String address = "rmi://localhost:"+port+"/nba";
18             IdoAction action = new doActionImpl();
19             LocateRegistry.createRegistry(port);
20             try {
21                 Naming.bind(address, action);
22                 System.out.println(">>>正在啓動服務端..");
23                 System.out.println(">>>服務端啓動成功!");
24                 System.out.println(">>>等待客戶端鏈接...");
25                 System.out.println(">>>客戶端Balla_兔子已鏈接。");
26             } catch (MalformedURLException e) {
27                 e.printStackTrace();
28             } catch (AlreadyBoundException e) {
29                 e.printStackTrace();
30             }
31         } catch (RemoteException e) {
32             e.printStackTrace();
33         }
34     }
35 
36 }
Server.java

Client.java (客戶端類)

  1 package com.lcw.rmi.collection;
  2 
  3 import java.net.MalformedURLException;
  4 import java.rmi.Naming;
  5 import java.rmi.NotBoundException;
  6 import java.rmi.RemoteException;
  7 import java.util.List;
  8 import java.util.Scanner;
  9 
 10 public class Client {
 11 
 12     public static void main(String[] args) {
 13         int port = 9797;
 14         String address = "rmi://localhost:" + port + "/nba";
 15 
 16         try {
 17             IdoAction action = (IdoAction) Naming.lookup(address);
 18             System.out.println("正在啓動客戶端..");
 19             System.out.println("客戶端啓動完畢,正在鏈接服務端..");
 20             System.out.println("鏈接成功...");
 21             System.out.println("---------------------------");
 22 
 23             while (true) {
 24                 System.out.println("①初始化數據庫-請按 (1)");
 25                 System.out.println();
 26                 System.out.println("②自動化採集NBA(2013-2014)賽季常規賽排名數據-請按(2)");
 27                 System.out.println();
 28                 System.out.println("③查詢NBA(2013-2014)賽季常規賽排名全部隊伍-請按(3)");
 29                 System.out.println();
 30                 System.out.println("④查詢具體球隊(2013-2014)賽季常規賽排名-請按(4)");
 31                 System.out.println();
 32                 System.out.println("⑤查詢具體詳情-請按(5)");
 33                 System.out.println();
 34 
 35                 Scanner scanner = new Scanner(System.in);
 36                 String input = scanner.next();
 37 
 38                 if (input.equals("1")) {
 39                     System.out
 40                             .println("---------------------------------------------------------");
 41                     System.out.println("服務端數據已初始化,請按2進行數據自動化採集..");
 42                     action.initData();
 43                     System.out
 44                             .println("---------------------------------------------------------");
 45                 }
 46                 if (input.equals("2")) {
 47                     System.out
 48                             .println("---------------------------------------------------------");
 49                     System.out.println("數據自動化採集中,請稍後..");
 50                     int i=0;
 51                     while(i<10000){//延遲操做,給數據採集緩衝時間
 52                         i++;
 53                     }
 54                     System.out.println("數據採集完畢..按3,4,5進行相關操做");
 55                     action.getAllDatas();
 56                     System.out
 57                             .println("---------------------------------------------------------");
 58                 }
 59                 if (input.equals("3")) {
 60                     System.out
 61                             .println("---------------------------------------------------------");
 62                     System.out.println("正在獲取NBA(2013-2014)賽季常規賽隊伍,請稍後..");
 63                     System.out.println();
 64                     List<String> list = action.getAllTeams();
 65                     for (int i = 0; i < list.size(); i++) {
 66                         if (i % 5 == 0 && i != 0) {
 67                             System.out.println();
 68                         }
 69                         System.out.print(list.get(i) + "\t");
 70                     }
 71                     System.out.println();
 72 
 73                     System.out
 74                             .println("---------------------------------------------------------");
 75                 }
 76                 if (input.equals("4")) {
 77                     System.out
 78                             .println("---------------------------------------------------------");
 79                     System.out.println("請輸入你要查詢的隊伍名稱(如:76人)");
 80                     String team = scanner.next();
 81                     System.out
 82                             .print("排名\t球隊\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前場\t後場\t總\t助攻\t失誤\t犯規\t得分");
 83                     System.out.println();
 84                     List<String> list=action.getTeamInfo(team);
 85                     for (int i = 0; i < 15; i++) {
 86                         System.out.print(list.get(i)+"\t");
 87                     }
 88                     System.out.println();
 89                     System.out
 90                             .println("---------------------------------------------------------");
 91                 }
 92                 if (input.equals("5")) {
 93                     System.out
 94                             .println("---------------------------------------------------------");
 95                     System.out.println("數據獲取中,請稍後...");
 96                     System.out.println();
 97                     System.out
 98                             .print("排名\t球隊\t出手\t命中率\t出手\t命中率\t出手\t命中率\t前場\t後場\t總\t助攻\t失誤\t犯規\t得分");
 99                     System.out.println();
100                     List<String> list=action.getAllInfo();
101                     for(int i=0;i<450;i++){
102                         if(i%15==0&&i!=0){
103                             System.out.println();
104                         }
105                         System.out.print(list.get(i)+"\t");
106                     }
107                     System.out.println();
108                     System.out
109                             .println("---------------------------------------------------------");
110                 }
111             }
112         } catch (MalformedURLException e) {
113             e.printStackTrace();
114         } catch (RemoteException e) {
115             e.printStackTrace();
116         } catch (NotBoundException e) {
117             e.printStackTrace();
118         }
119     }
120 }
Client.java

 

好了,關於JAVA採集數據文章就到此爲止了~ 撤··

相關文章
相關標籤/搜索