Java Network Programming

時間 2019-11-25

原文原文鏈接

java 網絡編程html

　　package java.net.*java

Network Protocol Stacknode

Definition: web

　　A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.sql

endpoint端點是一個IP地址和一個端口的組合，每個TCP鏈接能夠被兩個端點惟一標識。這樣，你才能夠在主機和服務器之間創建多個鏈接。apache

代碼示例：編程

GreetingClient.javajson

 1 import java.net.*;
 2 import java.io.*;
 3 
 4 public class GreetingClient{
 5     public static void main(String[] args){
 6         String serverName = args[0];
 7         int port = Integer.parseInt(args[1]);
 8         try{
 9             System.out.println("Connecting to " + serverName +
10                     " on port " + port);
11             Socket client = new Socket(serverName, port);
12             System.out.println("Just connected to "
13                     + client.getRemoteSocketAddress());
14             OutputStream outToServer = client.getOutputStream();
15             DataOutputStream out = new DataOutputStream(outToServer);
16             out.writeUTF("Hello from "
17                     + client.getLocalSocketAddress());
18             InputStream inFromServer = client.getInputStream();
19             DataInputStream in =
20                 new DataInputStream(inFromServer);
21             System.out.println("Server says " + in.readUTF());
22             client.close();
23         }catch(IOException e){
24             e.printStackTrace();
25         }
26     }
27 }

View Code

GreetingServer.java 繼承Thread類實現多線程瀏覽器

 1 import java.net.*;
 2 import java.io.*;
 3 
 4 public class GreetingServer extends Thread{
 5     private ServerSocket serverSocket;
 6 
 7     public GreetingServer(int port) throws IOException{
 8         serverSocket = new ServerSocket(port);
 9         serverSocket.setSoTimeout(10000);
10     }
11 
12     public void run(){
13         while(true){
14             try{
15                 System.out.println("Waiting for client on port " + 
16                         serverSocket.getLocalPort() + " ...");
17                 Socket server = serverSocket.accept();
18                 System.out.println("Just connected to "
19                         + server.getRemoteSocketAddress());
20                 DataInputStream in = new DataInputStream(server.getInputStream());
21                 System.out.println(in.readUTF());
22                 DataOutputStream out =
23                     new DataOutputStream(server.getOutputStream());
24                 out.writeUTF("Thank you for connecting to " 
25                         + server.getLocalSocketAddress() + "\nGoodbye!");
26                 server.close();
27             }catch(SocketTimeoutException s){
28                 System.out.println("Socket timed out!");
29                 break;
30             }catch(IOException e){
31                 e.printStackTrace();
32                 break;
33             }
34         }
35     }
36 
37     public static void main(String[] args){
38         int port = Integer.parseInt(args[0]);
39         try{
40             Thread t = new GreetingServer(port);
41             t.start();
42         }catch(IOException e){
43             e.printStackTrace();
44         }
45     }
46 }

View Code

Another Example 測試網絡端口 (端口開放則return true)

 1 public static boolean testInet(String site, int port) {
 2         Socket sock = new Socket(); 3 int timeout = 3000; // ms 4 InetSocketAddress addr = new InetSocketAddress(site,port); 5 try { 6  sock.connect(addr,timeout); 7 return true; 8 } catch (IOException e) { 9 return false; 10 } finally { 11 try {sock.close();} 12 catch (IOException e) {} 13  } 14 }

Payload [wikipedia]

payload是所傳輸數據中實際想傳輸的消息內容對應的那部分。

　　The term is borrowed from transportation, where "payload" refers to the part of the load that pays for transportation.

示例

下面是一段json數據

{  
   "data":{ "message":"Hello, world!" } }

字符串"Hello, world!" 就是 payload, 而其它部分就是協議開銷.

TCP/IP

Ethernet

IP (Internet Protocol 網際協議) [rfc791]

　　IP協議爲上層協議提供無狀態，無鏈接，不可靠的服務。

無狀態：IP通訊雙方不一樣步狀態傳輸信息，所以全部IP數據報的發送和接收都是相互獨立的，這樣也就形成了它沒法處理亂序，重複的IP數據報。相對於面向鏈接而設定的，例如TCP協議，它可以本身處理亂序，重複的報文段。他遞交給上層的內容絕對是有序的正確的。可是IP協議的狀態也是有優勢的，它簡單高效，由於咱們無需爲保證它的狀態而分配一些內核數據結構。
無鏈接：通訊雙方都不長久的維持對方的任何信息，那麼就須要上層協議去指定傳輸的IP地址。
不可靠服務：不可靠指的是IP協議不能保證IP數據報完整而且準確的到達接收端。所以使用IP的上層協議須要本身作數據確認，超時重傳等可靠傳輸機制。

Transport Layer

　　端口在傳輸協議層肯定。

UDP [rfc768]

Format
------

                                    
                  0      7 8     15 16    23 24    31  
                 +--------+--------+--------+--------+ 
                 |     Source      |   Destination   | 
                 |      Port       |      Port       | 
                 +--------+--------+--------+--------+ 
                 |                 |                 | 
                 |     Length      |    Checksum     | 
                 +--------+--------+--------+--------+ 
                 |                                     
                 |          data octets ...            
                 +---------------- ...                 

                      User Datagram Header Format

TCP [rfc793]

  TCP Header Format


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Source Port          |       Destination Port        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                        Sequence Number                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Acknowledgment Number                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |  Data |           |U|A|P|R|S|F|                               |
   | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
   |       |           |G|K|H|T|N|N|                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |           Checksum            |         Urgent Pointer        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Options                    |    Padding    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                             data                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                            TCP Header Format

          Note that one tick mark represents one bit position.

TCP 三次握手（three-way handshake）創建鏈接

簡單解釋以下：

客戶端向服務器發送一個SYN J 數據包
服務器向客戶端響應一個SYN K ，並對SYN J進行確認ACK J+1
客戶端再向服務器發一個確認ACK K+1 ，兩端都創建鏈接

TCP 關閉鏈接四次握手

簡單解釋：鏈接的兩端是獨立的，每一端的關閉都須要兩次握手。

HTTP

　　HTTP是一個客戶端終端（用戶）和服務器端（網站）請求和應答的標準（TCP）。經過使用Web瀏覽器、網絡爬蟲或者其它的工具，客戶端發起一個HTTP請求到服務器上指定端口（默認端口爲80）。咱們稱這個客戶端爲用戶代理程式（user agent）。應答的服務器上存儲着一些資源，好比HTML文件和圖像。咱們稱這個應答服務器爲源服務器（origin server）。在用戶代理和源服務器中間可能存在多個「中間層」，好比代理伺服器、網關或者隧道（tunnel）。

　　儘管TCP/IP協議是互聯網上最流行的應用，HTTP協議中，並無規定必須使用它或它支持的層。事實上，HTTP能夠在任何互聯網協議上，或其餘網絡上實現。HTTP假定其下層協議提供可靠的傳輸。所以，任何可以提供這種保證的協議均可以被其使用。所以也就是其在TCP/IP協議族使用TCP做爲其傳輸層。

　　一般，由HTTP客戶端發起一個請求，創建一個到服務器指定端口（默認是80端口）的TCP鏈接。HTTP服務器則在那個端口監聽客戶端的請求。一旦收到請求，服務器會向客戶端返回一個狀態，好比"HTTP/1.1 200 OK"，以及返回的內容，如請求的文件、錯誤消息、或者其它信息。

HTTP狀態碼

　　狀態代碼的第一個數字表明當前響應的類型：

1xx消息——請求已被服務器接收，繼續處理
2xx成功——請求已成功被服務器接收、理解、並接受
3xx重定向——須要後續操做才能完成這一請求
4xx請求錯誤——請求含有詞法錯誤或者沒法被執行
5xx服務器錯誤——服務器在處理某個正確請求時發生錯誤

HTTP響應由三個部分組成：
　　狀態碼(Status Code)：描述了響應的狀態。能夠用來檢查是否成功的完成了請求。請求失敗的狀況下，狀態碼可用來找出失敗的緣由。若是Servlet沒有返回狀態碼，默認會返回成功的狀態碼HttpServletResponse.SC_OK。
　　HTTP頭部(HTTP Header)：它們包含了更多關於響應的信息。好比：頭部能夠指定認爲響應過時的過時日期，或者是指定用來給用戶安全的傳輸實體內容的編碼格式。如何在Serlet中檢索HTTP的頭部看這裏。
　　主體(Body)：它包含了響應的內容。它能夠包含HTML代碼，圖片，等等。主體是由傳輸在HTTP消息中緊跟在頭部後面的數據字節組成的。

Code Example: use HttpURLConnection POST data to web server

 1 public static String executePost(String targetURL, String urlParameters) {
 2   HttpURLConnection connection = null;
 3 
 4   try {
 5     //Create connection
 6     URL url = new URL(targetURL);
 7     connection = (HttpURLConnection) url.openConnection();
 8     connection.setRequestMethod("POST");
 9     connection.setRequestProperty("Content-Type", 
10         "application/x-www-form-urlencoded");
11 
12     connection.setRequestProperty("Content-Length", 
13         Integer.toString(urlParameters.getBytes().length));
14     connection.setRequestProperty("Content-Language", "en-US");  
15 
16     connection.setUseCaches(false);
17     connection.setDoOutput(true);
18 
19     //Send request
20     DataOutputStream wr = new DataOutputStream (
21         connection.getOutputStream());
22     wr.writeBytes(urlParameters);
23     wr.close();
24 
25     //Get Response  
26     InputStream is = connection.getInputStream();
27     BufferedReader rd = new BufferedReader(new InputStreamReader(is));
28     StringBuilder response = new StringBuilder(); // or StringBuffer if Java version 5+
29     String line;
30     while ((line = rd.readLine()) != null) {
31       response.append(line);
32       response.append('\r');
33     }
34     rd.close();
35     return response.toString();
36   } catch (Exception e) {
37     e.printStackTrace();
38     return null;
39   } finally {
40     if (connection != null) {
41       connection.disconnect();
42     }
43   }
44 }

View Code

A Simple Web Crawler

　　pom.xml

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 <project xmlns="http://maven.apache.org/POM/4.0.0"
 3          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 4          xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 5     <modelVersion>4.0.0</modelVersion>
 6 
 7     <groupId>artificerPi</groupId>
 8     <artifactId>WebCrawler</artifactId>
 9     <version>1.0-SNAPSHOT</version>
10 
11     <dependencies>
12         <dependency>
13             <groupId>org.jsoup</groupId>
14             <artifactId>jsoup</artifactId>
15             <version>1.8.3</version>
16         </dependency>
17         <dependency>
18             <groupId>mysql</groupId>
19             <artifactId>mysql-connector-java</artifactId>
20             <version>5.1.25</version>
21         </dependency>
22     </dependencies>
23 </project>

View Code

　　ddl

1 create database crawler;
2 
3 use crawler;
4 
5 CREATE TABLE IF NOT EXISTS `Record` (
6   `RecordID` INT(11) NOT NULL AUTO_INCREMENT,
7   `URL` text NOT NULL,
8   PRIMARY KEY (`RecordID`)
9 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

View Code

　　DB.java

 1 /**
 2  * Created by artificerPi on 2016/4/7.
 3  */
 4 import java.sql.Connection;
 5 import java.sql.DriverManager;
 6 import java.sql.ResultSet;
 7 import java.sql.SQLException;
 8 import java.sql.Statement;
 9 
10 public class DB {
11 
12     public Connection conn = null;
13 
14     public DB() {
15         try {
16             Class.forName("com.mysql.jdbc.Driver");
17             String url = "jdbc:mysql://localhost:3306/Crawler";
18             conn = DriverManager.getConnection(url, "root", "passw0rd");
19             System.out.println("conn built");
20         } catch (SQLException e) {
21             e.printStackTrace();
22         } catch (ClassNotFoundException e) {
23             e.printStackTrace();
24         }
25     }
26 
27     public ResultSet runSql(String sql) throws SQLException {
28         Statement sta = conn.createStatement();
29         return sta.executeQuery(sql);
30     }
31 
32     public boolean runSql2(String sql) throws SQLException {
33         Statement sta = conn.createStatement();
34         return sta.execute(sql);
35     }
36 
37     @Override
38     protected void finalize() throws Throwable {
39         if (conn != null || !conn.isClosed()) {
40             conn.close();
41         }
42     }
43 }

View Code

　　Main.java

 1 /**
 2  * Created by artificerPi on 2016/4/7.
 3  */
 4 
 5 import org.jsoup.Jsoup;
 6 import org.jsoup.nodes.Document;
 7 import org.jsoup.nodes.Element;
 8 import org.jsoup.select.Elements;
 9 
10 import java.io.IOException;
11 import java.sql.PreparedStatement;
12 import java.sql.ResultSet;
13 import java.sql.SQLException;
14 import java.sql.Statement;
15 
16 
17 public class Main {
18     public static DB db = new DB();
19 
20     public static void main(String[] args) throws SQLException, IOException {
21         db.runSql2("TRUNCATE Record;");
22         processPage("http://www.mit.edu");
23     }
24 
25     public static void processPage(String URL) throws SQLException, IOException {
26         //check if the given URL is already in database
27         String sql = "select * from Record where URL = '" + URL + "'";
28         ResultSet rs = db.runSql(sql);
29         if (rs.next()) {
30 
31         } else {
32             //store the URL to database to avoid parsing again
33             sql = "INSERT INTO  `Crawler`.`Record` " + "(`URL`) VALUES " + "(?);";
34             PreparedStatement stmt = db.conn.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS);
35             stmt.setString(1, URL);
36             stmt.execute();
37 
38             //get useful information
39             Document doc = Jsoup.connect("http://www.mit.edu/").get();
40 
41             // query param: research
42             if (doc.text().contains("research")) {
43                 System.out.println(URL);
44             }
45 
46             //get all links and recursively call the processPage method
47             Elements questions = doc.select("a[href]");
48             for (Element link : questions) {
49                 if (link.attr("href").contains("mit.edu"))
50                     processPage(link.attr("abs:href"));
51             }
52         }
53     }
54 }

View Code

其中使用了jsoup （Java HTML Parser），能夠參考官方文檔或IBM developerworks.

用 Java 實現斷點續傳 (HTTP)

原理：

HTTP Code 206

206 Partial Content
服務器已經成功處理了部分GET請求。相似於FlashGet或者迅雷這類的HTTP 下載工具都是使用此類響應實現斷點續傳或者將一個大文檔分解爲多個下載段同時下載。
該請求必須包含Range頭信息來指示客戶端但願獲得的內容範圍，而且可能包含If-Range來做爲請求條件。
響應必須包含以下的頭部域：
Content-Range用以指示本次響應中返回的內容的範圍；若是是Content-Type爲multipart/byteranges的多段下載，則每一multipart段中都應包含Content-Range域用以指示本段的內容範圍。假如響應中包含Content-Length，那麼它的數值必須匹配它返回的內容範圍的真實字節數。
Date
ETag和／或Content-Location，假如一樣的請求本應該返回200響應。
Expires, Cache-Control，和／或Vary，假如其值可能與以前相同變量的其餘響應對應的值不一樣的話。
假如本響應請求使用了If-Range強緩存驗證，那麼本次響應不該該包含其餘實體頭；假如本響應的請求使用了If-Range弱緩存驗證，那麼本次響應禁止包含其餘實體頭；這避免了緩存的實體內容和更新了的實體頭信息之間的不一致。不然，本響應就應當包含全部本應該返回200響應中應當返回的全部實體頭部域。
假如ETag或Last-Modified頭部不能精確匹配的話，則客戶端緩存應禁止將206響應返回的內容與以前任何緩存過的內容組合在一塊兒。
任何不支持Range以及Content-Range頭的緩存都禁止緩存206響應返回的內容。

View Code

Demo:

　　設定請求：提交 RANGE: bytes=2000070

1 URL url = new URL("http://www.sjtu.edu.cn/down.zip"); 
2 HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection(); 
3 
4 // 設置 User-Agent 
5 httpConnection.setRequestProperty("User-Agent","NetFox"); 
6 // 設置斷點續傳的開始位置 
7 httpConnection.setRequestProperty("RANGE","bytes=2000070"); 
8 // 得到輸入流 
9 InputStream input = httpConnection.getInputStream();

View Code

　　從適當的位置繼續寫入文件

RandomAccess oSavedFile = new RandomAccessFile("down.zip","rw"); 
long nPos = 2000070; 
// 定位文件指針到 nPos 位置 
oSavedFile.seek(nPos); 
byte[] b = new byte[1024]; 
int nRead; 
// 從輸入流中讀入字節流，而後寫到文件中 
while((nRead=input.read(b,0,1024)) > 0) 
{ 
oSavedFile.write(b,0,nRead); 
}

View Code

Session & Cookie

　　cookie是Web服務器發送給瀏覽器的一塊信息。瀏覽器會在本地文件中給每個Web服務器存儲cookie。之後瀏覽器在給特定的Web服務器發請求的時候，同時會發送全部爲該服務器存儲的cookie。下面列出了session和cookie的區別：
　　不管客戶端瀏覽器作怎麼樣的設置，session都應該能正常工做。客戶端能夠選擇禁用cookie，可是，session仍然是可以工做的，由於客戶端沒法禁用服務端的session。
　　在存儲的數據量方面session和cookies也是不同的。session可以存儲任意的Java對象，cookie只能存儲String類型的對象。

參考：

https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html

https://zh.wikipedia.org/wiki/%E8%B6%85%E6%96%87%E6%9C%AC%E4%BC%A0%E8%BE%93%E5%8D%8F%E8%AE%AE