java 網絡編程html
package java.net.*java
Network Protocol Stacknode
Socketmysql
Definition: web
A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.sql
endpoint端點是一個IP地址和一個端口的組合,每個TCP鏈接能夠被兩個端點惟一標識。這樣,你才能夠在主機和服務器之間創建多個鏈接。apache
代碼示例:編程
GreetingClient.javajson
1 import java.net.*; 2 import java.io.*; 3 4 public class GreetingClient{ 5 public static void main(String[] args){ 6 String serverName = args[0]; 7 int port = Integer.parseInt(args[1]); 8 try{ 9 System.out.println("Connecting to " + serverName + 10 " on port " + port); 11 Socket client = new Socket(serverName, port); 12 System.out.println("Just connected to " 13 + client.getRemoteSocketAddress()); 14 OutputStream outToServer = client.getOutputStream(); 15 DataOutputStream out = new DataOutputStream(outToServer); 16 out.writeUTF("Hello from " 17 + client.getLocalSocketAddress()); 18 InputStream inFromServer = client.getInputStream(); 19 DataInputStream in = 20 new DataInputStream(inFromServer); 21 System.out.println("Server says " + in.readUTF()); 22 client.close(); 23 }catch(IOException e){ 24 e.printStackTrace(); 25 } 26 } 27 }
GreetingServer.java 繼承Thread類實現多線程瀏覽器
1 import java.net.*; 2 import java.io.*; 3 4 public class GreetingServer extends Thread{ 5 private ServerSocket serverSocket; 6 7 public GreetingServer(int port) throws IOException{ 8 serverSocket = new ServerSocket(port); 9 serverSocket.setSoTimeout(10000); 10 } 11 12 public void run(){ 13 while(true){ 14 try{ 15 System.out.println("Waiting for client on port " + 16 serverSocket.getLocalPort() + " ..."); 17 Socket server = serverSocket.accept(); 18 System.out.println("Just connected to " 19 + server.getRemoteSocketAddress()); 20 DataInputStream in = new DataInputStream(server.getInputStream()); 21 System.out.println(in.readUTF()); 22 DataOutputStream out = 23 new DataOutputStream(server.getOutputStream()); 24 out.writeUTF("Thank you for connecting to " 25 + server.getLocalSocketAddress() + "\nGoodbye!"); 26 server.close(); 27 }catch(SocketTimeoutException s){ 28 System.out.println("Socket timed out!"); 29 break; 30 }catch(IOException e){ 31 e.printStackTrace(); 32 break; 33 } 34 } 35 } 36 37 public static void main(String[] args){ 38 int port = Integer.parseInt(args[0]); 39 try{ 40 Thread t = new GreetingServer(port); 41 t.start(); 42 }catch(IOException e){ 43 e.printStackTrace(); 44 } 45 } 46 }
Another Example 測試網絡端口 (端口開放則return true)
1 public static boolean testInet(String site, int port) {
2 Socket sock = new Socket(); 3 int timeout = 3000; // ms 4 InetSocketAddress addr = new InetSocketAddress(site,port); 5 try { 6 sock.connect(addr,timeout); 7 return true; 8 } catch (IOException e) { 9 return false; 10 } finally { 11 try {sock.close();} 12 catch (IOException e) {} 13 } 14 }
Payload [wikipedia]
payload是所傳輸數據中實際想傳輸的消息內容對應的那部分。
The term is borrowed from transportation, where "payload" refers to the part of the load that pays for transportation.
示例
下面是一段json數據
{
"data":{ "message":"Hello, world!" } }
字符串"Hello, world!" 就是 payload, 而其它部分就是協議開銷.
Ethernet
IP (Internet Protocol 網際協議) [rfc791]
IP協議爲上層協議提供無狀態,無鏈接,不可靠的服務。
Transport Layer
端口在傳輸協議層肯定。
UDP [rfc768]
Format ------ 0 7 8 15 16 23 24 31 +--------+--------+--------+--------+ | Source | Destination | | Port | Port | +--------+--------+--------+--------+ | | | | Length | Checksum | +--------+--------+--------+--------+ | | data octets ... +---------------- ... User Datagram Header Format
TCP [rfc793]
TCP Header Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TCP Header Format Note that one tick mark represents one bit position.
TCP 三次握手(three-way handshake)創建鏈接
簡單解釋以下:
TCP 關閉鏈接 四次握手
簡單解釋: 鏈接的兩端是獨立的,每一端的關閉都須要兩次握手。
HTTP
HTTP是一個客戶端終端(用戶)和服務器端(網站)請求和應答的標準(TCP)。經過使用Web瀏覽器、網絡爬蟲或者其它的工具,客戶端發起一個HTTP請求到服務器上指定端口(默認端口爲80)。咱們稱這個客戶端爲用戶代理程式(user agent)。應答的服務器上存儲着一些資源,好比HTML文件和圖像。咱們稱這個應答服務器爲源服務器(origin server)。在用戶代理和源服務器中間可能存在多個「中間層」,好比代理伺服器、網關或者隧道(tunnel)。
儘管TCP/IP協議是互聯網上最流行的應用,HTTP協議中,並無規定必須使用它或它支持的層。事實上,HTTP能夠在任何互聯網協議上,或其餘網絡上實現。HTTP假定其下層協議提供可靠的傳輸。所以,任何可以提供這種保證的協議均可以被其使用。所以也就是其在TCP/IP協議族使用TCP做爲其傳輸層。
一般,由HTTP客戶端發起一個請求,創建一個到服務器指定端口(默認是80端口)的TCP鏈接。HTTP服務器則在那個端口監聽客戶端的請求。一旦收到請求,服務器會向客戶端返回一個狀態,好比"HTTP/1.1 200 OK",以及返回的內容,如請求的文件、錯誤消息、或者其它信息。
HTTP狀態碼
狀態代碼的第一個數字表明當前響應的類型:
HTTP響應由三個部分組成:
狀態碼(Status Code):描述了響應的狀態。能夠用來檢查是否成功的完成了請求。請求失敗的狀況下,狀態碼可用來找出失敗的緣由。若是Servlet沒有返回狀態碼,默認會返回成功的狀態碼HttpServletResponse.SC_OK。
HTTP頭部(HTTP Header):它們包含了更多關於響應的信息。好比:頭部能夠指定認爲響應過時的過時日期,或者是指定用來給用戶安全的傳輸實體內容的編碼格式。如何在Serlet中檢索HTTP的頭部看這裏。
主體(Body):它包含了響應的內容。它能夠包含HTML代碼,圖片,等等。主體是由傳輸在HTTP消息中緊跟在頭部後面的數據字節組成的。
Code Example: use HttpURLConnection POST data to web server
1 public static String executePost(String targetURL, String urlParameters) { 2 HttpURLConnection connection = null; 3 4 try { 5 //Create connection 6 URL url = new URL(targetURL); 7 connection = (HttpURLConnection) url.openConnection(); 8 connection.setRequestMethod("POST"); 9 connection.setRequestProperty("Content-Type", 10 "application/x-www-form-urlencoded"); 11 12 connection.setRequestProperty("Content-Length", 13 Integer.toString(urlParameters.getBytes().length)); 14 connection.setRequestProperty("Content-Language", "en-US"); 15 16 connection.setUseCaches(false); 17 connection.setDoOutput(true); 18 19 //Send request 20 DataOutputStream wr = new DataOutputStream ( 21 connection.getOutputStream()); 22 wr.writeBytes(urlParameters); 23 wr.close(); 24 25 //Get Response 26 InputStream is = connection.getInputStream(); 27 BufferedReader rd = new BufferedReader(new InputStreamReader(is)); 28 StringBuilder response = new StringBuilder(); // or StringBuffer if Java version 5+ 29 String line; 30 while ((line = rd.readLine()) != null) { 31 response.append(line); 32 response.append('\r'); 33 } 34 rd.close(); 35 return response.toString(); 36 } catch (Exception e) { 37 e.printStackTrace(); 38 return null; 39 } finally { 40 if (connection != null) { 41 connection.disconnect(); 42 } 43 } 44 }
A Simple Web Crawler
pom.xml
1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" 3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 4 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 5 <modelVersion>4.0.0</modelVersion> 6 7 <groupId>artificerPi</groupId> 8 <artifactId>WebCrawler</artifactId> 9 <version>1.0-SNAPSHOT</version> 10 11 <dependencies> 12 <dependency> 13 <groupId>org.jsoup</groupId> 14 <artifactId>jsoup</artifactId> 15 <version>1.8.3</version> 16 </dependency> 17 <dependency> 18 <groupId>mysql</groupId> 19 <artifactId>mysql-connector-java</artifactId> 20 <version>5.1.25</version> 21 </dependency> 22 </dependencies> 23 </project>
ddl
1 create database crawler; 2 3 use crawler; 4 5 CREATE TABLE IF NOT EXISTS `Record` ( 6 `RecordID` INT(11) NOT NULL AUTO_INCREMENT, 7 `URL` text NOT NULL, 8 PRIMARY KEY (`RecordID`) 9 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
DB.java
1 /** 2 * Created by artificerPi on 2016/4/7. 3 */ 4 import java.sql.Connection; 5 import java.sql.DriverManager; 6 import java.sql.ResultSet; 7 import java.sql.SQLException; 8 import java.sql.Statement; 9 10 public class DB { 11 12 public Connection conn = null; 13 14 public DB() { 15 try { 16 Class.forName("com.mysql.jdbc.Driver"); 17 String url = "jdbc:mysql://localhost:3306/Crawler"; 18 conn = DriverManager.getConnection(url, "root", "passw0rd"); 19 System.out.println("conn built"); 20 } catch (SQLException e) { 21 e.printStackTrace(); 22 } catch (ClassNotFoundException e) { 23 e.printStackTrace(); 24 } 25 } 26 27 public ResultSet runSql(String sql) throws SQLException { 28 Statement sta = conn.createStatement(); 29 return sta.executeQuery(sql); 30 } 31 32 public boolean runSql2(String sql) throws SQLException { 33 Statement sta = conn.createStatement(); 34 return sta.execute(sql); 35 } 36 37 @Override 38 protected void finalize() throws Throwable { 39 if (conn != null || !conn.isClosed()) { 40 conn.close(); 41 } 42 } 43 }
Main.java
1 /** 2 * Created by artificerPi on 2016/4/7. 3 */ 4 5 import org.jsoup.Jsoup; 6 import org.jsoup.nodes.Document; 7 import org.jsoup.nodes.Element; 8 import org.jsoup.select.Elements; 9 10 import java.io.IOException; 11 import java.sql.PreparedStatement; 12 import java.sql.ResultSet; 13 import java.sql.SQLException; 14 import java.sql.Statement; 15 16 17 public class Main { 18 public static DB db = new DB(); 19 20 public static void main(String[] args) throws SQLException, IOException { 21 db.runSql2("TRUNCATE Record;"); 22 processPage("http://www.mit.edu"); 23 } 24 25 public static void processPage(String URL) throws SQLException, IOException { 26 //check if the given URL is already in database 27 String sql = "select * from Record where URL = '" + URL + "'"; 28 ResultSet rs = db.runSql(sql); 29 if (rs.next()) { 30 31 } else { 32 //store the URL to database to avoid parsing again 33 sql = "INSERT INTO `Crawler`.`Record` " + "(`URL`) VALUES " + "(?);"; 34 PreparedStatement stmt = db.conn.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS); 35 stmt.setString(1, URL); 36 stmt.execute(); 37 38 //get useful information 39 Document doc = Jsoup.connect("http://www.mit.edu/").get(); 40 41 // query param: research 42 if (doc.text().contains("research")) { 43 System.out.println(URL); 44 } 45 46 //get all links and recursively call the processPage method 47 Elements questions = doc.select("a[href]"); 48 for (Element link : questions) { 49 if (link.attr("href").contains("mit.edu")) 50 processPage(link.attr("abs:href")); 51 } 52 } 53 } 54 }
其中使用了jsoup (Java HTML Parser),能夠參考官方文檔或IBM developerworks.
原理:
HTTP Code 206
206 Partial Content 服務器已經成功處理了部分GET請求。相似於FlashGet或者迅雷這類的HTTP 下載工具都是使用此類響應實現斷點續傳或者將一個大文檔分解爲多個下載段同時下載。 該請求必須包含Range頭信息來指示客戶端但願獲得的內容範圍,而且可能包含If-Range來做爲請求條件。 響應必須包含以下的頭部域: Content-Range用以指示本次響應中返回的內容的範圍;若是是Content-Type爲multipart/byteranges的多段下載,則每一multipart段中都應包含Content-Range域用以指示本段的內容範圍。假如響應中包含Content-Length,那麼它的數值必須匹配它返回的內容範圍的真實字節數。 Date ETag和/或Content-Location,假如一樣的請求本應該返回200響應。 Expires, Cache-Control,和/或Vary,假如其值可能與以前相同變量的其餘響應對應的值不一樣的話。 假如本響應請求使用了If-Range強緩存驗證,那麼本次響應不該該包含其餘實體頭;假如本響應的請求使用了If-Range弱緩存驗證,那麼本次響應禁止包含其餘實體頭;這避免了緩存的實體內容和更新了的實體頭信息之間的不一致。不然,本響應就應當包含全部本應該返回200響應中應當返回的全部實體頭部域。 假如ETag或Last-Modified頭部不能精確匹配的話,則客戶端緩存應禁止將206響應返回的內容與以前任何緩存過的內容組合在一塊兒。 任何不支持Range以及Content-Range頭的緩存都禁止緩存206響應返回的內容。
Demo:
設定請求: 提交 RANGE: bytes=2000070
1 URL url = new URL("http://www.sjtu.edu.cn/down.zip"); 2 HttpURLConnection httpConnection = (HttpURLConnection)url.openConnection(); 3 4 // 設置 User-Agent 5 httpConnection.setRequestProperty("User-Agent","NetFox"); 6 // 設置斷點續傳的開始位置 7 httpConnection.setRequestProperty("RANGE","bytes=2000070"); 8 // 得到輸入流 9 InputStream input = httpConnection.getInputStream();
從適當的位置繼續寫入文件
RandomAccess oSavedFile = new RandomAccessFile("down.zip","rw"); long nPos = 2000070; // 定位文件指針到 nPos 位置 oSavedFile.seek(nPos); byte[] b = new byte[1024]; int nRead; // 從輸入流中讀入字節流,而後寫到文件中 while((nRead=input.read(b,0,1024)) > 0) { oSavedFile.write(b,0,nRead); }
Session & Cookie
cookie是Web服務器發送給瀏覽器的一塊信息。瀏覽器會在本地文件中給每個Web服務器存儲cookie。之後瀏覽器在給特定的Web服務器發請求的時候,同時會發送全部爲該服務器存儲的cookie。下面列出了session和cookie的區別:
不管客戶端瀏覽器作怎麼樣的設置,session都應該能正常工做。客戶端能夠選擇禁用cookie,可是,session仍然是可以工做的,由於客戶端沒法禁用服務端的session。
在存儲的數據量方面session和cookies也是不同的。session可以存儲任意的Java對象,cookie只能存儲String類型的對象。
參考:
https://docs.oracle.com/javase/tutorial/networking/sockets/definition.html
https://zh.wikipedia.org/wiki/%E8%B6%85%E6%96%87%E6%9C%AC%E4%BC%A0%E8%BE%93%E5%8D%8F%E8%AE%AE