HTTP chunked編碼數據流解析算法

時間 2019-11-24

標籤 http chunked 編碼數據流解析算法欄目 HTTP/TCP 简体版

原文原文鏈接

一.關於HTTP數據傳輸中的內容長度

在HTTP傳輸過程當中，Content-Length用來表示響應消息體的大小，一般帶有Content-Length的數據是圖片，媒體資源，靜態文本，網頁，還有些文檔。一般Content-Length能夠被Client,Server都能利用。php

1.使用Content-Length在Client端時，配合POST,PUT,DELETE等向Server發送數據，報告數據長度給Server。ios

2.使用Content-Length在Server端時，配合除HEAD,HTTP 304等，向Client報告數據長度。緩存

3.固然，問題有些數據是動態變化的，並無固定長度，好比動態頁面處理數據，Server自己不肯定數據總長度，只知道數據中每一個片斷長度。所以，沒法使用Content-Length通知客戶端。這時候Transfer-Encoding:Chunked走上了舞臺，這種方式會會按照一邊傳遞數據片斷的長度一邊發送數據片斷規則向Client傳遞數據。app

4.一樣綜上可知，Content-Length永遠不能和chunked一塊兒使用。ide

二.Transfer-Encoding:Chunked編碼的數據格式。

HTTP/1.1 200 OK\r\n  
Content-Type: text/plain\r\n  
Transfer-Encoding: chunked\r\n
Connection:keep-Alive\r\n\r\n 
25\r\n;          
This is the data in the first chunk\r\n
\r\n1A\r\n
and this is the second one
\r\n0\r\n\r\n

看着很普通，其實咱們重點關注第三，五，七，九行：ui

第三行說明了他的傳輸編碼是chunked，第五，七，九是chunked編碼下一片斷Content-Length指示，用16進制數據表示的。this

好比第五行，去掉\r\n，0x25轉10進制正好是37，而「This is the data in the first chunk\r\n」的長度整好是37
一樣第七行，0x1A=26，「and this is the second one」的長度是26

第九行有些特殊，由於他標誌的0x0=0,那麼意味着如下再也沒有數據了，告訴Client中止讀取，不然遭遇線程阻塞而不會返回-1
\r\n0\r\n\r\n

使用一張示例圖來講明這之間的關係編碼

三.Transfer-Encoding:Chunked編碼解析

標準解析僞代碼url

length := 0//用來記錄解碼後的數據體長度
read chunk-size, chunk-extension (if any) and CRLF//第一次讀取塊大小
while (chunk-size > 0) {//一直循環，直到讀取的塊大小爲0
read chunk-data and CRLF//讀取塊數據體，以回車結束
append chunk-data to entity-body//添加塊數據體到解碼後實體數據
length := length + chunk-size//更新解碼後的實體長度
read chunk-size and CRLF//讀取新的塊大小
}
read entity-header//如下代碼讀取所有的頭標記
while (entity-header not empty) {
append entity-header to existing header fields
read entity-header
}
Content-Length := length//頭標記中添加內容長度
Remove "chunked" from Transfer-Encoding//頭標記中移除Transfer-Encoding

1.chunked常規的解析spa

咱們能夠直接看下面代碼

 
	    URL url=new URL("http://localhost:8080/test/ios.php");
	    EchoURLConnection connection=(EchoURLConnection)url.openConnection();
	    connection.setDoOutput(true);
	    connection.setDoInput(true);
	 
	    PrintWriter pw = new PrintWriter(new OutputStreamWriter(connection.getOutputStream()));
	    pw.write("name=zhangsan&password=123456");
	    pw.flush();
	   
	    InputStream stream = connection.getInputStream();
	    
	    int len = -1;
	    byte[] buf = new byte[1];
	    byte[] outBuf = new byte[]{0x3A}; //默認給一個「冒號」，便於解析http Response響應頭第一行
	    int count = 0;
	    //進行Header提取
	    while((len=stream.read(buf, 0, buf.length))>-1)
	    {
	    	
	    	int outLength = outBuf.length+len;
	    	byte[] tempBuf = outBuf;
	    	outBuf = new byte[outLength];
	    	System.arraycopy(tempBuf, 0, outBuf,0, tempBuf.length);
	    	System.arraycopy(buf, 0, outBuf,outBuf.length-1, len);
	    		
	    	if(buf[0]==0x0D || buf[0]==0x0A)
	    	{
	    		count++;
	    		if(count==4)
		    	{
		    		break;
		    	}
	    	}else{
	    		count = 0;
	    	}
	 
	    }
	    String headerString = new String(outBuf, 0, outBuf.length);
	    String[] splitLine = headerString.trim().split("\r\n");	
	    
	    //將header載入Map    
	    Map<String, String> headerMap = new HashMap<String, String>();
	    boolean isCkunked = false;
	    for (String statusLine : splitLine) {
	    	String[] nameValue = statusLine.split(":");
	    	if(nameValue[0].equals(""))
	    	{
	    		headerMap.put("", nameValue[1]);
	    	}else{
	    		headerMap.put(nameValue[0], nameValue[1]);
	    	}
	    	
	    	if("Transfer-Encoding".equalsIgnoreCase(nameValue[0]))
	    	{
	    		String value = nameValue[1].trim();
	    		isCkunked = value.equalsIgnoreCase("chunked"); //判斷是不是chunked編碼數據
	    	}
		}
	    
	    System.out.println(headerMap);
	   
	    if(isCkunked)
	    {
	    	int chunkedNumber = 0; //計數器，用於記錄通過了了多少個CRLF(回車換行)
	    	byte[] readBuf = null; //用於緩存chunked片斷的Content-Length行
	    	byte[] contentBuf = new byte[0]; //用於緩存實際內容
	    	long currentLength = 0;
	    	
	    	while((len=stream.read(buf, 0,buf.length))>=0)
	    	{
	    		if(readBuf==null)
	    		{
	    			readBuf  = new byte[]{buf[0]};
	    		}
	    		else
	    		{
	    		int outLength = readBuf.length+len;
	    	    	byte[] tempBuf = readBuf;
	    	    	readBuf = new byte[outLength];
	    	    	System.arraycopy(tempBuf, 0, readBuf,0, tempBuf.length);
	    	    	System.arraycopy(buf, 0, readBuf,readBuf.length-1, len);
	    		}
	    		
	    		if(buf[0]==0x0D|| buf[0]==0x0A) //判斷回車換行
	    		{
	    			chunkedNumber++;
	    			 //readBuf.length>currentLength判斷讀取的字節流長度超過沒超過chunked的片斷Content-Length
	    			if(chunkedNumber==2&&readBuf.length>currentLength)
	    			{
	    				
	    			byte[] tmpContentBuf = contentBuf;
	    			contentBuf = new byte[(int) (tmpContentBuf.length+currentLength)];
	    			System.arraycopy(tmpContentBuf, 0, contentBuf,0, tmpContentBuf.length);
	    	    	    	System.arraycopy(readBuf, 0, contentBuf,tmpContentBuf.length, (int) currentLength);
	    	    	    	
	    	    	    	String lineNo = "0x"+ (new String(readBuf,(int)currentLength, readBuf.length-(int)currentLength)).trim();
	    	    	    	if(lineNo.equals("0x")) 
	    	    	    	{
	    	    	    		currentLength = 0; //表示尚未讀取到chunked片斷Content-Length
	    	    	    	}else{
	    	    	    		currentLength = Long.decode(lineNo); //解析chunked片斷Content-Length
	    	    	    		if(currentLength==0) //若是解析出currentLength爲0，那麼應該中止讀取
	    	    	    		{
	    	    	    			System.out.println(new String(contentBuf, 0,contentBuf.length));
	    	    	    			break;
	    	    	    		}
	    	    	    	}
	    	    	    	System.out.println(currentLength+"--"+lineNo);
	    					chunkedNumber = 0;
	    					readBuf = null;
	    			}
	    			
	    		}else{
	    			chunkedNumber = 0; //遇到非CRLF，置爲0
	    		}
	    	}
	    }else{
	    
	        //解析非chunked編碼數據
	    	StringBuilder sb = new StringBuilder();
	    	
	    	while((len=stream.read(buf, 0,buf.length))>-1) 
	    	{
	    		sb.append(new String(buf, 0, len));
	    	}
	    	System.out.println(sb.toString());
	    }
	    pw.close();
	    stream.close();

2.封裝爲解析類

固然，上面的解析方式可移植性很差，咱們所以須要改進，咱們經過代理方式，解析InputStream

public class HttpInputStream extends InputStream {

	private  InputStream hostStream; //要解析的input流
	private  boolean isCkunked = false;
	private  int readMetaSize = 0; //讀取的實際總長度
	private  long contentLength = 0; //讀取到的內容長度
	private  long chunkedNextLength = -1L; //指示要讀取得字節流長度
	private  long chunkedCurrentLength = 0L; //指示當前已經讀取的字節流長度
	
	private final Map<String, Object> httpHeaders = new HashMap<String, Object>();
	
	public HttpInputStream(InputStream inputStream) throws IOException 
	{
		this.hostStream = inputStream;
		parseHeader();
	}
	public int getReadMetaSize() {
		return readMetaSize;
	}
	public long getContentLength() 
	{
		return contentLength;
	}
	/**
	 * 解析響應頭
	 * @throws IOException
	 */
	private void parseHeader() throws IOException
	{
		if(this.hostStream!=null)
		{
			    int len = -1;
			    byte[] buf = new byte[1];
			    byte[] outBuf = new byte[]{0x3A};
			    int count = 0;
			    while((len=read(buf, 0, buf.length))>-1)
			    {
			    	int outLength = outBuf.length+len;
			    	byte[] tempBuf = outBuf;
			    	outBuf = new byte[outLength];
			    	System.arraycopy(tempBuf, 0, outBuf,0, tempBuf.length);
			    	System.arraycopy(buf, 0, outBuf,outBuf.length-1, len);
			    		
			    	if(buf[0]==0x0D || buf[0]==0x0A)
			    	{
			    		count++;
			    		if(count==4)
				    	{
			    			break;
				    	}
			    	}else{
			    		count = 0;
			    	}
			 
			    }
			    String headerString = new String(outBuf, 0, outBuf.length);
			    String[] splitLine = headerString.trim().split("\r\n");	    
		
			    for (String statusLine : splitLine) {
			    	String[] nameValue = statusLine.split(":");
			    	if(nameValue[0].equals(""))
			    	{
			    		httpHeaders.put("", nameValue[1]);
			    	}else{
			    		httpHeaders.put(nameValue[0], nameValue[1]);
			    	}
			    	
			    	if("Transfer-Encoding".equalsIgnoreCase(nameValue[0]))
			    	{
			    		String value = nameValue[1].trim();
			    		isCkunked = value.equalsIgnoreCase("chunked");
			    	}
				}
		}
		
	}
	
	public Map<String, Object> getHttpHeaders() {
		return httpHeaders;
	}
	
	/**
	 * 字節流讀取
	 */
	@Override
	public int read() throws IOException {
		if(!isCkunked)
		{
			/**
			 * 非chunked編碼字節流解析
			 */
			return hostStream.read();
		}
		/**
		 * chunked編碼字節流解析
		 */
		return  readChunked();
	}

	private int readChunked() throws IOException {
		
		
		byte[] chunkedFlagBuf  = new byte[0]; //用於緩衝chunked編碼數據的length標誌行
		int crlf_nums = 0;
		int byteCode = -1;
		
		if(chunkedNextLength==-1L) // -1表示須要獲取 chunkedNextLength大小，也就是chunked數據length標誌
		{
			 byteCode = hostStream.read();
			 readMetaSize++;
			
			while(byteCode!=-1)
			{
				int outLength = chunkedFlagBuf.length+1;
		    	byte[] tempBuf = chunkedFlagBuf;
		    	chunkedFlagBuf = new byte[outLength];
		    	System.arraycopy(tempBuf, 0, chunkedFlagBuf,0, tempBuf.length);
		    	System.arraycopy(new byte[]{(byte) byteCode}, 0, chunkedFlagBuf,chunkedFlagBuf.length-1, 1);
		    	
		    	if(byteCode==0x0D || byteCode==0x0A) //記錄回車換行
		    	{
		    		crlf_nums++;
		    		if(crlf_nums==2) //若是回車換行計數爲2，進行檢測
		    		{
		    			String lineNo = "0x"+ (new String(chunkedFlagBuf,0,chunkedFlagBuf.length)).trim();
		    			chunkedNextLength = Long.decode(lineNo);
		    			contentLength+=chunkedNextLength;
		    			
		    			if(chunkedNextLength>0)
		    			{
		    				 byteCode = hostStream.read();
		    				 readMetaSize++;
		    			}
		    			
		    			break;
		    		}
		    		
		    	}else{
		    		crlf_nums=0;
		    	}
		    	byteCode = hostStream.read();
				readMetaSize++;
			}
		}
		else if(chunkedNextLength>0) //表示要讀取得片斷長度
		{
			if(chunkedCurrentLength<chunkedNextLength)
			{
				byteCode = hostStream.read();
				readMetaSize++;
				chunkedCurrentLength++; //讀取時加一，記錄長度
			}
			if(chunkedCurrentLength==chunkedNextLength){  //內容長度和標誌長度相同，說明長度爲chunkedCurrentLength的數據已經被讀取到了
				chunkedNextLength = -1L;
				chunkedCurrentLength = 0L;
			}
			
		}else{
		        //讀取結束，此時更新內容總長度到header中
			getHttpHeaders().put("Content-Length", ""+contentLength);
			return -1;//chunked流不會返回-1，這裏針對chunkedLength=0強制返回-1
		}
		
		return byteCode;
	}


	
	@Override
	public long skip(long n) throws IOException {
		return hostStream.skip(n);
	}
	
	@Override
	public boolean markSupported() {
		return hostStream.markSupported();
	}
	
	@Override
	public int available() throws IOException {
		return hostStream.available();
	}
	
	@Override
	public synchronized void mark(int readlimit) {
		hostStream.mark(readlimit);
	}
	

	@Override
	public void close() throws IOException {
		hostStream.close();
	}
	
	@Override
	public synchronized void reset() throws IOException {
		hostStream.reset();
	}

是否是很簡單，作到這一步，咱們其實已經完成了HttpResponse最70%的功能，剩下的30%是緩存，重定向，httpCookie之類的組件。

四.使用方式

    URL url=new URL("http://localhost:8080/test/ios.php");
	    EchoURLConnection connection=(EchoURLConnection)url.openConnection();
	    connection.setDoOutput(true);
	    connection.setDoInput(true);
	 
	    PrintWriter pw = new PrintWriter(new OutputStreamWriter(connection.getOutputStream()));
	    pw.write("name=zhangsan&password=123456");
	    pw.flush();
	   
	    InputStream stream = connection.getInputStream();
	    HttpInputStream his = new HttpInputStream(stream);
	    System.out.println(his.getHttpHeaders());
	    int len = 0;
	    byte[] buf = new byte[127];
	    StringBuilder sb = new StringBuilder();
	    while((len=his.read(buf, 0, buf.length))!=-1)
	    {
	    	sb.append(new String(buf, 0, len));
	    }
	    
	    System.out.println(sb.toString());