術語:壓縮率,compression ratio,壓縮後的大小/壓縮前的大小,越小說明壓縮效果越好。java
在使用netty的JdkZlibEncoder進行壓縮時,發現了一個問題:它對於短文本(小於2K)的壓縮效果不好,壓縮率在80%-120%,文本越短,壓縮效果越差,甚至可能比沒壓縮前更大。git
經過研究發現,使用字典能夠改進壓縮效果。如下詳細介紹如何作。github
咱們要傳輸的文本相似於:算法
1 <?xml version="1.0" encoding="utf-8" ?> 2 <Event attribute="TRANSIENT"> 3 <outer id="11" from="1005" to="915880056212" trunk="83057387" callid="24587"/> 4 <ext id="1005"/> 5 </Event>
提取字典的原則:將重複出現的字符串加入到字典。函數
能夠提取如下字典:工具
1 String[] dictionary = { 2 "<?xml version=\"1.0\" encoding=\"utf-8\" ?>", 3 "Event", "TRANSIENT", "attribute", "outer", "from", "trunk", 4 "callid", "id", "to", "ext" 5 }; 6
使用EmbeddedChannel API來構建測試用例。EmbeddedChannel可以模擬入站和出站的數據流,對於測試ChannelHandler很是有用。測試
JdkZlibEncoder的構造函數能夠接受一個字典參數:spa
下面是測試代碼:netty
1 public class GzipTest { 2 3 4 private String xml = "<?xml version=\"1.0\" encoding=\"utf-8\" ?>" + 5 "<Event attribute=\"TRANSIENT\">" + 6 "<outer id=\"11\" from=\"1005\" to=\"915880056212\" trunk=\"83057387\" callid=\"24587\" />" + 7 "<ext id=\"1005\" />" + 8 "</Event>"; 9 10 private String[] dictionary = { 11 "<?xml version=\"1.0\" encoding=\"utf-8\" ?>", 12 "Event", "TRANSIENT", "attribute", "outer", "from", "trunk", 13 "callid", "id", "to", "ext" 14 }; 15 16 17 /** 18 * 不使用字典壓縮 19 */ 20 @Test 21 public void test1() { 22 EmbeddedChannel embeddedChannel = new EmbeddedChannel(); 23 ChannelPipeline pipeline = embeddedChannel.pipeline(); 24 // 25 pipeline.addLast("gzipDecoder", new JdkZlibDecoder()); 26 pipeline.addLast("gzipEncoder", new JdkZlibEncoder(9)); 27 pipeline.addLast("decoder", new StringDecoder()); 28 pipeline.addLast("encoder", new StringEncoder()); 29 // 30 System.out.println("*******不使用字典壓縮*******"); 31 int compressBefore = xml.getBytes(StandardCharsets.UTF_8).length; 32 System.out.printf("壓縮前大小:%d \n", compressBefore); 33 // 模擬輸出 34 embeddedChannel.writeOutbound(xml); 35 ByteBuf outboundBuf = embeddedChannel.readOutbound(); 36 int compressAfter = outboundBuf.readableBytes(); 37 System.out.printf("壓縮後大小:%d, 壓縮率:%d%% \n", compressAfter, 38 compressAfter * 100 / compressBefore); 39 40 } 41 42 /** 43 * 使用字典壓縮 44 */ 45 @Test 46 public void test2() { 47 EmbeddedChannel embeddedChannel = new EmbeddedChannel(); 48 ChannelPipeline pipeline = embeddedChannel.pipeline(); 49 // 字典 50 byte[] dictionaryBytes = String.join("", dictionary) 51 .getBytes(StandardCharsets.UTF_8); 52 // 53 pipeline.addLast("gzipDecoder", new JdkZlibDecoder(dictionaryBytes)); 54 pipeline.addLast("gzipEncoder", new JdkZlibEncoder(9, dictionaryBytes)); 55 pipeline.addLast("decoder", new StringDecoder()); 56 pipeline.addLast("encoder", new StringEncoder()); 57 // 58 System.out.println("*******使用字典壓縮*******"); 59 int compressBefore = xml.getBytes(StandardCharsets.UTF_8).length; 60 System.out.printf("壓縮前大小:%d \n", compressBefore); 61 // 模擬輸出 62 embeddedChannel.writeOutbound(xml); 63 ByteBuf outboundBuf = embeddedChannel.readOutbound(); 64 int compressAfter = outboundBuf.readableBytes(); 65 System.out.printf("壓縮後大小:%d, 壓縮率:%d%% \n", compressAfter, 66 compressAfter * 100 / compressBefore); 67 } 68 69 70 }
輸出:code
*******不使用字典壓縮******* 壓縮前大小:173 壓縮後大小:150, 壓縮率:86% *******使用字典壓縮******* 壓縮前大小:173 壓縮後大小:95, 壓縮率:54%
從輸出能夠看到,壓縮率由86%提高至了54%。
若是以爲手工提取字典效率過低,還能夠試一下zstd。zstd是由facebook提供的一個壓縮庫,它提供了自動提取字典的工具。命令以下:
zstd --train ./dictionary/* -o ./dict.bin