本文主要介紹BitMap的算法思想,以及開源工具類JavaEWAH、RoaringBitmap的簡單用法。html
BitMap使用bit位
,來標記元素對應的Value。該算法可以節省存儲空間
。java
假設一個場景,要存0-7之內的數字[3,5,6,1,2],儘量的節省空間。
一種思路就是單純使用數組存儲,但若是數據量放大百萬倍甚至千萬倍呢,數組的所佔用的內存會很是大。
另外一種思路是使用BitMap
。git
表示[3,5,7,1,2],咱們能夠用8bit的空間來存儲,每一個數字都在對應的位置中以1的方式表示。github
位置7 | 位置 6 | 位置 5 | 位置 4 | 位置 3 | 位置 2 | 位置 1 | 位置 0 |
---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 |
若將上述BitMap看做是存儲用戶的標籤
,如信用卡逾期
標籤,位置當作用戶ID
,則若須要查詢哪些用戶有信用卡逾期的行爲(標籤),就很是容易查詢統計了。算法
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps.數組
BitMap一般被用做快速查詢的數據結構,但它太佔內存了。解決方案是,對BitMap進行壓縮
。數據結構
Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, roaring bitmaps can be hundreds of times faster and they often offer significantly better compression. They can even be faster than uncompressed bitmaps.工具
Roaring bitmaps是一種超常規的壓縮BitMap。它的速度比未壓縮的BitMap
快上百倍。測試
引入依賴google
<dependency> <groupId>org.roaringbitmap</groupId> <artifactId>RoaringBitmap</artifactId> <version>0.8.1</version> </dependency>
測試代碼
@SpringBootTest @RunWith(SpringRunner.class) public class TestRoaringbitmap { @Test public void test(){ //向rr中添加一、二、三、1000四個數字 RoaringBitmap rr = RoaringBitmap.bitmapOf(1,2,3,1000); //建立RoaringBitmap rr2 RoaringBitmap rr2 = new RoaringBitmap(); //向rr2中添加10000-12000共2000個數字 rr2.add(10000L,12000L); //返回第3個數字是1000,第0個數字是1,第1個數字是2,則第3個數字是1000 rr.select(3); //返回value = 2 時的索引爲 1。value = 1 時,索引是 0 ,value=3的索引爲2 rr.rank(2); //判斷是否包含1000 rr.contains(1000); // will return true //判斷是否包含7 rr.contains(7); // will return false //兩個RoaringBitmap進行or操做,數值進行合併,合併後產生新的RoaringBitmap叫rror RoaringBitmap rror = RoaringBitmap.or(rr, rr2); //rr與rr2進行位運算,並將值賦值給rr rr.or(rr2); //判斷rror與rr是否相等,顯然是相等的 boolean equals = rror.equals(rr); if(!equals) throw new RuntimeException("bug"); // 查看rr中存儲了多少個值,1,2,3,1000和10000-12000,共2004個數字 long cardinality = rr.getLongCardinality(); System.out.println(cardinality); //遍歷rr中的value for(int i : rr) { System.out.println(i); } //這種方式的遍歷比上面的方式更快 rr.forEach((Consumer<? super Integer>) i -> { System.out.println(i.intValue()); }); } }
引入依賴
<dependency> <groupId>com.googlecode.javaewah</groupId> <artifactId>JavaEWAH</artifactId> <version>1.1.6</version> </dependency>
測試代碼
@SpringBootTest @RunWith(SpringRunner.class) public class TestJavaEWAH { @Test public void test(){ EWAHCompressedBitmap ewahBitmap1 = EWAHCompressedBitmap.bitmapOf(0, 2, 55, 64, 1 << 30); EWAHCompressedBitmap ewahBitmap2 = EWAHCompressedBitmap.bitmapOf(1, 3, 64,1 << 30); //bitmap 1: {0,2,55,64,1073741824} System.out.println("bitmap 1: " + ewahBitmap1); //bitmap 2: {1,3,64,1073741824} System.out.println("bitmap 2: " + ewahBitmap2); //是否包含value=64,返回爲true System.out.println(ewahBitmap1.get(64)); //獲取value的個數,個數爲5 System.out.println(ewahBitmap1.cardinality()); //遍歷全部value ewahBitmap1.forEach(integer -> { System.out.println(integer); }); //進行位或運算 EWAHCompressedBitmap orbitmap = ewahBitmap1.or(ewahBitmap2); //返回bitmap 1 OR bitmap 2: {0,1,2,3,55,64,1073741824} System.out.println("bitmap 1 OR bitmap 2: " + orbitmap); //memory usage: 40 bytes System.out.println("memory usage: " + orbitmap.sizeInBytes() + " bytes"); //進行位與運算 EWAHCompressedBitmap andbitmap = ewahBitmap1.and(ewahBitmap2); //返回bitmap 1 AND bitmap 2: {64,1073741824} System.out.println("bitmap 1 AND bitmap 2: " + andbitmap); //memory usage: 32 bytes System.out.println("memory usage: " + andbitmap.sizeInBytes() + " bytes"); //序列化與反序列化 try { ByteArrayOutputStream bos = new ByteArrayOutputStream(); ewahBitmap1.serialize(new DataOutputStream(bos)); EWAHCompressedBitmap ewahBitmap1new = new EWAHCompressedBitmap(); byte[] bout = bos.toByteArray(); ewahBitmap1new.deserialize(new DataInputStream(new ByteArrayInputStream(bout))); System.out.println("bitmap 1 (recovered) : " + ewahBitmap1new); } catch (IOException e) { e.printStackTrace(); } } }
[1]: BitMap算法詳解
[2]: 漫畫:Bitmap算法 整合版
[3]: RoaringBitmap GitHub項目文檔
[4]: JavaEWAH GitHub項目文檔