Java-String.intern的深刻研究

When---何時須要瞭解String的intern方法:html

面試的時候(蜜汁尷尬)!雖然不想認可,不過面試的時候常常碰到這種高逼格的問題來考察咱們是否真正理解了String的不可變性、String常量池的設計以及String.intern方法所作的事情。但其實,咱們在實際的編程中也可能碰到能夠利用String.intern方法來提升程序效率或者減小內存佔用的狀況,這個咱們等下會細說。java

 

What---String.intern方法究竟作了什麼:面試

Returns a canonical representation for the string object. A pool of strings, initially empty, is maintained privately by the class String. When the intern method is invoked, if the pool already contains a string equal to this String object as determined by the equals(Object) method, then the string from the pool is returned. Otherwise, this String object is added to the pool and a reference to this String object is returned. It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true. All literal strings and string-valued constant expressions are interned. String literals are defined in section 3.10.5 of the The Java? Language Specification.express

上面是jdk源碼中對intern方法的詳細解釋。簡單來講就是intern用來返回常量池中的某字符串,若是常量池中已經存在該字符串,則直接返回常量池中該對象的引用。不然,在常量池中加入該對象,而後 返回引用。下面的一個例子詳細的解釋了intern的做用過程:編程

Now lets understand how Java handles these strings. When you create two string literals:緩存

String name1 = "Ram"; 安全

String name2 = "Ram";app

In this case, JVM searches String constant pool for value "Ram", and if it does not find it there then it allocates a new memory space and store value "Ram" and return its reference to name1. Similarly, for name2 it checks String constant pool for value "Ram" but this time it find "Ram" there so it does nothing simply return the reference to name2 variable. The way how java handles only one copy of distinct string is called String interning.dom

 

How---String.intern方法在jdk1.7以前和以後的區別:jvm

簡單的說其實就一個:在jdk1.7以前,字符串常量存儲在方法區的PermGen Space。在jdk1.7以後,字符串常量從新被移到了堆中。

 

Back---重回String設計的初衷:

Java中的String被設計成不可變的,出於如下幾點考慮:

1. 字符串常量池的須要。字符串常量池的誕生是爲了提高效率和減小內存分配。能夠說咱們編程有百分之八十的時間在處理字符串,而處理的字符串中有很大機率會出現重複的狀況。正由於String的不可變性,常量池很容易被管理和優化。

2. 安全性考慮。正由於使用字符串的場景如此之多,因此設計成不可變能夠有效的防止字符串被有意或者無心的篡改。從java源碼中String的設計中咱們不難發現,該類被final修飾,同時全部的屬性都被final修飾,在源碼中也未暴露任何成員變量的修改方法。(固然若是咱們想,經過反射或者Unsafe直接操做內存的手段也能夠實現對所謂不可變String的修改)。

3. 做爲HashMap、HashTable等hash型數據key的必要。由於不可變的設計,jvm底層很容易在緩存String對象的時候緩存其hashcode,這樣在執行效率上會大大提高。

 

 

Deeper---直接來看例子:

首先來試試下面程序的運行結果是否與預想的一致:

 1 String s1 = new String("aaa");
 2 String s2 = "aaa";
 3 System.out.println(s1 == s2);    // false
 4 
 5 s1 = new String("bbb").intern();
 6 s2 = "bbb";
 7 System.out.println(s1 == s2);    // true
 8 
 9 s1 = "ccc";
10 s2 = "ccc";
11 System.out.println(s1 == s2);    // true
12 
13 s1 = new String("ddd").intern();
14 s2 = new String("ddd").intern();
15 System.out.println(s1 == s2);    // true
16 
17 s1 = "ab" + "cd";
18 s2 = "abcd";    
19 System.out.println(s1 == s2);    // true
20 
21 String temp = "hh";
22 s1 = "a" + temp;
23 // 若是調用s1.intern 則最終返回true
24 s2 = "ahh";
25 System.out.println(s1 == s2);    // false
26 
27 temp = "hh".intern();
28 s1 = "a" + temp;
29 s2 = "ahh";
30 System.out.println(s1 == s2);    // false
31 
32 temp = "hh".intern();
33 s1 = ("a" + temp).intern();
34 s2 = "ahh";
35 System.out.println(s1 == s2);    // true
36 
37 s1 = new String("1");    // 同時會生成堆中的對象 以及常量池中1的對象,可是此時s1是指向堆中的對象的
38 s1.intern();            // 常量池中的已經存在
39 s2 = "1";
40 System.out.println(s1 == s2);    // false
41 
42 String s3 = new String("1") + new String("1");    // 此時生成了四個對象 常量池中的"1" + 2個堆中的"1" + s3指向的堆中的對象(注此時常量池不會生成"11")
43 s3.intern();    // jdk1.7以後,常量池不只僅能夠存儲對象,還能夠存儲對象的引用,會直接將s3的地址存儲在常量池
44 String s4 = "11";    // jdk1.7以後,常量池中的地址其實就是s3的地址
45 System.out.println(s3 == s4); // jdk1.7以前false, jdk1.7以後true
46 
47 s3 = new String("2") + new String("2");
48 s4 = "22";        // 常量池中不存在22,因此會新開闢一個存儲22對象的常量池地址
49 s3.intern();    // 常量池22的地址和s3的地址不一樣
50 System.out.println(s3 == s4); // false

// 對於何時會在常量池存儲字符串對象,我想咱們能夠基本得出結論: 1. 顯示調用String的intern方法的時候; 2. 直接聲明字符串字面常量的時候,例如: String a = "aaa";
// 3. 字符串直接常量相加的時候,例如: String c = "aa" + "bb"; 其中的aa/bb只要有任何一個不是字符串字面常量形式,都不會在常量池生成"aabb". 且此時jvm作了優化,不// 會同時生成"aa"和"bb"在字符串常量池中

若是有出入的話,再來看看具體的字節碼分析:

 1 /**
 2  * 字節碼爲:
 3  *   0:   ldc     #16; //String 11   --- 從常量池加載字符串常量11
 4      2:   astore_1                   --- 將11的引用存到本地變量1,其實就是將s指向常量池中11的位置
 5  */
 6 String s = "11";    
 7 
 8 /**
 9  * 0:   new     #16; //class java/lang/String    --- 新開闢了一個地址,存儲new出來的對象
10    3:   dup                                      --- 將new出來的對象複製了一份到棧頂(也就是s1最終指向的是堆中的另外一個存儲字符串11的地址)
11    4:   ldc     #18; //String 11          
12    6:   invokespecial   #20; //Method java/lang/String."<init>":(Ljava/lang/String;)V
13    9:   astore_1
14  */
15 String s1 = new String("11");
16 
17 /**
18  * 0:   new     #16; //class java/lang/StringBuilder                       --- 能夠看到jdk對字符串拼接作了優化,先是建了一個StringBuilder對象
19    3:   dup
20    4:   new     #18; //class java/lang/String                              --- 建立String對象
21    7:   dup
22    8:   ldc     #20; //String 1                                            --- 從常量池加載了1(此時常量池和堆中都會存字符串對象)
23    10:  invokespecial   #22; //Method java/lang/String."<init>":(Ljava/lang/String;)V                    --- 初始化String("1")對象
24    13:  invokestatic    #25; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
25    16:  invokespecial   #29; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V             --- 初始化StringBuilder對象
26    19:  new     #18; //class java/lang/String
27    22:  dup
28    23:  ldc     #20; //String 1
29    25:  invokespecial   #22; //Method java/lang/String."<init>":(Ljava/lang/String;)V
30    28:  invokevirtual   #30; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
31    31:  invokevirtual   #34; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
32    34:  astore_1                                                                                          ---從上能夠看到實際上常量池目前只存了1
34   36:  invokevirtual   #38; //Method java/lang/String.intern:()Ljava/lang/String;  --- 調用String.intern中,jdk1.7之後,常量池也是堆中的一部分且常量池能夠存引用,這裏直接存的是s2的引用
35   39:  pop                                                                                                --- 這裏直接返回的是棧頂的元素
36  */
37 String s2 = new String("1") + new String("1");
38 s2.intern();
39 
40 /**
41  * 0:   ldc     #16; //String abc        --- 能夠看到此時常量池直接存儲的是:abc, 而不會a、b、c各存一份
42    2:   astore_1
43  */
44 String s3 = "a" + "b" + "c";
45 
46 /**    
47 0:   new     #16; //class java/lang/StringBuilder
48 3:   dup
49 4:   ldc     #18; //String why                --- 常量池的why
50 6:   invokespecial   #20; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
51 9:   ldc     #23; //String true                --- 常量池的true
52 11:  invokevirtual   #25; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
53 14:  invokevirtual   #29; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
54 17:  astore_1
55 */
56 String s1 = new StringBuilder("why").append("true").toString();
57 System.out.println(s1 == s1.intern());                            // jdk1.7以前爲false,以後爲true

下面咱們延伸一下來說講字符串拼接的優化問題: 

 1 String a = "1"; 
2 for (int i=0; i<10; i++) {
3   a += i;
4 }
6 0: ldc #16; //String 1
 7 2: astore_1  8 3: iconst_0  9 4: istore_2                    --- 循環開始 10 5: goto 30 11 8: new #18; //class java/lang/StringBuilder --- 每一個循環都建了一個StringBuilder對象,對性能有損耗 12 11: dup 13 12: aload_1 14 13: invokestatic #20; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 15 16: invokespecial #26; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 16 19: iload_2 17 20: invokevirtual #29; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder; 18 23: invokevirtual #33; //Method java/lang/StringBuilder.toString:()Ljava/lang/String; 19 26: astore_1 20 27: iinc 2, 1 ---- 計數加1 21 30: iload_2 22 31: bipush 10 23 33: if_icmplt 8 24 25 String a = "1"; 26 for (int i=0; i<10; i++) { 27 a += "1"; 28 } 29 的字節碼爲: 30 0: ldc #16; //String 1 31 2: astore_1 32 3: iconst_0 33 4: istore_2 34 5: goto 31 35 8: new #18; //class java/lang/StringBuilder ---仍是會每次創建一個StringBuilder對象 36 11: dup 37 12: aload_1 38 13: invokestatic #20; //Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String; 39 16: invokespecial #26; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 40 19: ldc #16; //String 1 ---和上一個循環的區別也僅僅在於這裏是從常量池加載1, 41 21: invokevirtual #29; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 42 24: invokevirtual #33; //Method java/lang/StringBuilder.toString:()Ljava/lang/String; 43 27: astore_1 44 28: iinc 2, 1 45 31: iload_2 46 32: bipush 10 47 34: if_icmplt 8 
可知,真正的性能瓶頸在於每次循環都建了一個StringBuilder對象
因此咱們優化一下 :
50 StringBuilder sb = new StringBuilder("1"); 51 for (int i=0; i<10; i++) { 52 sb.append("1"); 53 }
對應的字節碼爲:
55 0: new #16; //class java/lang/StringBuilder -- 在循環直接初始化了StringBuilder對象 56 3: dup 57 4: ldc #18; //String 1 58 6: invokespecial #20; //Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V 59 9: astore_1 60 10: iconst_0 61 11: istore_2 62 12: goto 25 63 15: aload_1 64 16: ldc #18; //String 1 65 18: invokevirtual #23; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder; 66 21: pop 67 22: iinc 2, 1 68 25: iload_2 69 26: bipush 10 70 28: if_icmplt 15

 

Where---String.intern的使用:

咱們直接看一個例子來結束String.intern之旅吧:

 1 Integer[] DB_DATA = new Integer[10];
 2 Random random = new Random(10 * 10000);
 3 for (int i = 0; i < DB_DATA.length; i++) {
 4     DB_DATA[i] = random.nextInt();
 5 }
 6 long t = System.currentTimeMillis();
 7 for (int i = 0; i < MAX; i++) {
 8     arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length]));                // --- 每次都要new一個對象
 9     // arr[i] = new String(String.valueOf(DB_DATA[i % DB_DATA.length])).intern();    --- 其實雖然這麼多字符串,可是類型最多爲10個,大部分重複的字符串,大大減小內存
10 }
11 
12 System.out.println((System.currentTimeMillis() - t) + "ms");
13 System.gc();

 

參考連接:

http://www.360doc.com/content/14/0721/16/1073512_396062351.shtml

https://www.cnblogs.com/SaraMoring/p/5713732.html

相關文章
相關標籤/搜索