說起String字符串,咱們更多的是用於文本的傳輸與存儲,在JDK源碼中也被申明爲final類型,同時也不屬於Java中基本的數據類型,例如以直接雙引號申明的常量String nameStr="Manna Yang";或者採用構造函數建立String nameStr=new String("Manna Yang");下面將逐步揭開其神祕面紗...java
在探究String字符串常量池以前,咱們首先看下經過javap -v命令編譯後的字節碼數組
public class TestString{
private String testStr="Manna Yang";
public static int TYPE=0;
public static void main(String[] args){
System.out.println("Manna Yang");
}
}
複製代碼
Classfile /C:/Users/15971/Desktop/TestString.class
Last modified 2019-9-18; size 566 bytes
MD5 checksum 72f3c93ff8293c97a3da06775fa48ba0
Compiled from "TestString.java"
public class TestString
minor version: 0
major version: 52
flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
#1 = Methodref #8.#22 // java/lang/Object."<init>":()V
#2 = String #23 // Manna Yang
#3 = Fieldref #7.#24 // TestString.testStr:Ljava/lang/String;
#4 = Fieldref #25.#26 // java/lang/System.out:Ljava/io/PrintStream;
#5 = Methodref #27.#28 // java/io/PrintStream.println:(Ljava/lang/String;)V
#6 = Fieldref #7.#29 // TestString.TYPE:I
#7 = Class #30 // TestString
#8 = Class #31 // java/lang/Object
#9 = Utf8 testStr
#10 = Utf8 Ljava/lang/String;
#11 = Utf8 TYPE
#12 = Utf8 I
#13 = Utf8 <init>
#14 = Utf8 ()V
#15 = Utf8 Code
#16 = Utf8 LineNumberTable
#17 = Utf8 main
#18 = Utf8 ([Ljava/lang/String;)V
#19 = Utf8 <clinit>
#20 = Utf8 SourceFile
#21 = Utf8 TestString.java
#22 = NameAndType #13:#14 // "<init>":()V
#23 = Utf8 Manna Yang
#24 = NameAndType #9:#10 // testStr:Ljava/lang/String;
#25 = Class #32 // java/lang/System
#26 = NameAndType #33:#34 // out:Ljava/io/PrintStream;
#27 = Class #35 // java/io/PrintStream
#28 = NameAndType #36:#37 // println:(Ljava/lang/String;)V
#29 = NameAndType #11:#12 // TYPE:I
#30 = Utf8 TestString
#31 = Utf8 java/lang/Object
#32 = Utf8 java/lang/System
#33 = Utf8 out
#34 = Utf8 Ljava/io/PrintStream;
#35 = Utf8 java/io/PrintStream
#36 = Utf8 println
#37 = Utf8 (Ljava/lang/String;)V
{
public static int TYPE;
descriptor: I
flags: ACC_PUBLIC, ACC_STATIC
public TestString();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: aload_0
5: ldc #2 // String Manna Yang
7: putfield #3 // Field testStr:Ljava/lang/String;
10: return
LineNumberTable:
line 1: 0
line 2: 4
public static void main(java.lang.String[]);
descriptor: ([Ljava/lang/String;)V
flags: ACC_PUBLIC, ACC_STATIC
Code:
stack=2, locals=1, args_size=1
0: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #2 // String Manna Yang
5: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
LineNumberTable:
line 6: 0
line 7: 8
static {};
descriptor: ()V
flags: ACC_STATIC
Code:
stack=1, locals=0, args_size=0
0: iconst_0
1: putstatic #6 // Field TYPE:I
4: return
LineNumberTable:
line 3: 0
}
SourceFile: "TestString.java"
複製代碼
簽名字符 | 方法類型 |
---|---|
B | byte |
C | char |
D | double |
F | float |
I | int |
J | long |
L | 引用類型 |
S | short |
Z | boolean |
[ | 數組類型 |
V | Void類型 |
public TestString();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."<init>":()V
4: aload_0
5: ldc #2 // String Manna Yang
7: putfield #3 // Field testStr:Ljava/lang/String;
10: return
LineNumberTable:
line 1: 0
line 2: 4
複製代碼
descriptor : 描述方法類型安全
flags : 描述修飾符bash
stack : 操做數堆棧大小多線程
locals : 局部變量數大小app
agrs_size : 方法參數個數函數
load類型指令 常見的有aload,fload,iload,dload,此處aload_0表示將本地變量推送到棧頂,a表示引用類型,i\d\f分別對應基本類型,結構基本遵循 : 類型|動做優化
const類型指令 常見有iconst,iconst,fconst,dconst,例如定義int testType=2;在父類構造方法中就會存在iconst_0(下劃線後面爲index,表示變量位置),表示將int型常量推送到棧頂;ui
ldc : 將int,float或String型常量從常量池中推送至棧頂this
putfield : 賦值操做,對應還有getfield
return : 返回void,對應還有ireturn、freturn,表示返回int\float類型
invokespecial : 調用父類無參無返回值構造方法
putstatic : 靜態變量賦值,對應還有getstatic
1.瞭解上述字節碼結構以後,再來看看經常使用的字符串比較
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String) anObject;
int n = length();
if (n == anotherString.length()) {
int i = 0;
while (n-- != 0) {
if (charAt(i) != anotherString.charAt(i))
return false;
i++;
}
return true;
}
}
return false;
}
複製代碼
默認仍是比較常量池引用地址是否相等,不然對比類型,接着調用charAt()逐個字符比較,下面舉例一些常見的比較場景,加深理解
String testStr1="Manna Yang";
String testStr2=new String("Manna Yang");
String testStr3="Manna Yang";
System.out.println(testStr1 == testStr2); //false
System.out.println(testStr1.equals(testStr2)); //true
System.out.println(testStr1 == testStr3); //true
System.out.println(testStr1.equals(testStr3)); //true
按照jdk中equals方法,此時==對比爲false(地址不同),則繼續採用charAt方式逐個比較字符,new關鍵字建立的
對象存放在heap堆,雙引號""申明的常量放在常量池,testStr2引用指向常量池"Manna Yang"字符地址
複製代碼
繼續往下看 + 號的魅力
String testStr0 = new String("Test")+new String("Manna")+new String("Yang");編碼後以下
0: new #2 // class java/lang/StringBuilder
3: dup
4: invokespecial #3 // Method java/lang/StringBuilder."<init>":()V
7: new #4 // class java/lang/String
10: dup
11: ldc #5 // String Test
13: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
16: invokevirtual #7 //Method java/lang/StringBuilder.append:
(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: new #4 // class java/lang/String
22: dup
23: ldc #8 // String Manna
25: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
28: invokevirtual #7 //Method java/lang/StringBuilder.append:
(Ljava/lang/String;)Ljava/lang/StringBuilder;
31: new #4 // class java/lang/String
34: dup
35: ldc #9 // String Yang
37: invokespecial #6 // Method java/lang/String."<init>":(Ljava/lang/String;)V
40: invokevirtual #7 //Method java/lang/StringBuilder.append:
(Ljava/lang/String;)Ljava/lang/StringBuilder;
43: invokevirtual #10 //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
46: astore_1
47: return
複製代碼
在字節碼中能夠看到+號 StringBuilder對象也參與一次建立,而後調用父類初始化方法,接着調用append方法,最後再調用toString(),字節碼中new的指令包含4次,ldc指令包含3次;實際上jdk優化後的+號,在處理字符串拼接時提供很大便利,例如String testStr1="Manna"+" Yang";那麼在字節碼裏面已經拼接成一個字符串常量"Manna Yang";還有常見的在new String(""+"")這種方式,字符串也是會拼接,對應只new一次String對象;
2.繼續看下hashcode, hash值(哈希)主要用於散列存儲結構中確認對象的地址,像經常使用的HashMap\HashTable,若是兩個對象相同則它們的hash值必定相同;反之hash值相同的兩個對象不必定相同;在進行hash計算時咱們指望hash值的碰撞越少越好,提升查詢效率,下面看下String的hashCode()方法源碼
public int hashCode() {
int h = hash;
final int len = length();
if (h == 0 && len > 0) {
for (int i = 0; i < len; i++) {
h = 31 * h + charAt(i);
}
hash = h;
}
return h;
}
複製代碼
關於31這個係數我理解的更可能是散列分佈的更爲均勻,產生hash碰撞的概率更小,在源碼說明裏面也有計算公式推導 : s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1],charAt字符數組中字符對應的value值爲ASCII值,null的ASCII值爲0;
3.關於String類中的intern(),源碼方法裏有詳細註釋,來源於jdk1.8
When the intern method is invoked, if the pool already contains a
* string equal to this {@code String} object as determined by
* the {@link #equals(Object)} method, then the string from the pool is
* returned. Otherwise, this {@code String} object is added to the
* pool and a reference to this {@code String} object is returned.
//源碼方法
public native String intern();
複製代碼
字符串拼接效率,若是是字面常量拼接,則直接使用""+""+""這種方式,+號優化後只會生成一個對象,若是是字符串對象之間拼接,在多線程中使用時應採用StringBuffer,大部分方法線程安全;不然可以使用StringBuilder,後二者StringBuffer、StringBuilder的擴容機制爲array.length+16,均繼承抽象父類AbstractStringBuilder中的構造函數,源碼以下
AbstractStringBuilder(int var1) {
this.value = new char[var1];
}
...
複製代碼
每次都是從新new,而後再進行array copy,建議在初始拼接時傳入指定預計字符串長度值
加入星球一塊兒討論項目、研究新技術,共同成長!