String 源碼閱讀筆記

時間 2019-11-09

標籤 string 源碼閱讀筆記简体版

原文原文鏈接

String源碼閱讀

本人學習筆記，內容來自於閱讀源碼和其餘博客，水平有限，若有錯誤，煩請指正。java

詳情參考：git

Java 7 源碼學習系列（一）——Stringgithub
請別再拿「String s = new String("xyz");建立了多少個String實例」來面試了吧面試
Java中由substring方法引起的內存泄漏數組

1、概述

String 是Java中很是基礎和重要的類，Stirng是典型的Immutable類，即不可變類。（若是一個對象它被構造後其，狀態不能改變，則這個對象被認爲是不可變的（immutable ））。String聲明爲final class，全部屬性也是final，這同時也意味着String是沒法繼承的。緩存

Java語言提供了對字符串鏈接運算符的特別支持(+)，+ 號也能夠將其餘類型轉成字符串，經過對象的toString方法實現。因爲String是不可變的，因此String在進行拼接、裁剪等字符串操做時，都會產生新的String對象。安全

Java中還提供了StringBuffer、StringBuilder類，來更好地解決String拼接而產生新對象的問題。網絡

2、Stirng源碼

1. 定義

進入java.lang.String下，能夠看到String類以下定義：app

public final class String implements java.io.Serializable, Comparable<String>, CharSequenceide

能夠清楚的看到，String類被聲明爲final類，且實現了Serializable、Comparable、CharSequence 接口。其中CharSequencetigon 接口中提供了length()、chatAt() 等方法。

2. 屬性
```
private final char value[];（JDK 1.8）    
    private final byte value[];（JDK 1.9）
```
value[]數組用於存儲String中的字符串內容。是一個被聲明成final的字符數組，在JDK1.9之後，value[]被聲明爲字節數組。由於是被final聲明的，因此String一旦被初始化以後，就容許再改變。
```
private int hash;
```
hash 緩存了字符串的hashCode值，默認爲0
```
private static final long serialVersionUID = -6849794470754667710L;
private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0];
```
String實現了 Serializable 接口，因此支持序列化和反序列化。

Java的序列化機制是經過在運行時判斷類的serialVersionUID來驗證版本一致性的。在進行反序列化時，JVM會把傳來的字節流中的serialVersionUID與本地相應實體（類）的serialVersionUID進行比較，若是相同就認爲是一致的，能夠進行反序列化，不然就會出現序列化版本不一致的異常(InvalidCastException)。

JDK1.9中新增了一個coder屬性：
```
private final byte coder;
```
此屬性爲用於編碼字節的編碼的標識符，分爲 LATIN1 與 UTF16，被虛擬機信任，不可變，不重寫。

3. 構造方法

String類中包含了許多的構造方法（去除廢棄的有13個），這裏介紹幾個經常使用的構造方法。
- String() -- 空構造
```
public String() {       
    this.value = "".value;
}
```

能夠看到調用空構造時，會建立一個空字符對象。
String(String original) -- 使用字符串建立一個字符串對象

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

與直接用""雙引號建立字符串不一樣的是，使用new String("")建立字符串時，每一個建立出來的對象都是存儲在堆上的新對象，而使用""雙引號建立出來的字符串從常量池中獲取。因此出現以下代碼中的狀況：

public class TestStringCons{
        public static void main(String[] args){
            String abc = "abc";
            String abc2 = new String("abc");
            String abc3 = new String("abc");
            String abc4 = "abc";
            System.out.println(abc == abc2); // false
            System.out.println(abc2 == abc3); // false
            System.out.println(abc == abc4); // true
        }
    }

String（Char[] value[]），String(char value[], int offset, int count) -- 使用字符數組建立對象

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }      
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

傳入字符數組建立時，會用到Arrays.copyOf方法和Arrays.copyOfRange方法。這兩個方法是將原有的字符數組中的內容逐一的複製到String中的字符數組中。

String(byte bytes[], int offset, int length, Charset charset) -- 使用字節數組建立對象

public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }

在Java中，String實例中保存有一個char[]字符數組，char[]字符數組是以unicode碼來存儲的，String 和 char 爲內存形式，byte是網絡傳輸或存儲的序列化形式。因此在不少傳輸和存儲的過程當中須要將byte[]數組和String進行相互轉化。因此，String提供了一系列重載的構造方法來將一個字符數組轉化成String，提到byte[]和String之間的相互轉換就不得不關注編碼問題。經過charset來解碼指定的byte數組，將其解碼成unicode的char[]數組，夠形成新的String。

這裏的bytes字節流是使用charset進行編碼的，想要將他轉換成unicode的char[]數組，而又保證不出現亂碼，那就要指定其解碼方式

若是咱們在使用byte[]構造String的時候，使用的是下面這四種構造方法(帶有charsetName或者charset參數)的一種的話，那麼就會使用StringCoding.decode方法進行解碼，使用的解碼的字符集就是咱們指定的charsetName或者charset。咱們在使用byte[]構造String的時候，若是沒有指明解碼使用的字符集的話，那麼StringCoding的decode方法首先調用系統的默認編碼格式，若是沒有指定編碼格式則默認使用ISO-8859-1編碼格式進行編碼操做。主要體現代碼以下：

static char[] decode(byte[] ba, int off, int len) {
        String csn = Charset.defaultCharset().name();
        try {
            // use charset name decode() variant which provides caching.
            return decode(csn, ba, off, len);
        } catch (UnsupportedEncodingException x) {
            warnUnsupportedCharset(csn);
        }
        try {
            return decode("ISO-8859-1", ba, off, len);
        } catch (UnsupportedEncodingException x) {
            // If this code is hit during VM initialization, MessageUtils is
            // the only way we will be able to get any kind of error message.
            MessageUtils.err("ISO-8859-1 charset not available: "
                             * x.toString());
            // If we can not find ISO-8859-1 (a required encoding) then things
            // are seriously wrong with the installation.
            System.exit(1);
            return null;
        }
    }

在JDK1.9中，這個構造方法和StringCoding.decode方法發生一些改變：

public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBoundsOffCount(offset, length, bytes.length);
        StringCoding.Result ret =
            StringCoding.decode(charset, bytes, offset, length);
        this.value = ret.value;
        this.coder = ret.coder;
    }

其中StringCoding.decode在JDK1.9中再也不返回char[]數組，而返回的是 StringCodingde 靜態內部類 Result，再將Result中的value和coder賦給String。

String(StringBuffer buffer)、String(StringBuilder builder) -- 使用StringBuffer、StringBuilder建立字符串

public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }
    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

一個特殊的保護(protected)類型的構造方法

String中提供了一個protected修飾的構造器：

String(char[] value, boolean share) {
        // assert share : "unshared not supported";
        this.value = value;
    }

該方法與String(char[] value)的區別是：第一，多了一個boolean類型的share參數。這個參數方法中並無用到，其實**加入這個boolean參數share是爲了和String(cahr[] value) 這個構造器區分開來。**第二，這個構造器將傳入的字符數組直接賦給了value 。而String(char[] value) 這個構造器將傳入的字符數組使用Arrays.copyOf()方法複製了一份。

使用該構造器的優勢：性能好。不須要複製數組；節約內存，由於共享同一個數組，因此不須要新建數組空間。

之因此這個構造方法被設置爲poretected，若是設置爲public，就有可能破壞String的不可變性。因此，從安全角度來看，這個構造器也是安全的。

String的一些方法也使用了這種"性能好、節約內存、安全的構造器"，好比replace、concat、valueOf()以及JDK1.6的substring方法(實際上他們使用的是public String(char[], int, int)方法，原理和本方法相同，已經被本方法取代)。

4.substring

substring 方法的做用就是提取某個字符串的子串。可是JDK6的substring 可能會致使內存泄露。先看一下JDK1.6 substring 的源碼：

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > count) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        if (beginIndex > endIndex) {
            throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
        }
        return ((beginIndex == 0) && (endIndex == count)) ? this :
            new String(offset + beginIndex, endIndex - beginIndex, value); //使用的是和父字符串同一個char數組value
        }

    // 沒有新差建立對象，仍然使用了原字符串對象
    String(int offset, int count, char value[]) {
        this.value = value;
        this.offset = offset;
        this.count = count;
    }

因爲返回回來的子字符串和原有的父字符串是同一個對象，就可能引起內存泄露：

String str = "abcdefghijklmnopqrst";
    String sub = str.substring(1, 3) + "";
    str = null;

上面代碼中，雖然str = nulln，可是sub依然引用了str所引用的對象，致使str 所指向的對象 "abcdefghijklmnopqrst" 沒法被回收，進而可能致使內存泄露。

爲了改正這個問題，JDK1.7 以後的 substring 方法進行了修改，下面是JDK1.7的 substring 方法源碼：

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }


public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count < 0) {
        throw new StringIndexOutOfBoundsException(count);
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

public static char[] copyOfRange(char[] original, int from, int to) {
    int newLength = to - from;
    if (newLength < 0)
        throw new IllegalArgumentException(from + " > " + to);
    char[] copy = new char[newLength];   //是建立了一個新的char數組
    System.arraycopy(original, from, copy, 0,
                     Math.min(original.length - from, newLength));
    return copy;
}

能夠發現是去爲子字符串建立了一個新的char數組去存儲子字符串中的字符。這樣子字符串和父字符串也就沒有什麼必然的聯繫了，當父字符串的引用失效的時候，GC就會適時的回收父字符串佔用的內存空間。

5. String對 '+' 的重載

Java是不支持運算符重載的，String 的 '+' 是 java 中惟一的一個重載運算符。先看下面一段代碼

public class TestA{
  public static void main(String[] args){
    String str1 = "Hello";
    String str2 = str1 + "World";
  }
}

反編譯上面的代碼：

public class TestA{
    public static void main(final String[] array) {
        new StringBuilder().append("Hello").append("World").toString();
    }
}

能夠看出，String 中對 '+' 的重載其實就是使用StringBuilder 和 toString() 方法進行處理。

6. Stirnrg.valueOf() 和 Integer.toString的區別

1.int i = 5;
    2.String i1 = "" + i;
    3.String i2 = String.valueOf(i);
    4.String i3 = Integer.toString(i);

第3行和第4行沒有什麼區別，由於String.valueOf(i) 也是調用了 Integer.toString()方法來實現的。

第2行代碼實際上是String i1 = (new StringBuilder()).append(i).toString()。首先建立了一個StringBuilder 對象，在講

7. intern() 方法

intern() 方法有兩個做用：

第一，若是常量池中沒有該字符串的字面量，將字符串字面量放入常量池。
第二，返回這個常量的引用。

首先看下面一段代碼：

String str1 = "Hello";
    String str2 = new String("Hello");
    String str3 = new String("Hello").intern();
    System.out.println(str1 == str2); // false
    System.out.println(str1 == str3); // true

首先須要瞭解幾個關鍵詞：

運行時常量池 JVM 中有幾種常量池：
- class文件中的常量池
  
  主要用於存放字面量和符號引用，這部份內容會在類加載以後進入方法區與運行時常量池存放。
- 方法區中的運行時常量池
  
  運行時常量池除了存放calss文件常量池的內容外，與class常量池不一樣的是，運行時茶涼吃具備動態性，在運行期也可能將新的常量放入池中。
JVM爲了減小JVM中建立的字符串數量，字符串類維護了一個常量池，主要用來存儲編譯期生成的各類字面量和符號引用。
字面量

如文本字符串、聲明爲final 的常量值等;
符號引用

1.類和接口的全限定名；2.字段名稱和描述符；3.方法名稱和描述符。

對於上面的代碼產生的結果，先分析 new String("Hello") 建立對象的過程

首先，編譯期間，符號引用 str1 和字面量 Hello 會被加入到class文件中的常量池中，在類加載以後(具體時間請參考：Java 中new String("字面量") 中 "字面量" 是什麼時候進入字符串常量池的?) 可是並非全部的字面量都會進入字符串常量池，若是字符串已經存在常量池中就不會再加載進來了。

到了運行時期，執行到 new String("Hello")時，會在Java堆中建立一個字符串對象，這個對象所對應的字符串字面量保存在常量池中，可是符號引用 Str1 指向的是堆中新建立出來的地址。因此會有如下代碼成立：

Stirng s1 = new String("Hello");
Stirng s2 = new String("Hello");
System.out.println(s1 == s2); // false

由於s1,s2是堆上兩個不一樣對象的地址引用，因此s1 == s2 爲false。內存結構圖大體以下圖（草圖）所示：

在不一樣版本的JDK中，Java堆和字符串常量池之間的關係也是不一樣的，這裏爲了方便表述，就畫成兩個獨立的物理區域了。

new String("Hello")建立了幾個對象？

因此能夠很清楚的看到在執行 new String("Hello") 一共建立了兩個對象，一個是s1所引用的堆空間中的對象，另外一個是在常量池中的對象。

JVM並無規定常量池中的對象必須在編譯期才能放入常量池，運行期也能夠放入常量池，String的intern方法就是利用了這個特色。

再來分析一下一開始的代碼：

String str1 = "Hello";
String str2 = new String("Hello");
String str3 = new String("Hello").intern();
System.out.println(str1 == str2); // false
System.out.println(str1 == str3); // true

此時的內存結構應該是這樣的：

分析new String("Hello") 這段代碼，若是後面沒有執行intern()方法，那麼str2,str3都指向的是堆空間中的對象，也就是圖中綠色的那片區域。可是因爲是兩片不一樣的空間，地址不一樣，因此此時 str1 == str2 爲false，並且str2 == str3 也是false。

可是如今 str3 執行了 new String("Hello").intern()，intern()方法會將常量池中的引用返回給str3(由於這裏作了賦值)，由於前面的str1 已經在常量池中建立了一個"Hello"字面量，全部str3 接受到的intern()返回的引用與str1 一致，因此str1 == str3 爲 true成立。

在新建字符串對象的時候，咱們通常使用下面兩種方法：

一種是直接冒號建立：String str = "Hello";
一種是使用構造器：String str = new String("Hello");

不管是上面兩種哪一種方法，建立字符串對象時都會先檢查常量池中是否有該字面量，沒有的話就會放入常量池。那麼這樣的話，intern()是否是就沒有用了呢？

intern() 方法的使用

在前面說 String 對 '+' 重載時說到，String 在使用 '+' 進行字符串拼接時，實質是建立了一個 StringBuilder 對象再調用toString() 方法，可是若是拼接了兩個字符串變量，這種拼接以後產生的新的字符串並不在常量池中。

String s1 = "Hello";
String s2 = "World";
String s3 = s1 + s2;
String s4 = "Hello" + "World";

進行反編譯以後

String s1 = "Hello";
String s2 = "World";
String s3 = new StringBuilder().append("Hello").append("World").toString();
String s4 = "HelloWorld";

究其緣由，是由於常量池要保存的是已肯定的字面量值。也就是說，對於字符串的拼接，純字面量和字面量的拼接，會把拼接結果做爲常量保存到字符串池。

若是在字符串拼接中，有一個參數是非字面量，而是一個變量的話，整個拼接操做會被編譯成StringBuilder.append，這種狀況編譯器是沒法知道其肯定值的。只有在運行期才能肯定。

因此只有運行期才能肯定的字符串，就可使用intern()方法放入常量池，減小字符串的重複建立。

前面提到了new String("Hello")，也會把Hello放入常量池中，那new String("Hello").intern() 是否是就多餘了呢。其實否則，intern() 方法有兩個做用，一個是講字符串放入常量池，另外一個是將常量引用返回。

也就是這個值是有返回值的，返回的就是常量的引用。若是將下面的代碼

String str1 = "Hello";
String str2 = new String("Hello");
String str3 = new String("Hello").intern();
System.out.println(str1 == str2); // false
System.out.println(str1 == str3); // true

修改成：

String str1 = "Hello";
    String str2 = new String("Hello");
    String str3 = new String("Hello");
    // 使用新的變量接受返回的引用
    String str4 = str3.intern();
    System.out.println(str1 == str2); // false
    System.out.println(str1 == str3); // false（這裏true 再也不成立）
    System.out.println(str1 == str4); // true

不過這種寫法的確沒有什麼意義，可是對於理解intern()方法和常量池是頗有幫助的。