從JDK源碼看StringBuilder

時間 2019-11-17

原文原文鏈接

概況

在 Java 中處理字符串時常常會使用 String 類，實際上 String 對象的值是一個常量，一旦建立後不能被改變。正是由於其不可變，因此也沒法進行修改操做，只有不斷地 new 出新的 String 對象。java

爲此 Java 引入了可變字符串變量 StringBuilder 類，它不是線程安全的，只用在單線程場景下。數組

繼承結構

--java.lang.Object
  --java.lang.AbstractStringBuilder
    --java.lang.StringBuilder
複製代碼

類定義

public final class StringBuilder extends AbstractStringBuilder implements java.io.Serializable, CharSequence
複製代碼

StringBuilder 類被聲明爲 final，說明它不能再被繼承。同時它繼承了 AbstractStringBuilder 類，並實現了 Serializable 和 CharSequence 兩個接口。安全

其中 Serializable 接口代表其能夠序列化。bash

CharSequence 接口用來實現獲取字符序列的相關信息，接口定義以下：併發

length()獲取字符序列長度。
charAt(int index)獲取某個索引對應字符。
subSequence(int start, int end)獲取指定範圍子字符串。
toString()轉成字符串對象。
chars()用於獲取字符序列的字符的 int 類型值的流，該接口提供了默認的實現。
codePoints()用於獲取字符序列的代碼點的 int 類型的值的流，提供了默認的實現。

public interface CharSequence {

    int length();

    char charAt(int index);

    CharSequence subSequence(int start, int end);

    public String toString();

    public default IntStream chars() {
        省略代碼。。
    }

    public default IntStream codePoints() {
        省略代碼。。
    }
}
複製代碼

主要屬性

byte[] value;
byte coder;
int count;
複製代碼

value 該數組用於存儲字符串值。
coder 表示該字符串對象所用的編碼器。
count 表示該字符串對象中已使用的字符數。

構造方法

有若干種構造方法，能夠指定容量大小參數，若是沒有指定則構造方法默認建立容量爲16的字符串對象。若是 COMPACT_STRINGS 爲 true，即便用緊湊佈局則使用 LATIN1 編碼（ISO-8859-1編碼），則開闢長度爲16的 byte 數組。而若是是 UTF16 編碼則開闢長度爲32的 byte 數組。app

public StringBuilder() {
        super(16);
    }
    
AbstractStringBuilder(int capacity) {
        if (COMPACT_STRINGS) {
            value = new byte[capacity];
            coder = LATIN1;
        } else {
            value = StringUTF16.newBytesFor(capacity);
            coder = UTF16;
        }
    }
    
public StringBuilder(int capacity) {
        super(capacity);
    }
複製代碼

若是構造函數傳入的參數爲 String 類型，則會開闢長度爲str.length() + 16的 byte 數組，並經過append方法將字符串對象添加到 byte 數組中。機器學習

public StringBuilder(String str) {
        super(str.length() + 16);
        append(str);
    }
複製代碼

相似地，傳入參數爲 CharSequence 類型時也作相同處理。分佈式

public StringBuilder(CharSequence seq) {
        this(seq.length() + 16);
        append(seq);
    }
複製代碼

主要方法

append方法

有多個append方法，都只是傳入的參數不一樣而已，下面挑幾個典型的深刻看看，其餘都是相似的處理。函數

若是傳入 String 類型參數則調用父類的append方法將字符串對象添加到 StringBuilder 的 byte 數組中，而後返回 this。append 的邏輯爲：佈局

String 對象爲 null的話則在 StringBuilder 的 byte 數組中添加n u l l四個字符。
經過ensureCapacityInternal方法確保有足夠的空間，若是沒有則須要從新開闢空間。
經過putStringAt方法將字符串對象裏面的 byte 數組複製到 StringBuilder 的 byte 數組中，使用了System.arraycopy進行復制。
count 爲已使用的字符數，將其加上覆制的字符串長度。
返回 this。

public StringBuilder append(String str) {
        super.append(str);
        return this;
    }
    
public AbstractStringBuilder append(String str) {
        if (str == null) {
            return appendNull();
        }
        int len = str.length();
        ensureCapacityInternal(count + len);
        putStringAt(count, str);
        count += len;
        return this;
    }

複製代碼

ensureCapacityInternal方法邏輯：

首先獲取現有的容量大小。
若是須要的容量大於現有容量，則須要擴充容量，而且將原來的數組複製過來。
newCapacity方法用於肯定新容量大小，將現有容量大小擴大一倍再加上2，若是仍是不夠大則直接等於須要的容量大小，另外，若是新容量大小爲負則容量設置爲MAX_ARRAY_SIZE，它的大小等於Integer.MAX_VALUE - 8。

private void ensureCapacityInternal(int minimumCapacity) {
        int oldCapacity = value.length >> coder;
        if (minimumCapacity - oldCapacity > 0) {
            value = Arrays.copyOf(value,
                    newCapacity(minimumCapacity) << coder);
        }
    }

private int newCapacity(int minCapacity) {
        int oldCapacity = value.length >> coder;
        int newCapacity = (oldCapacity << 1) + 2;
        if (newCapacity - minCapacity < 0) {
            newCapacity = minCapacity;
        }
        int SAFE_BOUND = MAX_ARRAY_SIZE >> coder;
        return (newCapacity <= 0 || SAFE_BOUND - newCapacity < 0)
            ? hugeCapacity(minCapacity)
            : newCapacity;
    }
複製代碼

putStringAt的邏輯：

String 對象的編碼和 StringBuilder 對象的編碼不相同，則先執行inflate方法轉換成 UTF16 編碼。
若是 StringBuilder 對象不是 Latin1 編碼則不執行轉換。
經過StringUTF16.newBytesFor擴充空間，由於UTF16編碼的佔位是 Latin1 編碼的兩倍。
經過StringLatin1.inflate將原來的值拷貝到擴充後的空間中。
經過str.getBytes將 String 對象的值拷貝到 StringBuilder 對象中。

private final void putStringAt(int index, String str) {
        if (getCoder() != str.coder()) {
            inflate();
        }
        str.getBytes(value, index, coder);
    }

private void inflate() {
        if (!isLatin1()) {
            return;
        }
        byte[] buf = StringUTF16.newBytesFor(value.length);
        StringLatin1.inflate(value, 0, buf, 0, count);
        this.value = buf;
        this.coder = UTF16;
    }
複製代碼

傳入的參數爲 CharSequence 類型時，他會分幾種狀況處理，若是爲空則添加n u l l字符。另外還會根據對象實例化自 String 類型或 AbstractStringBuilder 類型調用對應的append方法。

public StringBuilder append(CharSequence s) {
        super.append(s);
        return this;
    }
    
public AbstractStringBuilder append(CharSequence s) {
        if (s == null) {
            return appendNull();
        }
        if (s instanceof String) {
            return this.append((String)s);
        }
        if (s instanceof AbstractStringBuilder) {
            return this.append((AbstractStringBuilder)s);
        }
        return this.append(s, 0, s.length());
    }
複製代碼

傳入的參數爲 char 數組類型時，邏輯以下：

經過ensureCapacityInternal方法確保足夠容量。
append 過程當中根據不一樣編碼作不一樣處理。
若是是 Latin1 編碼，從偏移量開始將一個個字符賦值到 StringBuilder 對象的字節數組中，這個過程當中會檢測每一個字符是否可使用 Latin1 編碼來解碼，能夠的話則直接將 char 轉成 byte 並進行賦值操做。不然爲 UTF16 編碼，此時先經過inflate()擴展空間，而後再經過StringUTF16.putCharsSB將全部剩下的字符串以 UTF16 編碼保存到 StringBuilder 對象中。
若是是 UTF16 編碼，則直接經過StringUTF16.putCharsSB將 char 數組添加到 StringBuilder 對象中。
修改 count 屬性，即已使用的字節數。

public StringBuilder append(char[] str) {
        super.append(str);
        return this;
    }

public AbstractStringBuilder append(char[] str) {
        int len = str.length;
        ensureCapacityInternal(count + len);
        appendChars(str, 0, len);
        return this;
    }

private final void appendChars(char[] s, int off, int end) {
        int count = this.count;
        if (isLatin1()) {
            byte[] val = this.value;
            for (int i = off, j = count; i < end; i++) {
                char c = s[i];
                if (StringLatin1.canEncode(c)) {
                    val[j++] = (byte)c;
                } else {
                    this.count = count = j;
                    inflate();
                    StringUTF16.putCharsSB(this.value, j, s, i, end);
                    this.count = count + end - i;
                    return;
                }
            }
        } else {
            StringUTF16.putCharsSB(this.value, count, s, off, end);
        }
        this.count = count + end - off;
    }
複製代碼

傳入的參數爲 boolean 類型時，邏輯以下：

經過ensureCapacityInternal肯定容量足夠大，true 和 false 的長度分別爲4和5。
若是爲 Latin1 編碼，按條件將t r u e 和 f a l s e添加到 StringBuilder 對象的字節數組中。
若是爲 UTF16 編碼，則按照編碼格式將 t r u e 和 f a l s e添加到 StringBuilder 對象的字節數組中。

public StringBuilder append(boolean b) {
        super.append(b);
        return this;
    }

public AbstractStringBuilder append(boolean b) {
        ensureCapacityInternal(count + (b ? 4 : 5));
        int count = this.count;
        byte[] val = this.value;
        if (isLatin1()) {
            if (b) {
                val[count++] = 't';
                val[count++] = 'r';
                val[count++] = 'u';
                val[count++] = 'e';
            } else {
                val[count++] = 'f';
                val[count++] = 'a';
                val[count++] = 'l';
                val[count++] = 's';
                val[count++] = 'e';
            }
        } else {
            if (b) {
                count = StringUTF16.putCharsAt(val, count, 't', 'r', 'u', 'e');
            } else {
                count = StringUTF16.putCharsAt(val, count, 'f', 'a', 'l', 's', 'e');
            }
        }
        this.count = count;
        return this;
    }
複製代碼

若是傳入的參數爲 int 或 long 類型，則處理的大體邏輯都爲先計算整數一共多少位數，而後再一個個放到 StringBuilder 對象的字節數組中。好比「789」，長度爲3，對於 Latin1 編碼則佔3個字節，而 UTF16 編碼佔6個字節。

public StringBuilder append(int i) {
        super.append(i);
        return this;
    }

public StringBuilder append(long lng) {
        super.append(lng);
        return this;
    }
複製代碼

若是傳入的參數爲 float 或 double 類型，則處理的大體邏輯都爲先計算浮點數一共多少位數，而後再一個個放到 StringBuilder 對象的字節數組中。好比「789.01」，長度爲6，注意點也佔空間，對於 Latin1 編碼則佔6個字節，而 UTF16 編碼佔12個字節。

public StringBuilder append(float f) {
        super.append(f);
        return this;
    }

public StringBuilder append(double d) {
        super.append(d);
        return this;
    }
複製代碼

appendCodePoint方法

該方法用於往 StringBuilder 對象中添加代碼點。代碼點是 unicode 編碼給字符分配的惟一整數，unicode 有17個代碼平面，其中的基本多語言平面（Basic Multilingual Plane，BMP）包含了主要常見的字符，其他平面叫作補充平面。

因此這裏先經過Character.isBmpCodePoint判斷是否屬於 BMP 平面，若是屬於該平面，此時只須要2個字節，則直接轉成 char 類型並添加到 StringBuilder 對象。若是超出 BMP 平面，此時須要4個字節，分別用來保存 High-surrogate 和 Low-surrogate，經過Character.toChars完成獲取對應4個字節並添加到 StringBuilder 對象中。

public StringBuilder appendCodePoint(int codePoint) {
        super.appendCodePoint(codePoint);
        return this;
    }
    
public AbstractStringBuilder appendCodePoint(int codePoint) {
        if (Character.isBmpCodePoint(codePoint)) {
            return append((char)codePoint);
        }
        return append(Character.toChars(codePoint));
    }


複製代碼

delete方法

該方法用於將指定範圍的字符刪掉，邏輯爲：

end 不能大於已使用字符數 count，大於的話則令其等於 count。
經過checkRangeSIOOBE檢查範圍合法性。
經過shift方法實現刪除操做，其經過System.arraycopy來實現，即把 end 後面的字符串複製到 start 位置，即至關於將中間的字符刪掉。
修改已使用字符數 count 值。
返回 this。

public StringBuilder delete(int start, int end) {
        super.delete(start, end);
        return this;
    }
    
public AbstractStringBuilder delete(int start, int end) {
        int count = this.count;
        if (end > count) {
            end = count;
        }
        checkRangeSIOOBE(start, end, count);
        int len = end - start;
        if (len > 0) {
            shift(end, -len);
            this.count = count - len;
        }
        return this;
    }

private void shift(int offset, int n) {
        System.arraycopy(value, offset << coder,
                         value, (offset + n) << coder, (count - offset) << coder);
    }
複製代碼

deleteCharAt方法

刪除指定索引字符，與 delete 方法實現同樣，經過shift方法實現刪除，修改 count 值。

public StringBuilder deleteCharAt(int index) {
        super.deleteCharAt(index);
        return this;
    }

public AbstractStringBuilder deleteCharAt(int index) {
        checkIndex(index, count);
        shift(index + 1, -1);
        count--;
        return this;
    }
複製代碼

replace方法

該方法用於將指定範圍的字符替換成指定字符串。邏輯以下：

end 不能大於已使用字符數 count，大於的話則令其等於 count。
經過checkRangeSIOOBE檢查範圍合法性。
計算新 count。
經過shift方法把 end 後面的字符串複製到 end + (newCount - count) 位置。
更新 count。
經過putStringAt將字符串放到 start 後，直接覆蓋掉後面的若干字符便可。

public StringBuilder replace(int start, int end, String str) {
        super.replace(start, end, str);
        return this;
    }
    
public AbstractStringBuilder replace(int start, int end, String str) {
        int count = this.count;
        if (end > count) {
            end = count;
        }
        checkRangeSIOOBE(start, end, count);
        int len = str.length();
        int newCount = count + len - (end - start);
        ensureCapacityInternal(newCount);
        shift(end, newCount - count);
        this.count = newCount;
        putStringAt(start, str);
        return this;
    }
複製代碼

insert方法

該方法用於向 StringBuilder 對象中插入字符。根據傳入的參數類型有若干個 insert 方法，操做都類似，深刻看重點一個。

當傳入的參數爲 String 類型時，邏輯爲：

經過checkOffset檢查偏移量的合法性。
若是字符串爲空，則將null字符串賦值給它。
經過ensureCapacityInternal確保足夠的容量。
經過shift方法把 offset 後面的字符串複製到 offset+len 位置。
更新 count。
將 str 放到 offset 位置，完成插入操做。
返回 this。

public StringBuilder insert(int offset, String str) {
        super.insert(offset, str);
        return this;
    }
    
public AbstractStringBuilder insert(int offset, String str) {
        checkOffset(offset, count);
        if (str == null) {
            str = "null";
        }
        int len = str.length();
        ensureCapacityInternal(count + len);
        shift(offset, len);
        count += len;
        putStringAt(offset, str);
        return this;
    }
複製代碼

除此以外，還可能插入 boolean 類型、object 類型、char 類型、char 數組類型、float 類型、double 類型、long 類型、int 類型和 CharSequence 類型。幾乎都是先轉成 String 類型再插入。

indexOf方法

該方法用於查找指定字符串的索引值，能夠從頭開始查找，也能夠指定起始位置。

能夠看到它間接調用了 String 類的indexOf方法，核心邏輯是若是是 Latin1 編碼則經過StringLatin1.indexOf查找，而若是是 UTF16 編碼則經過StringUTF16.indexOf查找。若是要查找的字符串編碼和 StringBuilder 對象的編碼不相同，則經過StringUTF16.indexOfLatin1查找。

public int indexOf(String str) {
        return super.indexOf(str);
    }
    
public int indexOf(String str, int fromIndex) {
        return super.indexOf(str, fromIndex);
    }
    
public int indexOf(String str, int fromIndex) {
        return String.indexOf(value, coder, count, str, fromIndex);
    }
    
static int indexOf(byte[] src, byte srcCoder, int srcCount,
                       String tgtStr, int fromIndex) {
        byte[] tgt    = tgtStr.value;
        byte tgtCoder = tgtStr.coder();
        int tgtCount  = tgtStr.length();

        if (fromIndex >= srcCount) {
            return (tgtCount == 0 ? srcCount : -1);
        }
        if (fromIndex < 0) {
            fromIndex = 0;
        }
        if (tgtCount == 0) {
            return fromIndex;
        }
        if (tgtCount > srcCount) {
            return -1;
        }
        if (srcCoder == tgtCoder) {
            return srcCoder == LATIN1
                ? StringLatin1.indexOf(src, srcCount, tgt, tgtCount, fromIndex)
                : StringUTF16.indexOf(src, srcCount, tgt, tgtCount, fromIndex);
        }
        if (srcCoder == LATIN1) {   
            return -1;
        }
        return StringUTF16.indexOfLatin1(src, srcCount, tgt, tgtCount, fromIndex);
    }
複製代碼

Latin1 編碼的StringLatin1.indexOf的主要邏輯爲：先肯定要查找的字符串的第一個字節 first，而後在 value 數組中遍歷尋找等於 first 的字節，一旦找到等於第一個字節的元素，則比較剩下的字符串是否相等，若是全部都相等則查找到指定的字節數組，返回該索引值，不然返回-1。

public static int indexOf(byte[] value, int valueCount, byte[] str, int strCount, int fromIndex) {
        byte first = str[0];
        int max = (valueCount - strCount);
        for (int i = fromIndex; i <= max; i++) {
            if (value[i] != first) {
                while (++i <= max && value[i] != first);
            }
            if (i <= max) {
                int j = i + 1;
                int end = j + strCount - 1;
                for (int k = 1; j < end && value[j] == str[k]; j++, k++);
                if (j == end) {
                    return i;
                }
            }
        }
        return -1;
    }
複製代碼

UTF16 編碼的StringUTF16.indexOf邏輯與 Latin1 編碼相似，只不過是須要兩個字節合到一塊兒（即比較 char 類型）進行比較。

另外若是源字符串的編碼爲 UTF16，而查找的字符串編碼爲 Latin1 編碼，，則經過StringUTF16.indexOfLatin1來查找，查找邏輯也是相似，只不過須要把一個字節的 Latin1 編碼轉成兩個字節的 UTF16 編碼後再比較。

lastIndexOf方法

該方法用於從尾部開始反向查找指定字符串的索引值，能夠從最末尾開始查找，也能夠指定末尾位置。它的實現邏輯跟indexOf差很少，只是反過來查找，這裏再也不贅述。

public int lastIndexOf(String str) {
        return super.lastIndexOf(str);
    }

public int lastIndexOf(String str, int fromIndex) {
        return super.lastIndexOf(str, fromIndex);
    }
複製代碼

reverse方法

該方法用於將字符串反轉，實現邏輯以下，其實就是作一個反轉操做，遍歷整個 StringBuilder 對象的數組，實現反轉。其中分爲 LATIN1 編碼和 UTF16 編碼作不一樣處理。

public StringBuilder reverse() {
        super.reverse();
        return this;
    }
    
public AbstractStringBuilder reverse() {
        byte[] val = this.value;
        int count = this.count;
        int coder = this.coder;
        int n = count - 1;
        if (COMPACT_STRINGS && coder == LATIN1) {
            for (int j = (n-1) >> 1; j >= 0; j--) {
                int k = n - j;
                byte cj = val[j];
                val[j] = val[k];
                val[k] = cj;
            }
        } else {
            StringUTF16.reverse(val, count);
        }
        return this;
    }
複製代碼

toString方法

該方法用於返回 String 對象，根據不一樣的編碼分別 new 出 String 對象。其中 UTF16 編碼會嘗試壓縮成 LATIN1 編碼，失敗的話則以 UTF16 編碼生成 String 對象。

public String toString() {
        return isLatin1() ? StringLatin1.newString(value, 0, count)
                          : StringUTF16.newString(value, 0, count);
    }
    
public static String newString(byte[] val, int index, int len) {
        return new String(Arrays.copyOfRange(val, index, index + len),
                          LATIN1);
    }
    
public static String newString(byte[] val, int index, int len) {
        if (String.COMPACT_STRINGS) {
            byte[] buf = compress(val, index, len);
            if (buf != null) {
                return new String(buf, LATIN1);
            }
        }
        int last = index + len;
        return new String(Arrays.copyOfRange(val, index << 1, last << 1), UTF16);
    }
複製代碼

writeObject方法

該方法是序列化方法，先按默認機制將對象寫入，而後再將 count 和 char 數組寫入。

private void writeObject(java.io.ObjectOutputStream s)
        throws java.io.IOException {
        s.defaultWriteObject();
        s.writeInt(count);
        char[] val = new char[capacity()];
        if (isLatin1()) {
            StringLatin1.getChars(value, 0, count, val, 0);
        } else {
            StringUTF16.getChars(value, 0, count, val, 0);
        }
        s.writeObject(val);
    }
複製代碼

readObject方法

該方法是反序列方法，先按默認機制讀取對象，再讀取 count 和 char 數組，最後再初始化對象內的字節數組和編碼標識。

private void readObject(java.io.ObjectInputStream s)
        throws java.io.IOException, ClassNotFoundException {
        s.defaultReadObject();
        count = s.readInt();
        char[] val = (char[]) s.readObject();
        initBytes(val, 0, val.length);
    }
    
void initBytes(char[] value, int off, int len) {
        if (String.COMPACT_STRINGS) {
            this.value = StringUTF16.compress(value, off, len);
            if (this.value != null) {
                this.coder = LATIN1;
                return;
            }
        }
        this.coder = UTF16;
        this.value = StringUTF16.toBytes(value, off, len);
    }
複製代碼

-------------推薦閱讀------------

跟我交流，向我提問：