Java 經常使用類源碼解析——String

時間 2019-11-07

標籤 java 經常使用源碼解析 string 欄目 Java 简体版

原文原文鏈接

String

類圖

成員變量

/** * 存儲字符，被 final 修飾，沒法修改 */
    private final char value[];

    /** * 存儲 String 的 hashcode */
    private int hash; // Default to 0
複製代碼

String 類的成員變量主要是上面兩個。java

經常使用構造方法

`public String( )`

public String() {
		this.value = "".value;
}
複製代碼

這裏直接用 "".value 賦值，而 value 是 "" 這個 String 對象的私有成員變量，爲何能夠直接訪問呢？git

由於 java 的訪問控制符是基於類的，而不是基於對象的。因此在同一個類中，能夠訪問該類不一樣對象的私有成員變量。正則表達式

`public String(String original)`

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
}
複製代碼

這種方式建立出來的字符串其實是 original 的一份拷貝，新字符串的 value 變量與 original 字符串的 value 變量是同一個內存地址的對象。因此，若是不須要顯示拷貝的狀況下，沒有必要使用這種方式建立對象。算法

`public String(char value[])`

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
}
複製代碼

根據字符數組建立字符串，這裏使用 Arrays.copyOf 方法能夠防止對 value 字符數組的修改影響到建立出來的字符串中的 value 數組。數組

`public String(char value[], int offset, int count)`

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }
複製代碼

這個方法與上面的構造方法相似，最後給 value 賦值使用的 Arrays.copyOfRange 方法來進行指定範圍的拷貝。工具

經常使用方法

`public String substring(int beginIndex, int endIndex)`

public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }
複製代碼

該方法用來獲取子字符串，截取範圍爲 [beginIndex, endIndex)，即包括起始索引，不包括終止索引。this

最後返回的新字符串使用的 public String(char value[], int offset, int count) 來構造。spa

`public boolean equals(Object anObject)`

public boolean equals(Object anObject) {
        // 直接比較內存地址
        if (this == anObject) {
            return true;
        }
        // 判斷 anObject 是否屬於 String 類
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            // 比較長度是否相等
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                // 逐位判斷值是否相等
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }
複製代碼

String 的 equals 方法是一個經典的 Object 類的重寫方法，其操做主要包括四個步驟code

比較兩個對象內存地址是否相同（Object 中的 equals 方法實現）
判斷傳入對象是否屬於 String 類
比較長度是否相等
經過循環逐位比較相同索引的值是否相等

`public String replace(char oldChar, char newChar)`

public String replace(char oldChar, char newChar) {
        if (oldChar != newChar) {
            int len = value.length;
            int i = -1;
            char[] val = value; /* avoid getfield opcode */

            //循環判斷字符串中是否有須要被替換的字符
            while (++i < len) {
                if (val[i] == oldChar) {
                    break;
                }
            }
            //若是有須要被替換的字符串，則進入該過程
            if (i < len) {
                // 構造新的字符數據 buf，放入已經遍歷過的字符
                char buf[] = new char[len];
                for (int j = 0; j < i; j++) {
                    buf[j] = val[j];
                }
                // 若是字符串沒有所有被遍歷，繼續遍歷；當索引 i 位置上的元素等於 oldChar 時替換爲 newChar
                while (i < len) {
                    char c = val[i];
                    buf[i] = (c == oldChar) ? newChar : c;
                    i++;
                }
                return new String(buf, true);
            }
        }
        return this;
    }
複製代碼

該方法替換字符步驟以下：cdn

在 while 循環中判斷原字符串中是否有須要被替換的字符 oldChar
若是原字符串中有 oldChar，則進入新字符串構建過程
新建 buf[] 數組，將原字符串已經遍歷的不等於 oldChar 的字符放入其中
若是原字符串沒有所有被遍歷，則繼續遍歷；當索引 i 位置上的元素等於 oldChar 時替換爲 newChar
根據新構建的 buf[] 數組返回新的字符串對象

這裏只介紹了參數爲 char 的字符替換，參數爲 String 的替換都是使用 正則表達式 來匹配並替換的。

`public String[] split(String regex)`

public String[] split(String regex) {
        return split(regex, 0);
    }
    
    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a (1)one-char String and this character is not one of the RegEx's meta characters ".$|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */
        char ch = 0;
        if ((
                // 字符長度爲 1 時，匹配是不是特殊字符
                (regex.value.length == 1 && ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
                // 字符長度爲 2 時，匹配第一個字符爲'\'，第二個字符非字母與數字
                (regex.length() == 2 && regex.charAt(0) == '\\' && (((ch = regex.charAt(1)) - '0') | ('9' - ch)) < 0
                        && ((ch - 'a') | ('z' - ch)) < 0 && ((ch - 'A') | ('Z' - ch)) < 0))
                // 匹配是不是字符範圍
                && (ch < Character.MIN_HIGH_SURROGATE || ch > Character.MAX_LOW_SURROGATE)) {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            // 遍歷 String，將分割的部分分別加入 list 中
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // 沒有匹配到字符
            if (off == 0) {
                return new String[]{this};
            }

            // list 添加留下來的部分
            if (!limited || list.size() < limit) {
                list.add(substring(off, value.length));
            }

            // 構造結果
            int resultSize = list.size();
            if (limit == 0) {
                //移除尾部空字符串
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        // 其他狀況，使用正則表達式來處理
        return Pattern.compile(regex).split(this, limit);
    }
複製代碼

具體步驟都在方法註釋上，關注遍歷 String 的操做

while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
複製代碼

當 regex 爲單個字符時，已遍歷字符索引爲 off，next 爲 regex 出現的索引。當有元素匹配上 regex 時，off = next + 1，而當有兩個連續的 regex 字符出現時，也會出現 next = next + 1。此時 next = off，substring(off, next) 爲空字符串。

因此，若字符串中出現連續的單一字符 regex N 次，則後面的 N - 1 個 regex 會致使結果中出現 N - 1 個空字符串。

在 regex 長度大於一時，正常匹配的處理過程也會將連續的 regex處理成空字符串。

其餘方法

`public native String intern()`

public native String intern();
複製代碼

intern 在開發中基本上不會使用到，可是在方法分析中常常遇到。

intern 方法的做用在 jdk 的註釋中已經解釋的很清楚了。

當字符串已經存在常量池中時，返回該字符串在常量池中的內存地址；

若是字符串在常量池中不存在時，將該字符串加入常量池，再返回其在常量池中的內存地址。

用一段代碼來解釋：

①        String s1 = "Hello";
②        String s2 = "Hello";
③        String s3 = new String("Hello");
④        System.out.println(s1 == s2);//true
⑤        System.out.println(s1 == s3);//false
⑥        s3 = s3.intern();
⑦        System.out.println(s1 == s3);//true
複製代碼

第一步，在棧中聲明瞭一個變量 s1，在常量池中加入了字符串 "Hello"。

第二步，在棧中聲明瞭一個變量 s2，指向常量池中的 "Hello"。

第三步，在堆中建立了一個對象，對象指向常量池中的 "Hello"，棧中聲明的變量 s3 指向的是堆中的對象。

所以，s1 == s2 爲 true，而 s1 == s3 爲 false。

第六步調用了 s3 = s3.intern()，至關於獲取了常量池中 "Hello" 的內存地址，並使 s3 指向它。

所以第七步，s1 == s3 會輸出 true。

String 的不變性

public final class String implements java.io.Serializable, Comparable<String>, CharSequence {

    /** * 存儲字符，被 final 修飾，沒法修改 */
    private final char value[];
複製代碼

String 被 final 修飾，說明該類不能被繼承。

String 中保存數據的是 char 數組 value，value 也被 final 修飾，因此當 String 被賦值以後，內存地址沒法再修改。即便能夠改變 value 數組中的值，可是 value 被 private 修飾，內部也沒有開放對 value 修改的方法，因此 value 產生後，內存地址沒法修改。

以上兩點肯定了 String 的不變性。