JDK源碼中的一些「小技巧」

時間 2019-12-12

標籤 jdk 源碼一些技巧欄目 Java 简体版

原文原文鏈接

均摘選自JDK源碼，俺的講座《Java基礎教程-手寫JDK》會詳細講解這些知識點，你們不妨圍觀下：）java

1 i++ vs i--

String源碼的第985行，equals方法中node

while (n--!= 0) {
       if (v1[i] != v2[i])
            return false;
       i++;           
  }

這段代碼是用於判斷字符串是否相等，但有個奇怪地方是用了i--!=0來作判斷，咱們一般不是用i++麼？爲何用i--呢？並且循環次數相同。緣由在於編譯後會多一條指令：segmentfault

i-- 操做自己會影響CPSR(當前程序狀態寄存器)，CPSR常見的標誌有N(結果爲負), Z(結果爲0)，C（有進位），O（有溢出）。i > 0，能夠直接經過Z標誌判斷出來。
i++操做也會影響CPSR(當前程序狀態寄存器)，但隻影響O（有溢出）標誌，這對於i < n的判斷沒有任何幫助。因此還須要一條額外的比較指令，也就是說每一個循環要多執行一條指令。數組

簡單來講，跟0比較會少一條指令。因此，循環使用i--，高端大氣上檔次。緩存

2 成員變量 vs 局部變量

JDK源碼在任何方法中幾乎都會用一個局部變量來接受成員變量，好比網絡

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;

由於局部變量初始化後是在該方法線程棧中，而成員變量初始化是在堆內存中，顯然前者更快，因此，咱們在方法中儘可能避免直接使用成員變量，而是使用局部變量。ide

3 刻意加載到寄存器 && 將耗時操做放到鎖外部

在ConcurrentHashMap中，鎖segment的操做頗有意思，它不是直接鎖，而是相似於自旋鎖，反覆嘗試獲取鎖，而且在獲取鎖的過程當中，會遍歷鏈表，從而將數據先加載到寄存器中緩存中，避免在鎖的過程當中在便利，同時，生成新對象的操做也是放到鎖的外部來作，避免在鎖中的耗時操做this

final V put(K key, int hash, V value, boolean onlyIfAbsent) {
        /** 在往該 segment 寫入前，須要先獲取該 segment 的獨佔鎖
           不是強制lock()，而是進行嘗試 */
        HashEntry<K,V> node = tryLock() ? null :
            scanAndLockForPut(key, hash, value);

scanAndLockForPut()源碼編碼

private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
    HashEntry<K,V> first = entryForHash(this, hash);
    HashEntry<K,V> e = first;
    HashEntry<K,V> node = null;
    int retries = -1; // negative while locating node

    // 循環獲取鎖
    while (!tryLock()) {
        HashEntry<K,V> f; // to recheck first below
        if (retries < 0) {
            if (e == null) {
                if (node == null) // speculatively create node
                    //該hash位無值，新建對象，而不用再到put()方法的鎖中再新建
                    node = new HashEntry<K,V>(hash, key, value, null);
                retries = 0;
            }
            //該hash位置key也相同，退化成自旋鎖
            else if (key.equals(e.key))
                retries = 0;
            else
                // 循環鏈表，cpu能自動將鏈表讀入緩存
                e = e.next;
        }
        // retries>0時就變成自旋鎖。固然，若是重試次數若是超過 MAX_SCAN_RETRIES（單核1多核64），那麼不搶了，進入到阻塞隊列等待鎖
        //    lock() 是阻塞方法，直到獲取鎖後返回，不然掛起
        else if (++retries > MAX_SCAN_RETRIES) {
            lock();
            break;
        }
        else if ((retries & 1) == 0 &&
                 // 這個時候是有大問題了，那就是有新的元素進到了鏈表，成爲了新的表頭
                 //     因此這邊的策略是，至關於從新走一遍這個 scanAndLockForPut 方法
                 (f = entryForHash(this, hash)) != first) {
            e = first = f; // re-traverse if entry changed
            retries = -1;
        }
    }
    return node;
}

4 判斷對象相等可先用==

在判斷對象是否相等時，可先用==，由於==直接比較地址，很是快，而equals的話會最對象值的比較，相對較慢，因此有可能的話，能夠用a==b || a.equals(b)來比較對象是否相等spa

5 關於transient

transient是用來阻止序列化的，但HashMap源碼中內部數組是定義爲transient的

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

那豈不裏面的鍵值對都沒法序列化了麼，網絡中用hashmap來傳輸豈不是沒法傳輸，其實否則。

Effective Java 2nd, Item75, Joshua大神提到:

For example, consider the case of a hash table. The physical
representation is a sequence of hash buckets containing key-value
entries. The bucket that an entry resides in is a function of the hash
code of its key, which is not, in general, guaranteed to be the same
from JVM implementation to JVM implementation. In fact, it isn't even
guaranteed to be the same from run to run. Therefore, accepting the
default serialized form for a hash table would constitute a serious
bug. Serializing and deserializing the hash table could yield an
object whose invariants were seriously corrupt.

怎麼理解? 看一下HashMap.get()/put()知道, 讀寫Map是根據Object.hashcode()來肯定從哪一個bucket讀/寫. 而Object.hashcode()是native方法, 不一樣的JVM裏多是不同的.

打個比方說, 向HashMap存一個entry, key爲字符串"STRING", 在第一個java程序裏, "STRING"的hashcode()爲1, 存入第1號bucket; 在第二個java程序裏, "STRING"的hashcode()有可能就是2, 存入第2號bucket. 若是用默認的串行化(Entry[] table不用transient), 那麼這個HashMap從第一個java程序裏經過串行化導入第二個java程序後, 其內存分佈是同樣的, 這就不對了.

舉個例子，好比向HashMap存一個鍵值對entry, key="方老司", 在第一個java程序裏, "方老司"的hashcode()爲1, 存入table[1]，好，如今傳到另外一個在JVM程序裏, "方老司" 的hashcode()有可能就是2, 因而到table[2]去取，結果值不存在。

HashMap如今的readObject和writeObject是把內容輸出/輸入, 把HashMap從新生成出來.

6 不要用char

char在Java中utf-16編碼，是2個字節，而2個字節是沒法表示所有字符的。2個字節表示的稱爲 BMP，另外的做爲high surrogate和 low surrogate 拼接組成由4字節表示的字符。好比String源碼中的indexOf:

//這裏用int來接受一個char，方便判斷範圍
 public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }
        //在Bmp範圍
        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            //不然轉到四個字節的判斷方式
            return indexOfSupplementary(ch, fromIndex);
        }
    }

因此Java的char只能表示utf16中的bmp部分字符。對於CJK（中日韓統一表意文字）部分擴展字符集則沒法表示。

例如，下圖中除Ext-A部分，char均沒法表示。