爲何Netty的FastThreadLocal速度快

前言

最近在看netty源碼的時候發現了一個叫FastThreadLocal的類,jdk自己自帶了ThreadLocal類,因此能夠大體想到此類比jdk自帶的類速度更快,主要快在什麼地方,以及爲何速度更快,下面作一個簡單的分析;數組

性能測試

ThreadLocal主要被用在多線程環境下,方便的獲取當前線程的數據,使用者無需關心多線程問題,方便使用;爲了能說明問題,分別對兩個場景進行測試,分別是:多個線程操做同一個ThreadLocal,單線程下的多個ThreadLocal,下面分別測試:bash

1.多個線程操做同一個ThreadLocal

分別對ThreadLocal和FastThreadLocal使用測試代碼,部分代碼以下:多線程

public static void test2() throws Exception {
		CountDownLatch cdl = new CountDownLatch(10000);
		ThreadLocal<String> threadLocal = new ThreadLocal<String>();
		long starTime = System.currentTimeMillis();
		for (int i = 0; i < 10000; i++) {
			new Thread(new Runnable() {

				@Override
				public void run() {
					threadLocal.set(Thread.currentThread().getName());
					for (int k = 0; k < 100000; k++) {
						threadLocal.get();
					}
					cdl.countDown();
				}
			}, "Thread" + (i + 1)).start();
		}
		cdl.await();
		System.out.println(System.currentTimeMillis() - starTime + "ms");
	}
複製代碼

以上代碼建立了10000個線程,同時往ThreadLocal設置,而後get十萬次,而後經過CountDownLatch來計算總的時間消耗,運行結果爲:1000ms左右; 下面再對FastThreadLocal進行測試,代碼相似:ide

public static void test2() throws Exception {
		CountDownLatch cdl = new CountDownLatch(10000);
		FastThreadLocal<String> threadLocal = new FastThreadLocal<String>();
		long starTime = System.currentTimeMillis();
		for (int i = 0; i < 10000; i++) {
			new FastThreadLocalThread(new Runnable() {

				@Override
				public void run() {
					threadLocal.set(Thread.currentThread().getName());
					for (int k = 0; k < 100000; k++) {
						threadLocal.get();
					}
					cdl.countDown();
				}
			}, "Thread" + (i + 1)).start();
		}

		cdl.await();
		System.out.println(System.currentTimeMillis() - starTime);
	}
複製代碼

運行以後結果爲:1000ms左右;能夠發如今這種狀況下兩種類型的ThreadLocal在性能上並無什麼差距,下面對第二種狀況進行測試;源碼分析

2.單線程下的多個ThreadLocal

分別對ThreadLocal和FastThreadLocal使用測試代碼,部分代碼以下:性能

public static void test1() throws InterruptedException {
		int size = 10000;
		ThreadLocal<String> tls[] = new ThreadLocal[size];
		for (int i = 0; i < size; i++) {
			tls[i] = new ThreadLocal<String>();
		}
		
		new Thread(new Runnable() {
			@Override
			public void run() {
				long starTime = System.currentTimeMillis();
				for (int i = 0; i < size; i++) {
					tls[i].set("value" + i);
				}
				for (int i = 0; i < size; i++) {
					for (int k = 0; k < 100000; k++) {
						tls[i].get();
					}
				}
				System.out.println(System.currentTimeMillis() - starTime + "ms");
			}
		}).start();
	}
複製代碼

以上代碼建立了10000個ThreadLocal,而後使用同一個線程對ThreadLocal設值,同時get十萬次,運行結果:2000ms左右; 下面再對FastThreadLocal進行測試,代碼相似:測試

public static void test1() {
		int size = 10000;
		FastThreadLocal<String> tls[] = new FastThreadLocal[size];
		for (int i = 0; i < size; i++) {
			tls[i] = new FastThreadLocal<String>();
		}
		
		new FastThreadLocalThread(new Runnable() {

			@Override
			public void run() {
				long starTime = System.currentTimeMillis();
				for (int i = 0; i < size; i++) {
					tls[i].set("value" + i);
				}
				for (int i = 0; i < size; i++) {
					for (int k = 0; k < 100000; k++) {
						tls[i].get();
					}
				}
				System.out.println(System.currentTimeMillis() - starTime + "ms");
			}
		}).start();
	}
複製代碼

運行結果:30ms左右;能夠發現性能達到兩個數量級的差距,固然這是在大量訪問次數的狀況下才有的效果;下面重點分析一下ThreadLocal的機制,以及FastThreadLocal爲何比ThreadLocal更快;優化

ThreadLocal的機制

由於咱們經常使用的就是set和get方法,分別看一下對應的源碼:ui

public void set(T value) {
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if (map != null)
            map.set(this, value);
        else
            createMap(t, value);
    }
    
    ThreadLocalMap getMap(Thread t) {
        return t.threadLocals;
    }
複製代碼

以上代碼大體意思:首先獲取當前線程,而後獲取當前線程中存儲的threadLocals變量,此變量其實就是ThreadLocalMap,最後看此ThreadLocalMap是否爲空,爲空就建立一個新的Map,不爲空則以當前的ThreadLocal爲key,存儲當前value;能夠進一步看一下ThreadLocalMap中的set方法:this

private void set(ThreadLocal<?> key, Object value) {

            // We don't use a fast path as with get() because it is at // least as common to use set() to create new entries as // it is to replace existing ones, in which case, a fast // path would fail more often than not. Entry[] tab = table; int len = tab.length; int i = key.threadLocalHashCode & (len-1); for (Entry e = tab[i]; e != null; e = tab[i = nextIndex(i, len)]) { ThreadLocal<?> k = e.get(); if (k == key) { e.value = value; return; } if (k == null) { replaceStaleEntry(key, value, i); return; } } tab[i] = new Entry(key, value); int sz = ++size; if (!cleanSomeSlots(i, sz) && sz >= threshold) rehash(); } 複製代碼

大體意思:ThreadLocalMap內部使用一個數組來保存數據,相似HashMap;每一個ThreadLocal在初始化的時候會分配一個threadLocalHashCode,而後和數組的長度進行取模操做,因此就會出現hash衝突的狀況,在HashMap中處理衝突是使用數組+鏈表的方式,而在ThreadLocalMap中,能夠看到直接使用nextIndex,進行遍歷操做,明顯性能更差;下面再看一下get方法:

public T get() {
        Thread t = Thread.currentThread();
        ThreadLocalMap map = getMap(t);
        if (map != null) {
            ThreadLocalMap.Entry e = map.getEntry(this);
            if (e != null) {
                @SuppressWarnings("unchecked")
                T result = (T)e.value;
                return result;
            }
        }
        return setInitialValue();
    }
複製代碼

一樣是先獲取當前線程,而後獲取當前線程中的ThreadLocalMap,而後以當前的ThreadLocal爲key,到ThreadLocalMap中獲取value:

private Entry getEntry(ThreadLocal<?> key) {
            int i = key.threadLocalHashCode & (table.length - 1);
            Entry e = table[i];
            if (e != null && e.get() == key)
                return e;
            else
                return getEntryAfterMiss(key, i, e);
        }
        
         private Entry getEntryAfterMiss(ThreadLocal<?> key, int i, Entry e) {
            Entry[] tab = table;
            int len = tab.length;

            while (e != null) {
                ThreadLocal<?> k = e.get();
                if (k == key)
                    return e;
                if (k == null)
                    expungeStaleEntry(i);
                else
                    i = nextIndex(i, len);
                e = tab[i];
            }
            return null;
        }
複製代碼

同set方式,經過取模獲取數組下標,若是沒有衝突直接返回數據,不然一樣出現遍歷的狀況;因此經過分析能夠大體知道如下幾個問題: 1.ThreadLocalMap是存放在Thread下面的,ThreadLocal做爲key,因此多個線程操做同一個ThreadLocal其實就是在每一個線程的ThreadLocalMap中插入的一條記錄,不存在任何衝突問題; 2.ThreadLocalMap在解決衝突時,經過遍歷的方式,很是影響性能; 3.FastThreadLocal經過其餘方式解決衝突的問題,達到性能的優化; 下面繼續來看一下FastThreadLocal是經過何種方式達到性能的優化。

爲何Netty的FastThreadLocal速度快

Netty中分別提供了FastThreadLocal和FastThreadLocalThread兩個類,FastThreadLocalThread繼承於Thread,下面一樣對經常使用的set和get方法來進行源碼分析:

public final void set(V value) {
        if (value != InternalThreadLocalMap.UNSET) {
            set(InternalThreadLocalMap.get(), value);
        } else {
            remove();
        }
    }

    public final void set(InternalThreadLocalMap threadLocalMap, V value) {
        if (value != InternalThreadLocalMap.UNSET) {
            if (threadLocalMap.setIndexedVariable(index, value)) {
                addToVariablesToRemove(threadLocalMap, this);
            }
        } else {
            remove(threadLocalMap);
        }
    }
複製代碼

此處首先對value進行斷定是否爲InternalThreadLocalMap.UNSET,而後一樣使用了一個InternalThreadLocalMap用來存放數據:

public static InternalThreadLocalMap get() {
        Thread thread = Thread.currentThread();
        if (thread instanceof FastThreadLocalThread) {
            return fastGet((FastThreadLocalThread) thread);
        } else {
            return slowGet();
        }
    }

    private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
        InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
        if (threadLocalMap == null) {
            thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
        }
        return threadLocalMap;
    }
複製代碼

能夠發現InternalThreadLocalMap一樣存放在FastThreadLocalThread中,不一樣在於,不是使用ThreadLocal對應的hash值取模獲取位置,而是直接使用FastThreadLocal的index屬性,index在實例化時被初始化:

private final int index;

    public FastThreadLocal() {
        index = InternalThreadLocalMap.nextVariableIndex();
    }
複製代碼

再進入nextVariableIndex方法中:

static final AtomicInteger nextIndex = new AtomicInteger();
     
    public static int nextVariableIndex() {
        int index = nextIndex.getAndIncrement();
        if (index < 0) {
            nextIndex.decrementAndGet();
            throw new IllegalStateException("too many thread-local indexed variables");
        }
        return index;
    }
複製代碼

在InternalThreadLocalMap中存在一個靜態的nextIndex對象,用來生成數組下標,由於是靜態的,因此每一個FastThreadLocal生成的index是連續的,再看一下InternalThreadLocalMap中是如何setIndexedVariable的:

public boolean setIndexedVariable(int index, Object value) {
        Object[] lookup = indexedVariables;
        if (index < lookup.length) {
            Object oldValue = lookup[index];
            lookup[index] = value;
            return oldValue == UNSET;
        } else {
            expandIndexedVariableTableAndSet(index, value);
            return true;
        }
    }
複製代碼

indexedVariables是一個對象數組,用來存放value;直接使用index做爲數組下標進行存放;若是index大於數組長度,進行擴容;get方法直接經過FastThreadLocal中的index進行快速讀取:

public final V get(InternalThreadLocalMap threadLocalMap) {
        Object v = threadLocalMap.indexedVariable(index);
        if (v != InternalThreadLocalMap.UNSET) {
            return (V) v;
        }

        return initialize(threadLocalMap);
    }
    
    public Object indexedVariable(int index) {
        Object[] lookup = indexedVariables;
        return index < lookup.length? lookup[index] : UNSET;
    }
複製代碼

直接經過下標進行讀取,速度很是快;可是這樣會有一個問題,可能會形成空間的浪費;

總結

經過以上分析咱們能夠知道在有大量的ThreadLocal進行讀寫操做的時候,纔可能會遇到性能問題;另外FastThreadLocal經過空間換取時間的方式來達到O(1)讀取數據;還有一個疑問就是內部爲何不直接使用HashMap(數組+黑紅樹)來代替ThreadLocalMap。

相關文章
相關標籤/搜索