ReentrantLock的實現網上有不少文章了,本篇文章會簡單介紹下其java層實現,重點放在分析競爭鎖失敗後如何阻塞線程。
因篇幅有限,synchronized的內容將會放到下篇文章。java
ReentrantLock是jdk中經常使用的鎖實現,其實現邏輯主語基於AQS(juc包中的大多數同步類實現都是基於AQS);接下來會簡單介紹AQS的大體原理,關於其實現細節以及各類應用,以後會寫一篇文章具體分析。linux
AQS是類AbstractQueuedSynchronizer.java的簡稱,JUC包下的ReentrantLock、CyclicBarrier、CountdownLatch都使用到了AQS。git
其大體原理以下:github
其中tryAcquire方法是抽象方法,具體實現取決於實現類,咱們常說的公平鎖和非公平鎖的區別就在於該方法的實現。安全
ReentrantLock分爲公平鎖和非公平鎖,咱們只看公平鎖。
ReentrantLock.lock會調用到ReentrantLock#FairSync.lock中:bash
FairSync.java架構
static final class FairSync extends Sync {
final void lock() {
acquire(1);
}
/**
* Fair version of tryAcquire. Don't grant access unless * recursive call or no waiters or is first. */ protected final boolean tryAcquire(int acquires) { final Thread current = Thread.currentThread(); int c = getState(); if (c == 0) { if (!hasQueuedPredecessors() && compareAndSetState(0, acquires)) { setExclusiveOwnerThread(current); return true; } } else if (current == getExclusiveOwnerThread()) { int nextc = c + acquires; if (nextc < 0) throw new Error("Maximum lock count exceeded"); setState(nextc); return true; } return false; } } 複製代碼
AbstractQueuedSynchronizer.javaless
public final void acquire(int arg) {
if (!tryAcquire(arg) &&
acquireQueued(addWaiter(Node.EXCLUSIVE), arg))
selfInterrupt();
}
複製代碼
能夠看到FairSync.lock調用了AQS的acquire
方法,而在acquire
中首先調用tryAcquire
嘗試得到鎖,如下兩種狀況返回true:async
重入
若是tryAcquire
失敗則調用acquireQueued
阻塞當前線程。acquireQueued
最終會調用到LockSupport.park()
阻塞線程。函數
我的認爲,要深刻理解鎖機制,一個很重要的點是理解系統是如何阻塞線程的。
LockSupport.java
public static void park(Object blocker) {
Thread t = Thread.currentThread();
setBlocker(t, blocker);
UNSAFE.park(false, 0L);
setBlocker(t, null);
}
複製代碼
park
方法的參數blocker是用於負責此次阻塞的同步對象,在AQS的調用中,這個對象就是AQS自己。咱們知道synchronized關鍵字是須要指定一個對象的(若是做用於方法上則是當前對象或當前類),與之相似blocker就是LockSupport指定的對象。
park
方法調用了native方法UNSAFE.park
,第一個參數表明第二個參數是不是絕對時間,第二個參數表明最長阻塞時間。
其實現以下,只保留核心代碼,完整代碼看查看unsafe.cpp
Unsafe_Park(JNIEnv *env, jobject unsafe, jboolean isAbsolute, jlong time){
...
thread->parker()->park(isAbsolute != 0, time);
...
}
複製代碼
park方法在os_linux.cpp中(其餘操做系統的實如今os_xxx中)
void Parker::park(bool isAbsolute, jlong time) {
...
//得到當前線程
Thread* thread = Thread::current();
assert(thread->is_Java_thread(), "Must be JavaThread");
JavaThread *jt = (JavaThread *)thread;
//若是當前線程被設置了interrupted標記,則直接返回
if (Thread::is_interrupted(thread, false)) {
return;
}
if (time > 0) {
//unpacktime中根據isAbsolute的值來填充absTime結構體,isAbsolute爲true時,time表明絕對時間且單位是毫秒,不然time是相對時間且單位是納秒
//absTime.tvsec表明了對於時間的秒
//absTime.tv_nsec表明對應時間的納秒
unpackTime(&absTime, isAbsolute, time);
}
//調用mutex trylock方法
if (Thread::is_interrupted(thread, false) || pthread_mutex_trylock(_mutex) != 0) {
return;
}
//_counter是一個許可的數量,跟ReentrantLock裏定義的許可變量基本都是一個原理。 unpack方法調用時會將_counter賦值爲1。
//_counter>0表明已經有人調用了unpark,因此不用阻塞
int status ;
if (_counter > 0) { // no wait needed
_counter = 0;
//釋放mutex鎖
status = pthread_mutex_unlock(_mutex);
return;
}
//設置線程狀態爲CONDVAR_WAIT
OSThreadWaitState osts(thread->osthread(), false /* not Object.wait() */);
...
//等待
_cur_index = isAbsolute ? ABS_INDEX : REL_INDEX;
pthread_cond_timedwait(&_cond[_cur_index], _mutex, &absTime);
...
//釋放mutex鎖
status = pthread_mutex_unlock(_mutex) ;
}
複製代碼
park
方法用POSIX的pthread_cond_timedwait
方法阻塞線程,調用pthread_cond_timedwait
前須要先得到鎖,所以park
主要流程爲:
pthread_mutex_trylock
嘗試得到鎖,若是獲取鎖失敗則直接返回pthread_cond_timedwait
進行等待pthread_mutex_unlock
釋放鎖另外,在阻塞當前線程前,會調用OSThreadWaitState
的構造方法將線程狀態設置爲CONDVAR_WAIT
,在Jvm中Thread狀態枚舉以下
enum ThreadState {
ALLOCATED, // Memory has been allocated but not initialized
INITIALIZED, // The thread has been initialized but yet started
RUNNABLE, // Has been started and is runnable, but not necessarily running
MONITOR_WAIT, // Waiting on a contended monitor lock
CONDVAR_WAIT, // Waiting on a condition variable
OBJECT_WAIT, // Waiting on an Object.wait() call
BREAKPOINTED, // Suspended at breakpoint
SLEEPING, // Thread.sleep()
ZOMBIE // All done, but not reclaimed yet
};
複製代碼
由上文咱們能夠知道LockSupport.park方法最終是由POSIX的pthread_cond_timedwait
的方法實現的。
咱們如今就進一步看看pthread_mutex_trylock
,pthread_cond_timedwait
,pthread_mutex_unlock
這幾個方法是如何實現的。
Linux系統中相關代碼在glibc庫中。
先看trylock的實現,
代碼在glibc的pthread_mutex_trylock.c
文件中,該方法代碼不少,咱們只看主要代碼
//pthread_mutex_t是posix中的互斥鎖結構體
int
__pthread_mutex_trylock (mutex)
pthread_mutex_t *mutex;
{
int oldval;
pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
switch (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex),
PTHREAD_MUTEX_TIMED_NP))
{
case PTHREAD_MUTEX_ERRORCHECK_NP:
case PTHREAD_MUTEX_TIMED_NP:
case PTHREAD_MUTEX_ADAPTIVE_NP:
/* Normal mutex. */
if (lll_trylock (mutex->__data.__lock) != 0)
break;
/* Record the ownership. */
mutex->__data.__owner = id;
++mutex->__data.__nusers;
return 0;
}
}
//如下代碼在lowlevellock.h中
#define __lll_trylock(futex) \
(atomic_compare_and_exchange_val_acq (futex, 1, 0) != 0)
#define lll_trylock(futex) __lll_trylock (&(futex))
複製代碼
mutex默認用的是PTHREAD_MUTEX_NORMAL
類型(與PTHREAD_MUTEX_TIMED_NP
相同);
所以會先調用lll_trylock
方法,lll_trylock
其實是一個cas操做,若是mutex->__data.__lock==0則將其修改成1並返回0,不然返回1。
若是成功,則更改mutex中的owner爲當前線程。
pthread_mutex_unlock.c
int
internal_function attribute_hidden
__pthread_mutex_unlock_usercnt (mutex, decr)
pthread_mutex_t *mutex;
int decr;
{
if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP)
== PTHREAD_MUTEX_TIMED_NP)
{
/* Always reset the owner field. */
normal:
mutex->__data.__owner = 0;
if (decr)
/* One less user. */
--mutex->__data.__nusers;
/* Unlock. */
lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex));
return 0;
}
}
複製代碼
pthread_mutex_unlock
將mutex中的owner清空,並調用了lll_unlock
方法
lowlevellock.h
#define __lll_unlock(futex, private) \
((void) ({ \
int *__futex = (futex); \
int __val = atomic_exchange_rel (__futex, 0); \
\
if (__builtin_expect (__val > 1, 0)) \
lll_futex_wake (__futex, 1, private); \
}))
#define lll_unlock(futex, private) __lll_unlock(&(futex), private)
#define lll_futex_wake(ftx, nr, private) \
({ \
DO_INLINE_SYSCALL(futex, 3, (long) (ftx), \
__lll_private_flag (FUTEX_WAKE, private), \
(int) (nr)); \
_r10 == -1 ? -_retval : _retval; \
})
複製代碼
lll_unlock
分爲兩個步驟:
FUTEX_WAIT
在休眠,因此經過調用系統函數FUTEX_WAKE
喚醒休眠線程FUTEX_WAKE
在上一篇文章有分析,futex機制的核心是當得到鎖時,嘗試cas更改一個int型變量(用戶態操做),若是integer原始值是0,則修改爲功,該線程得到鎖,不然就將當期線程放入到 wait queue中,wait queue中的線程不會被系統調度(內核態操做)。
futex變量的值有3種:0表明當前鎖空閒,1表明有線程持有當前鎖,2表明存在鎖衝突。futex的值初始化時是0;當調用try_lock的時候會利用cas操做改成1(見上面的trylock函數);當調用lll_lock
時,若是不存在鎖衝突,則將其改成1,不然改成2。
#define __lll_lock(futex, private) \
((void) ({ \
int *__futex = (futex); \
if (__builtin_expect (atomic_compare_and_exchange_bool_acq (__futex, \
1, 0), 0)) \
{ \
if (__builtin_constant_p (private) && (private) == LLL_PRIVATE) \
__lll_lock_wait_private (__futex); \
else \
__lll_lock_wait (__futex, private); \
} \
}))
#define lll_lock(futex, private) __lll_lock (&(futex), private)
void
__lll_lock_wait_private (int *futex)
{
//第一次進來的時候futex==1,因此不會走這個if
if (*futex == 2)
lll_futex_wait (futex, 2, LLL_PRIVATE);
//在這裏會把futex設置成2,並調用futex_wait讓當前線程等待
while (atomic_exchange_acq (futex, 2) != 0)
lll_futex_wait (futex, 2, LLL_PRIVATE);
}
複製代碼
pthread_cond_timedwait
用於阻塞線程,實現線程等待,
代碼在glibc的pthread_cond_timedwait.c
文件中,代碼較長,你能夠先簡單過一遍,看完下面的分析再從新讀一遍代碼
int
int
__pthread_cond_timedwait (cond, mutex, abstime)
pthread_cond_t *cond;
pthread_mutex_t *mutex;
const struct timespec *abstime;
{
struct _pthread_cleanup_buffer buffer;
struct _condvar_cleanup_buffer cbuffer;
int result = 0;
/* Catch invalid parameters. */
if (abstime->tv_nsec < 0 || abstime->tv_nsec >= 1000000000)
return EINVAL;
int pshared = (cond->__data.__mutex == (void *) ~0l)
? LLL_SHARED : LLL_PRIVATE;
//1.得到cond鎖
lll_lock (cond->__data.__lock, pshared);
//2.釋放mutex鎖
int err = __pthread_mutex_unlock_usercnt (mutex, 0);
if (err)
{
lll_unlock (cond->__data.__lock, pshared);
return err;
}
/* We have one new user of the condvar. */
//每執行一次wait(pthread_cond_timedwait/pthread_cond_wait),__total_seq就會+1
++cond->__data.__total_seq;
//用來執行futex_wait的變量
++cond->__data.__futex;
//標識該cond還有多少線程在使用,pthread_cond_destroy須要等待全部的操做完成
cond->__data.__nwaiters += 1 << COND_NWAITERS_SHIFT;
/* Remember the mutex we are using here. If there is already a
different address store this is a bad user bug. Do not store
anything for pshared condvars. */
//保存mutex鎖
if (cond->__data.__mutex != (void *) ~0l)
cond->__data.__mutex = mutex;
/* Prepare structure passed to cancellation handler. */
cbuffer.cond = cond;
cbuffer.mutex = mutex;
/* Before we block we enable cancellation. Therefore we have to
install a cancellation handler. */
__pthread_cleanup_push (&buffer, __condvar_cleanup, &cbuffer);
/* The current values of the wakeup counter. The "woken" counter
must exceed this value. */
//記錄futex_wait前的__wakeup_seq(爲該cond上執行了多少次sign操做+timeout次數)和__broadcast_seq(表明在該cond上執行了多少次broadcast)
unsigned long long int val;
unsigned long long int seq;
val = seq = cond->__data.__wakeup_seq;
/* Remember the broadcast counter. */
cbuffer.bc_seq = cond->__data.__broadcast_seq;
while (1)
{
//3.計算要wait的相對時間
struct timespec rt;
{
#ifdef __NR_clock_gettime
INTERNAL_SYSCALL_DECL (err);
int ret;
ret = INTERNAL_VSYSCALL (clock_gettime, err, 2,
(cond->__data.__nwaiters
& ((1 << COND_NWAITERS_SHIFT) - 1)),
&rt);
# ifndef __ASSUME_POSIX_TIMERS
if (__builtin_expect (INTERNAL_SYSCALL_ERROR_P (ret, err), 0))
{
struct timeval tv;
(void) gettimeofday (&tv, NULL);
/* Convert the absolute timeout value to a relative timeout. */
rt.tv_sec = abstime->tv_sec - tv.tv_sec;
rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000;
}
else
# endif
{
/* Convert the absolute timeout value to a relative timeout. */
rt.tv_sec = abstime->tv_sec - rt.tv_sec;
rt.tv_nsec = abstime->tv_nsec - rt.tv_nsec;
}
#else
/* Get the current time. So far we support only one clock. */
struct timeval tv;
(void) gettimeofday (&tv, NULL);
/* Convert the absolute timeout value to a relative timeout. */
rt.tv_sec = abstime->tv_sec - tv.tv_sec;
rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000;
#endif
}
if (rt.tv_nsec < 0)
{
rt.tv_nsec += 1000000000;
--rt.tv_sec;
}
/*---計算要wait的相對時間 end---- */
//是否超時
/* Did we already time out? */
if (__builtin_expect (rt.tv_sec < 0, 0))
{
//被broadcast喚醒,這裏疑問的是,爲何不須要判斷__wakeup_seq?
if (cbuffer.bc_seq != cond->__data.__broadcast_seq)
goto bc_out;
goto timeout;
}
unsigned int futex_val = cond->__data.__futex;
//4.釋放cond鎖,準備wait
lll_unlock (cond->__data.__lock, pshared);
/* Enable asynchronous cancellation. Required by the standard. */
cbuffer.oldtype = __pthread_enable_asynccancel ();
//5.調用futex_wait
/* Wait until woken by signal or broadcast. */
err = lll_futex_timed_wait (&cond->__data.__futex,
futex_val, &rt, pshared);
/* Disable asynchronous cancellation. */
__pthread_disable_asynccancel (cbuffer.oldtype);
//6.從新得到cond鎖,由於又要訪問&修改cond的數據了
lll_lock (cond->__data.__lock, pshared);
//__broadcast_seq值發生改變,表明發生了有線程調用了廣播
if (cbuffer.bc_seq != cond->__data.__broadcast_seq)
goto bc_out;
//判斷是不是被sign喚醒的,sign會增長__wakeup_seq
//第二個條件cond->__data.__woken_seq != val的意義在於
//可能兩個線程A、B在wait,一個線程調用了sign致使A被喚醒,這時B由於超時被喚醒
//對於B線程來講,執行到這裏時第一個條件也是知足的,從而致使上層拿到的result不是超時
//因此這裏須要判斷下__woken_seq(即該cond已經被喚醒的線程數)是否等於__wakeup_seq(sign執行次數+timeout次數)
val = cond->__data.__wakeup_seq;
if (val != seq && cond->__data.__woken_seq != val)
break;
/* Not woken yet. Maybe the time expired? */
if (__builtin_expect (err == -ETIMEDOUT, 0))
{
timeout:
/* Yep. Adjust the counters. */
++cond->__data.__wakeup_seq;
++cond->__data.__futex;
/* The error value. */
result = ETIMEDOUT;
break;
}
}
//一個線程已經醒了因此這裏__woken_seq +1
++cond->__data.__woken_seq;
bc_out:
//
cond->__data.__nwaiters -= 1 << COND_NWAITERS_SHIFT;
/* If pthread_cond_destroy was called on this variable already,
notify the pthread_cond_destroy caller all waiters have left
and it can be successfully destroyed. */
if (cond->__data.__total_seq == -1ULL
&& cond->__data.__nwaiters < (1 << COND_NWAITERS_SHIFT))
lll_futex_wake (&cond->__data.__nwaiters, 1, pshared);
//9.cond數據修改完畢,釋放鎖
lll_unlock (cond->__data.__lock, pshared);
/* The cancellation handling is back to normal, remove the handler. */
__pthread_cleanup_pop (&buffer, 0);
//10.從新得到mutex鎖
err = __pthread_mutex_cond_lock (mutex);
return err ?: result;
}
複製代碼
上面的代碼雖然加了註釋,但相信大多數人第一次看都看不懂。
咱們來簡單梳理下,上面代碼有兩把鎖,一把是mutex鎖,一把cond鎖。另外,在調用pthread_cond_timedwait
先後必須調用pthread_mutex_lock(&mutex);
和pthread_mutex_unlock(&mutex);
加/解mutex鎖。
所以pthread_cond_timedwait
的使用大體分爲幾個流程:
pthread_cond_timedwait
調用前)pthread_cond_timedwait
調用後)看到這裏,你可能有幾點疑問:爲何須要兩把鎖?mutex鎖和cond鎖的做用是什麼?
說mutex鎖的做用以前,咱們回顧一下java的Object.wait的使用。Object.wait必須是在synchronized同步塊中使用。試想下若是不加synchronized也能運行Object.wait的話會存在什麼問題?
Object condObj=new Object();
voilate int flag = 0;
public void waitTest(){
if(flag == 0){
condObj.wait();
}
}
public void notifyTest(){
flag=1;
condObj.notify();
}
複製代碼
如上代碼,A線程調用waitTest,這時flag==0,因此準備調用wait方法進行休眠,這時B線程開始執行,調用notifyTest將flag置爲1,並調用notify方法,注意:此時A線程還沒調用wait,因此notfiy沒有喚醒任何線程。而後A線程繼續執行,調用wait方法進行休眠,而以後不會有人來喚醒A線程,A線程將永久wait下去!
Object condObj=new Object();
voilate int flag = 0;
public void waitTest(){
synchronized(condObj){
if(flag == 0){
condObj.wait();
}
}
}
public void notifyTest(){
synchronized(condObj){
flag=1;
condObj.notify();
}
}
複製代碼
在有鎖保護下的狀況下, 當調用condObj.wait時,flag必定是等於0的,不會存在一直wait的問題。
回到pthread_cond_timedwait
,其須要加mutex鎖的緣由就呼之欲出了:保證wait和其wait條件的原子性
不論是glibc的pthread_cond_timedwait
/pthread_cond_signal
仍是java層的Object.wait
/Object.notify
,Jdk AQS的Condition.await
/Condition.signal
,全部的Condition機制都須要在加鎖環境下才能使用,其根本緣由就是要保證進行線程休眠時,條件變量是沒有被篡改的。
注意下mutex鎖釋放的時機,回顧上文中pthread_cond_timedwait
的流程,在第2步時就釋放了mutex鎖,以後調用futex_wait
進行休眠,爲何要在休眠前就釋放mutex鎖呢?緣由也很簡單:若是不釋放mutex鎖就開始休眠,那其餘線程就永遠沒法調用signal方法將休眠線程喚醒(由於調用signal方法前須要得到mutex鎖)。
在線程被喚醒以後還要在第10步中從新得到mutex鎖是爲了保證鎖的語義(思考下若是不從新得到mutex鎖會發生什麼)。
cond鎖的做用其實很簡單: 保證對象cond->data
的線程安全。
在pthread_cond_timedwait
時須要修改cond->data
的數據,如增長__total_seq(在這個cond上一共執行過多少次wait)增長__nwaiters(如今還有多少個線程在wait這個cond),全部在修改及訪問cond->data
時須要加cond鎖。
這裏我沒想明白的一點是,用mutex鎖也能保證cond->data
修改的線程安全,只要晚一點釋放mutex鎖就好了。爲何要先釋放mutex,從新得到cond來保證線程安全? 是爲了不mutex鎖住的範圍太大嗎?
該問題的答案能夠見評論區@11800222 的回答:
mutex鎖不能保護cond->data修改的線程安全,調用signal的線程沒有用mutex鎖保護修改cond的那段臨界區。
pthread_cond_wait/signal這一對自己用cond鎖同步就能睡眠喚醒。
wait的時候須要傳入mutex是由於睡眠前須要釋放mutex鎖,但睡眠以前又不能有無鎖的空隙,解決辦法是讓mutex鎖在cond鎖上以後再釋放。
而signal前不須要釋放mutex鎖,在持有mutex的狀況下signal,以後再釋放mutex鎖。
喚醒休眠線程的代碼比較簡單,主要就是調用lll_futex_wake。
int
__pthread_cond_signal (cond)
pthread_cond_t *cond;
{
int pshared = (cond->__data.__mutex == (void *) ~0l)
? LLL_SHARED : LLL_PRIVATE;
//由於要操做cond的數據,因此要加鎖
lll_lock (cond->__data.__lock, pshared);
/* Are there any waiters to be woken? */
if (cond->__data.__total_seq > cond->__data.__wakeup_seq)
{
//__wakeup_seq爲執行sign與timeout次數的和
++cond->__data.__wakeup_seq;
++cond->__data.__futex;
...
//喚醒wait的線程
lll_futex_wake (&cond->__data.__futex, 1, pshared);
}
/* We are done. */
lll_unlock (cond->__data.__lock, pshared);
return 0;
}
複製代碼
本文對Java簡單介紹了ReentrantLock實現原理,對LockSupport.park底層實現pthread_cond_timedwait
機制作了詳細分析。
看完這篇文章,你可能還會有疑問:Synchronized鎖的實現和ReentrantLock是同樣的嗎?Thread.sleep/Object.wait休眠線程的原理和LockSupport.park有什麼區別?linux內核層的futex的具體是如何實現的?
原文:Java架構筆記