Postgres中的SpinLock鎖

時間 2019-11-11

標籤 postgres spinlock 简体版

原文原文鏈接

咱們知道，在數據庫中爲了併發控制，少不了要使用各類各樣的鎖(lock)。PostgreSQL中也不例外。算法

在PostgreSQL中有三種級別的鎖，他們的關係以下：數據庫

|上層  RegularLock
  |
  |      LWLock
  |
  |底層  SpinLock

那麼按照順序，咱們先來討論下PostgreSQL的最底層的SpinLock。編程

做爲PostgreSQL的最底層的鎖，SpinLock比較簡單，它的特色是封鎖時間很短，沒有等待隊列和死鎖檢測機制，在事務結束時不能自動釋放。所以，SpinLock通常不單獨使用，而是做爲其餘鎖(LWLock)的底層實現。windows

做爲最底層鎖，它的實現是和操做系統和硬件環境相關的。爲此，PostgreSQL實現了兩個SpinLock：多線程

與機器相關的實現，利用TAS指令集實現(定義在s_lock.h和s_lock.c中);併發
與機器無關，利用PostgreSQL定義的信號量PGSemaphore實現(定義在spin.c中)。app

很顯然，依賴機器實現的SpinLock必定比不依賴機器實現的SpinLock要快。所以，若是PostgreSQL運行的機器上若是支持TAS指令集，那麼天然會採用第一種實現，不然只能使用第二種實現了。ide

關於SpinLock的動做，能夠看下面這張圖：函數

機器相關的實現

咱們，知道與機器相關的實現利用了TAS指令集。那麼什麼是TAS呢？優化

TAS是 Test and Set的縮寫。是一個原子操做。它修改內存的值，並返回原來的值。當一個進程P1對一個內存位置作TAS操做，不容許其它進程P2對此內存位置再作TAS操做。P2必須等P1操做完成後，再作TAS操做。所以，該操做被用來實現進程互斥。

有了這個概念，咱們來看源代碼。

代碼在：

src/include/storage/s_lock.h
src/backend/storage/lmgr/s_lock.c

雖說了對於SpinLock有兩個底層實現，可是在上層調用時，咱們是使用統一的接口的，接口在src/backend/storage/lmgr/s_lock.c中：

/*
 * s_lock(lock) - platform-independent portion of waiting for a spinlock.
 */
int
s_lock(volatile slock_t *lock, const char *file, int line, const char *func)
{
    ...
   
    while (TAS_SPIN(lock))   //調用點
    {
    
    ... 

}

能夠發現這個TAS_SPIN(lock)是一個宏，

#define TAS_SPIN(lock)  TAS(lock)

當使用基於TAS指令集的鎖時，有：

#define TAS(lock) tas(lock)

對機器的TAS的使用在函數tas()中。

static __inline__ int
tas(volatile slock_t *lock)
{
    register slock_t _res = 1;

    /*
     * Use a non-locking test before asserting the bus lock.  Note that the
     * extra test appears to be a small loss on some x86 platforms and a small
     * win on others; it's by no means clear that we should keep it.
     *
     * When this was last tested, we didn't have separate TAS() and TAS_SPIN()
     * macros.  Nowadays it probably would be better to do a non-locking test
     * in TAS_SPIN() but not in TAS(), like on x86_64, but no-one's done the
     * testing to verify that.  Without some empirical evidence, better to
     * leave it alone.
     */
    __asm__ __volatile__(
        "   cmpb    $0,%1   \n"
        "   jne     1f      \n"
        "   lock            \n"
        "   xchgb   %0,%1   \n"
        "1: \n"
:       "+q"(_res), "+m"(*lock)
:       /* no inputs */
:       "memory", "cc");
    return (int) _res;
}

能夠看到這段在C語言中的內嵌彙編代碼便是調用了機器的TAS指令。假設lock原來的值爲「0」，當P1去作申請lock時，能獲取獲得鎖。而此時P2再去申請鎖時，必須spin，由於此時lock的值已經被P1修改成「1」了。

用TAS來實現spin lock,此處要注意volatile的使用。volatile表示這個變量是易失的，因此會編譯器會每次都去內存中取原始值，而不是直接拿寄存器中的值。

這避免了在多線程編程中，因爲多個線程更新同一個變動，內存中和寄存器中值的不一樣步而致使變量的值錯亂的問題。另外，也會影響編譯器的優化行爲。

具體彙編代碼的解析，能夠查看相關資料。

在使用時，PostgreSQL不直接調用tas()函數，而是經過：

int s_lock(volatile slock_t *lock, const char *file, int line, const char *func);

來申請spin lock。返回值是等待的時間。

機器無關的實現

若是機器上沒有TAS指令集，那麼PostgreSQL利用PGSemaphores來實現SpinLock。

PGSemaphore是使用OS底層的semaphore來實現的，PG對其作了封裝，提供了PG系統內部統一的semaphore操做接口。PG的用PGSemaphore結構體表示PG自身的semaphore信號，並將相關操做封裝在sembuf中，傳遞給底層OS。

實現代碼在：

src/backend/storage/lmgr/spin.c

咱們知道這個TAS_SPIN(lock)是SpinLock的抽象定義：

#define TAS_SPIN(lock)  TAS(lock)

在不使用TAS的場合，有：

#define TAS(lock)   tas_sema(lock)

即調用tas_sema(lock)函數實現SpinLock：

int
tas_sema(volatile slock_t *lock)
{
    /* Note that TAS macros return 0 if *success* */
    return !PGSemaphoreTryLock(&SpinlockSemaArray[*lock]);
}

對於信號量，PostgreSQL分別針對POSIX 信號量、SYSTEM V信號量和windows信號量進行了不一樣的實現，實現代碼分別在：

src/backend/port/posix_sema.c
src/backend/port/sysv_sema.c
src/backend/port/win32_sema.c

咱們這裏以SYSTEM V信號量爲例進行講解。

PGSemaphoreTryLock的定義爲：

bool
PGSemaphoreTryLock(PGSemaphore sema)
{
    int         errStatus;
    struct sembuf sops;    //重要！！！

    sops.sem_op = -1;           /* decrement */
    sops.sem_flg = IPC_NOWAIT;  /* but don't block */
    sops.sem_num = sema->semNum;

    /*
     * Note: if errStatus is -1 and errno == EINTR then it means we returned
     * from the operation prematurely because we were sent a signal.  So we
     * try and lock the semaphore again.
     */
    do
    {
        errStatus = semop(sema->semId, &sops, 1);
    } while (errStatus < 0 && errno == EINTR);
    
    ...

即調用了PGSemaphores來實現SpinLock。

而PGSemaphores的定義爲：

typedef struct PGSemaphoreData
{
    int         semId;          /* semaphore set identifier */
    int         semNum;         /* semaphore number within set */
} PGSemaphoreData;

在利用system V信號量時，咱們有：

struct sembuf
{
unsigned short int sem_num; /* semaphore number */
short int sem_op; /* semaphore operation */
short int sem_flg; /* operation flag */
};

PGSemaphoreTryLock中的while循環裏就是執行了semop操做。
而這些操做是OS自帶的操做(在<sys/sem.h>頭文件中)：

extern int semop(int __semid, struct sembuf *opsptr, size_t nops);

很明顯，此處PostgreSQL封裝了OS底層的system V 的semaphore，而後利用OS底層的系統函數來操做。

剩下兩種信號量大抵如此，此處很少言。

共通的操做

SpinLock是分兩種狀況來分別實現的。這是它們的不一樣，在Spinlock之上有一些共通的操做要說明下。對於SpinLock的獲取，並非每次都成功，當嘗試獲取時發現一個對象已經被lock時，當前線程不會阻塞在改鎖上，而是先spin(自旋)必定的次數以後再sleep必定的時間後嘗試再次獲取。對於每次spin以後的sleep時間，PostgreSQL使用了自適應算法，來決定spin的次數和每次spin後，sleep的時間。

下面兩個變量要注意下：