本文由巨杉數據庫北美實驗室資深數據庫架構師撰寫,主要介紹巨杉數據庫的併發malloc實現與架構設計。__原文爲英文撰寫,咱們提供了中文譯本在英文以後。數據庫
SequoiaDB Concurrent malloc Implementation緩存
Introduction數據結構
In a C/C++ application, the dynamic memory allocation function malloc(3) can have a significant impact on the application’s performance. For multi-threaded applications such as a database engine, a sub-optimal memory allocator can also limit the scalability of the application. In this paper, we will discuss several popular dynamic memory allocator, and how SequoiaDB addresses the dynamic memory allocation problem in its database engine.多線程
dlmalloc/ptmalloc架構
The GNU C library (glibc) uses ptmalloc, which is an allocator forked from dlmalloc with thread-related improvement. Memories are allocated as chunks, which is 8-byte aligned data structure containing a header and usable memory. This means there is at least an 8 or 16 byte overhead for memory chunk management. Unallocated memory is grouped by similar sizes and maintained by a double-linked list of chunks.併發
jemallocapp
Originally developed by Jason Evans in 2005, jemalloc has since been adopted by FreeBSD, Facebook, Mozilla Firefox, MariaDB, Android and etc. jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. In order to avoid lock contention, jemalloc uses separate memory pool 「arenas」 for each CPU, and threads are assigned to an arena to handle malloc requests.ide
tcmalloc函數
TCMalloc is a malloc developed by Google. It reduces lock contention for multi-threaded programs by utilizing thread-local storage for small allocations. For large allocations, mmap or sbrk can be used along with fine grained and efficient spinlocks. It also has garbage-collection for local storage of dead threads. For small objects allocation, TCMalloc requires just one-percent space overhead for 8-byte objects, which is very space-efficient.性能
Here is a test done to compare the performance of jemalloc and tcmalloc. The test involves 500 iterations of performing 1000 memory allocation, then free these 1000 memory. As seen both of them have very similar performance.
SequoiaDB Implementation
In SequoiaDB 3.4, it implements its own proprietary memory allocator, which is highly efficient and tailored for the memory usage within the SequoiaDB database engine. While jemalloc and tcmalloc are both excellent general purpose memory allocator, they cannot address all the challenges that are encountered within SequoiaDB. For example, the ability to trace memory requests is an important requirement in SequoiaDB engine, and this feature is lacking in existing third-party memory allocators. Figure 2 shows the architecture of the SequoiaDB memory model. There are three layers - thread, pool and OSS (Operating System Services).
OSS Layer
The OSS layer provides malloc API which requests memory from the underlying operating system. This is also where the pool layer gets the memory from.
Pool Layer
The pool layer is a global memory pool which contains segments of different size. A segment is a contiguous memory block that is allocated from the OSS Layer. Each segment is divided into fixed-size chunks. By default there are 32-byte, 64, 128…8092-byte chunk-size. Requests above the 8092-byte max chunk-size threshold will be serviced by the OSS layer.
Thread Layer
The thread layer is a thread-local cache, with each thread having its own private cache, therefore memory allocation can be done in a lock-free manner. Memory chunks are grouped together by their chunk size, implemented using a linked-list. Memory chunks are requested and cached from the pool layer up to a configured threshold. For memories exceeding this threshold, they are released back to the pool layer, and can be reused by other threads. This design helps limit the overall memory footprint. In addition, each thread has a single elastic-big-block, which is used to service requests above max chunk-size threshold. Therefore, in most cases requests can be fulfilled in the thread layer, which is efficient and fast.
In addition, the SequoiaDB memory model also has built-in memory-debugging capability to detect memory corruption. It also has a trace feature which can track down where memories are being requested from. On top of that, it is fully configurable, and allow deployment to be customized according to customers workload and environment.
如下爲中文譯本
介紹
在 C / C ++ 應用程序中,動態內存分配函數 malloc(3) 會對應用程序的性能產生重大影響。對於諸如數據庫引擎之類的多線程應用程序,優化不足的內存分配器也會限制應用程序的可伸縮性。在本文中,咱們將討論幾種流行的動態內存分配器,以及 SequoiaDB 如何解決其數據庫引擎中的動態內存分配問題。
dlmalloc/ptmalloc
GNU C 庫 (glibc) 使用 ptmalloc,它是從 dlmalloc 派生的具備線程相關改進的分配器。內存被分配爲塊,這是 8byte 對齊的數據結構,其中包含標頭和可用內存。這意味着內存塊管理至少有 8 或 16byte 的開銷。未分配的內存按類似的大小分組,並由塊的雙向連接列表維護。
jemalloc
jemalloc 最初由 Jason Evans 於2005年開發,此後已被 FreeBSD,Facebook,Mozilla Firefox,MariaDB,Android 等採用。jemalloc 是通用的 malloc(3) 實現,主要特色是避免碎片化和可擴展的併發支持。爲了不鎖競爭,jemalloc 爲每一個 CPU 使用單獨的內存池「區域」,而且將線程分配給區域以處理 malloc 請求。
tcmalloc
TCMalloc 是 Google 開發的 malloc。經過利用線程本地存儲進行小的分配,它減小了多線程程序的鎖爭用。對於較大的分配,能夠將 mmap 或 sbrk 與細粒度且高效的自旋鎖一塊兒使用。它還具備垃圾收集功能,用於死線程的本地存儲。對於小對象分配,TCMalloc 僅須要8個字節對象的百分之一的空間開銷,這很是節省空間。
這是一個測試,用於比較 jemalloc 和 tcmalloc 的性能。該測試涉及500次迭代以執行1000個內存分配,而後釋放這1000個內存。如圖所示,它們二者的性能十分接近。
SequoiaDB的實現
在 SequoiaDB 中(以 SequoiaDB v3.4 做爲例子),它實現了本身專有的內存分配器,該分配器高效且針對 SequoiaDB 數據庫引擎中的內存使用量身定製。儘管 jemalloc 和 tcmalloc 都是出色的通用內存分配器,但它們沒法解決 SequoiaDB 內部遇到的全部挑戰。例如,跟蹤內存請求的能力是 SequoiaDB 引擎的一項重要要求,而現有的第三方內存分配器缺乏此功能。圖2顯示了 SequoiaDB 內存模型的體系結構。共有三層-線程,池和 OSS(操做系統服務)。
OSS Layer
OSS 層提供了 malloc API,該 API 向底層操做系統請求內存。這也是 PoolLayer 從中獲取內存的位置。
Pool Layer
Pool Layer 是全局內存池,其中包含不一樣大小的段。段是從 OSS 層分配的連續內存塊。每一個段分爲固定大小的塊。默認狀況下,有32字節,6四、128…8092字節的塊大小。超過8092字節最大塊大小閾值的請求將由 OSS 層處理。
Thread Layer
線程層是線程本地緩存,每一個線程都有其本身的專用緩存,所以能夠無鎖方式完成內存分配。內存塊按其塊大小分組在一塊兒,使用連接列表實現。從 Pool Layer 請求內存塊並將其緩存到配置的閾值。對於超過此閾值的內存,它們將釋放回 Pool Layer 並能夠由其餘線程重用。
此設計有助於限制總體內存佔用。此外,每一個線程都有一個彈性大塊,用於服務超過最大塊大小閾值的請求。所以,在大多數狀況下,能夠在線程層中知足請求,這既高效又快速。
此外,SequoiaDB 內存模型還具備內置的內存調試功能,能夠檢測內存損壞。它還具備跟蹤功能,能夠跟蹤從哪裏請求內存。最重要的是,它是徹底可配置的,並容許根據客戶的工做量和環境自定義部署。