Linux傳統Huge Pages與Transparent Huge Pages再次學習總結

時間 2019-11-21

標籤 linux 傳統 huge pages transparent 再次學習總結欄目 Linux 简体版

原文原文鏈接

Linux下的大頁分爲兩種類型：標準大頁（Huge Pages）和透明大頁（Transparent Huge Pages）。Huge Pages有時候也翻譯成大頁/標準大頁/傳統大頁，它們都是Huge Pages的不一樣中文翻譯名而已，順帶提一下這個，省得有人被這些名詞給混淆、誤導了。Huge Pages是從Linux Kernel 2.6後被引入的。目的是使用更大的內存頁面（memory page size）以適應愈來愈大的系統內存，讓操做系統能夠支持現代硬件架構的大頁面容量功能。透明大頁（Transparent Huge Pages）縮寫爲THP，這個是RHEL 6（其它分支版本SUSE Linux Enterprise Server 11, and Oracle Linux 6 with earlier releases of Oracle Linux Unbreakable Enterprise Kernel 2 (UEK2)）開始引入的一個功能。具體能夠參考官方文檔。這二者有啥區別呢？這二者的區別在於大頁的分配機制，標準大頁管理是預分配的方式，而透明大頁管理則是動態分配的方式。相信有很多人將Huge Page和Transparent Huge Pages混爲一談。目前透明大頁與傳統HugePages聯用會出現一些問題，致使性能問題和系統重啓。Oracle 建議禁用透明大頁（Transparent Huge Pages）。在 Oracle Linux 6.5 版中，已刪除透明 HugePages。html

標準大頁（HuagePage）英文介紹linux

HugePages is a feature integrated into the Linux kernel with release 2.6. It is a method to have larger pages where it is useful for working with very large memory. It can be useful for both 32-bit and 64-bit configurations. HugePage sizes vary from 2MB to 256MB, depending on the kernel version and the hardware architecture. For Oracle Databases, using HugePages reduces the operating system maintenance of page states, and increases TLB (Translation Lookaside Buffer) hit ratio.算法

RHEL的官方文檔對傳統大頁（Huge Pages）和透明大頁（Transparent Huge Pages）這二者的描述以下(https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html)sql

Huge pages can be difficult to manage manually, and often require significant changes to code in order to be used effectively. As such, Red Hat Enterprise Linux 6 also implemented the use of transparent huge pages(THP). THP is an abstraction layer that automates most aspects of creating, managing, and using huge pages.數據庫

THP hides much of the complexity in using huge pages from system administrators and developers. As the goal of THP is improving performance, its developers (both from the community and Red Hat) have tested and optimized THP across a wide range of systems, configurations, applications, and workloads. This allows the default settings of THP to improve the performance of most system configurations. However, THP is not recommended for database workloads.緩存

傳統大頁很難手動管理, 並且一般須要對代碼進行重大更改纔能有效地使用。所以, 紅帽企業 Linux 6 實現引入了透明大頁面 (THP)。THP 是一個抽象層, 能夠自動建立、管理和使用傳統大頁的大多數方面。bash

THP爲系統管理員和開發人員減小了不少使用傳統大頁的複雜性, 由於THP的目標是改進性能, 所以其它開發人員 (來自社區和紅帽) 已在各類系統、配置、應用程序和負載中對 THP 進行了測試和優化。這樣可以讓 THP 的默認設置改進大多數系統配置性能。可是, 不建議對數據庫工做負載使用 THP。服務器

注：THP 目前只能映射異步內存區域，好比堆和棧空間架構

咱們知道，x86架構使用的是虛擬內存架構，其容許尋址範圍超過硬件中的可用物理內存。這經過容許每一個進程擁有本身可尋址的內存來實現。該進程認爲此內存是專供本身使用的。這稱爲進程的虛擬內存。實際上，此內存能夠是實際駐留於RAM 芯片上的物理內存，也能夠是存儲在物理磁盤上被稱做交換區或分頁區的專用區域中。進程不知道虛擬內存是存儲在RAM 中仍是磁盤上；內存由操做系統管理。若是所需內存超過可用物理內存，操做系統會將一些內存移出到分頁區。這種活動效率極低，是致使性能問題的常見緣由。因爲磁盤的存取速度遠低於RAM，「分頁」的進程會遇到顯著的性能問題。oracle

另外，隨着硬件的飛速發展，服務器的內存愈來愈大，系統中使用的內存越多，管理該內存所需的資源也就越多。對於Linux 操做系統，經過 Linux kswapd 進程和頁表（Page Table）內存結構（針對系統中存在的每一個進程包含一條記錄）實現內存管理。每條記錄包含進程使用的每頁虛擬內存及其物理地址（RAM 或磁盤）。經過使用處理器的TLB（ Translation Lookaside Buffer CPU中一小塊緩存）爲該進程提供幫助。操做系統使用頁表條目管理系統中進程所用的內存。在 Linux 中，執行此管理的操做系統進程被稱做kswapd，可在操做系統工具中找到。TLB 緩存將緩存頁表條目來提升性能。典型的 TLB 緩存可保存 4 到 4096 個條目。對於數百萬甚至數十億個頁表條目，這種緩存就不夠用了。

當大量內存被用於ORACLE數據庫或其餘應用時，操做系統將消耗大量資源來管理虛擬地址到物理地址轉換，其結果每每是一個很是大的頁表結構（Page Table）。因爲每條頁表條目包含進程正在使用的全部內存頁面的虛擬地址到物理地址的轉換，所以對於很是大的系統全局區 (SGA)，每一個進程的頁表條目均可能很大。舉個例子，咱們的一個測試服務器，內存爲64GB，SGA_TARGET爲32G，若是沒有使用傳統大頁，頁表結構（PageTables）大小爲1573080 kB，接近1.5G大小了。您能夠看到，要管理的頁面數量巨大。這將致使顯著的性能開銷。

# grep PageTables /proc/meminfo

PageTables: 1573080 kB

這些就是傳統大頁爲何會被引入的緣由。引入它能解決什麼問題呢？內存是由塊管理，即衆所周知的頁面。咱們知道，在Linux 64位系統裏面,默認內存是以4K的頁面（Page）來管理的。也就是說一個頁面有 4096 字節。1MB 內存等於 256 個頁面。2MB內存等於512個頁面。管理這些內存的消耗就比較大。CPU 有內嵌的內存管理單元TLB，這些單元中包含這些頁面列表，每一個頁面都使用頁表條目。頁表（Page Table）用來存放虛擬內存和物理內存頁對應關係的內存結構。若是page size較小，那麼相應的頁表內存結構就會比較大。而Hugepages的默認值page size爲2M，是4KB的500倍，因此能夠大大減少Page Table的大小。經過啓用 HugePages使用大頁面，能夠用一個頁表條目表明一個大頁面，而不是使用許多條目表明較小的頁面，從而能夠管理更多內存，減小操做系統對頁面狀態的維護並提升 TLB 緩存命中率。注意，Hugepagesize的大小默認爲2M，這個也是能夠調整的。區間範圍爲2MB to 256MB。

若是上面這段解釋還不夠清晰、完全，那麼看看下面這段摘抄的解釋：

大多數操做系統採用了分段或分頁的方式進行管理。分段是粗粒度的管理方式，而分頁則是細粒度管理方式，分頁方式能夠避免內存空間的浪費。相應地，也就存在內存的物理地址與虛擬地址的概念。經過前面這兩種方式，CPU必須把虛擬地址轉換程物理內存地址才能真正訪問內存。爲了提升這個轉換效率，CPU會緩存最近的虛擬內存地址和物理內存地址的映射關係，並保存在一個由CPU維護的映射表中。爲了儘可能提升內存的訪問速度，須要在映射表中保存儘可能多的映射關係。Linux的內存管理採起的是分頁存取機制，爲了保證物理內存能獲得充分的利用，內核會按照LRU算法在適當的時候將物理內存中不常用的內存頁自動交換到虛擬內存中，而將常用的信息保留到物理內存。一般狀況下，Linux默認狀況下每頁是4K，這就意味着若是物理內存很大，則映射表的條目將會很是多，會影響CPU的檢索效率。由於內存大小是固定的，爲了減小映射表的條目，可採起的辦法只有增長頁的尺寸。所以Hugepage便所以而來。也就是打破傳統的小頁面的內存管理方式，使用大頁面2M,4M等。如此一來映射條目則明顯減小。TLB 緩存命中率將大大提升。

而ORACLE爲何要使用標準大頁（Huge Pages）來提升性能？由於ORACLE數據庫使用共享內存(SGA)來管理能夠共享的一些資源;好比shared pool中存儲了共享的SQL語句及執行計劃,buffer pool中存儲了數據塊。對這些資源的訪問,其實就是ORACLE使用OS的API來訪問內存資源的過程。內存操做理應/一般意義上都是很快的,這時候Oracle數據庫能夠很正常的工做。可是有些狀況下也會出現性能問題：

a)若是SGA內的某一部分被swap到硬盤上,那麼再次訪問它,就須要花很是多的時間。

b)若是OS自己的內存很是的大,那麼管理/訪問到咱們須要的內存的過程就須要更長時間。

在這些狀況下,咱們每每會碰到諸如latch/mutex/library cache lock[pin]/row cache lock的問題.

Linux下HugePage能夠解決由以上兩種問題引起的性能波動。

咱們知道，在Linux 64位系統裏面,默認內存是以4K的頁面（Page）來管理的,當系統有很是多的內存的時候,管理這些內存的消耗就比較大;而HugePage使用2M大小的頁面來減少管理開銷。HugePage管理的內存並不能被Swap，這就避免了Swap引起的數據庫性能問題。因此，若是您的系統常常碰到由於swap引起的性能問題的系統毫無疑問須要啓用HugePage。另外，OS內存很是大的系統也須要啓用HugePage。可是具體多大就必定須要使用HugePage？這並無定論，有些文檔曾經提到12G以上就推薦開啓，咱們強烈建議您在測試環境進行了充分的測試以後，再決定是否在生產環境應用HugePage。

固然，任何事情都是有兩面性的，HugePage也有些小缺點。第一個缺點是它須要額外配置，可是這徹底是能夠忽略的。另外，若是使用了HugePage，11g新特性 AMM（Automatic Memory Management）就不能使用了，可是ASMM（Automatic Shared Memory Management）仍然能夠繼續使用。

下面是一些相關名詞以及Huge Pages的特徵等等。大部分都是RHEL官網或Mos上相關英文資料以及對應的部分翻譯：

· Page Table: A page table is the data structure of a virtual memory system in an operating system to store the mapping between virtual addresses and physical addresses. This means that on a virtual memory system, the memory is accessed by first accessing a page table and then accessing the actual memory location implicitly.

· TLB: A Translation Lookaside Buffer (TLB) is a buffer (or cache) in a CPU that contains parts of the page table. This is a fixed size buffer being used to do virtual address translation faster.

· hugetlb: This is an entry in the TLB that points to a HugePage (a large/big page larger than regular 4K and predefined in size). HugePages are implemented via hugetlb entries, i.e. we can say that a HugePage is handled by a "hugetlb page entry". The 'hugetlb" term is also (and mostly) used synonymously with a HugePage (See Note 261889.1). In this document the term "HugePage" is going to be used but keep in mind that mostly "hugetlb" refers to the same concept.

· hugetlbfs: This is a new in-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocated on hugetlbfs type filesystem are allocated in HugePages.

HugePages in 2.4 Kernels

The HugePages feature is backported to some 2.4 kernels. Kernel versions 2.4.21-* has this feature (See Note 311504.1 for the distributions with 2.4.21 kernels) but it is implemented in a different way. The feature is completely available. The difference from 2.6 implementation is the organization within the source code and the kernel parameters that are used for configuring HugePages. See Parameters/Setup section below.

Advantages of HugePages Over Normal Sharing Or AMM (see below)

· Not swappable: HugePages are not swappable. Therefore there is no page-in/page-out mechanism overhead.HugePages are universally regarded as pinned.

不可交換：HugePages不可交換。所以沒有頁面換入/頁面換出的機制開銷.HugePages被廣泛認爲是固定在RAM中的。

· Relief of TLB pressure:

o Hugepge uses fewer pages to cover the physical address space, so the size of 「book keeping」 (mapping from the virtual to the physical address) decreases, so it requiring fewer entries in the TLB

o TLB entries will cover a larger part of the address space when use HugePages, there will be fewer TLB misses before the entire or most of the SGA is mapped in the SGA

o Fewer TLB entries for the SGA also means more for other parts of the address space

減輕TLB的壓力

Hugepge使用較少的頁面來覆蓋物理地址空間，所以「保留書籤」（從虛擬地址到物理地址的映射）的大小減小，所以在TLB中要求較少的條目

當使用HugePages時，TLB條目將涵蓋更大的地址空間，對於SGA中的所有或大部分的內存映射，TLB未命中將會大大減小。

SGA須要更少的TLB條目意味着TLB中能夠有更多的條目來保存其餘地址空間。

TLB是直接緩存虛擬地址到物理地址的緩存表，用於提高性能，省去查找page table從而減小開銷，可是若是出現的大量的TLB miss，必然會給系統的性能帶來較大的負面影響，尤爲對於連續的讀操做。從第二篇文章中咱們知道若是使用hugepages能大量減小PTE的數量，也就意味着訪問一樣多的內容須要的PTE會更少，而一般TLB的槽位是有限的，通常只有512個，因此更少的PTE也就意味着更高的TLB的命中率。

· Decreased page table overhead: Each page table entry can be as large as 64 bytes and if we are trying to handle 50GB of RAM, the pagetable will be approximately 800MB in size which is practically will not fit in 880MB size lowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6 kernels) considering the other uses of lowmem. When 95% of memory is accessed via 256MB hugepages, this can work with a page table of approximately 40MB in total. See also Document 361468.1.

減小頁表的開銷：每一個頁表條目能夠高達64字節，若是咱們50GB的RAM保存在頁表（page table）當中，那麼頁表（page table）大小大約爲800MB，實際上對於lowmem來講，考慮到lowmem的其餘用途，880MB大小是不合適的（在2.4內核當中 ,page tabel在低於2.6的內核當中不是必須的）， lowmem中經過256MB的hugepages訪問95％的內存時，可使用大約40MB的頁表。另見文檔361468.1。

· Eliminated page table lookup overhead: Since the pages are not subject to replacement, page table lookups are not required.

減小頁表查詢的開銷：PTE的數量減小，那麼使得不少頁表的查詢就不須要了，而且更少的PTE使得頁表的查詢更快。若是TLB miss，則可能須要額外三次內存讀取操做才能將線性地址翻譯爲物理地址。

· Faster overall memory performance: On virtual memory systems each memory operation is actually two abstract memory operations. Since there are fewer pages to work on, the possible bottleneck on page table access is clearly avoided.

提高內存訪問的總體性能：使用虛擬內存，每一次對內存的訪問實際上都是由兩次抽象的內存操做組成。若是隻要使用更少的頁面，那麼本來在頁表訪問的瓶頸也得以免

HugePages Reservation

The HugePages reservation feature is fully implemented in 2.6.17 kernel, and thus EL5 (based on 2.6.18) has this feature. The alloc_huge_page() is improved for this. (See kernel source mm/hugetlb.c)

From /usr/share/doc/kernel-doc-2.6.18/Documentation/vm/hugetlbpage.txt:

HugePages_Rsvd is short for "reserved," and is the number of hugepages for which a commitment to allocate from the pool has been made, but no allocation has yet been made. It's vaguely analogous to overcommit.

This feature in the Linux kernel enables the Oracle Database to be able to allocate hugepages for the sublevels of the SGA on-demand. The same behaviour is expected for various Oracle Database versions that are certified on EL5.

HugePages and Oracle 11g Automatic Memory Management (AMM)

The AMM and HugePages are not compatible. One needs to disable AMM on 11g to be able to use HugePages. See Document 749851.1 for further information.

在Linux中， kswapd是負責內核頁面交換管理的一個守護進程，它的職責是保證Linux內存管理操做的高效。當物理內存不夠時，它就會變得很是aggressive，有些狀況下能佔用單核CPU的100%. kswapd 進程負責確保內存空間老是在被釋放中，它監控內核中的pages_high和pages_low閥值。若是空閒內存的數值低於pages_low,則每次 kswapd 進程啓動掃描並嘗試釋放32個free pages.並一直重複這個過程,直到空閒內存的數值高於 pages_high。kswapd 進程完成如下幾個操做:

o 若是該頁處於未修改狀態,則將該頁放置回空閒列表中.

o 若是該頁處於已修改狀態並可備份迴文件系統,則將頁內容寫入到磁盤.

o 若是該頁處於已修改狀態但沒有任何磁盤備份,則將頁內容寫入到swap device.

標準大頁的配置查看

查看標準大頁（Huage Pages)的頁面大小：

 
  [root@DB-Server ~]$ grep Hugepagesize /proc/meminfo 
   
  Hugepagesize:     2048 kB

確認標準大頁（傳統大頁/大頁/HuagePage）是否配置、並在使用的方法：

 
  [oracle@DB-Server ~]$ cat /proc/sys/vm/nr_hugepages  
   
  0 
   
  [oracle@DB-Server ~]$ grep -i HugePages_Total /proc/meminfo  
   
  HugePages_Total:     0

若是HugePages_Total爲0，意味着標準大頁（大頁、傳統大頁）沒有設置或使用。nr_hugepages爲0，意味着標準大頁沒有設置。

標準大頁的一些內核參數。以下所示：

 
  [oracle@DB-Server ~]$ more /etc/issue 
   
  Red Hat Enterprise Linux Server release 5.7 (Tikanga) 
   
  Kernel \r on an \m 
   
  [oracle@DB-Server ~]$ grep Huge /proc/meminfo 
   
  HugePages_Total:     0 
   
  HugePages_Free:      0 
   
  HugePages_Rsvd:      0 
   
  Hugepagesize:     2048 kB 
   
  [root@mylnx02 ~]# more /etc/issue 
   
  Red Hat Enterprise Linux Server release 6.6 (Santiago) 
   
  Kernel \r on an \m 
   
  [root@mylnx02 ~]#  grep Huge /proc/meminfo 
   
  AnonHugePages:     18432 kB 
   
  HugePages_Total:       0 
   
  HugePages_Free:        0 
   
  HugePages_Rsvd:        0 
   
  HugePages_Surp:        0 
   
  Hugepagesize:       2048 kB

AnonHugePages: 匿名 HugePages 數量。Oracle Linux 6.5 中已刪除此計數器。與透明 HugePages 有關。

HugePages_Total: 分配的頁面數目，和Hugepagesize相乘後獲得所分配的內存大小

HugePages_Free: 歷來沒有被使用過的Hugepages數目。即便oracle sga已經分配了這部份內存，可是若是沒有實際寫入，那麼看到的仍是Free的。這是很容易誤解的地方（池中還沒有分配的 HugePages 數量。）

HugePages_Rsvd: 已經被分配預留可是尚未使用的page數目。在Oracle剛剛啓動時，大部份內存應該都是Reserved而且Free的，隨着ORACLE SGA的使用，Reserved和Free都會不斷的下降

HugePages_Surp: 「surplus」的縮寫形式，表示池中大於/proc/sys/vm/nr_hugepages 中值的 HugePages 數量。剩餘 HugePages 的最大數量由 /proc/sys/vm/nr_overcommit_hugepages 控制。此值爲0的狀況很常見

Hugepagesize: 頁面大小

HugePages_Free – HugePages_Rsvd 這部分是沒有被使用到的內存，若是沒有其餘的oracle instance，這部份內存也許永遠都不會被使用到，也就是被浪費了。HugePages_Total-HugePages_Free+HugePages_Rsvd 就是目前實例須要的頁面數量.

如何設置標準大頁（Huge Page）的大小呢？通常都是修改內核參數nr_hugepages。在/etc/sysctl.conf配置文件中設置參數vm.nr_hugepages

# echo "vm.nr_hugepages=512" >> /etc/sysctl.conf

下面咱們來介紹一下，在64位Linux服務器下爲ORACLE數據庫設置標準大頁的基本步驟，具體須要根據實際狀況做出調整。關於如何配置標準大頁，能夠參考官方文檔HugePages on Oracle Linux 64-bit (文檔 ID 361468.1)或文檔（Configuring HugePages for Oracle on Linux (x86-64))來演示一下如何設置Huge Pages。

步驟1：在/etc/security/limits.conf文件中添加memlock的限制，注意該值略微小於實際物理內存的大小（單位爲KB）。好比物理內存是64GB，能夠設置爲以下：

* soft memlock 60397977

* hard memlock 60397977

若是這裏的值超過了SGA的需求，也沒有不利的影響。若是使用了Oracle Linux的oracle-validated包，或者Exadata DB compute則會自動配置這個參數。下面來看看一個實際測試環境，內存爲16G

 
  [root@mylnx02 ~]# free -m 
   
               total       used       free     shared    buffers     cached 
   
  Mem:         16077       9520       6556          0         37        766 
   
  -/+ buffers/cache:       8716       7361 
   
  Swap:        14015          0      14015

那麼咱們修改/etc/security/limits.conf，設置memlock的值爲16384000（16077*1024）

vi /etc/security/limits.conf

* soft memlock 16384000

* hard memlock 16384000

步驟2：從新登陸安裝Oracle產品的帳號並驗證memlock。以下所示，當前測試環境的帳號爲oracle

[oracle@mylnx02 ~]$ ulimit -l

16384000

步驟3： 11g中禁用AMM 若是ORACLE 11g要使用標準大頁，就必須禁用AMM(Automatic Memory Management)，若是是ORACLE 10g則能夠忽略該步驟。

 
  [oracle@DB-Server ~]$ sqlplus / as sysdba 
   
  SQL*Plus: Release 11.2.0.1.0 Production on Fri Oct 27 14:43:12 2017 
   
  Copyright (c) 1982, 2009, Oracle.  All rights reserved. 
   
  Connected to: 
   
  Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production 
   
  With the Partitioning, OLAP, Data Mining and Real Application Testing options 
   
  SQL> show parameter memory_target; 
   
  NAME                                 TYPE        VALUE 
   
  ------------------------------------ ----------- ------------------------------ 
   
  memory_target                        big integer 1552M 
   
  SQL> show parameter memory_max_target; 
   
  NAME                                 TYPE        VALUE 
   
  ------------------------------------ ----------- ------------------------------ 
   
  memory_max_target                    big integer 1552M 
   
  SQL>  
   
  SQL> alter system set memory_target=0 scope=both; 
   
  SQL> alter system set memory_max_target=0 scope=spfile;

若是Oracle 是11g之後的版本，那麼默認建立的實例會使用Automatic Memory Management (AMM)的特性，該特性與HugePage不兼容。在設置HugePage以前須要先禁用AMM。設置初始化參數MEMORY_TARGET 和MEMORY_MAX_TARGET 爲0便可。

使用AMM的狀況下，全部的SGA 內存都是在/dev/shm 下分配的，所以在分配SGA時不會使用HugePage。這也是AMM 與HugePage不兼容的緣由。

另外：默認狀況下ASM instance 也是使用AMM的，但由於ASM 實例不須要大SGA，因此對ASM 實例使用HugePages意義不大。

若是咱們要使用HugePage，那麼就必須先確保沒有設置MEMORY_TARGET/ MEMORY_MAX_TARGET參數。

步驟4：確保你全部的ORACLE數據庫實例都已經啓動（包括ASM實例），而後運行hugepages_settings.sh（具體參考Mos文檔Document 401749.1，腳本內容以下）獲取內核參數vm.nr_hugepages的大小。

 
  #! /bin/bash 
   
  # 
   
  # hugepages_settings.sh 
   
  # 
   
  # Linux bash script to compute values for the 
   
  # recommended HugePages/HugeTLB configuration 
   
  # on Oracle Linux 
   
  # 
   
  # Note: This script does calculation for all shared memory 
   
  # segments available when the script is run, no matter it 
   
  # is an Oracle RDBMS shared memory segment or not. 
   
  # 
   
  # This script is provided by Doc ID 401749.1 from My Oracle Support  
   
  # http://support.oracle.com 
   
  # Welcome text 
   
  echo " 
   
  This script is provided by Doc ID 401749.1 from My Oracle Support  
   
  (http://support.oracle.com) where it is intended to compute values for  
   
  the recommended HugePages/HugeTLB configuration for the current shared  
   
  memory segments on Oracle Linux. Before proceeding with the execution please note following: 
   
   * For ASM instance, it needs to configure ASMM instead of AMM. 
   
   * The 'pga_aggregate_target' is outside the SGA and  
   
     you should accommodate this while calculating SGA size. 
   
   * In case you changes the DB SGA size,  
   
     as the new SGA will not fit in the previous HugePages configuration,  
   
     it had better disable the whole HugePages,  
   
     start the DB with new SGA size and run the script again. 
   
  And make sure that: 
   
   * Oracle Database instance(s) are up and running 
   
   * Oracle Database 11g Automatic Memory Management (AMM) is not setup  
   
     (See Doc ID 749851.1) 
   
   * The shared memory segments can be listed by command: 
   
       # ipcs -m 
   
  Press Enter to proceed..." 
   
  read 
   
  # Check for the kernel version 
   
  KERN=`uname -r | awk -F. '{ printf("%d.%d\n",$1,$2); }'` 
   
  # Find out the HugePage size 
   
  HPG_SZ=`grep Hugepagesize /proc/meminfo | awk '{print $2}'` 
   
  if [ -z "$HPG_SZ" ];then 
   
      echo "The hugepages may not be supported in the system where the script is being executed." 
   
      exit 1 
   
  fi 
   
  # Initialize the counter 
   
  NUM_PG=0 
   
  # Cumulative number of pages required to handle the running shared memory segments 
   
  for SEG_BYTES in `ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"` 
   
  do 
   
      MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q` 
   
      if [ $MIN_PG -gt 0 ]; then 
   
          NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q` 
   
      fi 
   
  done 
   
  RES_BYTES=`echo "$NUM_PG * $HPG_SZ * 1024" | bc -q` 
   
  # An SGA less than 100MB does not make sense 
   
  # Bail out if that is the case 
   
  if [ $RES_BYTES -lt 100000000 ]; then 
   
      echo "***********" 
   
      echo "** ERROR **" 
   
      echo "***********" 
   
      echo "Sorry! There are not enough total of shared memory segments allocated for  
   
  HugePages configuration. HugePages can only be used for shared memory segments  
   
  that you can list by command: 
   
      # ipcs -m 
   
  of a size that can match an Oracle Database SGA. Please make sure that: 
   
   * Oracle Database instance is up and running  
   
   * Oracle Database 11g Automatic Memory Management (AMM) is not configured" 
   
      exit 1 
   
  fi 
   
  # Finish with results 
   
  case $KERN in 
   
      '2.2') echo "Kernel version $KERN is not supported. Exiting." ;; 
   
      '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`; 
   
             echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;; 
   
      '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; 
   
      '3.8') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; 
   
      '3.10') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; 
   
      '4.1') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;; 
   
  esac 
   
  # End

 
  [root@mylnx02 ~]# ./hugepages_settings.sh  
   
  This script is provided by Doc ID 401749.1 from My Oracle Support  
   
  (http://support.oracle.com) where it is intended to compute values for  
   
  the recommended HugePages/HugeTLB configuration for the current shared  
   
  memory segments on Oracle Linux. Before proceeding with the execution please note following: 
   
   * For ASM instance, it needs to configure ASMM instead of AMM. 
   
   * The 'pga_aggregate_target' is outside the SGA and  
   
     you should accommodate this while calculating SGA size. 
   
   * In case you changes the DB SGA size,  
   
     as the new SGA will not fit in the previous HugePages configuration,  
   
     it had better disable the whole HugePages,  
   
     start the DB with new SGA size and run the script again. 
   
  And make sure that: 
   
   * Oracle Database instance(s) are up and running 
   
   * Oracle Database 11g Automatic Memory Management (AMM) is not setup  
   
     (See Doc ID 749851.1) 
   
   * The shared memory segments can be listed by command: 
   
       # ipcs -m 
   
  Press Enter to proceed... 
   
  Recommended setting: vm.nr_hugepages = 4098

Step 5：在/etc/sysctl.conf文件中設置vm.nr_hugepages參數

[root@mylnx02 ~]# vi /etc/sysctl.conf

vm.nr_hugepages=4098

Step 6：關閉全部的數據庫實例並重啓服務器。

Step 7：驗證配置是否正確，以下所示：

[oracle@mylnx02 ~]$ grep HugePages /proc/meminfo

HugePages_Total: 4098

HugePages_Free: 3439

HugePages_Rsvd: 3438

HugePages_Surp: 0

爲了確保HugePages配置的有效性，HugePages_Free值應該小於HugePages_Total 的值，而且有必定的HugePages_Rsvd的值。

The values in the output will vary. To make sure that the configuration is valid, the HugePages_Free value should be smaller than HugePages_Total and there should be some HugePages_Rsvd. HugePages_Rsvd counts free pages that are reserved for use (requested for an SGA, but not touched/mapped yet).

The sum of Hugepages_Free and HugePages_Rsvd may be smaller than your total combined SGA as instances allocate pages dynamically and proactively as needed.

Oracle 11.2.0.3以及之後版本，能夠經過檢查警報日誌來驗證是否對數據庫實例啓用了大頁面。啓動實例時，您應在警報日誌中參數列表前面看到以下內容：

****************** Large Pages Information *****************

Total Shared Global Region in Large Pages = 28 GB (100%)

Large Pages used by this instance: 14497 (28 GB)

Large Pages unused system wide = 1015 (2030 MB) (alloc incr 64 MB)

Large Pages configured system wide = 19680 (38 GB)

Large Page size = 2048 KB

另外，能夠經過如下參數控制數據庫對HugePage的使用方式(11gr2以後)：

use_large_pages = {true/only/false/auto}

默認值是true，若是系統設置Hugepages的話，SGA會優先使用hugepages，有多少用多少。若是設置爲false， SGA就不會使用hugepages。若是設置爲only 若是hugepages大小不夠的話，數據庫實例是沒法啓動的。設置爲auto，這個選項會觸發oradism進程從新配置linux內核，以增長hugepages的數量。通常設置爲true。

SQL> alter system set use_large_pages=true scope=spfile sid='*';

啓用透明大頁

啓用透明大頁很是簡單，能夠參考這篇博客Linux 關於Transparent Hugepages的介紹，這裏不作贅述。

參考資料：

https://help.marklogic.com/Knowledgebase/Article/View/16/0/linux-huge-pages-and-transparent-huge-pages

https://support.oracle.com/epmos/faces/DocumentDisplay?parent=DOCUMENT&sourceId=361468.1&id=401749.1

http://www.oracle.com/technetwork/cn/articles/servers-storage-dev/hugepages-2099009-zhs.html

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=500381499574891&id=361323.1&_afrWindowMode=0&_adf.ctrl-state=lxb6cxp3_100

https://developers.redhat.com/blog/2014/03/10/examining-huge-pages-or-transparent-huge-pages-performance/

https://oracle-base.com/articles/linux/configuring-huge-pages-for-oracle-on-linux-64

https://access.redhat.com/documentation/zh-CN/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Large_Memory_Optimization_Big_Pages_and_Huge_Pages-Configuring_Huge_Pages_in_Red_Hat_Enterprise_Linux_4_or_5.html

https://blogs.oracle.com/database4cn/linux-64hugepage

https://docs.oracle.com/cd/E11882_01/install.112/e41961/memry.htm#CBAFIFGJ

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。