valgrind和Kcachegrind性能分析工具詳解

時間 2021-02-01

標籤 html git github 算法數組緩存多線程架構 app 框架欄目系統性能简体版

原文原文鏈接

1、valgrind介紹

valgrind是運行在Linux上的一套基於仿真技術的程序調試和分析工具，用於構建動態分析工具的裝備性框架。它包括一個工具集，每一個工具執行某種類型的調試、分析或相似的任務，以幫助完善你的程序。Valgrind的架構是模塊化的，因此能夠容易的建立新的工具而又不會擾亂現有的結構。html

valgrind主要包含如下工具：git

一、memcheck：檢查程序中的內存問題，如泄漏、越界、非法指針等。

二、callgrind：檢測程序代碼的運行時間和調用過程，以及分析程序性能。

三、cachegrind：分析CPU的cache命中率、丟失率，用於進行代碼優化。

四、helgrind：用於檢查多線程程序的競態條件。

五、massif：堆棧分析器，指示程序中使用了多少堆內存等信息。

另外，也有一些大多數用戶不會用到的小工具： Lackey是一個示例工具，用於演示一些裝備的基礎性內容；Nulgrind是一個最小化的Valgrind工具，不作分析或者操做，僅用於測試目的。github

2、valgrind安裝及使用

安裝

建議從valgrind官網下載安裝，目前官網的最新包是3.16.1算法

$ mkdir valgrind-inst
$ cd valgrind-inst/
$ wget https://sourceware.org/pub/valgrind/valgrind-3.16.1.tar.bz2

$ ls
valgrind-3.16.1.tar.bz2

解壓後進行安裝，能夠指定安裝目錄，這樣的話記得設置環境變量數組

$ tar -xvf valgrind-3.16.1.tar.bz2
$ cd valgrind-3.16.1
$ ./configure --prefix=/usr/local/valgrind
$ make
$ make install

查看是否安裝成功緩存

$ valgrind --version
valgrind-3.16.1

工具集的使用

基本使用格式以下：多線程

usage: valgrind [options] prog-and-args

其支持衆多選項，咱們能夠經過valgrind --help來進行查看。架構

這裏咱們只介紹幾個較爲經常使用的選項app

--tool: 是最經常使用的選項，用於選擇使用valgrind工具集中的哪個工具。默認值爲memcheck。

--version: 用於打印valgrind的版本號

-q/--quiet: 安靜的運行，只打印錯誤消息；

-v/--verbose: 打印更詳細的信息；

--trace-children: 是否跟蹤子進程，默認值爲no;

--track-fds: 是否追蹤打開的文件描述符，默認爲no

--time-stamp=no|yes: 是否在打印出的每條消息以前加上時間戳信息。默認值爲no

--log-file=<file>: 指定將消息打印到某個文件

--default-suppressions: 加載默認的抑制參數。

--alignment: 指定malloc分配內存時的最小對齊字節數；

以下的一些選項用於Memcheck工具：

--leak-check=no|summary|full: 在退出時是否查找內存泄露。默認值爲summary

--show-leak-kinds=kind1,kind2,..: 顯示哪種類型的內存泄露。默認顯示definite和possible這兩種；

3、 Valgrind 工具詳解

1） memcheck

最經常使用的工具，用來檢測程序中出現的內存問題，全部對內存的讀寫都會被檢測到，一切對malloc、free、new、delete的調用都會被捕獲。因此，它能檢測如下問題：框架

一、使用未初始化的內存。若是在定義一個變量時沒有賦初始值，後邊即便賦值了，使用這個變量的時候Memcheck也會報"uninitialised value"錯誤。使用中會發現，valgrind提示不少這個錯誤，因爲關注的是內存泄漏問題，因此能夠用--undef-value-errors=選項把這個錯誤提示屏蔽掉，具體能夠看後面的選項解釋。

二、讀/寫釋放後的內存塊；

三、內存讀寫越界（數組訪問越界／訪問已經釋放的內存),讀/寫超出malloc分配的內存塊；

四、讀/寫不適當的棧中內存塊；

五、內存泄漏，指向一塊內存的指針永遠丟失；

六、不正確的malloc/free或new/delete匹配（重複釋放／使用不匹配的分配和釋放函數）；

七、內存覆蓋，memcpy()相關函數中的dst和src指針重疊。

用法：

將程序編譯生成可執行文件後執行：valgrind –leak-check=full ./程序名

注意：下面討論的全部測試代碼在編譯時最好都加上-g選項（用來在memcheck的輸出中生成行號）進行編譯。

測試程序驗證：

編寫測試程序

#include <stdlib.h>

void func() {
    char *p = new char[10];
}

int main() {
    func();

    return 0;
}

編譯後，用valgrind檢測程序。
若是設置了--leak-check=full，Memcheck會給出詳細的每一個塊是在哪裏分配，而且給出分配時函數調用堆棧（編譯的時候使用-g選項和去掉-o優化選項，就能夠獲得更詳細的函數信息，能夠精確到代碼的某一行）。能夠經過--show-leak-kinds選項來選擇要詳細報告哪幾種類型的錯誤。Memcheck會把函數調用堆棧相同或類似的內存塊信息，放到同一個條目來顯示，能夠經過--leak-resolution來控制這個"類似"判斷的力度。

$ g++ -g -o test leak.cpp
$ valgrind --tool=memcheck --leak-check=full ./test

檢測結果以下：

==6018== Memcheck, a memory error detector
==6018== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==6018== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==6018== Command: ./test
==6018== 
==6018== 
==6018== HEAP SUMMARY:
==6018==     in use at exit: 10 bytes in 1 blocks
==6018==   total heap usage: 1 allocs, 0 frees, 10 bytes allocated
==6018== 
==6018== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==6018==    at 0x4C2AC58: operator new[](unsigned long) (vg_replace_malloc.c:431)
==6018==    by 0x40062E: func() (leak.cpp:4)
==6018==    by 0x40063D: main (leak.cpp:8)
==6018== 
==6018== LEAK SUMMARY:
==6018==    definitely lost: 10 bytes in 1 blocks
==6018==    indirectly lost: 0 bytes in 0 blocks
==6018==      possibly lost: 0 bytes in 0 blocks
==6018==    still reachable: 0 bytes in 0 blocks
==6018==         suppressed: 0 bytes in 0 blocks
==6018== 
==6018== For lists of detected and suppressed errors, rerun with: -s
==6018== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

結果說明：

先看看輸出信息中的HEAP SUMMARY，它表示程序在堆上分配內存的狀況，其中的1 allocs
表示程序分配了 1 次內存，0 frees表示程序釋放了 0 次內存，10 bytes allocated表示分配了 10 個字節的內存。
另外，Valgrind 也會報告程序是在哪一個位置發生內存泄漏。

上面LEAK SUMMARY會打印5種不一樣的類型，這裏咱們簡單介紹一下：

definitely lost: 明確丟失的內存。程序中存在內存泄露，應儘快修復。當程序結束時若是一塊動態分配的內存沒有被釋放而且經過程序內的指針變量均沒法訪問這塊內存則會報這個錯誤；

indirectly lost: 間接丟失。當使用了含有指針成員的類或結構體時可能會報這個錯誤。這類錯誤無需直接修復，它們老是與definitely lost一塊兒出現，只要修復definitely lost便可。

possibly lost: 可能丟失。大多數狀況下應視爲與definitely lost同樣須要儘快修復，除非你的程序讓一個指針指向一塊動態分配的內存（但不是這塊內存的起始地址），而後經過運算獲得這塊內存的起始地址，再釋放它。當程序結束時若是一塊動態分配的內存沒有被釋放而且經過程序內的指針變量均沒法訪問這塊內存的起始地址，但能夠訪問其中的某一部分數據，則會報這個錯誤。

stil reachable: 能夠訪問，未丟失但也未釋放。若是程序是正常結束的，那麼它可能不會形成程序崩潰，但長時間運行有可能耗盡系統資源。

其餘幾種狀況，寫一個綜合的測試程序進行驗證。

// mixed.cpp

void func() {
    char *ptr = new char[10];
    ptr[10] = 'a';   // 內存越界

    memcpy(ptr + 1, ptr, 5);   // 踩內存

    delete []ptr;
    delete []ptr; // 重複釋放

    char *p;
    *p = 1;   // 非法指針
}

int main() {
    func();

    return 0;
}

編譯後，用valgrind檢測程序。

$ g++ -g -o test mixed.cpp
$ valgrind --tool=memcheck --leak-check=full ./test

檢測結果

==22786== Memcheck, a memory error detector
==22786== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==22786== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==22786== Command: ./test
==22786== 
==22786== Invalid write of size 1      // 內存越界
==22786==    at 0x4007FB: func() (mixed.cpp:6)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786==  Address 0x5a2404a is 0 bytes after a block of size 10 alloc'd
==22786==    at 0x4C2AC58: operator new[](unsigned long) (vg_replace_malloc.c:431)
==22786==    by 0x4007EE: func() (mixed.cpp:5)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786== 
==22786== Source and destination overlap in memcpy(0x5a24041, 0x5a24040, 5)  // 踩內存
==22786==    at 0x4C2E83D: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1033)
==22786==    by 0x400819: func() (mixed.cpp:8)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786== 
==22786== Invalid free() / delete / delete[] / realloc()    // 重複釋放
==22786==    at 0x4C2BBAF: operator delete[](void*) (vg_replace_malloc.c:649)
==22786==    by 0x40083F: func() (mixed.cpp:11)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786==  Address 0x5a24040 is 0 bytes inside a block of size 10 free'd
==22786==    at 0x4C2BBAF: operator delete[](void*) (vg_replace_malloc.c:649)
==22786==    by 0x40082C: func() (mixed.cpp:10)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786==  Block was alloc'd at
==22786==    at 0x4C2AC58: operator new[](unsigned long) (vg_replace_malloc.c:431)
==22786==    by 0x4007EE: func() (mixed.cpp:5)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786== 
==22786== Use of uninitialised value of size 8    // 非法指針
==22786==    at 0x400844: func() (mixed.cpp:14)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786== 
==22786== 
==22786== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==22786==  Bad permissions for mapped region at address 0x4008B0
==22786==    at 0x400844: func() (mixed.cpp:14)
==22786==    by 0x400851: main (mixed.cpp:18)
==22786== 
==22786== HEAP SUMMARY:
==22786==     in use at exit: 0 bytes in 0 blocks
==22786==   total heap usage: 1 allocs, 2 frees, 10 bytes allocated
==22786== 
==22786== All heap blocks were freed -- no leaks are possible
==22786== 
==22786== Use --track-origins=yes to see where uninitialised values come from  
==22786== For lists of detected and suppressed errors, rerun with: -s
==22786== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

可見valgrind將上述幾種狀況都檢測出來了。

2） Callgrind

和gprof相似的分析工具，但它對程序的運行觀察更爲入微，能給咱們提供更多的信息。和gprof不一樣的是，它不須要在編譯源代碼時附加特殊選項，但仍是推薦加上調試選項。Callgrind收集程序運行時的一些數據，創建函數調用關係圖，還能夠有選擇地進行cache模擬。在運行結束時，它會把分析數據寫入一個文件。callgrind_annotate能夠把這個文件的內容轉化成可讀的形式。

測試程序

#include <stdio.h>
#include <unistd.h>

void test() {
    sleep(1);
}
void func() {
    for(int i = 0; i < 10; i++) {
        test();
    }
}

int main() {
    func();
    printf("process is over!\n");

    return 0;
}

編譯後，用valgrind檢測程序。

$ g++ -g -o test callgrind.cpp
$ valgrind --tool=callgrind ./test
$ ls
callgrind.cpp  callgrind.out.3490  test

callgrind.out.3490就是callgrind生成的文件。

這裏介紹一個圖形化性能分析工具Kcachegrind

Kcachegrind官網地址

下載安裝後能夠用來分析callgrind生成的文件。

用Kcachegrind打開callgrind.out.3490這個文件，以下圖：

經過圖形化，咱們能夠很直觀的知道哪段程序執行慢，而且瞭解相關調用關係。

3） Cachegrind

Cache分析器，它模擬CPU中的一級緩存和二級緩存，可以精確地指出程序中cache的丟失和命中。若是須要，它還可以爲咱們提供cache丟失次數，內存引用次數，以及每行代碼，每一個函數，每一個模塊，整個程序產生的指令數。這對優化程序有很大的幫助。

它的使用方法也是：valgrind –tool=cachegrind ./程序名

4） Helgrind

它主要用來檢查多線程程序中出現的競爭問題。Helgrind尋找內存中被多個線程訪問，而又沒有一向加鎖的區域，這些區域每每是線程之間失去同步的地方，並且會致使難以發覺的錯誤。Helgrind實現了名爲Eraser的競爭檢測算法，並作了進一步改進，減小了報告錯誤的次數。不過，Helgrind仍然處於實驗狀態。

測試代碼：

#include <stdio.h>
#include <pthread.h>

#define NUM 10
int counter = 0;

void *threadfunc(void*) {
    for (int i = 0; i < NUM; i++) {
        counter += i;
    }
}

int main() {
    pthread_t tid1, tid2;

    pthread_create(&tid1, NULL, &threadfunc, NULL);
    pthread_create(&tid2, NULL, &threadfunc, NULL);

    // wait for thread to terminate
    pthread_join(tid1, NULL);
    pthread_join(tid2, NULL);

    printf("counter = %d\n", counter);

    return 0;
}

編譯後，用valgrind檢測程序。

$ g++ -g -o test helgrind.cpp -lpthread
$ valgrind --tool=helgrind ./test

檢測結果：

==27722== Helgrind, a thread error detector
==27722== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==27722== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==27722== Command: ./test
==27722== 
==27722== ---Thread-Announcement------------------------------------------
==27722== 
==27722== Thread #3 was created
==27722==    at 0x597589E: clone (in /usr/lib64/libc-2.17.so)
==27722==    by 0x4E43059: do_clone.constprop.4 (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x4E44569: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x4C30CFA: pthread_create_WRK (hg_intercepts.c:425)
==27722==    by 0x4C31DD8: pthread_create@* (hg_intercepts.c:458)
==27722==    by 0x400728: main (helgrind.cpp:17)
==27722== 
==27722== ---Thread-Announcement------------------------------------------
==27722== 
==27722== Thread #2 was created
==27722==    at 0x597589E: clone (in /usr/lib64/libc-2.17.so)
==27722==    by 0x4E43059: do_clone.constprop.4 (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x4E44569: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x4C30CFA: pthread_create_WRK (hg_intercepts.c:425)
==27722==    by 0x4C31DD8: pthread_create@* (hg_intercepts.c:458)
==27722==    by 0x40070D: main (helgrind.cpp:16)
==27722== 
==27722== ----------------------------------------------------------------
==27722== 
==27722== Possible data race during read of size 4 at 0x601048 by thread #3
==27722== Locks held: none
==27722==    at 0x4006CE: threadfunc(void*) (helgrind.cpp:9)
==27722==    by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387)
==27722==    by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x59758DC: clone (in /usr/lib64/libc-2.17.so)
==27722== 
==27722== This conflicts with a previous write of size 4 by thread #2
==27722== Locks held: none
==27722==    at 0x4006D9: threadfunc(void*) (helgrind.cpp:9)
==27722==    by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387)
==27722==    by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x59758DC: clone (in /usr/lib64/libc-2.17.so)
==27722==  Address 0x601048 is 0 bytes inside data symbol "counter"
==27722== 
==27722== ----------------------------------------------------------------
==27722== 
==27722== Possible data race during write of size 4 at 0x601048 by thread #3
==27722== Locks held: none
==27722==    at 0x4006D9: threadfunc(void*) (helgrind.cpp:9)
==27722==    by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387)
==27722==    by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x59758DC: clone (in /usr/lib64/libc-2.17.so)
==27722== 
==27722== This conflicts with a previous write of size 4 by thread #2
==27722== Locks held: none
==27722==    at 0x4006D9: threadfunc(void*) (helgrind.cpp:9)
==27722==    by 0x4C30EEE: mythread_wrapper (hg_intercepts.c:387)
==27722==    by 0x4E43EA4: start_thread (in /usr/lib64/libpthread-2.17.so)
==27722==    by 0x59758DC: clone (in /usr/lib64/libc-2.17.so)
==27722==  Address 0x601048 is 0 bytes inside data symbol "counter"
==27722== 
counter = 90
==27722== 
==27722== Use --history-level=approx or =none to gain increased speed, at
==27722== the cost of reduced accuracy of conflicting-access information
==27722== For lists of detected and suppressed errors, rerun with: -s
==27722== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

從上述結果知道，valgrind分析出了競態的狀況。

5） Massif

堆棧分析器，它能測量程序在堆棧中使用了多少內存，告訴咱們堆塊，堆管理塊和棧的大小。Massif能幫助咱們減小內存的使用，在帶有虛擬內存的現代系統中，它還可以加速咱們程序的運行，減小程序停留在交換區中的概率。

Massif對內存的分配和釋放作profile。程序開發者經過它能夠深刻了解程序的內存使用行爲，從而對內存使用進行優化。這個功能對C++尤爲有用，由於C++有不少隱藏的內存分配和釋放。

此外，lackey 和 nulgrind 也會提供。Lackey 是小型工具，不多用到；Nulgrind 只是爲開發者展現如何建立一個工具。咱們就不作介紹了。

參考列表：

valgrind的使用

Linux 下利用 valgrind工具進行內存泄露檢測和性能分析

valgrind詳解與使用實例

使用 Valgrind 檢測 C++內存泄漏

利用性能分析工具valgrind+KCachegrind分析

Linux性能分析工具與圖形化方法