linux下利用valgrind工具進行內存泄露檢測和性能分析

時間 2019-11-05

標籤 linux 利用 valgrind 工具進行內存泄露檢測性能分析欄目 Linux 简体版

原文原文鏈接

valgrind一般用來成分析程序性能及程序中的內存泄露錯誤python

一 Valgrind工具集簡紹程序員

Valgrind包含下列工具：算法

一、memcheck：檢查程序中的內存問題，如泄漏、越界、非法指針等。緩存

二、callgrind：檢測程序代碼的運行時間和調用過程，以及分析程序性能。多線程

三、cachegrind：分析CPU的cache命中率、丟失率，用於進行代碼優化。app

四、helgrind：用於檢查多線程程序的競態條件。ide

五、massif：堆棧分析器，指示程序中使用了多少堆內存等信息。svn

六、lackey：函數

七、nulgrind：工具

這幾個工具的使用是經過命令：valgrand --tool=name 程序名來分別調用的，當不指定tool參數時默認是 --tool=memcheck

二 Valgrind工具詳解

1.Memcheck

最經常使用的工具，用來檢測程序中出現的內存問題，全部對內存的讀寫都會被檢測到，一切對malloc、free、new、delete的調用都會被捕獲。因此，它能檢測如下問題：

一、對未初始化內存的使用；

二、讀/寫釋放後的內存塊；

三、讀/寫超出malloc分配的內存塊；

四、讀/寫不適當的棧中內存塊；

五、內存泄漏，指向一塊內存的指針永遠丟失；

六、不正確的malloc/free或new/delete匹配；

七、memcpy()相關函數中的dst和src指針重疊。

這些問題每每是C/C++程序員最頭疼的問題，Memcheck能在這裏幫上大忙。
例如：

#include <stdlib.h>
#include <malloc.h>
#include <string.h>
 
void test()
{
    int *ptr = malloc(sizeof(int)*10);
 
    ptr[10] = 7; // 內存越界
 
    memcpy(ptr +1, ptr, 5); // 踩內存
 
 
    free(ptr); 
    free(ptr);// 重複釋放
 
    int *p1;
    *p1 = 1; // 非法指針
}
 
int main(void)
{
    test();
    return 0;
}

將程序編譯生成可執行文件後執行：valgrind --leak-check=full ./程序名
輸出結果以下：

==4832== Memcheck, a memory error detector
==4832== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==4832== Using Valgrind-3.6.1 and LibVEX; rerun with -h for copyright info
==4832== Command: ./tmp
==4832==
==4832== Invalid write of size 4      // 內存越界
==4832==    at 0x804843F: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832== Address 0x41a6050 is 0 bytes after a block of size 40 alloc'd
==4832==    at 0x4026864: malloc (vg_replace_malloc.c:236)
==4832==    by 0x8048435: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==
==4832== Source and destination overlap in memcpy(0x41a602c, 0x41a6028, 5) // 踩內存
==4832==    at 0x4027BD6: memcpy (mc_replace_strmem.c:635)
==4832==    by 0x8048461: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==
==4832== Invalid free() / delete / delete[] // 重複釋放
==4832==    at 0x4025BF0: free (vg_replace_malloc.c:366)
==4832==    by 0x8048477: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832== Address 0x41a6028 is 0 bytes inside a block of size 40 free'd
==4832==    at 0x4025BF0: free (vg_replace_malloc.c:366)
==4832==    by 0x804846C: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==
==4832== Use of uninitialised value of size 4 // 非法指針
==4832==    at 0x804847B: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==
==4832==
==4832== Process terminating with default action of signal 11 (SIGSEGV) //因爲非法指針賦值致使的程序崩潰
==4832== Bad permissions for mapped region at address 0x419FFF4
==4832==    at 0x804847B: test (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==    by 0x804848D: main (in /home/yanghao/Desktop/testC/testmem/tmp)
==4832==
==4832== HEAP SUMMARY:
==4832==     in use at exit: 0 bytes in 0 blocks
==4832==   total heap usage: 1 allocs, 2 frees, 40 bytes allocated
==4832==
==4832== All heap blocks were freed -- no leaks are possible
==4832==
==4832== For counts of detected and suppressed errors, rerun with: -v
==4832== Use --track-origins=yes to see where uninitialised values come from
==4832== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 11 from 6)
Segmentation fault

從valgrind的檢測輸出結果看，這幾個錯誤都找了出來。

2.Callgrind

和gprof相似的分析工具，但它對程序的運行觀察更是入微，能給咱們提供更多的信息。和gprof不一樣，它不須要在編譯源代碼時附加特殊選項，但加上調試選項是推薦的。Callgrind收集程序運行時的一些數據，創建函數調用關係圖，還能夠有選擇地進行cache模擬。在運行結束時，它會把分析數據寫入一個文件。callgrind_annotate能夠把這個文件的內容轉化成可讀的形式。

生成可視化的圖形須要下載gprof2dot：http://jrfonseca.googlecode.com/svn/trunk/gprof2dot/gprof2dot.py

這是個python腳本，把它下載以後修改其權限chmod +7 gprof2dot.py ，並把這個腳本添加到$PATH路徑中的任一文件夾下，我是將它放到了/usr/bin目錄下，這樣就能夠直接在終端下執行gprof2dot.py了。

Callgrind能夠生成程序性能分析的圖形，首先來講說程序性能分析的工具吧，一般可使用gnu自帶的gprof，它的使用方法是：在編譯程序時添加-pg參數，例如：

#include <stdio.h>
#include <malloc.h>
void test()
{
    sleep(1);
}
void f()
{
    int i;
    for( i = 0; i < 5; i ++)
        test();
}
int main()
{
    f();
    printf("process is over!\n");
    return 0;
}

首先執行 gcc -pg -o tmp tmp.c，而後運行該程序./tmp，程序運行完成後會在當前目錄下生成gmon.out文件（這個文件gprof在分析程序時須要），
再執行gprof ./tmp | gprof2dot.py |dot -Tpng -o report.png，打開report.png結果：

顯示test被調用了5次，程序中耗時所佔百分比最多的是test函數。

再來看 Callgrind的生成調用圖過程吧，執行：valgrind --tool=callgrind ./tmp，執行完成後在目錄下生成"callgrind.out.XXX"的文件這是分析文件，能夠直接利用：callgrind_annotate callgrind.out.XXX 打印結果，也可使用：gprof2dot.py -f callgrind callgrind.out.XXX |dot -Tpng -o report.png 來生成圖形化結果:

它生成的結果很是詳細，甚至連函數入口，及庫函數調用都標識出來了。

3.Cachegrind

Cache分析器，它模擬CPU中的一級緩存I1，Dl和二級緩存，可以精確地指出程序中cache的丟失和命中。若是須要，它還可以爲咱們提供cache丟失次數，內存引用次數，以及每行代碼，每一個函數，每一個模塊，整個程序產生的指令數。這對優化程序有很大的幫助。

做一下廣告：valgrind自身利用該工具在過去幾個月內使性能提升了25%-30%。據早先報道，kde的開發team也對valgrind在提升kde性能方面的幫助表示感謝。

它的使用方法也是：valgrind --tool=cachegrind 程序名，

4.Helgrind

它主要用來檢查多線程程序中出現的競爭問題。Helgrind尋找內存中被多個線程訪問，而又沒有一向加鎖的區域，這些區域每每是線程之間失去同步的地方，並且會致使難以發掘的錯誤。Helgrind實現了名爲「Eraser」的競爭檢測算法，並作了進一步改進，減小了報告錯誤的次數。不過，Helgrind仍然處於實驗階段。

首先舉一個競態的例子吧：

#include <stdio.h> #include <pthread.h>
#define NLOOP 50
int counter = 0; /* incremented by threads */
void *threadfn(void *); int main(int argc, char **argv) { pthread_t tid1, tid2,tid3; pthread_create(&tid1, NULL, &threadfn, NULL); pthread_create(&tid2, NULL, &threadfn, NULL); pthread_create(&tid3, NULL, &threadfn, NULL); /* wait for both threads to terminate */ pthread_join(tid1, NULL); pthread_join(tid2, NULL); pthread_join(tid3, NULL); return 0; } void *threadfn(void *vptr) { int i, val; for (i = 0; i < NLOOP; i++) { val = counter; printf("%x: %d \n", (unsigned int)pthread_self(),  val+1); counter = val+1; } return NULL; }

這段程序的競態在30~32行，咱們想要的效果是3個線程分別對全局變量累加50次，最後全局變量的值爲150，因爲這裏沒有加鎖，很明顯競態使得程序不能達到咱們的目標。咱們來看Helgrind是如何幫咱們檢測到競態的。先編譯程序：gcc -o test thread.c -lpthread ，而後執行:valgrind --tool=helgrind ./test 輸出結果以下:
49c0b70: 1
49c0b70: 2
==4666== Thread #3 was created
==4666==    at 0x412E9D8: clone (clone.S:111)
==4666==    by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
==4666==    by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
==4666==    by 0x4026F8B: pthread_create@* (hg_intercepts.c:288)
==4666==    by 0x8048524: main (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==
==4666== Thread #2 was created
==4666==    at 0x412E9D8: clone (clone.S:111)
==4666==    by 0x40494B5: pthread_create@@GLIBC_2.1 (createthread.c:256)
==4666==    by 0x4026E2D: pthread_create_WRK (hg_intercepts.c:257)
==4666==    by 0x4026F8B: pthread_create@* (hg_intercepts.c:288)
==4666==    by 0x8048500: main (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==
==4666== Possible data race during read of size 4 at 0x804a028 by thread #3
==4666==    at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==    by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666==    by 0x4048E98: start_thread (pthread_create.c:304)
==4666==    by 0x412E9ED: clone (clone.S:130)
==4666== This conflicts with a previous write of size 4 by thread #2
==4666==    at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==    by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666==    by 0x4048E98: start_thread (pthread_create.c:304)
==4666==    by 0x412E9ED: clone (clone.S:130)
==4666==
==4666== Possible data race during write of size 4 at 0x804a028 by thread #2
==4666==    at 0x80485CA: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==    by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666==    by 0x4048E98: start_thread (pthread_create.c:304)
==4666==    by 0x412E9ED: clone (clone.S:130)
==4666== This conflicts with a previous read of size 4 by thread #3
==4666==    at 0x804859C: threadfn (in /home/yanghao/Desktop/testC/testmem/a.out)
==4666==    by 0x4026F60: mythread_wrapper (hg_intercepts.c:221)
==4666==    by 0x4048E98: start_thread (pthread_create.c:304)
==4666==    by 0x412E9ED: clone (clone.S:130)
==4666==
49c0b70: 3
......
55c1b70: 51
==4666==
==4666== For counts of detected and suppressed errors, rerun with: -v
==4666== Use --history-level=approx or =none to gain increased speed, at
==4666== the cost of reduced accuracy of conflicting-access information
==4666== ERROR SUMMARY: 8 errors from 2 contexts (suppressed: 99 from 31)