前些日子深信服面試,面試官問到了如何調試段錯誤,一時還真不知道如何回答。雖然偶爾會遇到段錯誤,但都是程序運行提示段錯誤後回去修改代碼,而沒有深刻去了解。html
參考維基百科,段錯誤的一個比較完整的定義以下:java
In computing, a segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation; on x86 computers this is a form of general protection fault. In short, a segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (e.g., attempts to write to a read-only location, or to overwrite part of the operating system). Systems based on processors like the Motorola 68000 tend to refer to these events as Address or Bus errors. On Unix-like operating systems, a process that accesses invalid memory receives the SIGSEGV signal. On Microsoft Windows, a process that accesses invalid memory receives the STATUS_ACCESS_VIOLATION exception.
另外,維基百科還總結了一些引發段錯誤的典型緣由:linux
The following are some typical causes of a segmentation fault: 1. Dereferencing null pointers – this is special-cased by memory management hardware 2. Attempting to access a nonexistent memory address (outside process's address space) 3. Attempting to access memory the program does not have rights to (such as kernel structures in process context) 4. Attempting to write read-only memory (such as code segment)
These in turn are often caused by programming errors that result in invalid memory access: 1. Dereferencing or assigning to an uninitialized pointer (wild pointer, which points to a random memory address) 2. Dereferencing or assigning to a freed pointer (dangling pointer, which points to memory that has been freed/deallocated/deleted) 3. A buffer overflow 4. A stack overflow 5. Attempting to execute a program that does not compile correctly. (Some compilers will output an executable file despite the presence of compile-time errors.)
該部分主要參考自博文你的java/c/c++程序崩潰了?揭祕段錯誤(Segmentation fault)(3)。c++
做爲例子的代碼以下:面試
1 // stack.c 2 #include "stdio.h" 3 #include "string.h" 4 #include "stdlib.h" 5 6 7 int main(int argc,char** args) { 8 char * p = NULL; 9 *p = 0x0; 10 }
程序運行結果以下:dom
第1步 strace 查信號描述ide
strace -i -x -o segfault.txt ./segfault.o
獲得以下信息:
fetch
能夠知道:this
1.錯誤信號:SIGSEGV
3.錯誤碼:SEGV_MAPERR
3.錯誤內存地址:0x0
4.邏輯地址0x400507處出錯.spa
能夠猜想:
程序中有空指針訪問試圖向
0x0
寫入而引起段錯誤.
關於strace使用可參考博文 Linux strace 命令。
第2步 dmesg 查錯誤現場
dmesg
獲得:
可知:
1.錯誤類型:segfault ,即段錯誤(Segmentation Fault).
2.出錯時ip:0x400507
3.錯誤號:6,即110
第3步 收集已知結論
這裏 錯誤號和ip
是關鍵,錯誤號對照下面:
/* * Page fault error code bits: * * bit 0 == 0: no page found 1: protection fault * bit 1 == 0: read access 1: write access * bit 2 == 0: kernel-mode access 1: user-mode access * bit 3 == 1: use of reserved bit detected * bit 4 == 1: fault was an instruction fetch */ /*enum x86_pf_error_code { PF_PROT = 1 << 0, PF_WRITE = 1 << 1, PF_USER = 1 << 2, PF_RSVD = 1 << 3, PF_INSTR = 1 << 4, };*/
對照後可知:
錯誤號6 = 110 = (PF_USER | PF_WIRTE | 0).
即「用戶態」、「寫入型頁錯誤 」、「沒有與指定的地址相對應的頁」.
上面的信息與咱們最初的推斷吻合.
如今,對目前已知結論進行歸納以下:
1.錯誤類型:segfualt ,即段錯誤(Segmentation Fault).
2.出錯時ip:0x400507
3.錯誤號:6,即110
4.錯誤碼:SEGV_MAPERR 即地址沒有映射到對象.
5.錯誤緣由:對
0x0
進行寫操做引起了段錯誤,緣由是0x0
沒有與之對應的頁或者叫映射.
第4步 根據結論找到出錯代碼
gdb ./segfault.o
根據結論中的ip = 0x400507
當即獲得:
顯然,這驗證了咱們的結論:
咱們試圖將值
0x0
寫入地址0x0
從而引起寫入未映射的地址的段錯誤.
而且咱們找到了錯誤的代碼stack.c的第9行。
除了以上提到的方法,咱們還能夠經過調試 Core Dump 來肯定錯誤代碼:
關於 Core Dump 的詳細,可參考博文 Linux Core Dump。
你的java/c/c++程序崩潰了?揭祕段錯誤(Segmentation fault)(1)
你的java/c/c++程序崩潰了?揭祕段錯誤(Segmentation fault)(2)