crash 是目前普遍使用的 linux 內核崩潰轉儲文件的分析工具,掌握 crash 的使用技巧,對於分析定位內核崩潰的問題,有着很是重要的做用。本文首先介紹了 crash 的基本概念和安裝方法,其次詳細介紹瞭如何使用 crash 工具分析內核崩潰轉儲文件,包括各類經常使用調試命令的使用方法,最後以幾個實際工做中遇到的真實案例向讀者展現了 crash 的強大功能。在這篇文章中,既有詳細的工具使用方法,又有豐富的實際案例分析,相信您讀過之後定會受益不淺。html
如前文所述,當 linux 系統內核發生崩潰的時候,能夠經過 kdump 等方式收集內核崩潰以前的內存,生成一個轉儲文件 vmcore。內核開發者經過分析該 vmcore 文件就能夠診斷出內核崩潰的緣由,從而進行操做系統的代碼改進。那麼 crash 就是一個被普遍使用的內核崩潰轉儲文件分析工具,掌握 crash 的使用技巧,對於定位問題有着十分重要的做用。linux
因爲 crash 用於調試內核崩潰的轉儲文件,所以使用 crash 須要依賴以下條件:redis
kernel 映像文件 vmlinux 在編譯的時候必須指定了 -g 參數,即帶有調試信息。sass
須要有一個內存崩潰轉儲文件(例如 vmcore),或者能夠經過 /dev/mem 或 /dev/crash 訪問的實時系統內存。若是 crash 命令行沒有指定轉儲文件,則 crash 默認使用實時系統內存,這時須要 root 權限。bash
crash 支持的平臺處理器包括:x86, x86_64, ia64, ppc64, arm, s390, s390x ( 也有部分 crash 版本支持 Alpha 和 32-bit PowerPC,可是對於這兩種平臺的支持不保證長期維護 )。網絡
crash 支持 2.2.5-15(含)之後的 Linux 內核版本。隨着 Linux 內核的更新,crash 也在不斷升級以適應新的內核。數據結構
要想使用 crash 調試內核轉儲文件,須要安裝 crash 工具和內核調試信息包。不一樣的發行版安裝包名稱略有差別,這裏僅列出 RHEL 和 SLES 發行版對應的安裝包名稱以下:架構
表 1. crash 工具和內核調試包app
系統版本 | crash 工具名稱 | 內核調試信息包 |
---|---|---|
RHEL6.2 | crash | kernel-debuginfo-common kernel-debuginfo |
SLES11SP2 | crash | kernel-default-debuginfo kernel-ppc64-debuginfo |
以 RHEL 爲例,安裝 crash 及內核調試信息包的步驟以下:curl
rpm -ivh crash-5.1.8-1.el6.ppc64.rpm rpm -ivh kernel-debuginfo-common-ppc64-2.6.32-220.el6.ppc64.rpm rpm -ivh kernel-debuginfo-2.6.32-220.el6.ppc64.rpm
使用 crash 調試轉儲文件,須要在命令行輸入兩個參數:debug kernel 和 dump file,其中 dump file 是內核轉儲文件的名稱,debug kernel 是由內核調試信息包安裝的,不一樣的發行版名稱略有不一樣,以 RHEL 和 SLES 爲例:
RHEL6.2:/usr/lib/debug/lib/modules/2.6.32-220.el6.ppc64/vmlinux SLES11SP2:/usr/lib/debug/boot/vmlinux-3.0.13-0.27-ppc64.debug
使用 crash -h 或 man crash 能夠查看 crash 支持的一系列選項,這裏僅以經常使用的選項爲例說明以下:
-h:打印幫助信息
-d:設置調試級別
-S:使用 /boot/System.map 做爲默認的映射文件
-s:不顯示版本、初始調試信息等,直接進入命令行
-i file:啓動以後自動運行 file 中的命令,再接受用戶輸入
crash 命令啓動後,會產生一個轉儲文件的分析報告摘要,以下圖所示。
[root@curlylp1 ~]# crash crash 5.1.8-1.el6 Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.0 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.32-220.el6.ppc64/vmlinux DUMPFILE: /dev/mem CPUS: 2 DATE: Thu Feb 2 00:31:34 2012 UPTIME: 58 days, 22:52:43 LOAD AVERAGE: 76.11, 77.40, 77.83 TASKS: 481 NODENAME: curlylp1.upt.austin.ibm.com RELEASE: 2.6.32-220.el6.ppc64 VERSION: #1 SMP Wed Nov 9 08:02:37 EST 2011 MACHINE: ppc64 (5009 Mhz) MEMORY: 4 GB PID: 30510 COMMAND: "crash" TASK: c00000006ddbe460 [THREAD_INFO: c000000073268000] CPU: 0 STATE: TASK_RUNNING (ACTIVE) crash>
KERNEL: 系統崩潰時運行的 kernel 文件
DUMPFILE: 內核轉儲文件
CPUS: 所在機器的 CPU 數量
DATE: 系統崩潰的時間
TASKS: 系統崩潰時內存中的任務數
NODENAME: 崩潰的系統主機名
RELEASE: 和 VERSION: 內核版本號
MACHINE: CPU 架構
MEMORY: 崩潰主機的物理內存
PANIC: 崩潰類型,常見的崩潰類型包括:
SysRq (System Request):經過魔法組合鍵致使的系統崩潰,一般是測試使用。經過 echo c > /proc/sysrq-trigger,就能夠觸發系統崩潰。
oops:能夠當作是內核級的 Segmentation Fault。應用程序若是進行了非法內存訪問或執行了非法指令,會獲得 Segfault 信號,通常行爲是 coredump,應用程序也能夠本身截獲 Segfault 信號,自行處理。若是內核本身犯了這樣的錯誤,則會彈出 oops 信息。
crash 命令行啓動後,能夠經過一些內置命令來打印系統崩潰前的信息。
bt - backtrace
bt 命令用於查看系統崩潰前的堆棧等信息,這是系統調試中很是經常使用和好用的一個命令。
清單 2. bt 命令結果
crash> bt PID: 2860 TASK: c0000000677e9550 CPU: 0 COMMAND: "bash" R0: 0000000000000001 R1: c0000000018978b0 R2: c00000000061c460 R3: c000000001897920 R4: 0000000000000000 R5: 0000000000000000 R6: 0000000000019e07 R7: 0000000000000000 R8: 000000000a000000 R9: c000000072938d80 R10: c0000000006b5d58 R11: c000000000740178 R12: 0000000000000000 R13: c00000000054ea80 R14: 00000000100d0000 R15: 0000000000000000 R16: 00000000100e2ab8 R17: 00000000100b0000 R18: 00000000100d0000 R19: 00000000100d0000 R20: 0000000000000000 R21: 0000000000000000 R22: 00000000100e8a28 R23: 0000000000000000 R24: 8000000000009032 R25: 0000000000000000 R26: 0000000000000000 R27: 0000000000000063 R28: 0000000000000006 R29: 0000000000000000 R30: c00000000058bfe8 R31: c0000000005a5ed0 NIP: c00000000009d9b0 MSR: 8000000000001032 OR3: c000000001897ab0 CTR: c00000000028b6ec LR: c00000000028b708 XER: 0000000000000005 CCR: 0000000000000006 MQ: 0000000000000000 DAR: c0000000005a5ed0 DSISR: c000000001897b10 Syscall Result: 0000000000000000 NIP [c00000000009d9b0] .crash_kexec LR [c00000000028b708] .sysrq_handle_crashdump #0 [c0000000018978b0] .crash_kexec at c00000000009d9e0 #1 [c000000001897a90] .sysrq_handle_crashdump at c00000000028b708 #2 [c000000001897b10] .__handle_sysrq at c00000000028b1fc #3 [c000000001897bc0] .write_sysrq_trigger at c00000000015eadc #4 [c000000001897c50] .proc_reg_write at c000000000156670 #5 [c000000001897cf0] .vfs_write at c0000000000fd490 #6 [c000000001897d90] .sys_write at c0000000000fdc00 #7 [c000000001897e30] syscall_exit at c0000000000086a4 syscall [c00] exception frame: R0: 0000000000000004 R1: 00000000ffb6f820 R2: 00000000f7fe95c0 R3: 0000000000000001 R4: 00000000f7d70000 R5: 0000000000000002 R6: 0000000000000001 R7: ffffffffffffffff R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00000000100dc8e8 R14: 00000000100d0000 R15: 0000000000000000 R16: 00000000100e2ab8 R17: 00000000100b0000 R18: 00000000100d0000 R19: 00000000100d0000 R20: 0000000000000000 R21: 0000000000000000 R22: 00000000100e8a28 R23: 0000000000000000 R24: 0000000000000001 R25: 00000000100e9718 R26: 0000000000000000 R27: 0000000000000002 R28: 000000000ff703f8 R29: 00000000f7d70000 R30: 000000000ff6fff4 R31: 0000000000000002 NIP: 000000000fec0988 MSR: 000000000000d032 OR3: 0000000000000001 CTR: 000000000fe59270 LR: 000000000fe592dc XER: 0000000000000000 CCR: 0000000040242442 MQ: 00000000010b6c30 DAR: 00000000f7d70000 DSISR: 0000000042000000 Syscall Result: 0000000000000000 crash>
如上輸出中,以「# 數字」開頭的行爲調用堆棧,即系統崩潰前內核依次調用的一系列函數,經過這個能夠迅速推斷內核在何處崩潰。
log - dump system message buffer
log 命令能夠打印系統消息緩衝區,從而可能找到系統崩潰的線索。log 命令的截圖以下(爲節省篇幅,已將部分行省略):
清單 3. log 命令結果
crash> log Crash kernel location must be 0x2000000 Using pSeries machine description Page orders: linear mapping = 24, virtual = 16, io = 12 Found initrd at 0xc000000001500000:0xc000000001c90400 Partition configured for 2 cpus. Starting Linux PPC64 #1 SMP Tue Jan 24 20:12:50 EST 2012 ----------------------------------------------------- ppc64_pft_size = 0x19 physicalMemorySize = 0x80000000 ppc64_caches.dcache_line_size = 0x80 ppc64_caches.icache_line_size = 0x80 htab_address = 0x0000000000000000 htab_hash_mask = 0x3ffff ----------------------------------------------------- Linux version 2.6.18-307.el5 (mockbuild@ppc-001.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50)) #1 SMP Tue Jan 24 20:12:50 EST 2012 [boot]0012 Setup Arch Node 0 Memory: 0x0-0x80000000
ps - display process status information
ps 命令用於顯示進程的狀態,(如圖)帶 > 標識表明是活躍的進程。ps 命令的截圖以下(省略部分行):
清單 4. ps 命令結果
crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 c00000000054e190 RU 0.0 0 0 [swapper] 0 1 1 c00000007ff15150 RU 0.0 0 0 [swapper] 1 0 1 c00000007ff15960 IN 0.1 4672 2688 init 2 1 0 c00000007ff14940 IN 0.0 0 0 [migration/0] 3 1 0 c00000007ff14130 IN 0.0 0 0 [ksoftirqd/0] 4 1 0 c00000007ff13920 IN 0.0 0 0 [watchdog/0] 5 1 1 c00000007ff13110 IN 0.0 0 0 [migration/1] 6 1 1 c00000007ff12900 IN 0.0 0 0 [ksoftirqd/1] 7 1 1 c00000007ff120f0 IN 0.0 0 0 [watchdog/1] 8 1 0 c00000007ff118e0 IN 0.0 0 0 [events/0] 9 1 1 c00000007ff1ba20 IN 0.0 0 0 [events/1] 10 1 1 c00000007ff110d0 IN 0.0 0 0 [khelper] 139 1 0 c0000000015822f0 IN 0.0 0 0 [kthread] 143 139 0 c000000001c6eb00 IN 0.0 0 0 [kblockd/0] 144 139 1 c000000001580ac0 IN 0.0 0 0 [kblockd/1] 145 139 0 c000000001c6f310 IN 0.0 0 0 [cqueue/0] 146 139 1 c0000000015802b0 IN 0.0 0 0 [cqueue/1] 150 139 0 c00000007ff1e270 IN 0.0 0 0 [khubd] 152 139 0 c00000007ff1ea80 IN 0.0 0 0 [kseriod] 169 1 1 c000000001c62170 IN 0.0 0 0 [rtasd] 209 139 0 c00000007f4ca370 IN 0.0 0 0 [khungtaskd] > 1771 1 1 c000000001c36a80 RU 0.1 4096 2240 syslogd
dis - disassembling instruction
dis 命令用於對給定地址的內容進行反彙編。dis 命令的截圖以下:
清單 5. dis 命令結果
crash> dis -l c000000000255900 /usr/src/debug/kernel-ppc64-3.0.8/linux-3.0/fs/proc/mmu.c: 47 0xc000000000255900 <.get_vmalloc_info+112>: ld r10,8(r11) 5.5 struct – view data struct struct 命令用於查看數據結構的定義原型。命令截圖以下: crash> struct -o vm_struct struct vm_struct { [0] struct vm_struct *next; [8] void *addr; [16] long unsigned int size; [24] long unsigned int flags; [32] struct page **pages; [40] unsigned int nr_pages; [48] phys_addr_t phys_addr; [56] void *caller; } SIZE: 64
如前文所述,當 linux 系統內核發生崩潰的時候,能夠經過 kdump 等方式收集內核崩潰以前的內存,生成一個轉儲文件 vmcore。內核開發者經過分析該 vmcore 文件就能夠診斷出內核崩潰的緣由,從而進行操做系統的代碼改進。那麼 crash 就是一個被普遍使用的內核崩潰轉儲文件分析工具,掌握 crash 的使用技巧,對於定位問題有着十分重要的做用。
這裏採用筆者在實際測試工做中發現的 SLES 系統下的系統崩潰問題做爲案例來進行講解。該系統已經配置了 kdump 啓用,所以在系統發生崩潰以後,在 /var/crash/ 當天日期 / 目錄下面生成一個 vmcore 文件,下面咱們來對這個文件進行分析。
清單 6. 啓動 crash
# crash vmlinux-3.0.8-0.11-ppc64 vmcore crash 5.1.9 Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.0 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-unknown-linux-gnu"... KERNEL: vmlinux-3.0.8-0.11-ppc64 DUMPFILE: vmcore CPUS: 40 DATE: Wed Nov 16 20:17:11 2011 UPTIME: 10:37:23 LOAD AVERAGE: 60.00, 60.00, 60.00 TASKS: 811 NODENAME: eellp1 RELEASE: 3.0.8-0.11-ppc64 VERSION: #1 SMP Thu Nov 10 16:28:46 UTC 2011 (3cea58b) MACHINE: ppc64 (3550 Mhz) MEMORY: 4 GB PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details) PID: 5563 COMMAND: "sh" TASK: c0000000faac3700 [THREAD_INFO: c0000000f8ce0000] CPU: 36 STATE: TASK_RUNNING (PANIC) crash>
能夠看到內核版本是 3.0.8-0.11-ppc64,這是一個 sles11sp2 的開發版本。
清單 7. bt 命令
crash> bt PID: 5563 TASK: c0000000faac3700 CPU: 36 COMMAND: "sh" #0 [c0000000f8ce31b0] .crash_kexec at c0000000001039f8 #1 [c0000000f8ce33b0] .die at c000000000020158 #2 [c0000000f8ce3450] .bad_page_fault at c000000000045004 #3 [c0000000f8ce34d0] handle_page_fault at c000000000005ec8 Data Access error [300] exception frame: R0: 0000000000130000 R1: c0000000f8ce37c0 R2: c000000000f876d8 R3: c000000001224dc8 R4: 0000000000000001 R5: 0000000000000000 R6: cfffffffffffffff R7: 0000000002220000 R8: 2ffffffff1f10000 R9: d00000000e0f0000 R10: 0000000000000000 R11: 0000000100000000 R12: 0000000082002424 R13: c000000001f06c00 R14: 000000001003e270 R15: 0000000000000001 R16: 0000000000000001 R17: 0000000000000000 R18: 0000000000000000 R19: c0000000f820b4b8 R20: c0000000f8ce3df8 R21: c000000000fe2400 R22: 00000fffb53d0000 R23: fffffffffffff000 R24: 0000000000000400 R25: 000000000000ed99 R26: 0000000000002000 R27: 0000000000002e58 R28: c000000001224dc8 R29: c000000001224dc0 R30: c000000000ef2658 R31: c0000000f8ce39a0 NIP: c000000000255900 MSR: 8000000000009032 OR3: c000000000005278 CTR: c000000000263a08 LR: c0000000002558dc XER: 0000000000000001 CCR: 0000000022002444 MQ: 0000000000000001 DAR: 0000000100000008 DSISR: 0000000040000000 Syscall Result: 0000000000000000 ..... #4 [c0000000f8ce37c0] .get_vmalloc_info at c000000000255900 [Link Register ] [c0000000f8ce37c0] .get_vmalloc_info at c0000000002558dc (un reliable) #5 [c0000000f8ce3850] .meminfo_proc_show at c000000000263ad8 #6 [c0000000f8ce3b40] .seq_read at c00000000020aa44 #7 [c0000000f8ce3c30] .proc_reg_read at c000000000258ccc #8 [c0000000f8ce3ce0] .vfs_read at c0000000001dee60 #9 [c0000000f8ce3d80] .sys_read at c0000000001df06c #10 [c0000000f8ce3e30] syscall_exit at c0000000000097ec syscall [c01] exception frame: R0: 0000000000000003 R1: 00000ffff3cceb60 R2: 00000fffb5305c40 R3: 0000000000000008 R4: 00000fffb53d0000 R5: 0000000000000400 R6: 0000000000000001 R7: 00000fffb5249f88 R8: 800000000200f032 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00000fffb50b8110 NIP: 00000fffb523d0c4 MSR: 800000000200f032 OR3: 0000000000000008 CTR: 00000fffb51dae70 LR: 00000fffb51daeac XER: 0000000000000001 CCR: 0000000044002422 MQ: 0000000000000001 DAR: 00000fffb51dcd60 DSISR: 0000000040000000 Syscall Result: 00000fffb53d0000 Crash>
清單 8. dis 命令
crash> dis -l c000000000255900 /usr/src/debug/kernel-ppc64-3.0.8/linux-3.0/fs/proc/mmu.c: 47 0xc000000000255900 <.get_vmalloc_info+112>: ld r10,8(r11)
清單 9. linux 源碼
21 void get_vmalloc_info(struct vmalloc_info *vmi) 22 { 23 struct vm_struct *vma; …… 46 for (vma = vmlist; vma; vma = vma->next) { 47 unsigned long addr = (unsigned long) vma->addr
用 struct 命令查看數據結構
清單 10. struct 命令
crash> struct -o vm_struct struct vm_struct { [0] struct vm_struct *next; [8] void *addr; [16] long unsigned int size; [24] long unsigned int flags; [32] struct page **pages; [40] unsigned int nr_pages; [48] phys_addr_t phys_addr; [56] void *caller; } SIZE: 64 crash>
對照源碼和反彙編代碼,咱們發現第 47 行的源碼,實際對應的就是反彙編的代碼
ld r10,8(r11) # 將寄存器 r11 的第 8 個 byte 後的內容,load 到寄存器 r10
清單 11. struct 命令
crash> struct vm_struct 0000000100000000 struct: invalid kernel virtual address: 0000000100000000 crash>
說明 r11 的內容已經被破壞,並非指向一個 vm_struct 結構了。
通過上面的層層分析,咱們推測問題的產生過程以下:mmu.c 第 46 行, vma = vma->next 取到了一個錯誤的地址,致使第 47 行 addr = (unsigned long) vma->addr 產生了內核錯誤。固然,更深層的緣由,還須要對代碼邏輯進行分析,找出致使這個現象的根源。
本節列出了使用 crash 過程當中可能會碰到的問題,並給出了相應的解決對策。
清單 12. 缺乏調試信息包
[root@bondlp1 2012-02-02-01:37]# crash crash 5.1.8-1.el5 Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. crash: /boot/vmlinuz-2.6.18-307.el5: no debugging data available crash: vmlinuz-2.6.18-307.el5.debug: debuginfo file not found crash: either install the appropriate kernel debuginfo package, or copy vmlinuz-2.6.18-307.el5.debug to this machine
遇到這種問題時,須要安裝內核調試信息包,再從新運行 crash 命令。
清單 13. vmlinux 和 vmcore 版本不匹配
[root@bondlp1 2012-02-02-01:37]# crash /usr/lib/debug/lib/modules/2.6.18-305.el5/vmlinux vmcore crash 5.1.8-1.el5 Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.0 Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "powerpc64-unknown-linux-gnu"... WARNING: kernel version inconsistency between vmlinux and dumpfile please wait... (gathering module symbol data) WARNING: cannot access vmalloc'd module memory crash: invalid kernel virtual address: 8000000000b663c8 type: "runqueues entry (per_cpu)"
這種狀況說明你所使用的 vmlinux 與產生 vmcore 的內核版本不一致,須要使用相同版本的內核來調試 vmcore 文件。
清單 14. core 文件不完整
[root@bondlp1 2012-02-02-01:37]# crash /usr/lib/debug/lib/modules/2.6.18-305.el5/vmlinux vmcore crash 5.1.8-1.el5 Copyright (C) 2002-2011 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. WARNING: vmcore: may be truncated or incomplete PT_LOAD p_offset: 1610681124 p_filesz: 234881024 bytes required: 1845562148 dumpfile size: 1638400000
這個提示說明你使用的 vmcore 文件不完整。致使這個問題的緣由可能有多種,硬盤空間不足,網絡 dump 時網絡中斷等等。對於這種狀況,咱們須要從新 dump 一個完整的 vmcore 進行分析調試。
對於內核開發人員,crash 已經成了必不可少的一個工具。內核當然高深,可是經過 kdump 和 crash 這對戰友的親密配合,不少問題都會迎刃而解。本文僅爲您介紹了 crash 的基本知識,更多的技巧還須要讀者在實踐中不斷探索和總結。