如何使用crash分析vmcore - 之基礎思路case1

如何使用crash分析vmcore - 之基礎思路case1

dmesg查看內核日誌

[2493382.671020] systemd-shutdown[1]: Sending SIGKILL to PID 28975 (docker-containe).
[2493382.671078] systemd-shutdown[1]: Sending SIGKILL to PID 29015 (systemd).
[2493420.208723] EXT4-fs (nvme0n1p1): sb orphan head is 140906170
[2493420.209198] sb_info orphan list:
[2493420.209663]   inode nvme0n1p1:140906170 at ffff88490edabfb8: mode 100666, nlink 0, next 149423507
[2493420.210129]   inode nvme0n1p1:149423507 at ffff8801b99391a8: mode 100666, nlink 0, next 17567381
[2493420.210583]   inode nvme0n1p1:17567381 at ffff8806d4a26998: mode 100744, nlink 0, next 17570510
[2493420.211050]   inode nvme0n1p1:17570510 at ffff886387f82ef8: mode 100644, nlink 0, next 17570503
[2493420.211508]   inode nvme0n1p1:17570503 at ffff886a1f15bfb8: mode 100644, nlink 0, next 241700498
[2493420.211966]   inode nvme0n1p1:241700498 at ffff8877481800e8: mode 100644, nlink 0, next 243138756
[2493420.212431]   inode nvme0n1p1:243138756 at ffff88761ad10518: mode 100644, nlink 0, next 241565954
[2493420.212900]   inode nvme0n1p1:241565954 at ffff8870d64bbfb8: mode 100755, nlink 0, next 241566333
[2493420.213366]   inode nvme0n1p1:241566333 at ffff88721ae74c48: mode 100644, nlink 0, next 241050093
[2493420.213833]   inode nvme0n1p1:241050093 at ffff887704958948: mode 100755, nlink 0, next 241567324
[2493420.214545] ------------[ cut here ]------------
[2493420.219336] kernel BUG at fs/ext4/super.c:879!  <<<======這裏指明BUG的代碼位置
[2493420.223948] invalid opcode: 0000 [#1] SMP
[2493420.228133] Modules linked in: kpatch_D751550(OE) kpatch_D631237(OE) unix_diag(E) af_packet_diag(E) netlink_diag(E) dccp_diag(E) dccp(E) tcp_diag(E) udp_diag(E) inet_diag(E) [last unloaded: aisqos_hotfixes]
[2493420.246846] CPU: 58 PID: 1 Comm: systemd-shutdow Tainted: G        W  OE K 4.9.79-009.ali3000.alios7.x86_64 #1
[2493420.257009] Hardware name: Inventec     AliServer Thor01-2U             /TB800G4-G1      , BIOS A1.20 03/06/2018
[2493420.267339] task: ffff887e45918000 task.stack: ffffc90000014000
[2493420.273425] RIP: 0010:[<ffffffffa031a8df>]  [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]  <<<=======這裏指明BUG的代碼位置
[2493420.282593] RSP: 0018:ffffc90000017de8  EFLAGS: 00010206
[2493420.288079] RAX: ffff88490edabf50 RBX: ffff887e43299000 RCX: 00000001949b336d
[2493420.295384] RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000206
[2493420.302682] RBP: ffffc90000017e18 R08: 00000000000081a4 R09: 0000000000000000
[2493420.309988] R10: 0000000000000cb8 R11: 0000000000001e92 R12: ffff887e43299278
[2493420.317293] R13: ffff887e43298800 R14: ffff887e43299278 R15: ffffffffa034ff88
[2493420.324598] FS:  00007f3241ccf840(0000) GS:ffff887e78480000(0000) knlGS:0000000000000000
[2493420.332850] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2493420.338767] CR2: 00007f5e1372fbd0 CR3: 00000004daa52000 CR4: 00000000007606f0
[2493420.346065] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2493420.353361] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[2493420.360660] PKRU: 55555554
[2493420.363536] Stack:
[2493420.365721]  9cbae75a00000000 ffff887e43298800 ffffffffa034a5e0 ffff887e3818c7b8
[2493420.373365]  0000000000000000 ffff887e45918bb0 ffffc90000017e38 ffffffff81244aaf
[2493420.380991]  0000000000000083 ffff887e357b8680 ffffc90000017e58 ffffffff81244e37
[2493420.388617] Call Trace:
[2493420.391239]  [<ffffffff81244aaf>] generic_shutdown_super+0x6f/0x100
[2493420.397676]  [<ffffffff81244e37>] kill_block_super+0x27/0x70
[2493420.403508]  [<ffffffff81244f73>] deactivate_locked_super+0x43/0x70
[2493420.409945]  [<ffffffff8124547a>] deactivate_super+0x5a/0x60
[2493420.415770]  [<ffffffff81264b2f>] cleanup_mnt+0x3f/0x90
[2493420.421169]  [<ffffffff81264bc2>] __cleanup_mnt+0x12/0x20
[2493420.426733]  [<ffffffff810a7b50>] task_work_run+0x80/0xa0
[2493420.432306]  [<ffffffff810032ba>] exit_to_usermode_loop+0xaa/0xb0
[2493420.438572]  [<ffffffff81003baa>] syscall_return_slowpath+0xaa/0xb0
[2493420.445011]  [<ffffffff8171a783>] entry_SYSCALL_64_fastpath+0xc3/0xc5
[2493420.451623] Code: 60 04 00 00 48 8b 80 e0 00 00 <0f> 0b 49 c7 c7 88 ff 34 a0 49 8b
[2493420.459829] RIP  [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
[2493420.466633]  RSP <ffffc90000017de8>
crash>

經過dmesg日誌,咱們能夠經過兩個方法判斷 bug的代碼位置:node

1. [2493420.219336] kernel BUG at fs/ext4/super.c:879!

2. [2493420.273425] RIP: 0010:[<ffffffffa031a8df>]  [<ffffffffa031a8df>] ext4_put_super+0x36f/0x3c0 [ext4]
其中(0x36f表明和ext4_put_super函數入口的偏移量,0x3c0是基準地址 )

從2找到代碼crash的具體位置:linux

(gdb) p 0x36f
$11 = 879

反彙編函數,找到位置ios

crash> dis -l ext4_put_super

在crash中查看代碼

crash自己是能夠查看代碼的,前提是你須要加載模塊, 好比:docker

加載模塊ext4:tcp

crash> mod -s ext4
crash> mod  <<----列出全部的模塊

第879行:函數

crash> l *ext4_put_super+0x36f
0xffffffffa031a8df is in ext4_put_super (fs/ext4/super.c:879).
874              * isn't empty.  The on-disk one can be non-empty if we've
875              * detected an error and taken the fs readonly, but the
876              * in-memory list had better be clean by this point. */
877             if (!list_empty(&sbi->s_orphan))
878                     dump_orphan_list(sb, sbi);
879             J_ASSERT(list_empty(&sbi->s_orphan));
880
881             sync_blockdev(sb->s_bdev);
882             invalidate_bdev(sb->s_bdev);
883             if (sbi->journal_bdev && sbi->journal_bdev != sb->s_bdev) {

只有當咱們找到具體的代碼,才能進一步分析代碼,究竟爲何會crash,好比,這個函數的參數(多是某個struct)的值究竟是什麼?oop

bt打印棧

bt棧[exception RIP: ext4_put_super+879] 有能夠看到是在 函數ext4_put_super 的第879行this

crash> bt
PID: 1      TASK: ffff887e45918000  CPU: 58  COMMAND: "systemd-shutdow"
 #0 [ffffc90000017a58] machine_kexec at ffffffff810603e8
 #1 [ffffc90000017ab8] __crash_kexec at ffffffff811211cd
 #2 [ffffc90000017b80] __crash_kexec at ffffffff811212a5
 #3 [ffffc90000017b98] crash_kexec at ffffffff811212eb
 #4 [ffffc90000017bb8] oops_end at ffffffff81030905
 #5 [ffffc90000017be0] die at ffffffff81030ddb
 #6 [ffffc90000017c10] do_trap at ffffffff8102df02
 #7 [ffffc90000017c60] do_error_trap at ffffffff8102e2d9
 #8 [ffffc90000017d20] do_invalid_op at ffffffff8102e830
 #9 [ffffc90000017d30] invalid_op at ffffffff8171b63e
    [exception RIP: ext4_put_super+879]
    RIP: ffffffffa031a8df  RSP: ffffc90000017de8  RFLAGS: 00010206
    RAX: ffff88490edabf50  RBX: ffff887e43299000  RCX: 00000001949b336d
    RDX: 0000000000000000  RSI: 0000000000000206  RDI: 0000000000000206
    RBP: ffffc90000017e18   R8: 00000000000081a4   R9: 0000000000000000
    R10: 0000000000000cb8  R11: 0000000000001e92  R12: ffff887e43299278
    R13: ffff887e43298800  R14: ffff887e43299278  R15: ffffffffa034ff88
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
#12 [ffffc90000017e40] kill_block_super at ffffffff81244e37
#13 [ffffc90000017e60] deactivate_locked_super at ffffffff81244f73
#14 [ffffc90000017e80] deactivate_super at ffffffff8124547a
#15 [ffffc90000017e98] cleanup_mnt at ffffffff81264b2f
#16 [ffffc90000017eb0] __cleanup_mnt at ffffffff81264bc2
#17 [ffffc90000017ec0] task_work_run at ffffffff810a7b50
#18 [ffffc90000017f00] exit_to_usermode_loop at ffffffff810032ba
#19 [ffffc90000017f30] syscall_return_slowpath at ffffffff81003baa
#20 [ffffc90000017f50] entry_SYSCALL_64_fastpath at ffffffff8171a783
    RIP: 00007f3241195c47  RSP: 00007fffb3db5438  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 0000560b87fbd920  RCX: 00007f3241195c47
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 0000560b87fbdd10
    RBP: 0000560b87fbda00   R8: 0000000000000000   R9: 00007f32410e416d
    R10: 0000000000000021  R11: 0000000000000246  R12: 0000560b87fbdd10
    R13: 00007fffb3db5538  R14: 00007fffb3db5523  R15: 0000000000000000
    ORIG_RAX: 00000000000000a6  CS: 0033  SS: 002b
crash>

反彙編上下函數

當咱們,分析到了出錯的具體的代碼行,下一步須要分析,傳入的參數和structurl

首先,咱們須要看下 函數 ext4_put_super的原型,發現是static void ext4_put_super(struct super_block *sb),只有一個參數, 並且是一個結構體struct super_block, 如今咱們須要知道 *sb 指針的地址是多少呢? 那這個地址確定是 上個函數 generic_shutdown_super 傳遞給它的.spa

如今分析的關鍵是,咱們須要知道,當generic_shutdown_superffffffff81244aaf 處,調用到 ext4_put_super的時候,傳給 ext4_put_super 的指針地址是多少?

首先,須要 反彙編 函數generic_shutdown_super 找到地址ffffffff81244aaf

crash> dis -l generic_shutdown_super 
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 436
0xffffffff81244aa0 <generic_shutdown_super+96>: mov    0x30(%r12),%rax
0xffffffff81244aa5 <generic_shutdown_super+101>:        test   %rax,%rax
0xffffffff81244aa8 <generic_shutdown_super+104>:        je     0xffffffff81244aaf <generic_shutdown_super+111>
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 437
0xffffffff81244aaa <generic_shutdown_super+106>:        mov    %rbx,%rdi   <===rbx 和 rdi 數據一致
0xffffffff81244aad <generic_shutdown_super+109>:        callq  *%rax     <===在這裏調用下個函數
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/include/linux/compiler.h: 243
0xffffffff81244aaf <generic_shutdown_super+111>:        mov    0x608(%rbx),%rax
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/super.c: 439
0xffffffff81244ab6 <generic_shutdown_super+118>:        lea    0x608(%rbx),%rdx
0xffffffff81244abd <generic_shutdown_super+125>:        cmp    %rax,%rdx
0xffffffff81244ac0 <generic_shutdown_super+128>:        jne    0xffffffff81244b1f <generic_shutdown_super+223>

接着,反彙編ext4_put_super , 你會發現push了不少的寄存器的值到stack

crash> dis -l ext4_put_super
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 824
0xffffffffa031a570 <ext4_put_super>:    nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa031a575 <ext4_put_super+5>:  push   %rbp
0xffffffffa031a576 <ext4_put_super+6>:  mov    %rsp,%rbp
0xffffffffa031a579 <ext4_put_super+9>:  push   %r15   <===第1個寄存器入棧
0xffffffffa031a57b <ext4_put_super+11>: push   %r14   <===第2個寄存器入棧
0xffffffffa031a57d <ext4_put_super+13>: push   %r13   <===第3個寄存器入棧
0xffffffffa031a57f <ext4_put_super+15>: push   %r12   <===第4個寄存器入棧
0xffffffffa031a581 <ext4_put_super+17>: mov    %rdi,%r13
0xffffffffa031a584 <ext4_put_super+20>: push   %rbx   <===第5個寄存器入棧(rbx是在上個函數的時候,就有值的,因此,ext4_put_super函數的第一個參數的指針的地址就是這個寄存器的值)
0xffffffffa031a585 <ext4_put_super+21>: sub    $0x8,%rsp
0xffffffffa031a589 <ext4_put_super+25>: mov    0x460(%rdi),%rbx
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 826
0xffffffffa031a590 <ext4_put_super+32>: mov    0xe0(%rbx),%r14
/usr/src/debug/kernel-4.9.79-009.ali3000/linux-4.9.79-009.ali3000.alios7.x86_64/fs/ext4/super.c: 830
0xffffffffa031a597 <ext4_put_super+39>: callq  0xffffffffa03133f0 <ext4_unregister_li_request>
crash> bt -f
#10 [ffffc90000017de0] ext4_put_super at ffffffffa031a91c [ext4]
    ffffc90000017de8: 9cbae75a00000000(           ) ffff887e43298800(第5個寄存器的值)
    ffffc90000017df8: ffffffffa034a5e0(第4個寄存器的值) ffff887e3818c7b8(第3個寄存器的值)
    ffffc90000017e08: 0000000000000000(第2個寄存器的值) ffff887e45918bb0(第1個寄存器的值)
    ffffc90000017e18: ffffc90000017e38 ffffffff81244aaf(這兩個是不表明寄存器的)
#11 [ffffc90000017e20] generic_shutdown_super at ffffffff81244aaf
    ffffc90000017e28: 0000000000000083 ffff887e357b8680
    ffffc90000017e38: ffffc90000017e58 ffffffff81244e37
crash> struct super_block ffff887e43298800
struct super_block {
  s_list = {
    next = 0xffffffff81cb3db0 <super_blocks>,    <=======這裏也驗證了,就是地址ffff887e43298800表示的就是 struct super_block 
    prev = 0xffff887e43968800
  },
  s_dev = 271581185,
  s_blocksize_bits = 12 '\f',
  s_blocksize = 4096,
  s_maxbytes = 17592186040320,
  s_type = 0xffffffffa03589c0 <ext4_fs_type>,
  s_op = 0xffffffffa034a5e0 <ext4_sops>,
  dq_op = 0xffffffffa034a720 <ext4_quota_operations>,
  s_qcop = 0xffffffff81843f60 <dquot_quotactl_sysfile_ops>,
  s_export_op = 0xffffffffa034a580 <ext4_export_ops>,
  s_flags = 805371904,
  s_iflags = 1,
  s_magic = 61267,
  s_root = 0x0,
  s_umount = {
    count = {
      counter = -4294967295
    },
    wait_list = {
      next = 0xffff887e43298878,
      prev = 0xffff887e43298878
    },
    wait_lock = {
      raw_lock = {
        val = {
          counter = 0
        }
      }

Refers

https://blog.csdn.net/u013982161/article/details/51347944

相關文章
相關標籤/搜索