來自:liuhangtiant連接:https://blog.csdn.net/liuhangtiant/article/details/109555795
概述
kprobe機制用於在內核中動態添加一些探測點,能夠知足一些調試需求。本文主要探尋kprobe的執行路徑,也就是說如何trap到kprobe,以及如何回到原路徑繼續執行。php
實例
先經過一個實例來感覺下kprobe,linux中有一個現成的實例:
samples/kprobes/kprobe_example.c
因爲當前驗證環境是基於qemu+arm64,我刪除了其餘架構的代碼,並稍稍作了一下改動:
linux
/* * NOTE: This example is works on x86 and powerpc. * Here's a sample kernel module showing the use of kprobes to dump a * stack trace and selected registers when _do_fork() is called. * * For more information on theory of operation of kprobes, see * Documentation/kprobes.txt * * You will see the trace data in /var/log/messages and on the console * whenever _do_fork() is invoked to create a new process. */ #include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> #define MAX_SYMBOL_LEN 64 static char symbol[MAX_SYMBOL_LEN] = "_do_fork"; module_param_string(symbol, symbol, sizeof(symbol), 0644); /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { .symbol_name = symbol, }; /* kprobe pre_handler: called just before the probed instruction is executed */ static int handler_pre(struct kprobe *p, struct pt_regs *regs) { pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," " pstate = 0x%lx\n", p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate); dump_stack(); /* A dump_stack() here will give a stack backtrace */ return 0; } /* kprobe post_handler: called after the probed instruction is executed */ static void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { pr_info("<%s> post_handler: p->addr = 0x%p, pstate = 0x%lx\n", p->symbol_name, p->addr, (long)regs->pstate); dump_stack(); } /* * fault_handler: this is called if an exception is generated for any * instruction within the pre- or post-handler, or when Kprobes * single-steps the probed instruction. */ static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) { pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr); /* Return 0 because we don't handle the fault. */ return 0; } static int __init kprobe_init(void) { int ret; kp.pre_handler = handler_pre; kp.post_handler = handler_post; kp.fault_handler = handler_fault; ret = register_kprobe(&kp); if (ret < 0) { pr_err("register_kprobe failed, returned %d\n", ret); return ret; } pr_info("Planted kprobe at %p\n", kp.addr); return 0; } static void __exit kprobe_exit(void) { unregister_kprobe(&kp); pr_info("kprobe at %p unregistered\n", kp.addr); } module_init(kprobe_init) module_exit(kprobe_exit) MODULE_LICENSE("GPL");
這段代碼很簡單,默認狀況下,kprobe作了3個鉤子,分別在_do_fork對應位置的指令執行以前,執行以後,以及出異常的時候。
插入該內核模塊以後,隨便輸入一條命令,可看到下面的打印:
apache
[ 19.882832] kprobe_example: loading out-of-tree module taints kernel. [ 19.900442] Planted kprobe at (____ptrval____) [ 19.908571] <_do_fork> pre_handler: p->addr = 0x(____ptrval____), pc = 0xffff0000080d2c98, pstate = 0x80000005 [ 19.913657] CPU: 0 PID: 1358 Comm: udevd Tainted: G O 4.18.0 #7 [ 19.916239] Hardware name: linux,dummy-virt (DT) [ 19.918400] Call trace: [ 19.919373] dump_backtrace+0x0/0x180 [ 19.920681] show_stack+0x14/0x20 [ 19.921817] dump_stack+0x90/0xb4 [ 19.923678] handler_pre+0x24/0x68 [kprobe_example] [ 19.926357] kprobe_breakpoint_handler+0xbc/0x160 [ 19.926627] brk_handler+0x70/0x88 [ 19.926802] do_debug_exception+0x94/0x160 [ 19.927102] el1_dbg+0x18/0x78 [ 19.927299] _do_fork+0x0/0x358 [ 19.927465] el0_svc_naked+0x30/0x34 [ 19.928973] <_do_fork> post_handler: p->addr = 0x(____ptrval____), pstate = 0x80000005 [ 19.929361] CPU: 0 PID: 1358 Comm: udevd Tainted: G O 4.18.0 #7 [ 19.929693] Hardware name: linux,dummy-virt (DT) [ 19.929962] Call trace: [ 19.930102] dump_backtrace+0x0/0x180 [ 19.930289] show_stack+0x14/0x20 [ 19.930461] dump_stack+0x90/0xb4 [ 19.934684] handler_post+0x24/0x30 [kprobe_example] [ 19.934968] post_kprobe_handler+0x54/0x98 [ 19.935234] kprobe_single_step_handler+0x74/0xa8 [ 19.935389] single_step_handler+0x3c/0xb0 [ 19.935516] do_debug_exception+0x94/0x160 [ 19.935642] el1_dbg+0x18/0x78 [ 19.935965] 0xffff000000ac8004 [ 19.936067] el0_svc_naked+0x30/0x34
probe和post鉤子獲得執行,這對查看內核的調用棧很是有幫助。架構
深刻探究
是否只能基於symbol_name作kprobe?
顯然不太可能,struct kprobe中有一個addr成員,很明顯是能夠直接基於地址作kprobe的。
把這段代碼:
函數
#define MAX_SYMBOL_LEN 64 static char symbol[MAX_SYMBOL_LEN] = "_do_fork"; module_param_string(symbol, symbol, sizeof(symbol), 0644); /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { .symbol_name = symbol, };
修改成:post
/* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { .addr= (kprobe_opcode_t *)0xffff0000080d2c98, };
效果是同樣的。this
kprobe是如何動態添加探針的?
這個確定要分析代碼了,好在代碼至關簡單:spa
register_kprobe |------arm_kprobe | |------__arm_kprobe | | |------arch_arm_kprobe /* arm kprobe: install breakpoint in text */ void __kprobes arch_arm_kprobe(struct kprobe *p) { patch_text(p->addr, BRK64_OPCODE_KPROBES); }
從註釋就能夠很明顯看出來,是把addr對應位置的指令修改成brk指令,固然這裏說的是ARM64架構。那麼一旦CPU執行到addr,就會觸發異常,trap到kprobe註冊的鉤子上。.net
post鉤子爲何會用到single step
從上面的調用棧能夠看到,post鉤子其實是經過單步斷點trap過來的?爲何須要用到單步斷點呢?這個其實很好解釋。咱們先來理一下kprobe的過程:debug
把addr位置的指令修改成brk指令
CPU執行到addr處trap到pre執行
pre執行完畢後須要把addr處的指令恢復
CPU繼續執行addr處的指令
CPU執行post
那麼CPU如何才能執行到post,很簡單,使能單步執行就能夠了。確定有人會說,能夠把addr+4的指令也替換成brk,這個確定是不行的,由於ARM64多是32位/16位指令混編的,即使是固定32位指令,CPU下一條要執行的指令也不必定是addr+4,好比當前addr是一條跳轉指令。
fault_handler 鉤子何時會用到
經過分析代碼可知,當發生page fault的時候,會調用當前正在running的kprobe的fault_handler鉤子,因此這裏發生page fault的代碼並不必定是addr處的指令,也多是pre或者post中的指令。我在pre中注入一段訪問0地址的邏輯:
static void * g_addr=0; static int handler_pre(struct kprobe *p, struct pt_regs *regs) __attribute__((optimize("O0"))); static int handler_pre(struct kprobe *p, struct pt_regs *regs) { pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," " pstate = 0x%lx\n", p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate); printk("%d\n", *(char *)g_addr); /* A dump_stack() here will give a stack backtrace */ return 0; }
經驗證確實調用到了fault_handler鉤子:
[ 17.272594] kprobe_example: loading out-of-tree module taints kernel. [ 17.294266] Planted kprobe at (____ptrval____) # # ls [ 19.072586] <(null)> pre_handler: p->addr = 0x(____ptrval____), pc = 0xffff0000080d2c98, pstate = 0x80000005 [ 19.073189] fault_handler: p->addr = 0x(____ptrval____), trap #-1778384890n [ 19.073568] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 19.074271] Mem abort info: [ 19.074393] ESR = 0x96000006 [ 19.074641] Exception class = DABT (current EL), IL = 32 bits [ 19.074887] SET = 0, FnV = 0 [ 19.075014] EA = 0, S1PTW = 0 [ 19.075174] Data abort info: [ 19.075324] ISV = 0, ISS = 0x00000006 [ 19.075455] CM = 0, WnR = 0 [ 19.075774] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____) [ 19.076005] [0000000000000000] pgd=00000000485c6003, pud=00000000bb2f4003, pmd=0000000000000000 [ 19.076596] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 19.076924] Modules linked in: kprobe_example(O) [ 19.077693] CPU: 0 PID: 1387 Comm: sh Tainted: G O 4.18.0 #7 [ 19.077927] Hardware name: linux,dummy-virt (DT) [ 19.078298] pstate: 400003c5 (nZcv DAIF -PAN -UAO) [ 19.078962] pc : handler_pre+0x50/0x70 [kprobe_example] [ 19.079149] lr : handler_pre+0x44/0x70 [kprobe_example] [ 19.079359] sp : ffff00000ac63c00 [ 19.079565] x29: ffff00000ac63c00 x28: ffff80007a3c9a80 [ 19.079821] x27: ffff000008ac1000 x26: 00000000000000dc [ 19.080047] x25: ffff80007dfb7788 x24: 0000000000000000 [ 19.080363] x23: ffff0000080d2c98 x22: ffff00000ac63d70 [ 19.080621] x21: ffff000000ac2000 x20: 0000800074f02000 [ 19.080863] x19: ffff0000090b5788 x18: ffffffffffffffff [ 19.081197] x17: 0000000000000000 x16: 0000000000000000 [ 19.081501] x15: ffff0000090d96c8 x14: 3030303030666666 [ 19.081720] x13: 667830203d206370 x12: ffff0000090d9940 [ 19.081933] x11: ffff0000085dd8d8 x10: 5f287830203d2072 [ 19.082189] x9 : 0000000000000017 x8 : 2065746174737020 [ 19.082455] x7 : 2c38396332643038 x6 : ffff80007dfb8240 [ 19.082660] x5 : ffff80007dfb8240 x4 : 0000000000000000 [ 19.082871] x3 : ffff80007dfbf030 x2 : 793b575e486def00 [ 19.083068] x1 : 0000000000000000 x0 : 0000000000000000 [ 19.083390] Process sh (pid: 1387, stack limit = 0x(____ptrval____)) [ 19.083783] Call trace: [ 19.084020] handler_pre+0x50/0x70 [kprobe_example] [ 19.084470] kprobe_breakpoint_handler+0xbc/0x160 [ 19.084693] brk_handler+0x70/0x88 [ 19.084839] do_debug_exception+0x94/0x160 [ 19.085132] el1_dbg+0x18/0x78 [ 19.085259] _do_fork+0x0/0x358 [ 19.085443] el0_svc_naked+0x30/0x34 [ 19.085939] Code: 95d9a53f b0000000 9101c000 f9400000 (39400000) [ 19.086713] ---[ end trace 3bb11c402bc37363 ]---
但因爲fault_handler中沒有對該異常作處理,因此依然掛死了。
fault_handler能夠用於報錯或者糾錯,報錯能夠自定義一些錯誤信息給用戶,以便分析錯誤;糾錯用於修改錯誤,那麼針對當前這個錯誤應該怎麼作糾錯呢?
在fault_handler中爲g_addr分配空間?,這顯然不行,g_addr確定已經被載入寄存器了,此時修改已經太遲。惟一的方法就是修改寄存器的值,而寄存器此時確定已經入棧了,因此必須修改寄存器在棧裏面的內容。
下面咱們來fixup這個掛死問題:
根據掛死信息
[ 19.084020] handler_pre+0x50/0x70 [kprobe_example]
是在handler_pre+0x50這個位置出異常的,經過反彙編得知這個位置對應的指令是:
50: 39400000 ldrb w0, [x0]
x0的內容是0,因此這裏是讀0地址,很明顯,g_addr被載入到了x0中,因此只要修改x0就能夠了。
fixup實現
修改fault_handler函數:
static int g_addr1=0x5a; static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr) { pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr); regs->regs[0] = (unsigned long)&g_addr1; /* Return 0 because we don't handle the fault. */ return 1; }
驗證
[ 58.882059] <(null)> pre_handler: p->addr = 0x(____ptrval____), pc = 0xffff0000080d2c98, pstate = 0x80000005 [ 58.882393] fault_handler: p->addr = 0x(____ptrval____), trap #-1778384890n [ 58.882411] 90 [ 58.882658] <(null)> post_handler: p->addr = 0x(____ptrval____), pstate = 0x80000005 [ 58.882960] CPU: 1 PID: 1388 Comm: sh Tainted: G O 4.18.0 #7
fault_handler以後,pre_handler打印了g_addr對應地址的內容是90,也就是0x5a。
大功告成,咱們成功的讓內核訪問了0地址,而且返回了0x5a。
固然,這一切都是假的!
(END)
更多精彩,盡在"Linux閱碼場",掃描下方二維碼關注