在本系列的上一篇博文裏,我已經介紹了進程/線程的基本含義以及一些相關數據結構,如今咱們來看看Linux中進程的管理。node
Linux內核定義了一個list_head結構,數據結構定義linux
struct list_head { struct list_head *next; struct list_head *prev; };
字段next 和 prev 分別表示通用雙向鏈表向前和向後的指針元素!list_head字段的指針中存放的是另外一個list_head字段的元素,而不是自己的數據結構地址。如圖
在咱們上一篇博客介紹到的進程描述符(task_struct)也有這個結構體,稱爲進程鏈表。進程鏈表是一個雙向循環鏈表,它把全部進程的描述符連接起來。每一個task_struct結構都包含一個list_head類型的字段tasks,這個結構的prev和next分別指向前面和後面的task_struct元素。
這個鏈表是一個循環的雙向鏈表,開始的時候只有init_task這一個進程,它是內核的第一個進程,它的初始化是經過靜態分配內存,"手動"(其它的進程初始化都是經過動態分配內存初始化的)進行的,每新建一個進程,就經過SET_LINKS宏將該進程的task_struct結構加入到這條雙向鏈表中,不過要注意的是若是一個進程新建一個線程(不包括主線程),也就是輕量級進程,它是不會加入該鏈表的。經過宏for_each_process能夠從init_task開始遍歷全部的進程。安全
#define for_each_task(p) for (p = &init_task ; (p = p->next_task) != &init_task ; )
當內核尋找一個新進程在CPU上運行時,必須只考慮可運行進程(即處在TASK_RUNNING狀態的進程)。把可運行狀態的進程組成一個雙向循環鏈表,也叫可運行隊列(runqueue)。
在task_struct結構中定義了兩個指針。session
struct task_struct *next_run, *prev_run;
由正在運行或是能夠運行的,其進程狀態均爲TASK_RUNNING的進程所組成的一個雙向循環鏈表,即run_queue就緒隊列。該鏈表的先後向指針用next_run和prev_run,鏈表的頭和尾都是init_task(即0號
進程)。
可是,爲了實如今固定的時間內選出「最佳」的可運行程序,內核將可運行進程的優先級劃分爲0-139,併爲此創建了140個可運行進程鏈表,用以組織處於TASK_RUNNING狀態的進程,每一個進程優先權對應一個不一樣的鏈表
linux內核定義了一個prio_array_t類型的結構體來管理這140個鏈表。每一個可運行的進程都在這140個鏈表中的一個,經過進程描述符結構中的run_list來實現,它也是一個list_head類型。enqueue_task是把進程描述符插入到某個可運行鏈表中,dequeue_task則從某個可運行鏈表中刪除該進程描述符。TASK_RUNNING狀態的prio_array_t類型的結構體是runqueue結構的arrays[1]成員。
數據結構
爲了經過pid找到進程的描述符,若是直接遍歷進程間互聯的鏈表來查找進程id爲pid的進程描述符顯然是低效的,因此爲了更爲高效的查找,linux內核使用了4個hash散列表來加快查找,之因此使用4個散列表,是爲了能根據不一樣的pid類型來查找進程描述符,它們分別是進程的pid,線程組領頭進程的pid,進程組領頭進程的pid,會話領頭進程的pid。每一個類型的散列表中是經過宏pid_hashfn(x)來進行散列值的計算的。每一個進程均可能同時處於這是個散列表中,因此在進程描述符中有一個類型爲pid結構的pids成員,經過它能夠將進程加入散列表中,pid結構中包含解決散列衝突的pid_chain成員,它是hlist_node類型的,還有一個是將相同pid鏈起來的pid_list,它是list_head類型。
less
struct pid_link { int nr; // pid的數值 struct hlist_node pid_chain; struct list_head pid_list; } struct task_struct { … struct pid_link pids[4]; … }
內核2.6,定義一個新的 struct task_security_struct,而後掛接到task_struct的void *security指針上,可是,內核3.x 在task_struct找不到security成員了,原來是將安全相關的信息剝離到一個叫作 cred 的結構體中,由cred負責保存進程安全上下文ide
The security context of a task 95 * 96 * The parts of the context break down into two categories: 97 * 98 * (1) The objective context of a task. These parts are used when some other 99 * task is attempting to affect this one. 100 * 101 * (2) The subjective context. These details are used when the task is acting 102 * upon another object, be that a file, a task, a key or whatever. 103 * 104 * Note that some members of this structure belong to both categories - the 105 * LSM security pointer for instance. 106 * 107 * A task has two security pointers. task->real_cred points to the objective 108 * context that defines that task's actual details. The objective part of this 109 * context is used whenever that task is acted upon. 110 * 111 * task->cred points to the subjective context that defines the details of how 112 * that task is going to act upon another object. This may be overridden 113 * temporarily to point to another security context, but normally points to the 114 * same context as task->real_cred. 115 */ 116 struct cred { 117 atomic_t usage; 118 #ifdef CONFIG_DEBUG_CREDENTIALS 119 atomic_t subscribers; /* number of processes subscribed */ 120 void *put_addr; 121 unsigned magic; 122 #define CRED_MAGIC 0x43736564 123 #define CRED_MAGIC_DEAD 0x44656144 124 #endif 125 uid_t uid; /* real UID of the task */ 126 gid_t gid; /* real GID of the task */ 127 uid_t suid; /* saved UID of the task */ 128 gid_t sgid; /* saved GID of the task */ 129 uid_t euid; /* effective UID of the task */ 130 gid_t egid; /* effective GID of the task */ 131 uid_t fsuid; /* UID for VFS ops */ 132 gid_t fsgid; /* GID for VFS ops */ 133 unsigned securebits; /* SUID-less security management */ 134 kernel_cap_t cap_inheritable; /* caps our children can inherit */ 135 kernel_cap_t cap_permitted; /* caps we're permitted */ 136 kernel_cap_t cap_effective; /* caps we can actually use */ 137 kernel_cap_t cap_bset; /* capability bounding set */ 138 #ifdef CONFIG_KEYS 139 unsigned char jit_keyring; /* default keyring to attach requested 140 * keys to */ 141 struct key *thread_keyring; /* keyring private to this thread */ 142 struct key *request_key_auth; /* assumed request_key authority */ 143 struct thread_group_cred *tgcred; /* thread-group shared credentials */ 144 #endif 145 #ifdef CONFIG_SECURITY 146 void *security; /* subjective LSM security */ 147 #endif 148 struct user_struct *user; /* real user ID subscription */ 149 struct user_namespace *user_ns; /* cached user->user_ns */ 150 struct group_info *group_info; /* supplementary groups for euid/fsgid */ 151 struct rcu_head rcu; /* RCU deletion hook */ 152 };
正如uid,euid的關係同樣,task_struct也有兩種身份cred函數
struct task_struct{ ... /* process credentials */ const struct cred __rcu *real_cred; /* objective and real subjective task credentials (COW) */ const struct cred __rcu *cred; /* effective (overridable) subjective task credentials (COW) */ ... }
這裏詳細說明如下這個安全上下文的做用。
linux系統中,一個對象操做另外一個對象時一般要作安全性檢查。如一個進程操做一個文件,要檢查進程是否有權限操做該文件。
linux內核中,credential機制的引入,正是對象間訪問所需權限的抽象;主體提供本身權限的證書,客體提供訪問本身所需權限的證書,根據主客體提供的證書及操做作安全性檢查。
證書管理術語:
客體:指用戶空間程序直接能夠操做的系統對象,如進程、文件、消息隊列、信號量、共享內存等;每一個客體都有一組憑證,每種客體有不一樣的憑證集
客體全部者:客體憑證集有一部分表示客體全部者;如文件中uid表示文件的全部者
主體:操做客體的對象;除進程外大多數系統對象都不是主體,但在特殊環境下某些對象是主體,如文件在設置F_SETOWN後能夠發送SIGIO信號到進程,這時文件就是主體,進程就是客體
行爲:主體怎樣操做客體,如讀寫執行文件等
客體上下文:客體被訪問時所需權限憑證集
主體上下文:主體的權限憑證集
規則:主體操做客體時,用於安全檢查
當主體操做客體時,根據主體上下文、客體上下文、操做來作安全計算,查找規則看主體是否有權限操做客體。
進程描述符中cred和real_cred字段分別指向主體與客體的證書學習
注:筆者尚未學習內核pwn的相關知識,因此這裏只是簡單介紹一下cred這個結構體在內核pwn中提權的做用,沒有具體例子說明
能夠經過執行commit_creds(prepare_kernel_cred(0))來得到root權限(root的uid、gid均爲0)
源碼以下:ui
/* /kernel/cred.c */ /** * prepare_kernel_cred - Prepare a set of credentials for a kernel service * @daemon: A userspace daemon to be used as a reference * * Prepare a set of credentials for a kernel service. This can then be used to * override a task's own credentials so that work can be done on behalf of that * task that requires a different subjective context. * * @daemon is used to provide a base for the security record, but can be NULL. * If @daemon is supplied, then the security data will be derived from that; * otherwise they'll be set to 0 and no groups, full capabilities and no keys. * * The caller may change these controls afterwards if desired. * * Returns the new credentials or NULL if out of memory. * * Does not take, and does not return holding current->cred_replace_mutex. */ struct cred *prepare_kernel_cred(struct task_struct *daemon) { const struct cred *old; struct cred *new; new = kmem_cache_alloc(cred_jar, GFP_KERNEL); if (!new) return NULL; kdebug("prepare_kernel_cred() alloc %p", new); if (daemon) old = get_task_cred(daemon); else old = get_cred(&init_cred); validate_creds(old); *new = *old; new->non_rcu = 0; atomic_set(&new->usage, 1); set_cred_subscribers(new, 0); get_uid(new->user); get_user_ns(new->user_ns); get_group_info(new->group_info); #ifdef CONFIG_KEYS new->session_keyring = NULL; new->process_keyring = NULL; new->thread_keyring = NULL; new->request_key_auth = NULL; new->jit_keyring = KEY_REQKEY_DEFL_THREAD_KEYRING; #endif #ifdef CONFIG_SECURITY new->security = NULL; #endif if (security_prepare_creds(new, old, GFP_KERNEL) < 0) goto error; put_cred(old); validate_creds(new); return new; error: put_cred(new); put_cred(old); return NULL; } EXPORT_SYMBOL(prepare_kernel_cred);
prepare_kernel_cred()
根據源碼註釋中的描述,這個函數返回一個cred結構體,能夠用於代替進程原來的cred以便可以完成須要不一樣subjective context的任務。若是提供了參數@daemon,那麼security data未來源於此,而這個參數也可爲空,而後內容字段會被設置成0(uid/gid都是0,就是root權限咯?)
/* /kernel/cred.c */ /** * commit_creds - Install new credentials upon the current task * @new: The credentials to be assigned * * Install a new set of credentials to the current task, using RCU to replace * the old set. Both the objective and the subjective credentials pointers are * updated. This function may not be called if the subjective credentials are * in an overridden state. * * This function eats the caller's reference to the new credentials. * * Always returns 0 thus allowing this function to be tail-called at the end * of, say, sys_setgid(). */ int commit_creds(struct cred *new) { struct task_struct *task = current; const struct cred *old = task->real_cred; kdebug("commit_creds(%p{%d,%d})", new, atomic_read(&new->usage), read_cred_subscribers(new)); BUG_ON(task->cred != old); #ifdef CONFIG_DEBUG_CREDENTIALS BUG_ON(read_cred_subscribers(old) < 2); validate_creds(old); validate_creds(new); #endif BUG_ON(atomic_read(&new->usage) < 1); get_cred(new); /* we will require a ref for the subj creds too */ /* dumpability changes */ if (!uid_eq(old->euid, new->euid) || !gid_eq(old->egid, new->egid) || !uid_eq(old->fsuid, new->fsuid) || !gid_eq(old->fsgid, new->fsgid) || !cred_cap_issubset(old, new)) { if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; /* * If a task drops privileges and becomes nondumpable, * the dumpability change must become visible before * the credential change; otherwise, a __ptrace_may_access() * racing with this change may be able to attach to a task it * shouldn't be able to attach to (as if the task had dropped * privileges without becoming nondumpable). * Pairs with a read barrier in __ptrace_may_access(). */ smp_wmb(); } /* alter the thread keyring */ if (!uid_eq(new->fsuid, old->fsuid)) key_fsuid_changed(task); if (!gid_eq(new->fsgid, old->fsgid)) key_fsgid_changed(task); /* do it * RLIMIT_NPROC limits on user->processes have already been checked * in set_user(). */ alter_cred_subscribers(new, 2); if (new->user != old->user) atomic_inc(&new->user->processes); rcu_assign_pointer(task->real_cred, new); rcu_assign_pointer(task->cred, new); if (new->user != old->user) atomic_dec(&old->user->processes); alter_cred_subscribers(old, -2); /* send notifications */ if (!uid_eq(new->uid, old->uid) || !uid_eq(new->euid, old->euid) || !uid_eq(new->suid, old->suid) || !uid_eq(new->fsuid, old->fsuid)) proc_id_connector(task, PROC_EVENT_UID); if (!gid_eq(new->gid, old->gid) || !gid_eq(new->egid, old->egid) || !gid_eq(new->sgid, old->sgid) || !gid_eq(new->fsgid, old->fsgid)) proc_id_connector(task, PROC_EVENT_GID); /* release the old obj and subj refs both */ put_cred(old); put_cred(old); return 0; } EXPORT_SYMBOL(commit_creds);
根據源碼註釋的描述,這個函數會將當前進程的real_cred和cred都設置成一組新的cred。 綜上,經過prepare_kernel_cred(0)得到一個root的cred,而後再用commit_creds()將其安裝到當前進程,即commit_creds(prepare_kernel_cred(0)),這樣就能夠提權啦!