linux內核分析之fork()

時間 2019-11-18

標籤 linux 內核分析 fork 欄目 Linux 简体版

原文原文鏈接

　　從一個比較有意思的題開始提及，最近要找工做無心間看到一個關於unix/linux中fork()的面試題：linux

 1   1 #include<sys/types.h>
 2   2 #include<stdio.h>
 3   3 #include<unistd.h>
 4   4    int main(void)
 5   5         {
 6   6          int i;
 7   7          int buf[100]={1,2,3,4,5,6,7,8,9};
 8   8          for(i=0;i<2;i++)
 9   9           {
10  10             fork();
11  11             printf("+");
12  12             //write("/home/pi/code/test_fork/test_fork.txt",buf,8);
13  13             write(STDOUT_FILENO,"-",1);
14  14           }
15  15          return 0;
16  16 
17  17 }

　　題目要求是從上面的代碼中肯定輸出的「+」的數量，我後面加了一個「-」，再肯定輸出「-」的數量。面試

　　先給答案：「+」8次，「-」6次算法

1 ---++--++-++++

　　上面的這段代碼很簡單，包含的內容卻有不少，有進程產生、系統調用、不帶緩衝I/O、標準I/O。windows

　　linux中產生一個進程的調用函數過程以下：緩存

　　fork()---------->sys_fork()-------------->do_fork()---------->copy_process()session

　　fork()、vfork()、_clone()庫函數都根據各自須要的參數標誌去調用clone()，而後由clone()去調用do_fork()。do_fork()完成了建立中的大部分工做，該函數調用copy_process()函數，數據結構

　　從用戶空間調用fork()函數到執行系統調用產生軟件中斷陷入內核空間，在內核空間執行do_fork()函數，主要是複製父進程的頁表、內核棧等，若是要執行子進程代碼還要調用exac()函數拷貝硬盤上的代碼到位內存上，因爲剛建立的子進程沒有申請內存，目前和父進程共用父進程的代碼段、數據段等，沒有存放子進程本身代碼段數據段的內存，此時會產生一個缺頁異常，爲子進程申請內存，同時定製本身的全局描述GDT、局部描述符LDT、任務狀態描述符TSS，下面從代碼中分析這個過程而後在回答上面爲何「+」是8次，「-」6次。app

　　調用fork()函數執行到了unistd.h中的宏函數syscall0less

 1 /* XXX - _foo needs to be __foo, while __NR_bar could be _NR_bar. */
 2 /*
 3  * Don't remove the .ifnc tests; they are an insurance against
 4  * any hard-to-spot gcc register allocation bugs.
 5  */
 6 #define _syscall0(type,name) \
 7 type name(void) \
 8 { \
 9   register long __a __asm__ ("r10"); \
10   register long __n_ __asm__ ("r9") = (__NR_##name); \
11   __asm__ __volatile__ (".ifnc %0%1,$r10$r9\n\t" \
12             ".err\n\t" \
13             ".endif\n\t" \
14             "break 13" \
15             : "=r" (__a) \
16             : "r" (__n_)); \
17   if (__a >= 0) \
18      return (type) __a; \
19   errno = -__a; \
20   return (type) -1; \
21 }

　　將宏函數展開後變爲dom

 1 /* XXX - _foo needs to be __foo, while __NR_bar could be _NR_bar. */
 2 /*
 3  * Don't remove the .ifnc tests; they are an insurance against
 4  * any hard-to-spot gcc register allocation bugs.
 5  */
 7 int fork(void) 
 8 { 
 9   register long __a __asm__ ("r10"); \
10   register long __n_ __asm__ ("r9") = (__NR_##name); \
11   __asm__ __volatile__ (".ifnc %0%1,$r10$r9\n\t" \
12             ".err\n\t" \
13             ".endif\n\t" \
14             "break 13" \
15             : "=r" (__a) \
16             : "r" (__n_)); \
17   if (__a >= 0) \
18      return (type) __a; \
19   errno = -__a; \
20   return (type) -1; \
21 }

　　##的意思就是宏中的字符直接替換

　　若是name = fork，那麼在宏中__NR_##name就替換成了__NR_fork了。

　　__NR_##name是系統調用號，##指的是兩次宏展開．即用實際的系統調用名字代替"name",而後再把__NR_...展開．如name == ioctl，則爲__NR_ioctl。

　　上面的彙編目前仍是沒有怎麼弄懂-------

　　int $0x80 是全部系統調用函數的總入口，fork()是其中之一，「0」(_NR_fork) 意思是將fork在sys_call_table[]中對應的函數編號_NR_fork也就是2，將2傳給eax寄存器。這個編號就是sys_fork()函數在sys_call_table中的偏移值，其餘的系統調用在sys_call_table均存在偏移值()。

　　int $0x80 中斷返回後，將執行return (type) -1----->展開就是return (int) __a;產生int $0x80軟件中斷，CPU從3級特權的進程跳到0特權級內核代碼中執行。中斷使CPU硬件自動將SS、ESP、EFLAGGS、CS、EIP這五個寄存器的值按照這個順序壓人父進程的內核棧，這些壓棧的數據將在後續的copy_process()函數中用來初始化進程1的任務狀態描述符TSS

　　CPU自動壓棧完成後，跳轉到system_call.s中的_system_call處執行，繼續將DS、ES、FS、EDX、ECX、EBX壓棧(這些壓棧仍舊是爲了初始化子進程中的任務狀態描述符TSS作準備)。最終內核經過剛剛設置的eax的偏移值「2」查詢sys_call_table[],知道這次系統調用對應的函數是sys_fork()。跳轉到_sys_fork處執行。

　　注意：一個函數的參數不是由函數定義的，而是由函數定義之外的程序經過壓棧的方式「作」出來的，是操做系統底層代碼與應用程序代碼寫做手法的差別之一。咱們知道在C語言中函數運行時參數是存在棧中的，根據這個原理操做系統設計者能夠將前面程序強行壓棧的值做爲函數的參數，當調用這個函數時這些值就是函數的參數。

　　sys_fork函數

1 asmlinkage int sys_fork(void)
2 {
3 #ifndef CONFIG_MMU
4     /* fork almost works, enough to trick you into looking elsewhere:-( */
5     return -EINVAL;
6 #else
7     return do_fork(SIGCHLD, user_stack(__frame), __frame, 0, NULL, NULL);
8 #endif
9 }

　　do_fork函數

 1 /*
 2  *  Ok, this is the main fork-routine.
 3  *
 4  * It copies the process, and if successful kick-starts
 5  * it and waits for it to finish using the VM if required.
 6  */
 7 long do_fork(unsigned long clone_flags,
 8           unsigned long stack_start,
 9           struct pt_regs *regs,
10           unsigned long stack_size,
11           int __user *parent_tidptr,
12           int __user *child_tidptr)
13 {
14     struct task_struct *p;
15     int trace = 0;
16     struct pid *pid = alloc_pid();
17     long nr;
18 
19     if (!pid)
20         return -EAGAIN;
21     nr = pid->nr;
22     if (unlikely(current->ptrace)) {
23         trace = fork_traceflag (clone_flags);
24         if (trace)
25             clone_flags |= CLONE_PTRACE;
26     }
27 dup_task_struct
28     p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, pid);
29     /*
30      * Do this prior waking up the new thread - the thread pointer
31      * might get invalid after that point, if the thread exits quickly.
32      */
33     if (!IS_ERR(p)) {
34         struct completion vfork;
35 
36         if (clone_flags & CLONE_VFORK) {
37             p->vfork_done = &vfork;
38             init_completion(&vfork);
39         }
40 
41         if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) {
42             /*
43              * We'll start up with an immediate SIGSTOP.
44              */
45             sigaddset(&p->pending.signal, SIGSTOP);
46             set_tsk_thread_flag(p, TIF_SIGPENDING);
47         }
48 
49         if (!(clone_flags & CLONE_STOPPED))
50             wake_up_new_task(p, clone_flags);
51         else
52             p->state = TASK_STOPPED;
53 
54         if (unlikely (trace)) {
55             current->ptrace_message = nr;
56             ptrace_notify ((trace << 8) | SIGTRAP);
57         }
58 
59         if (clone_flags & CLONE_VFORK) {
60             freezer_do_not_count();
61             wait_for_completion(&vfork);
62             freezer_count();
63             if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE)) {
64                 current->ptrace_message = nr;
65                 ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP);
66             }
67         }
68     } else {
69         free_pid(pid);
70         nr = PTR_ERR(p);
71     }
72     return nr;
73 }

　　copy_process函數

  1 /*
  2  * This creates a new process as a copy of the old one,
  3  * but does not actually start it yet.
  4  *
  5  * It copies the registers, and all the appropriate
  6  * parts of the process environment (as per the clone
  7  * flags). The actual kick-off is left to the caller.
  8  */
  9 static struct task_struct *copy_process(unsigned long clone_flags,
 10                     unsigned long stack_start,
 11                     struct pt_regs *regs,
 12                     unsigned long stack_size,
 13                     int __user *parent_tidptr,
 14                     int __user *child_tidptr,
 15                     struct pid *pid)
 16 {
 17     int retval;
 18     struct task_struct *p = NULL;
 19 
 20     if ((clone_flags & (CLONE_NEWNS|CLONE_FS)) == (CLONE_NEWNS|CLONE_FS))
 21         return ERR_PTR(-EINVAL);
 22 
 23     /*
 24      * Thread groups must share signals as well, and detached threads
 25      * can only be started up within the thread group.
 26      */
 27     if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
 28         return ERR_PTR(-EINVAL);
 29 
 30     /*
 31      * Shared signal handlers imply shared VM. By way of the above,
 32      * thread groups also imply shared VM. Blocking this case allows
 33      * for various simplifications in other code.
 34      */
 35     if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
 36         return ERR_PTR(-EINVAL);
 37 
 38     retval = security_task_create(clone_flags);
 39     if (retval)
 40         goto fork_out;
 41 
 42     retval = -ENOMEM;
 43     p = dup_task_struct(current);
 44     if (!p)
 45         goto fork_out;
 46 sys_fork
 47     rt_mutex_init_task(p);
 48 
 49 #ifdef CONFIG_TRACE_IRQFLAGS
 50     DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
 51     DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 52 #endif
 53     retval = -EAGAIN;
 54     if (atomic_read(&p->user->processes) >=
 55             p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
 56         if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
 57                 p->user != &root_user)
 58             goto bad_fork_free;
 59     }
 60 
 61     atomic_inc(&p->user->__count);
 62     atomic_inc(&p->user->processes);
 63     get_group_info(p->group_info);
 64 
 65     /*
 66      * If multiple threads are within copy_process(), then this check
 67      * triggers too late. This doesn't hurt, the check is only there
 68      * to stop root fork bombs.
 69      */
 70     if (nr_threads >= max_threads)
 71         goto bad_fork_cleanup_count;
 72 
 73     if (!try_module_get(task_thread_info(p)->exec_domain->module))
 74         goto bad_fork_cleanup_count;
 75 
 76     if (p->binfmt && !try_module_get(p->binfmt->module))
 77         goto bad_fork_cleanup_put_domain;
 78 
 79     p->did_exec = 0;
 80     delayacct_tsk_init(p);    /* Must remain after dup_task_struct() */
 81     copy_flags(clone_flags, p);
 82     p->pid = pid_nr(pid);
 83     retval = -EFAULT;
 84     if (clone_flags & CLONE_PARENT_SETTID)
 85         if (put_user(p->pid, parent_tidptr))
 86             goto bad_fork_cleanup_delays_binfmt;
 87 
 88     INIT_LIST_HEAD(&p->children);
 89     INIT_LIST_HEAD(&p->sibling);
 90     p->vfork_done = NULL;
 91     spin_lock_init(&p->alloc_lock);
 92 
 93     clear_tsk_thread_flag(p, TIF_SIGPENDING);
 94     init_sigpending(&p->pending);
 95 
 96     p->utime = cputime_zero;
 97     p->stime = cputime_zero;
 98      p->sched_time = 0;
 99 #ifdef CONFIG_TASK_XACCT
100     p->rchar = 0;        /* I/O counter: bytes read */
101     p->wchar = 0;        /* I/O counter: bytes written */
102     p->syscr = 0;        /* I/O counter: read syscalls */
103     p->syscw = 0;        /* I/O counter: write syscalls */
104 #endif
105     task_io_accounting_init(p);
106     acct_clear_integrals(p);
107 
108      p->it_virt_expires = cputime_zero;
109     p->it_prof_expires = cputime_zero;
110      p->it_sched_expires = 0;
111      INIT_LIST_HEAD(&p->cpu_timers[0]);
112      INIT_LIST_HEAD(&p->cpu_timers[1]);
113      INIT_LIST_HEAD(&p->cpu_timers[2]);
114 
115     p->lock_depth = -1;        /* -1 = no lock */
116     do_posix_clock_monotonic_gettime(&p->start_time);
117     p->security = NULL;
118     p->io_context = NULL;
119     p->io_wait = NULL;
120     p->audit_context = NULL;
121     cpuset_fork(p);
122 #ifdef CONFIG_NUMA
123      p->mempolicy = mpol_copy(p->mempolicy);
124      if (IS_ERR(p->mempolicy)) {
125          retval = PTR_ERR(p->mempolicy);
126          p->mempolicy = NULL;
127          goto bad_fork_cleanup_cpuset;
128      }
129     mpol_fix_fork_child_flag(p);
130 #endif
131 #ifdef CONFIG_TRACE_IRQFLAGS
132     p->irq_events = 0;
133 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
134     p->hardirqs_enabled = 1;
135 #else
136     p->hardirqs_enabled = 0;
137 #endif
138     p->hardirq_enable_ip = 0;
139     p->hardirq_enable_event = 0;
140     p->hardirq_disable_ip = _THIS_IP_;
141     p->hardirq_disable_event = 0;
142     p->softirqs_enabled = 1;
143     p->softirq_enable_ip = _THIS_IP_;
144     p->softirq_enable_event = 0;
145     p->softirq_disable_ip = 0;
146     p->softirq_disable_event = 0;
147     p->hardirq_context = 0;
148     p->softirq_context = 0;
149 #endif
150 #ifdef CONFIG_LOCKDEP
151     p->lockdep_depth = 0; /* no locks held yet */
152     p->curr_chain_key = 0;
153     p->lockdep_recursion = 0;
154 #endif
155 
156 #ifdef CONFIG_DEBUG_MUTEXES
157     p->blocked_on = NULL; /* not blocked yet */
158 #endif
159 
160     p->tgid = p->pid;
161     if (clone_flags & CLONE_THREAD)
162         p->tgid = current->tgid;
163 
164     if ((retval = security_task_alloc(p)))
165         goto bad_fork_cleanup_policy;
166     if ((retval = audit_alloc(p)))
167         goto bad_fork_cleanup_security;
168     /* copy all the process information */
169     if ((retval = copy_semundo(clone_flags, p)))
170         goto bad_fork_cleanup_audit;
171     if ((retval = copy_files(clone_flags, p)))
172         goto bad_fork_cleanup_semundo;
173     if ((retval = copy_fs(clone_flags, p)))
174         goto bad_fork_cleanup_files;
175     if ((retval = copy_sighand(clone_flags, p)))
176         goto bad_fork_cleanup_fs;
177     if ((retval = copy_signal(clone_flags, p)))
178         goto bad_fork_cleanup_sighand;
179     if ((retval = copy_mm(clone_flags, p)))
180         goto bad_fork_cleanup_signal;
181     if ((retval = copy_keys(clone_flags, p)))
182         goto bad_fork_cleanup_mm;
183     if ((retval = copy_namespaces(clone_flags, p)))
184         goto bad_fork_cleanup_keys;
185     retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
186     if (retval)
187         goto bad_fork_cleanup_namespaces;
188 
189     p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
190     /*
191      * Clear TID on mm_release()?
192      */
193     p->clear_child_tid = (clone_flags & CLONE_CHILD_CLEARTID) ? child_tidptr: NULL;
194     p->robust_list = NULL;
195 #ifdef CONFIG_COMPAT
196     p->compat_robust_list = NULL;
197 #endif
198     INIT_LIST_HEAD(&p->pi_state_list);
199     p->pi_state_cache = NULL;
200 
201     /*
202      * sigaltstack should be cleared when sharing the same VM
203      */
204     if ((clone_flags & (CLONE_VM|CLONE_VFORK)) == CLONE_VM)
205         p->sas_ss_sp = p->sas_ss_size = 0;
206 
207     /*
208      * Syscall tracing should be turned off in the child regardless
209      * of CLONE_PTRACE.
210      */
211     clear_tsk_thread_flag(p, TIF_SYSCALL_TRACE);
212 #ifdef TIF_SYSCALL_EMU
213     clear_tsk_thread_flag(p, TIF_SYSCALL_EMU);
214 #endif
215 
216     /* Our parent execution domain becomes current domain
217        These must match for thread signalling to apply */
218     p->parent_exec_id = p->self_exec_id;
219 
220     /* ok, now we should be set up.. */
221     p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);
222     p->pdeath_signal = 0;
223     p->exit_state = 0;
224 
225     /*
226      * Ok, make it visible to the rest of the system.
227      * We dont wake it up yet.
228      */
229     p->group_leader = p;
230     INIT_LIST_HEAD(&p->thread_group);
231     INIT_LIST_HEAD(&p->ptrace_children);
232     INIT_LIST_HEAD(&p->ptrace_list);
233 
234     /* Perform scheduler related setup. Assign this task to a CPU. */
235     sched_fork(p, clone_flags);
236 
237     /* Need tasklist lock for parent etc handling! */
238     write_lock_irq(&tasklist_lock);
239 
240     /* for sys_ioprio_set(IOPRIO_WHO_PGRP) */
241     p->ioprio = current->ioprio;
242 
243     /*
244      * The task hasn't been attached yet, so its cpus_allowed mask will
245      * not be changed, nor will its assigned CPU.
246      *
247      * The cpus_allowed mask of the parent may have changed after it was
248      * copied first time - so re-copy it here, then check the child's CPU
249      * to ensure it is on a valid CPU (and if not, just force it back to
250      * parent's CPU). This avoids alot of nasty races.
251      */
252     p->cpus_allowed = current->cpus_allowed;
253     if (unlikely(!cpu_isset(task_cpu(p), p->cpus_allowed) ||
254             !cpu_online(task_cpu(p))))
255         set_task_cpu(p, smp_processor_id());
256 
257     /* CLONE_PARENT re-uses the old parent */
258     if (clone_flags & (CLONE_PARENT|CLONE_THREAD))
259         p->real_parent = current->real_parent;
260     else
261         p->real_parent = current;
262     p->parent = p->real_parent;
263 
264     spin_lock(&current->sighand->siglock);
265 
266     /*
267      * Process group and session signals need to be delivered to just the
268      * parent before the fork or both the parent and the child after the
269      * fork. Restart if a signal comes in before we add the new process to
270      * it's process group.
271      * A fatal signal pending means that current will exit, so the new
272      * thread can't slip out of an OOM kill (or normal SIGKILL).
273       */
274      recalc_sigpending();
275     if (signal_pending(current)) {
276         spin_unlock(&current->sighand->siglock);
277         write_unlock_irq(&tasklist_lock);
278         retval = -ERESTARTNOINTR;
279         goto bad_fork_cleanup_namespaces;
280     }
281 
282     if (clone_flags & CLONE_THREAD) {
283         p->group_leader = current->group_leader;
284         list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
285 
286         if (!cputime_eq(current->signal->it_virt_expires,
287                 cputime_zero) ||
288             !cputime_eq(current->signal->it_prof_expires,
289                 cputime_zero) ||
290             current->signal->rlim[RLIMIT_CPU].rlim_cur != RLIM_INFINITY ||
291             !list_empty(&current->signal->cpu_timers[0]) ||
292             !list_empty(&current->signal->cpu_timers[1]) ||
293             !list_empty(&current->signal->cpu_timers[2])) {
294             /*
295              * Have child wake up on its first tick to check
296              * for process CPU timers.
297              */
298             p->it_prof_expires = jiffies_to_cputime(1);
299         }
300     }
301 
302     if (likely(p->pid)) {
303         add_parent(p);
304         if (unlikely(p->ptrace & PT_PTRACED))
305             __ptrace_link(p, current->parent);
306 
307         if (thread_group_leader(p)) {
308             p->signal->tty = current->signal->tty;
309             p->signal->pgrp = process_group(current);
310             set_signal_session(p->signal, process_session(current));
311             attach_pid(p, PIDTYPE_PGID, task_pgrp(current));
312             attach_pid(p, PIDTYPE_SID, task_session(current));
313 
314             list_add_tail_rcu(&p->tasks, &init_task.tasks);
315             __get_cpu_var(process_counts)++;
316         }
317         attach_pid(p, PIDTYPE_PID, pid);
318         nr_threads++;
319     }
320 
321     total_forks++;
322     spin_unlock(&current->sighand->siglock);
323     write_unlock_irq(&tasklist_lock);
324     proc_fork_connector(p);
325     return p;
326 
327 bad_fork_cleanup_namespaces:
328     exit_task_namespaces(p);
329 bad_fork_cleanup_keys:
330     exit_keys(p);
331 bad_fork_cleanup_mm:
332     if (p->mm)
333         mmput(p->mm);
334 bad_fork_cleanup_signal:
335     cleanup_signal(p);
336 bad_fork_cleanup_sighand:
337     __cleanup_sighand(p->sighand);
338 bad_fork_cleanup_fs:
339     exit_fs(p); /* blocking */
340 bad_fork_cleanup_files:
341     exit_files(p); /* blocking */
342 bad_fork_cleanup_semundo:
343     exit_sem(p);
344 bad_fork_cleanup_audit:
345     audit_free(p);
346 bad_fork_cleanup_security:
347     security_task_free(p);
348 bad_fork_cleanup_policy:
349 #ifdef CONFIG_NUMA
350     mpol_free(p->mempolicy);
351 bad_fork_cleanup_cpuset:
352 #endif
353     cpuset_exit(p);
354 bad_fork_cleanup_delays_binfmt:
355     delayacct_tsk_free(p);
356     if (p->binfmt)
357         module_put(p->binfmt->module);
358 bad_fork_cleanup_put_domain:
359     module_put(task_thread_info(p)->exec_domain->module);
360 bad_fork_cleanup_count:
361     put_group_info(p->group_info);
362     atomic_dec(&p->user->processes);
363     free_uid(p->user);
364 bad_fork_free:
365     free_task(p);
366 fork_out:
367     return ERR_PTR(retval);
368 }

　　dup_task_struct函數，tsk = alloc_task_struct();dup_task_struct()函數主要是爲子進程建立一個內核棧，主要賦值語句setup_thread_stack(tsk, orig);

在函數中調用alloc_task_struct()進行內存分配，alloc_task_struct()函數獲取內存的方式內核裏面有幾種：

　　一、# define alloc_task_struct() kmem_cache_alloc(task_struct_cachep, GFP_KERNEL)

　　二、　　

1 struct task_struct *alloc_task_struct(void)
2 {
3     struct task_struct *p = kmalloc(THREAD_SIZE, GFP_KERNEL);
4     if (p)
5         atomic_set((atomic_t *)(p+1), 1);
6     return p;
7 }

　　三、#define alloc_task_struct() ((struct task_struct *)__get_free_pages(GFP_KERNEL | __GFP_COMP, KERNEL_STACK_SIZE_ORDER))

　　以上3中申請內存的方式最後一種是最底層的，直接分配頁，第二種利用了頁高速緩存，至關因而對第3中方式進行了封裝，第1種在第2中的方式上進行分配，至關於調用了第2種頁高速緩存的API進行內存分配的。

 1 static struct task_struct *dup_task_struct(struct task_struct *orig)
 2 {
 3     struct task_struct *tsk;
 4     struct thread_info *ti;
 5 
 6     prepare_to_copy(orig);
 7 
 8     tsk = alloc_task_struct();
 9     if (!tsk)
10         return NULL;
11 
12     ti = alloc_thread_info(tsk);
13     if (!ti) {
14         free_task_struct(tsk);
15         return NULL;
16     }
17 
18     *tsk = *orig;
19     tsk->stack = ti;
20     setup_thread_stack(tsk, orig);               //主要賦值語句將父進程的進程的thread_info賦值給子進程
21 
22 #ifdef CONFIG_CC_STACKPROTECTOR
23     tsk->stack_canary = get_random_int();
24 #endif
25 
26     /* One for us, one for whoever does the "release_task()" (usually parent) */
27     atomic_set(&tsk->usage,2);
28     atomic_set(&tsk->fs_excl, 0);
29 #ifdef CONFIG_BLK_DEV_IO_TRACE
30     tsk->btrace_seq = 0;
31 #endif
32     tsk->splice_pipe = NULL;
33     return tsk;
34 }

　　如今咱們主要分析copy_process函數，此函數中作了很是重要的，體現linux中父子進程建立機制的工做。

　　一、調用dup_task_struct()爲子進程建立一個內核棧、thread_info結構和task_struct，這些值與當前進程的值相同。此時子進程和父進程的描述符是徹底相同的。

　　p = dup_task_struct(current)---->(struct task_struct *tsk---------->tsk = alloc_task_struct()從slab層分配了一個關於進程描述符的slab)

　　二、檢查並確保新建立這個子進程後，當前用戶所擁有的進程數目沒有超出給它分配的資源的限制。

　　三、子進程着手使本身與父進程區別開來，爲進程的task_struct、tss作個性化設置，進程描述符內的許多成員都要被清0或設置爲初始值。那些不是繼承而來的進程描述符成員，主要是統計信息。task_struct中的大多數數據都依然未被修改。

　　四、爲子進程建立第一個頁表，將進程0的頁表項內容賦給這個頁表。

　　copy_process()————>copy_fs(),_copy_fs_struct(current->fs)中current指針表示當前進程也就是父進程的

　　copy_fs()函數爲子進程複製父進程的頁目錄項

 1 static inline int copy_fs(unsigned long clone_flags, struct task_struct * tsk)
 2 {
 3     if (clone_flags & CLONE_FS) {
 4         atomic_inc(&current->fs->count);
 5         return 0;
 6     }
 7     tsk->fs = __copy_fs_struct(current->fs);
 8     if (!tsk->fs)
 9         return -ENOMEM;
10     return 0;
11 }

　　_copy_fs_struct()

 1 static inline struct fs_struct *__copy_fs_struct(struct fs_struct *old)
 2 {
 3     struct fs_struct *fs = kmem_cache_alloc(fs_cachep, GFP_KERNEL);
 4     /* We don't need to lock fs - think why ;-) */
 5     if (fs) {
 6         atomic_set(&fs->count, 1);
 7         rwlock_init(&fs->lock);
 8         fs->umask = old->umask;
 9         read_lock(&old->lock);                         //進行加鎖不能被打斷
10         fs->rootmnt = mntget(old->rootmnt);
11         fs->root = dget(old->root);
12         fs->pwdmnt = mntget(old->pwdmnt);
13         fs->pwd = dget(old->pwd);
14         if (old->altroot) {
15             fs->altrootmnt = mntget(old->altrootmnt);
16             fs->altroot = dget(old->altroot);
17         } else {
18             fs->altrootmnt = NULL;
19             fs->altroot = NULL;
20         }
21         read_unlock(&old->lock);
22     }
23     return fs;
24 }

　　fs_struct數據結構，這個數據結構將VFS層裏面的描述頁目錄對象的結構體進行了實例化，這樣就能夠爲子進程建立一個頁目錄項，同時這個fs_strcut結構體和爲子進程分配內核棧同樣都是經過頁高速緩存實現的：struct fs_struct *fs = kmem_cache_alloc(fs_cachep, GFP_KERNEL);

1 struct fs_struct {
2     atomic_t count;
3     rwlock_t lock;
4     int umask;
5     struct dentry * root, * pwd, * altroot;                     //struct denty 頁目錄項結構體
6     struct vfsmount * rootmnt, * pwdmnt, * altrootmnt;
7 };

　　copy_files()函數，爲子進程複製父進程的頁表，共享父進程的文件

 1 static int copy_files(unsigned long clone_flags, struct task_struct * tsk)
 2 {
 3     struct files_struct *oldf, *newf;
 4     int error = 0;
 5 
 6     /*
 7      * A background process may not have any files ...
 8      */
 9     oldf = current->files;                         //將父進程的頁表
10     if (!oldf)
11         goto out;
12 
13     if (clone_flags & CLONE_FILES) {
14         atomic_inc(&oldf->count);
15         goto out;
16     }
17 
18     /*
19      * Note: we may be using current for both targets (See exec.c)
20      * This works because we cache current->files (old) as oldf. Don't
21      * break this.
22      */
23     tsk->files = NULL;
24     newf = dup_fd(oldf, &error);
25     if (!newf)
26         goto out;
27 
28     tsk->files = newf;
29     error = 0;
30 out:
31     return error;
32 }

　　dup_fd()　

 1 /*
 2  * Allocate a new files structure and copy contents from the
 3  * passed in files structure.
 4  * errorp will be valid only when the returned files_struct is NULL.
 5  */
 6  files_struct 
 7 static struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 8 {
 9     struct files_struct *newf;
10     struct file **old_fds, **new_fds;
11     int open_files, size, i;
12     struct fdtable *old_fdt, *new_fdt;
13 
14     *errorp = -ENOMEM;
15     newf = alloc_files();
16     if (!newf)
17         goto out;
18 
19     spin_lock(&oldf->file_lock);
20     old_fdt = files_fdtable(oldf);
21     new_fdt = files_fdtable(newf);
22     open_files = count_open_files(old_fdt);
23 
24     /*
25      * Check whether we need to allocate a larger fd array and fd set.
26      * Note: we're not a clone task, so the open count won't change.
27      */
28     if (open_files > new_fdt->max_fds) {
29         new_fdt->max_fds = 0;
30         spin_unlock(&oldf->file_lock);
31         spin_lock(&newf->file_lock);
32         *errorp = expand_files(newf, open_files-1);
33         spin_unlock(&newf->file_lock);
34         if (*errorp < 0)
35             goto out_release;
36         new_fdt = files_fdtable(newf);
37         /*
38          * Reacquire the oldf lock and a pointer to its fd table
39          * who knows it may have a new bigger fd table. We need
40          * the latest pointer.
41          */
42         spin_lock(&oldf->file_lock);
43         old_fdt = files_fdtable(oldf);
44     }
45 
46     old_fds = old_fdt->fd;
47     new_fds = new_fdt->fd;
48 
49     memcpy(new_fdt->open_fds->fds_bits,
50         old_fdt->open_fds->fds_bits, open_files/8);
51     memcpy(new_fdt->close_on_exec->fds_bits,
52         old_fdt->close_on_exec->fds_bits, open_files/8);
53 
54     for (i = open_files; i != 0; i--) {
55         struct file *f = *old_fds++;
56         if (f) {
57             get_file(f);
58         } else {
59             /*
60              * The fd may be claimed in the fd bitmap but not yet
61              * instantiated in the files array if a sibling thread
62              * is partway through open().  So make sure that this
63              * fd is available to the new process.
64              */
65             FD_CLR(open_files - i, new_fdt->open_fds);
66         }
67         rcu_assign_pointer(*new_fds++, f);
68     }
69     spin_unlock(&oldf->file_lock);
70 
71     /* compute the remainder to be cleared */
72     size = (new_fdt->max_fds - open_files) * sizeof(struct file *);
73 
74     /* This is long word aligned thus could use a optimized version */ 
75     memset(new_fds, 0, size); 
76 
77     if (new_fdt->max_fds > open_files) {
78         int left = (new_fdt->max_fds-open_files)/8;
79         int start = open_files / (8 * sizeof(unsigned long));
80 
81         memset(&new_fdt->open_fds->fds_bits[start], 0, left);
82         memset(&new_fdt->close_on_exec->fds_bits[start], 0, left);
83     }
84 
85     return newf;
86 
87 out_release:
88     kmem_cache_free(files_cachep, newf);
89 out:
90     return NULL;
91 }

　　files_struct結構體，files_struct結構保存了進程打開的全部文件表數據，描述一個正被打開的文件。

 1 struct files_struct {  
 2     atomic_t        count;              //自動增量  
 3     struct fdtable  *fdt;  
 4     struct fdtable  fdtab;  
 5     fd_set      close_on_exec_init;     //執行exec時
 6 須要關閉的文件描述符初值集合  
 7     fd_set      open_fds_init;          //當前打開文件
 8 的文件描述符屏蔽字  
 9     struct file         * fd_array[NR_OPEN_DEFAULT];  
10     spinlock_t      file_lock;  /* Protects concurrent
11 writers.  Nests inside tsk->alloc_lock */  
12 };

　　alloc_files()函數

 1 static struct files_struct *alloc_files(void)
 2 {
 3     struct files_struct *newf;
 4     struct fdtable *fdt;
 5 
 6     newf = kmem_cache_alloc(files_cachep, GFP_KERNEL);
 7     if (!newf)
 8         goto out;
 9 
10     atomic_set(&newf->count, 1);
11 
12     spin_lock_init(&newf->file_lock);
13     newf->next_fd = 0;
14     fdt = &newf->fdtab;
15     fdt->max_fds = NR_OPEN_DEFAULT;
16     fdt->close_on_exec = (fd_set *)&newf->close_on_exec_init;
17     fdt->open_fds = (fd_set *)&newf->open_fds_init;
18     fdt->fd = &newf->fd_array[0];
19     INIT_RCU_HEAD(&fdt->rcu);
20     fdt->next = NULL;
21     rcu_assign_pointer(newf->fdt, fdt);
22 out:
23     return newf;
24 }

　　四、子進程的狀態被設置爲TASK_UNINTERRUPTEIBLE,保證子進程不會投入運行。

　　。。。。。。前面對於子進程個性化設置沒有分析得很清楚，後面本身弄懂了再來補充。

　　先總結一下fork()的執行流程而後在來解決文章剛開始的問題。

　　從上面的分析能夠看出fork()的流程大概是：

　　一、p = dup_task_struct(current);　爲新進程建立一個內核棧、thread_iofo和task_struct,這裏徹底copy父進程的內容，因此到目前爲止，父進程和子進程是沒有任何區別的。

　　二、爲新進程在其內存上創建內核堆棧

　　三、對子進程task_struct任務結構體中部分變量進行初始化設置，檢查全部的進程數目是否已經超出了系統規定的最大進程數，若是沒有的話，那麼就開始設置進程描訴符中的初始值，從這開始，父進程和子進程就開始區別開了。

　　四、把父進程的有關信息複製給子進程，創建共享關係

　　五、設置子進程的狀態爲不可被TASK_UNINTERRUPTIBLE，從而保證這個進程如今不能被投入運行，由於還有不少的標誌位、數據等沒有被設置

　　六、複製標誌位（falgs成員）以及權限位(PE_SUPERPRIV)和其餘的一些標誌

　　七、調用get_pid()給子進程獲取一個有效的而且是惟一的進程標識符PID

　　八、return ret_from_fork;返回一個指向子進程的指針，開始執行

　　關於文章開始提出的問題，咱們能夠從前面的分析知道，子進程的產生是從父進程那兒複製的內核棧、頁表項以及與父進程共享文件(對於父進程的文件只能讀不能寫)，因此子進程若是沒有執行exac()函數載入本身的可執行代碼，他和父進程將共享數據即代碼段數據段，這就是爲何fork()一次感受執行了兩次printf()函數，至於爲何不是6次「+」這個和標準I/O裏面的緩衝有關係，因此後面我用了一個不帶緩衝的I/O函數進行了測試輸出是6次「-」，在子進程複製父進程的內核棧、頁表項、頁表的時候頁把緩存複製到了子進程中，因此多了兩次。

　　能夠從下面的圖中看明白

　　注：本文參考了《linux 內核設計與實現》、《深刻理解linux內核》

　　參考博文：http://blog.csdn.net/Always2015/article/details/45008785

　　　　 http://www.oschina.net/question/195301_62902?sort=default&p=1#answers

總結

　　linux建立一個新的進程是從複製父進程內核棧、頁表項開始的，在系統內核裏首先是將父進程的進程描述符進行拷貝，而後再根據本身的狀況修改相應的參數，獲取本身的進程號，再開始執行。

　　後續關於線程

　　在前面咱們講的是在linux中建立一個進程，其實在其中建立線程也和上面的流程同樣，只是咱們須要設置標誌位讓子進程與父進程共享數據。linux實現線程的機制很是獨特，從內核的角度講，linux沒有線程這個說法，linux把全部的線程都當作進程來實現。內核沒有準備特別的調度算法或者是定義特別的數據結構來表徵線程。相反，線程僅僅被視爲一個與其它進程共享某些資源的進程。每一個線程都擁有惟一隸屬於本身的task_struct，因此在內核中看起來像一個普通的進程只是線程和其它進程共享某些資源，如地址空間。

　　因此linux裏面實現線程的方法和windows或者sun solaris等操做系統實現差別很是大。這些操做系統在內核裏面專門提供了支持線程的機制。對於linux來講線程只是一種共享資源的手段。

　　線程建立時和普通的進程相似，只不過在調用clone()的時候須要傳遞一些參數標誌來指明須要共享的資源。如：

　　CLONE_FILES:父子進程共享打開的文件

　　CLONE_FS：父子進程共享打開的文件系統信息

　　。。。。

　　後續關於進程終結

　　通常來講進程的析構是自身引發的。它發生在進程調用exit()系統調用時，既能夠顯示的調用這個系統調用，也能夠隱式的從某個函數返回，C語言編譯器會在main函數的返回點後面放置調用exit()的代碼。當進程接收到它既不能處理也不能忽略的信號或者異常時，它還能被動的終結。調用do_exit()函數完成進程的終結。進程的終結就是一個釋放進程佔有的資源的過程。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。