在經典的TCP/IP網絡編程書籍中都介紹過這樣一種模型:編程
「服務器在某知名端口監聽,並fork若干子進程,當有新的鏈接請求到來時在子進程中經過accept調用獲取新鏈接並進行處理」;服務器
聽起來一切瓜熟蒂落,但仔細想一想就會有不少疑問,好比「父子進程屬於兩個不一樣的進程空間,父進程中監聽的端口如何在子進程中accept?」;網絡
另外網上還有一些討論,好比「多個進程在同一個描述符上accept時會產生「驚羣」效應」;架構
一切又撲朔迷離起來了。socket
本文將以此爲背景,經過實踐和源碼相結合的方式來一探究竟。tcp
本文所採用的服務器模型以下:函數
int main(int argc, char *argv[]){ socket(); bind(); listen(); fork(); if( parent ){ accept(); } else if( child ){ accept(); } else{ /*error*/ } return 0; }
這裏比文章開頭介紹的架構更進一步,咱們在父進程中也調用了accept(),看看是個什麼情形。oop
首先,啓動服務器:性能
$ ps -ef | grep server yyy 6182 3573 0 18:08 pts/3 00:00:00 ./tcp_server_tem yyy 6183 6182 0 18:08 pts/3 00:00:00 ./tcp_server_tem $ sudo netstat -antp | grep 54321 tcp 0 0 192.168.31.162:54321 0.0.0.0:* LISTEN 6182/tcp_server_tem
使用ps命令查看,父進程(6182)和子進程(6183)均已經正常啓動,而且netstat命令中只顯示了父進程(6182)監聽在指定的端口上(54321)。測試
若是隻有父進程在該端口上監聽,那麼子進程中是如何作到成功accept的呢?
咱們知道,socket的實質也是描述符,那麼就深刻進程所擁有的描述符表中看一下吧:
$ ls -l /proc/6182/fd total 0 lrwx------ 1 yyy yyy 64 Feb 3 18:08 0 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 1 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 2 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 3 -> socket:[50899] $ ls -l /proc/6183/fd total 0 lrwx------ 1 yyy yyy 64 Feb 3 18:08 0 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 1 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 2 -> /dev/pts/3 lrwx------ 1 yyy yyy 64 Feb 3 18:08 3 -> socket:[50899] $ cat /proc/net/tcp | grep 50899 1: A21FA8C0:D431 00000000:0000 0A 00000000:00000000 00:00000000 00000000 1001 0 50899 1 0000000000000000 100 0 0 10 -1
能夠看到,雖然netstat中沒有顯示,但其實父子進程都擁有該監聽socket(子進程是經過fork時的描述符拷貝而從父進程中繼承來的),並指向同一個節點(50899)。這樣在父子進程中就均可以經過對應的描述符來操做內核中對應的同一個sock對象了。
好了,下面啓動客戶端來看一下效果(測試中使用的客戶端和服務器均跑在同一臺物理機器上):
server 6183 accept clientsock 4 server 6183 recv zero server 6183 now do accept! server 6182 accept clientsock 4 server 6182 recv zero server 6182 now do accept! server 6183 accept clientsock 4 server 6183 recv zero server 6183 now do accept! server 6182 accept clientsock 4 server 6182 recv zero server 6182 now do accept!
神奇,父子進程都可以經過accept獲取新鏈接,而且看起來仍是交替進行的。
那麼accept函數到底是如何實現的呢,仍是得要去協議棧的源碼裏面扒一扒才行啊。
Kernel 3.16.1。
TCP/IP協議棧中,accept系統調用對應的實現函數是inet_csk_accept:
net/ipv4/inet_connection_sock.c /* * This will accept the next outstanding connection. */ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err) { struct inet_connection_sock *icsk = inet_csk(sk); struct request_sock_queue *queue = &icsk->icsk_accept_queue; struct sock *newsk; struct request_sock *req; int error; lock_sock(sk); /* We need to make sure that this socket is listening, * and that it has something pending. */ error = -EINVAL; if (sk->sk_state != TCP_LISTEN) goto out_err; /* Find already established connection */ if (reqsk_queue_empty(queue)) { long timeo = sock_rcvtimeo(sk, flags & O_NONBLOCK); /* If this is a non blocking socket don't sleep */ error = -EAGAIN; if (!timeo) goto out_err; error = inet_csk_wait_for_connect(sk, timeo); if (error) goto out_err; } req = reqsk_queue_remove(queue); newsk = req->sk; sk_acceptq_removed(sk); if (sk->sk_protocol == IPPROTO_TCP && queue->fastopenq != NULL) { spin_lock_bh(&queue->fastopenq->lock); if (tcp_rsk(req)->listener) { /* We are still waiting for the final ACK from 3WHS * so can't free req now. Instead, we set req->sk to * NULL to signify that the child socket is taken * so reqsk_fastopen_remove() will free the req * when 3WHS finishes (or is aborted). */ req->sk = NULL; req = NULL; } spin_unlock_bh(&queue->fastopenq->lock); } out: release_sock(sk); if (req) __reqsk_free(req); return newsk; out_err: newsk = NULL; req = NULL; *err = error; goto out; } EXPORT_SYMBOL(inet_csk_accept);
若是調用時尚未能夠accept的鏈接且使用了阻塞模式的話,則會進入inet_csk_wait_for_connect函數:
/* * Wait for an incoming connection, avoid race conditions. This must be called * with the socket locked. */ static int inet_csk_wait_for_connect(struct sock *sk, long timeo) { struct inet_connection_sock *icsk = inet_csk(sk); DEFINE_WAIT(wait); int err; /* * True wake-one mechanism for incoming connections: only * one process gets woken up, not the 'whole herd'. * Since we do not 'race & poll' for established sockets * anymore, the common case will execute the loop only once. * * Subtle issue: "add_wait_queue_exclusive()" will be added * after any current non-exclusive waiters, and we know that * it will always _stay_ after any new non-exclusive waiters * because all non-exclusive waiters are added at the * beginning of the wait-queue. As such, it's ok to "drop" * our exclusiveness temporarily when we get woken up without * having to remove and re-insert us on the wait queue. */ for (;;) { prepare_to_wait_exclusive(sk_sleep(sk), &wait, TASK_INTERRUPTIBLE); release_sock(sk); if (reqsk_queue_empty(&icsk->icsk_accept_queue)) timeo = schedule_timeout(timeo); lock_sock(sk); err = 0; if (!reqsk_queue_empty(&icsk->icsk_accept_queue)) break; err = -EINVAL; if (sk->sk_state != TCP_LISTEN) break; err = sock_intr_errno(timeo); if (signal_pending(current)) break; err = -EAGAIN; if (!timeo) break; } finish_wait(sk_sleep(sk), &wait); return err; }
這裏使用了等待隊列來完成任務,而且註釋中說的很清楚了,採用了「wake-one」的機制,不會發生「whole herd」,也就是「驚羣」的狀況。
還有另一種模型,就是accept以後再fork,而後在父進程中關閉accept套接字,在子進程中關閉監聽套接字,這樣作的缺點在於fork系統調用的性能損耗,但好在如今的fork實現了「copy-on-write」機制,就再也不展開說了。
立刻就要過年放假了,下一篇文章應該就是猴年了,祝你們新年快樂!