以redis server 爲例node
redis server 啓動時調用bind() 傳入文件描述符fd6 綁定端口6379,調用listen()監聽端口,並經過accept() 等待鏈接redis
root@pmghong-VirtualBox:/usr/local/redis/bin# strace -ff -o /data/redis_strace/redis ./redis-server root@pmghong-VirtualBox:/proc/22330/fd# ls /data/redis_strace/ -l total 48 -rw-r--r-- 1 root root 34741 3月 14 10:37 redis.25102 -rw-r--r-- 1 root root 134 3月 14 10:37 redis.25105 -rw-r--r-- 1 root root 134 3月 14 10:37 redis.25106 -rw-r--r-- 1 root root 134 3月 14 10:37 redis.25107 root@pmghong-VirtualBox:/proc/22330/fd# vi /data/redis_strace/redis.25102 ... ... epoll_create(1024) = 5 socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 6 setsockopt(6, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0 setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(6, {sa_family=AF_INET6, sin6_port=htons(6379), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 listen(6, 511) = 0 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 ... ... root@pmghong-VirtualBox:/proc/25102/fd# ll total 0 dr-x------ 2 root root 0 3月 14 12:05 ./ dr-xr-xr-x 9 root root 0 3月 14 10:37 ../ lrwx------ 1 root root 64 3月 14 12:28 0 -> /dev/pts/0 lrwx------ 1 root root 64 3月 14 12:28 1 -> /dev/pts/0 lrwx------ 1 root root 64 3月 14 12:05 2 -> /dev/pts/0 lr-x------ 1 root root 64 3月 14 12:28 3 -> pipe:[104062] l-wx------ 1 root root 64 3月 14 12:28 4 -> pipe:[104062] lrwx------ 1 root root 64 3月 14 12:28 5 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 3月 14 12:28 6 -> socket:[104063] lrwx------ 1 root root 64 3月 14 12:28 7 -> socket:[104064] lrwx------ 1 root root 64 3月 14 12:28 8 -> socket:[256344]
第一階段:BIO(阻塞IO)app
Redis Server 啓動後經過文件描述符fd6 監聽系統內核socket
Client1 / Client2 分別經過fd7,fd8 請求訪問rediside
在BIO的場景下,redis server 會調用read()方法並進入阻塞狀態,也就是直到fd7 有請求過來,處理完才能處理其餘請求函數
這個模式缺點很明顯,就是阻塞IO致使效率低學習
第二階段 NIO (非阻塞IO)code
跟BIO的區別在於,調用read(fd7) 時,若是沒有請求數據,當即給redis server 返回一個錯誤orm
redis server 收到該類型的錯誤便可知道當前鏈接沒有數據過來,能夠繼續處理下一個請求,提升處理效率server
bind(6, {sa_family=AF_INET6, sin6_port=htons(6379), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 listen(6, 511) = 0 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
該模式的問題在於,定時輪詢調用read(fdx)系統調用,當多個client 請求過來時,須要頻繁的進行內核態/用戶態切換,上下文切換開銷大
第三階段 select 同步非阻塞
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); select() and pselect() allow a program to monitor multiple file descriptors, waiting until one or more of the file descriptors become "ready" for some class of I/O operation (e.g., input possible). A file descriptor is considered ready if it is possible to perform a corresponding I/O operation (e.g., read(2) without blocking, or a sufficiently small write(2)).
目標是同時監聽多個fd,直到一個或者多個fd 進入ready 狀態,纔會調用read()等系統調用處理業務邏輯,而不像上述的NIO場景下,須要輪詢調用x個read()
select 只能解決事件通知問題(即哪些進程能讀,哪些不能讀的問題),但到了內核態,仍需在內核中遍歷x個fd,看哪一個client 發生了IO,再通知select 把結果集返回給server端,接着由sever端向指定的fd發起read() 系統調用
第四階段 epoll 多路複用
epoll 機制包括 epoll_create / epoll_ctl / epoll_wait 3個系統調用
// epoll_create // 說明 epoll_create() creates an epoll(7) instance. //函數簽名 int epoll_create(int size); //返回值 On success, these system calls return a nonnegative file descriptor. On error, -1 is returned, and errno is set to indicate the error. //epoll_ctl //說明 This system call performs control operations on the epoll(7) instance referred to by the file descriptor epfd. It requests that the operation op be per‐ formed for the target file descriptor, fd. //函數簽名 int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); // op 類型 EPOLL_CTL_ADD / EPOLL_CTL_MOD /EPOLL_CTL_DEL // 返回值 When successful, epoll_ctl() returns zero. When an error occurs, epoll_ctl() returns -1 and errno is set appropriately. //epoll_ctl //說明 The epoll_wait() system call waits for events on the epoll(7) instance referred to by the file descriptor epfd. The memory area pointed to by events will contain the events that will be available for the caller. Up to maxevents are returned by epoll_wait(). The maxevents argument must be greater than zero. //函數簽名 int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); //返回值 When successful, epoll_wait() returns the number of file descriptors ready for the requested I/O, or zero if no file descriptor became ready during the requested timeout milliseconds. When an error occurs, epoll_wait() returns -1 and errno is set appropriately.
epoll_create(1024) = 5 ... ... bind(6, {sa_family=AF_INET6, sin6_port=htons(6379), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 listen(6, 511) = 0 ... ... bind(7, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 listen(7, 511) = 0 ... ... epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN, {u32=6, u64=6}}) = 0 epoll_ctl(5, EPOLL_CTL_ADD, 7, {EPOLLIN, {u32=7, u64=7}}) = 0 epoll_ctl(5, EPOLL_CTL_ADD, 3, {EPOLLIN, {u32=3, u64=3}}) = 0 write(...) read(...) epoll_wait(5, [], 10128, 0) = 0
一、進程啓動時經過epoll_create() 建立epoll instance,成功時返回一個非負數的fdn,失敗返回-1還有錯誤碼
二、調用epoll_ctl(上一步epoll_create 返回的fd,op,fd6,事件類型<accpet>)
三、調用epoll_wait() 監聽內核事件,調用成功時返回該fd。例如當c1請求redisserver 時,首先須要經過fd6創建鏈接,此時經過epoll_ctl() 中對fd6 的accept()調用能夠監聽到該請求,並將fd6傳給epoll_wait()
四、redis server端 從epoll_wait() 獲取須要IO操做的fd,發現c1 經過fd6請求創建鏈接,爲其分配fd7,並在epoll_ctl()註冊一個監聽,例如epoll_ctl(fdn,op, fd7, <read>)
經過上述的事件通知方式,能夠解決select 中,內核態循環遍歷全部fd的缺點,僅當接收到IO中斷事件,才通知上層程序,提升工做效率。