本文記錄了一個基於c socket
的簡易代理服務器的實現。(CS:APP lab 10 proxy lab
)git
本代理服務器支持keep-alive
鏈接,將訪問記錄保存在log
文件。github
Github: https://github.com/He11oLiu/proxy瀏覽器
全文分爲如下部分服務器
CS:APP
對服務器的要求HTTP/1.0
)HTTP
協議,修改處理函數使其支持keep-alive
readn
與writen
的優化inet
ntoa
, gethostbyname
, and gethostbyaddr
inside a thread. In particular, the open clientfd
function in csapp.c is thread-unsafe because it calls gethostbyaddr
, a Class-3 thread unsafe function (CSAPP 13.7.1).You will need to write a thread-safe version of open clientfd
, called open_clientfd_ts
, that uses the lock-and-copy technique (CS:APP 13.7.1) when it calls gethostbyaddr
.Rio_readn
, Rio_readlineb
, and Rio writen error checking wrappers in csapp.c
arenot appropriate for a realistic proxy because they terminate the process when they encounter an error. Instead, you should write new wrappers called Rio readn w
, Rio readlineb w
, and Rio writen w that simply return after printing a warning message when I/O fails. When either of the read wrappers detects an error, it should return 0, as though it encountered EOF on the socket.errno =ECONNRESET
error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server. The most common write failure is an errno =EPIPE
error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.SIGPIPE
signal whose default action isto terminate the process. To keep your proxy from crashing you can use the SIGIGN argument to th esignal function (CS:APP 8.5.3) to explicitly ignore these SIGPIPE signalsImplementing a Sequential Web Proxymarkdown
proxy lab
雛形服務器框架多線程
int main(int argc, char **argv){
int lisenfd, port;
unsigned int clientlen;
clientinfo* client;
/* Ignore SIGPIPE */
Signal(SIGPIPE, SIG_IGN);
if (argc != 2){
fprintf(stderr, "usage:%s <port>\n", argv[0]);
exit(1);
}
port = atoi(argv[1]);
/* open log file */
logfile = fopen("proxylog","w");
lisenfd = Open_listenfd(port);
clientlen = sizeof(struct sockaddr_in);
while (1){
/* Create a new memory area to pass arguments to doit */
/* It will be free by doit */
client = (clientinfo*)Malloc(sizeof(clientinfo));
client->socketfd = Accept(lisenfd, (SA *)&client->clientaddr, &clientlen);
printf("Client %s connected\n",inet_ntoa(client->clientaddr.sin_addr));
doit(client);
}
return 0;
}
做爲最第一版本,先完成一個迭代服務器,而非並行服務器,這類服務器的框架相對簡單,這個部分主要測試對於期功能的理解,並在只針對一個用戶接入的狀況下進行處理。app
服務器框架可簡化爲以下,其中doit()
爲實際處理客戶端請求的函數。框架
init_server();
while(1){
accept();
doit();
}
doit()
處理客戶端的請求對於代理的處理條例很清晰socket
HTTP
請求uri
HTTP
請求/* * doit */
void doit(clientinfo *client){
int serverfd;
char buf[MAXLINE],method[MAXLINE],uri[MAXLINE],version[MAXLINE];
char hostname[MAXLINE],pathname[MAXLINE];
int port;
char errorstr[MAXLINE];
char logstring[MAXLINE];
rio_t rio;
ssize_t len = 0;
int resplen = 0;
/* init args */
Rio_readinitb(&rio,client->socketfd);
Rio_readlineb(&rio,buf,MAXLINE);
sscanf(buf,"%s %s %s",method,uri,version);
if(strcmp(method,"GET")){
fprintf(stderr, "error request\n");
sprintf(errorstr,"%s Not Implement",method);
clienterror(client->socketfd, method, "501","Not Implement", errorstr);
Close(client->socketfd);
return;
}
if(parse_uri(uri,hostname,pathname,&port)!=0){
fprintf(stderr, "parse error\n");
clienterror(client->socketfd, method, "400","uri error","URI error");
Close(client->socketfd);
return;
}
#ifdef DEBUG
printf("Finish parse %s %s %s %d\n",uri,hostname,pathname,port);
#endif
/* connect to server */
if((serverfd=open_clientfd(hostname,port))<0){
printf("Cannot connect to server %s %d\n",hostname,port);
clienterror(client->socketfd, method, "302","Server not found", "Server not found");
Close(client->socketfd);
return;
}
/* generate and push the request to server */
if(pathname[0]=='\0') strcpy(pathname,"/");
if(strcmp("HTTP/1.0",version)!=0) printf("Only support HTTP/1.0");
sprintf(buf,"%s %s HTTP/1.0\r\n",method, pathname);
Rio_writen(serverfd,buf,strlen(buf));
sprintf(buf,"Host: %s\r\n",hostname);
Rio_writen(serverfd,buf,strlen(buf));
sprintf(buf,"\r\n");
Rio_writen(serverfd,buf,strlen(buf));
/* receive the response from server */
Rio_readinitb(&rio, serverfd);
while((len = rio_readnb(&rio, buf, MAXLINE)>0)){
Rio_writen(client->socketfd, buf, MAXLINE);
resplen += MAXLINE - len;
memset(buf, 0, MAXLINE);
}
format_log_entry(logstring, &client->clientaddr, uri, resplen);
fprintf(logfile, "%s\n", logstring);
close(client->socketfd);
close(serverfd);
/* free the clientinfo space */
free(client);
}
在這裏遇到Q&A
中的第二個問題,沒法支持HTTP/1.1
ide
嘗試直接在設置中接入此proxy
而網頁通常發出爲HTTP/1.1
,致使也存在卡在read
的狀況,須要特殊處理
可是因爲瀏覽器發出的變量中有要求keep-alive
的,致使read
不能用,仍是放棄此種方法。
/* Or just copy the HTTP request from client */
Rio_writen_w(serverfd, buf, strlen(buf));
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
Rio_writen_w(serverfd, buf,len);
if (!strcmp(buf, "\r\n")) /* End of request */
break;
memset(buf,0,MAXLINE);
}
Parse_uri
的小BUGhostend = strpbrk(hostbegin, " :/\r\n\0");
/* when no ':' show up in the end,hostend may be NULL */
if(hostend == NULL) hostend = hostbegin + strlen(hostbegin);
設置http
代理
嘗試鏈接
Dealing with multiple requests concurrently
支持多線程是很是簡單的,可是稍微複雜一點的是後面的互斥量處理。
這裏先新寫一個線程處理函數。
void *thread_handler(void *arg){
doit((clientinfo*)arg);
return NULL;
}
而後在原來的doit
的地方改成
Pthread_create(&thread, NULL, thread_handler, client); Pthread_detach(thread);
如今服務器的框架以下:
main(){
init_server();
while(1){
accept();
create_newThread(handler,arg);
}
}
//每一個線程的處理
handler(arg){
initThread();
doit(arg);
}
因爲在macOS
中的sem_init
已經被標記爲__deprecated
,內存中的互斥量已經不能用了。這裏改成基於文件的sem_open
來替代sem_init
。
/* Mutex semaphores */
sem_t *mutex_host, *mutex_file;
if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
在文檔中提到過open_client
中因爲調用了getaddrbyhost
,必需要在調用以前獲取互斥量,故完成新的open_clientfd
。
在CSAPP
中打包好了PV
原語的接口,能夠直接調用。
原來的open_clientfd
的實現方法以下,只用在注視掉的地方加上PV
原語保證只有一個thread
在cs
區域便可。
/* Fill in the server's IP address and port */
bzero((char *) &serveraddr, sizeof(serveraddr));
serveraddr.sin_family = AF_INET;
//P(mutex_host);
if ((hp = gethostbyname(hostname)) == NULL)
return -2; /* check h_errno for cause of error */
bcopy((char *)hp->h_addr_list[0],
(char *)&serveraddr.sin_addr.s_addr, hp->h_length);
serveraddr.sin_port = htons(port);
//V(mutex_host);
對於文件,進行相似操做
format_log_entry(logstring, &client->clientaddr, uri, resplen);
P(mutex_file);
fprintf(logfile, "%s\n", logstring);
V(mutex_file);
爲了可以在服務器運行的時候打開文件,將文件操做修改成以下:
format_log_entry(logstring, &client->clientaddr, uri, resplen);
P(mutex_file);
logfile = fopen("proxy.log","a");
fprintf(logfile, "%s\n", logstring);
fclose(logfile);
V(mutex_file);
利用一個全局變量來記錄當前thread
的id
。並經過clientinfo
將其傳走。
/* thread id */
unsigned long tid = 0;
printf("Client %s connected tid = %zd\n",inet_ntoa(client->clientaddr.sin_addr),tid);
client->tid = tid ++;
Pthread_create(&thread, NULL, thread_handler, client);
Rio_xxx_w
因爲Rio_writen
與Rio_readnb
遇到錯誤時會直接unix_error
。爲了保證服務器繼續運行,須要將其改成打印錯誤並返回。
void Rio_writen_w(int fd, void *usrbuf, size_t n){
if (rio_writen(fd, usrbuf, n) != n)
printf("Rio_writen_w error\n");
}
ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n){
ssize_t rc;
if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
printf("Rio_readnb_w error\n");
rc = 0;
}
return rc;
}
ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen){
ssize_t rc;
if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
printf("Rio_readlineb_w failed\n");
return 0;
}
return rc;
}
HTTP/1.1
以及圖片加載問題在解決以前,在Github
上轉了一圈,所看有限幾個repo
中有的繞過了這個部分,直接像上面同樣直接解析發送HTTP/1.0
的請求,有的直接無差異用readline
致使圖片等文件仍然陷入read
致使必須等待對方服務器斷開鏈接後才能讀到完整數據從read
中出來,而致使網頁加載速度奇慢。
下面就從HTTP
的協議入手,尋找一個妥善的方法解決該問題。
當客戶端請求時是
Connection: keep-alive
的時候,服務器返回的形式Transfer-Encoding: chunked
的形式,以確保頁面數據是否結束,長鏈接就是這種方式,用chunked
形式就不能用content-length
content-length
設置響應消息的實體內容的大小,單位爲字節。對於HTTP協議來講,這個方法就是設置Content-Length
響應頭字段的值。- 由於當瀏覽器與WEB服務器之間使用持久(
keep-alive
)的HTTP鏈接,若是WEB服務器沒有采用chunked
傳輸編碼方式,那麼它必須在每個應答中發送一個Content-Length
的響應頭來表示各個實體內容的長度,以便客戶端可以分辨出上一個響應內容的結束位置。- 當不是
keep-alive
,就是經常使用短鏈接形式,會直接把鏈接關掉,不須要長度。- 服務器上取得是動態內容,全部沒有
content-length
這項- 若是是靜態頁面,則有
content-length
故,對於服務器傳回來的信息,不能直接無腦讀,要對頭部進行解析。對於服務器傳回來的信息進行處理的步驟以下:
\n\r
表明着頭的結束。Content-Length:
條目表明着時明確給出長度的case
,須要記錄下長度的大小Transfer-Encoding:Chunked
條目表明着屬於Chunked
編碼的case
,在後面用readline
進行處理。body
Chunked
編碼,則直接使用readline
進行讀取。若讀到0/r/n
時,表明當前的body
已經結束。退出循環。content-length
屬性,則利用read_size = MAXLINE > content_length ?content_length : MAXLINE
計算每次須要讀取的byte
,而後調用readnb
來精確讀取字節。當讀取到指定字節表明着body
結束,退出循環。這樣能夠解決keep-alive
致使的問題。
/* Receive response from target server and forward to client */
Rio_readinitb(&rio, serverfd);
/* Read head */
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
/* Fix bug of return value when response line exceeds MAXLINE */
if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
/* when found "\r\n" means head ends */
if (!strcmp(buf, "\r\n")){
Rio_writen_w(client->socketfd, buf, len);
break;
}
if (!strncasecmp(buf, "Content-Length:", 15)) {
sscanf(buf + 15, "%u", &content_length);
chunked = False;
}
if (!strncasecmp(buf, "Transfer-Encoding:", sizeof("Transfer-Encoding:"))) {
if(strstr(buf,"chunked")!=NULL || strstr(buf,"Chunked")!=NULL)
chunked = True;
}
/* Send the response line to client and count the total len */
Rio_writen_w(client->socketfd, buf, len);
recv_len += len;
}
/* Read body */
if(chunked){
/* Transfer-Encoding:chuncked */
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
/* Fix bug of return value when response line exceeds MAXLINE */
if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
/* Send the response line to client and count the total len */
Rio_writen_w(client->socketfd, buf, len);
recv_len += len;
/* End of response */
if (!strcmp(buf, "0\r\n")) {
Rio_writen_w(client->socketfd, "0\r\n", 2);
recv_len += 2;
break;
}
}
}
else{
read_size = MAXLINE > content_length?content_length:MAXLINE;
while((len = Rio_readnb_w(&rio,buf,read_size))!=0){
content_length -= len;
recv_len += len;
Rio_writen_w(client->socketfd, buf, len);
if(content_length == 0) break;
read_size = MAXLINE > content_length?content_length:MAXLINE;
}
}
固然這不是真正意義上的keep-alive
。要作到持續連接少TCP
創建幾回,須要利用循環,再回到上面從客戶端獲取信息。
再次回到writen
與readn
的函數上。但用戶還沒加載完內容,就開始點擊進入下一個網頁,致使關閉了當前的網頁,就會致使writen
出現錯誤。
Reads and writes can fail for a variety of reasons. The most common read failure is an
errno =ECONNRESET
error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server.The most common write failure is an
errno = EPIPE
error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.
首先將這種錯誤狀況單獨處理
int Rio_writen_w(int fd, void *usrbuf, size_t n){
if (rio_writen(fd, usrbuf, n) != n){
printf("Rio_writen_w error\n");
if(errno == EPIPE)
/* client have closed this connection */
return CLIENT_CLOSED;
return UNKOWN_ERROR;
}
return NO_ERROR;
}
而後將全部的writen_w
替換爲
if(Rio_writen_w(client->socketfd, buf, len)==CLIENT_CLOSED){
clienton = False;
break;
}
當clienton
爲false
的狀況就能夠直接跳過剩餘,直接到log
一樣的,修改read
爲
ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n,bool *serverstat){
ssize_t rc;
if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
printf("Rio_readnb_w error\n");
rc = 0;
if(errno == ECONNRESET) *serverstat = False;
}
return rc;
}
ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen,bool *serverstat){
ssize_t rc;
if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
printf("Rio_readlineb_w failed\n");
rc = 0;
if(errno == ECONNRESET) *serverstat = False;
}
return rc;
}
修改從客戶端讀取的readline
爲
Rio_readlineb_w(&rio, buf, MAXLINE,&clienton)
修改從服務器讀取的readline
爲
Rio_readlineb_w(&rio, buf, MAXLINE,&serveron)
並添加一些對於server
與client
狀態的檢查避免消耗資源。
爲什麼都適用fd
來描述套接字
從
unix
程序的角度來看,socket
是一個有相應描述符的打開文件。
爲什麼在HTTP/1.1
的狀況下,須要中斷等好久纔可以讀出來
Client 127.0.0.1 connected
error request
Client 127.0.0.1 connected
Finish parse http://www.baidu.com www.baidu.com 80
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin
while (nleft > 0) {
//在這一步出不來????
if ((nread = read(fd, bufp, nleft)) < 0) {
if (errno == EINTR) /* interrupted by sig handler return */
nread = 0; /* and call read() again */
else
return -1; /* errno set by read() */
}
else if (nread == 0)
break; /* EOF */
nleft -= nread;
bufp += nread;
}
觀察是在HTTP/1.1
的狀況下,在read
函數出不來。
猜想多是1.1
是持續連接,不存在EOF
,須要手動判斷是否該退出while
已解決,見Part3
非內存的mutex
打開時會讀到上次的值
先利用unlink
來取消連接。
sem_unlink("mutex_host");
sem_unlink("mutex_file");
if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
fprintf(stderr,"cannot create mutex");
}