網絡編程：基於C語言的簡易代理服務器實現(proxylab)

時間 2019-11-11

標籤網絡編程基於 c語言簡易代理服務器實現 proxylab 欄目系統網絡简体版

原文原文鏈接

本文記錄了一個基於c socket的簡易代理服務器的實現。(CS:APP lab 10 proxy lab)git

本代理服務器支持keep-alive鏈接，將訪問記錄保存在log文件。github

Github: https://github.com/He11oLiu/proxy瀏覽器

全文分爲如下部分服務器

HINT：CS:APP對服務器的要求
Part1：迭代服務器實現 & 簡易處理(強制HTTP/1.0)
Part2：並行服務器 & 互斥量
Part3：進一步理解HTTP協議，修改處理函數使其支持keep-alive
Part4：readn與writen的優化
Q&A ：出現的問題及解決方法

HINT

[x] Be careful about memory leaks. When the processing for an HTTP request fails for any reason, the thread must close all open socket descriptors and free all memory resources before terminating.
[x] You will find it very useful to assign each thread a small unique integer ID (such as the current requestnumber) and then pass this ID as one of the arguments to the thread routine. If you display this ID ineach of your debugging output statements, then you can accurately track the activity of each thread.
[x] To avoid a potentially fatal memory leak, your threads should run as detached, not joinable (CS:APP 13.3.6).
[x] Since the log file is being written to by multiple threads, you must protect it with mutual exclusion semaphores wdfhenever you write to it (CS:APP 13.5.2 and 13.5.3).
[x] Be very careful about calling thread-unsafe functions such as inet ntoa, gethostbyname, and gethostbyaddr inside a thread. In particular, the open clientfd function in csapp.c is thread-unsafe because it calls gethostbyaddr, a Class-3 thread unsafe function (CSAPP 13.7.1).You will need to write a thread-safe version of open clientfd, called open_clientfd_ts, that uses the lock-and-copy technique (CS:APP 13.7.1) when it calls gethostbyaddr.
[x] Use the RIO (Robust I/O) package (CS:APP 11.4) for all I/O on sockets. Do not use standard I/O onsockets. You will quickly run into problems if you do. However, standard I/O calls such as fopenand fwrite are fine for I/O on the log file.
[x] The Rio_readn, Rio_readlineb, and Rio writen error checking wrappers in csapp.c arenot appropriate for a realistic proxy because they terminate the process when they encounter an error. Instead, you should write new wrappers called Rio readn w, Rio readlineb w, and Rio writen w that simply return after printing a warning message when I/O fails. When either of the read wrappers detects an error, it should return 0, as though it encountered EOF on the socket.
[x] Reads and writes can fail for a variety of reasons. The most common read failure is an errno =ECONNRESET error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server. The most common write failure is an errno =EPIPE error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.
[x] Writing to connection that has been closed by the peer first time elicits an error with errno set to EPIPE. Writing to such a connection a second time elicits a SIGPIPE signal whose default action isto terminate the process. To keep your proxy from crashing you can use the SIGIGN argument to th esignal function (CS:APP 8.5.3) to explicitly ignore these SIGPIPE signals

Part 1

Implementing a Sequential Web Proxymarkdown

簡易`proxy lab`雛形

服務器框架多線程

int main(int argc, char **argv){
    int lisenfd, port;
    unsigned int clientlen;
    clientinfo* client;


    /* Ignore SIGPIPE */
    Signal(SIGPIPE, SIG_IGN);

    if (argc != 2){
        fprintf(stderr, "usage:%s <port>\n", argv[0]);
        exit(1);
    }
    port = atoi(argv[1]);

    /* open log file */
    logfile = fopen("proxylog","w");

    lisenfd = Open_listenfd(port);
    clientlen = sizeof(struct sockaddr_in);

    while (1){
        /* Create a new memory area to pass arguments to doit */
        /* It will be free by doit */
        client = (clientinfo*)Malloc(sizeof(clientinfo));
        client->socketfd = Accept(lisenfd, (SA *)&client->clientaddr, &clientlen);
        printf("Client %s connected\n",inet_ntoa(client->clientaddr.sin_addr));
        doit(client);


    }
    return 0;
}

做爲最第一版本，先完成一個迭代服務器，而非並行服務器，這類服務器的框架相對簡單，這個部分主要測試對於期功能的理解，並在只針對一個用戶接入的狀況下進行處理。app

服務器框架可簡化爲以下，其中doit()爲實際處理客戶端請求的函數。框架

init_server();
while(1){
    accept();
    doit();
}

`doit()`處理客戶端的請求

對於代理的處理條例很清晰socket

獲取從客戶發來的HTTP請求
拆解其中的uri
鏈接服務器，並從新發送HTTP請求
獲取服務器的反饋並輸出給客戶端
記錄該條訪問記錄

/* * doit */
void doit(clientinfo *client){
    int serverfd;
    char buf[MAXLINE],method[MAXLINE],uri[MAXLINE],version[MAXLINE];
    char hostname[MAXLINE],pathname[MAXLINE];
    int port;
    char errorstr[MAXLINE];
    char logstring[MAXLINE];
    rio_t rio;
    ssize_t len = 0;
    int resplen = 0;

    /* init args */
    Rio_readinitb(&rio,client->socketfd);
    Rio_readlineb(&rio,buf,MAXLINE);


    sscanf(buf,"%s %s %s",method,uri,version);

    if(strcmp(method,"GET")){
        fprintf(stderr, "error request\n");
        sprintf(errorstr,"%s Not Implement",method);
        clienterror(client->socketfd, method, "501","Not Implement", errorstr);
        Close(client->socketfd);
        return;
    }
    if(parse_uri(uri,hostname,pathname,&port)!=0){
        fprintf(stderr, "parse error\n");
        clienterror(client->socketfd, method, "400","uri error","URI error");
        Close(client->socketfd);
        return;
    }
#ifdef DEBUG
    printf("Finish parse %s %s %s %d\n",uri,hostname,pathname,port);
#endif

    /* connect to server */

    if((serverfd=open_clientfd(hostname,port))<0){
        printf("Cannot connect to server %s %d\n",hostname,port);
        clienterror(client->socketfd, method, "302","Server not found", "Server not found");
        Close(client->socketfd);
        return;
    }

    /* generate and push the request to server */
    if(pathname[0]=='\0') strcpy(pathname,"/");
    if(strcmp("HTTP/1.0",version)!=0) printf("Only support HTTP/1.0");
    sprintf(buf,"%s %s HTTP/1.0\r\n",method, pathname);
    Rio_writen(serverfd,buf,strlen(buf));
    sprintf(buf,"Host: %s\r\n",hostname);
    Rio_writen(serverfd,buf,strlen(buf));
    sprintf(buf,"\r\n");
    Rio_writen(serverfd,buf,strlen(buf));

    /* receive the response from server */
    Rio_readinitb(&rio, serverfd);
    while((len = rio_readnb(&rio, buf, MAXLINE)>0)){
        Rio_writen(client->socketfd, buf, MAXLINE);
        resplen += MAXLINE - len;
        memset(buf, 0, MAXLINE);
    }

    format_log_entry(logstring, &client->clientaddr, uri, resplen);
    fprintf(logfile, "%s\n", logstring);
    close(client->socketfd);
    close(serverfd);
    /* free the clientinfo space */
    free(client);
}

在這裏遇到Q&A中的第二個問題，沒法支持HTTP/1.1ide

嘗試直接在設置中接入此proxy

而網頁通常發出爲HTTP/1.1，致使也存在卡在read的狀況，須要特殊處理

另外一種嘗試

可是因爲瀏覽器發出的變量中有要求keep-alive的，致使read不能用，仍是放棄此種方法。

/* Or just copy the HTTP request from client */
Rio_writen_w(serverfd, buf, strlen(buf));
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
    Rio_writen_w(serverfd, buf,len);
    if (!strcmp(buf, "\r\n")) /* End of request */
        break;
    memset(buf,0,MAXLINE);
}

`Parse_uri`的小BUG

hostend = strpbrk(hostbegin, " :/\r\n\0");
/* when no ':' show up in the end,hostend may be NULL */
if(hostend == NULL) hostend = hostbegin + strlen(hostbegin);

簡易代理測試

設置http代理

嘗試鏈接

Part 2

Dealing with multiple requests concurrently

多線程

支持多線程是很是簡單的，可是稍微複雜一點的是後面的互斥量處理。

這裏先新寫一個線程處理函數。

void *thread_handler(void *arg){
    doit((clientinfo*)arg);
    return NULL;
}

而後在原來的doit的地方改成

Pthread_create(&thread, NULL, thread_handler, client);
Pthread_detach(thread);

如今服務器的框架以下：

main(){
  init_server();
  while(1){
      accept();
      create_newThread(handler,arg);
  }
}
//每一個線程的處理
handler(arg){
    initThread();
    doit(arg);
}

互斥量

因爲在macOS中的sem_init已經被標記爲__deprecated，內存中的互斥量已經不能用了。這裏改成基於文件的sem_open來替代sem_init。

/* Mutex semaphores */
sem_t *mutex_host, *mutex_file;

if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
    fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
    fprintf(stderr,"cannot create mutex");
}

在文檔中提到過open_client中因爲調用了getaddrbyhost，必需要在調用以前獲取互斥量，故完成新的open_clientfd。

在CSAPP中打包好了PV原語的接口，能夠直接調用。

原來的open_clientfd的實現方法以下，只用在注視掉的地方加上PV原語保證只有一個thread在cs區域便可。

/* Fill in the server's IP address and port */
    bzero((char *) &serveraddr, sizeof(serveraddr));
    serveraddr.sin_family = AF_INET;
    //P(mutex_host);
    if ((hp = gethostbyname(hostname)) == NULL)
        return -2; /* check h_errno for cause of error */
    bcopy((char *)hp->h_addr_list[0],
          (char *)&serveraddr.sin_addr.s_addr, hp->h_length);
    serveraddr.sin_port = htons(port);
    //V(mutex_host);

對於文件，進行相似操做

format_log_entry(logstring, &client->clientaddr, uri, resplen); 
P(mutex_file);
fprintf(logfile, "%s\n", logstring);
V(mutex_file);

爲了可以在服務器運行的時候打開文件，將文件操做修改成以下：

format_log_entry(logstring, &client->clientaddr, uri, resplen); 
P(mutex_file);
logfile = fopen("proxy.log","a");
fprintf(logfile, "%s\n", logstring);
fclose(logfile);
V(mutex_file);

thread_id

利用一個全局變量來記錄當前thread的id。並經過clientinfo將其傳走。

/* thread id */
unsigned long tid = 0;

printf("Client %s connected tid = %zd\n",inet_ntoa(client->clientaddr.sin_addr),tid);
client->tid = tid ++;
Pthread_create(&thread, NULL, thread_handler, client);

`Rio_xxx_w`

因爲Rio_writen與Rio_readnb遇到錯誤時會直接unix_error。爲了保證服務器繼續運行，須要將其改成打印錯誤並返回。

void Rio_writen_w(int fd, void *usrbuf, size_t n){
    if (rio_writen(fd, usrbuf, n) != n)
        printf("Rio_writen_w error\n");
}

ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n){
    ssize_t rc;
    if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
        printf("Rio_readnb_w error\n");
        rc = 0;
    }
    return rc;
}

ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen){
    ssize_t rc;
    if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
        printf("Rio_readlineb_w failed\n");
        return 0;
    }
    return rc;
}

Part3

解決`HTTP/1.1`以及圖片加載問題

在解決以前，在Github上轉了一圈，所看有限幾個repo中有的繞過了這個部分，直接像上面同樣直接解析發送HTTP/1.0的請求，有的直接無差異用readline致使圖片等文件仍然陷入read致使必須等待對方服務器斷開鏈接後才能讀到完整數據從read中出來，而致使網頁加載速度奇慢。

下面就從HTTP的協議入手，尋找一個妥善的方法解決該問題。

當客戶端請求時是Connection: keep-alive的時候，服務器返回的形式Transfer-Encoding: chunked的形式，以確保頁面數據是否結束，長鏈接就是這種方式，用chunked形式就不能用content-length

content-length設置響應消息的實體內容的大小，單位爲字節。對於HTTP協議來講，這個方法就是設置Content-Length響應頭字段的值。

由於當瀏覽器與WEB服務器之間使用持久(keep-alive)的HTTP鏈接，若是WEB服務器沒有采用chunked傳輸編碼方式，那麼它必須在每個應答中發送一個Content-Length的響應頭來表示各個實體內容的長度，以便客戶端可以分辨出上一個響應內容的結束位置。

當不是keep-alive，就是經常使用短鏈接形式，會直接把鏈接關掉，不須要長度。

服務器上取得是動態內容，全部沒有content-length這項

若是是靜態頁面，則有content-length

故，對於服務器傳回來的信息，不能直接無腦讀，要對頭部進行解析。對於服務器傳回來的信息進行處理的步驟以下：

讀頭，頭裏面有幾個比較重要的信息
- \n\r表明着頭的結束。
- Content-Length:條目表明着時明確給出長度的case，須要記錄下長度的大小
- Transfer-Encoding:Chunked條目表明着屬於Chunked編碼的case，在後面用readline進行處理。
讀body
- 若爲Chunked編碼，則直接使用readline進行讀取。若讀到0/r/n時，表明當前的body已經結束。退出循環。
- 如有content-length屬性，則利用read_size = MAXLINE > content_length ?content_length : MAXLINE計算每次須要讀取的byte，而後調用readnb來精確讀取字節。當讀取到指定字節表明着body結束，退出循環。

這樣能夠解決keep-alive致使的問題。

/* Receive response from target server and forward to client */
Rio_readinitb(&rio, serverfd);
/* Read head */
while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
    /* Fix bug of return value when response line exceeds MAXLINE */
    if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
    /* when found "\r\n" means head ends */
    if (!strcmp(buf, "\r\n")){
        Rio_writen_w(client->socketfd, buf, len);
        break;
    }
    if (!strncasecmp(buf, "Content-Length:", 15)) {
        sscanf(buf + 15, "%u", &content_length);
        chunked = False;
    }
    if (!strncasecmp(buf, "Transfer-Encoding:", sizeof("Transfer-Encoding:"))) {
        if(strstr(buf,"chunked")!=NULL || strstr(buf,"Chunked")!=NULL)
            chunked = True;
    }

    /* Send the response line to client and count the total len */
    Rio_writen_w(client->socketfd, buf, len);
    recv_len += len;
}

/* Read body */
if(chunked){
    /* Transfer-Encoding:chuncked */
    while ((len = Rio_readlineb_w(&rio, buf, MAXLINE)) != 0) {
        /* Fix bug of return value when response line exceeds MAXLINE */
        if (len == MAXLINE && buf[MAXLINE - 2] != '\n') --len;
        /* Send the response line to client and count the total len */
        Rio_writen_w(client->socketfd, buf, len);
        recv_len += len;
        /* End of response */
        if (!strcmp(buf, "0\r\n")) {
            Rio_writen_w(client->socketfd, "0\r\n", 2);
            recv_len += 2;
            break;
        }
    }
}
else{
    read_size = MAXLINE > content_length?content_length:MAXLINE;
    while((len = Rio_readnb_w(&rio,buf,read_size))!=0){
        content_length -= len;
        recv_len += len;
        Rio_writen_w(client->socketfd, buf, len);
        if(content_length == 0) break;
        read_size = MAXLINE > content_length?content_length:MAXLINE;
    }
}

固然這不是真正意義上的keep-alive。要作到持續連接少TCP創建幾回，須要利用循環，再回到上面從客戶端獲取信息。

Part4

再次回到writen與readn的函數上。但用戶還沒加載完內容，就開始點擊進入下一個網頁，致使關閉了當前的網頁，就會致使writen出現錯誤。

Reads and writes can fail for a variety of reasons. The most common read failure is an errno =ECONNRESET error caused by reading from a connection that has already been closed by the peeron the other end, typically an overloaded end server.

The most common write failure is an errno = EPIPE error caused by writing to a connection that has been closed by its peer on the other end. This can occur for example, when a user hits their browser’s Stop button during a long transfer.

首先將這種錯誤狀況單獨處理

int Rio_writen_w(int fd, void *usrbuf, size_t n){
    if (rio_writen(fd, usrbuf, n) != n){
        printf("Rio_writen_w error\n");
        if(errno == EPIPE)
            /* client have closed this connection */
            return CLIENT_CLOSED;
        return UNKOWN_ERROR;
    }
    return NO_ERROR;
}

而後將全部的writen_w替換爲

if(Rio_writen_w(client->socketfd, buf, len)==CLIENT_CLOSED){
    clienton = False; 
    break;
}

當clienton爲false的狀況就能夠直接跳過剩餘，直接到log

一樣的，修改read爲

ssize_t Rio_readnb_w(rio_t *rp, void *usrbuf, size_t n,bool *serverstat){
    ssize_t rc;
    if ((rc = rio_readnb(rp, usrbuf, n)) < 0) {
        printf("Rio_readnb_w error\n");
        rc = 0;
        if(errno == ECONNRESET) *serverstat = False;
    }
    return rc;
}

ssize_t Rio_readlineb_w(rio_t *rp, void *usrbuf, size_t maxlen,bool *serverstat){
    ssize_t rc;
    if ((rc = rio_readlineb(rp, usrbuf, maxlen)) < 0) {
        printf("Rio_readlineb_w failed\n");
        rc = 0;
        if(errno == ECONNRESET) *serverstat = False;
    }
    return rc;
}

修改從客戶端讀取的readline爲

Rio_readlineb_w(&rio, buf, MAXLINE,&clienton)

修改從服務器讀取的readline爲

Rio_readlineb_w(&rio, buf, MAXLINE,&serveron)

並添加一些對於server與client狀態的檢查避免消耗資源。

Q&A

爲什麼都適用fd來描述套接字

從unix程序的角度來看，socket是一個有相應描述符的打開文件。

爲什麼在HTTP/1.1的狀況下，須要中斷等好久纔可以讀出來

Client 127.0.0.1 connected
error request
Client 127.0.0.1 connected
Finish parse http://www.baidu.com www.baidu.com  80
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin
Interrupted and Rebegin

while (nleft > 0) {
    //在這一步出不來？？？？
  if ((nread = read(fd, bufp, nleft)) < 0) {
      if (errno == EINTR) /* interrupted by sig handler return */
          nread = 0;      /* and call read() again */
      else
          return -1;      /* errno set by read() */
  }
  else if (nread == 0)
      break;              /* EOF */
  nleft -= nread;
  bufp += nread;
}

觀察是在HTTP/1.1的狀況下，在read函數出不來。
猜想多是1.1是持續連接，不存在EOF，須要手動判斷是否該退出while

已解決，見Part3

非內存的mutex打開時會讀到上次的值

先利用unlink來取消連接。

sem_unlink("mutex_host");
sem_unlink("mutex_file");
if((mutex_host = sem_open("mutex_host",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
  fprintf(stderr,"cannot create mutex");
}
if((mutex_file = sem_open("mutex_file",O_CREAT,S_IRUSR | S_IWUSR, 1))==NULL){
  fprintf(stderr,"cannot create mutex");
}