【工做】Proxy Server的優化 - 檢測目標網站URL變化

時間 2019-12-14

標籤工做 proxy server 優化檢測目標網站 url 變化欄目網站開發简体版

原文原文鏈接

　　在工做中，我在組裏負責一個Proxy（代理）的Module，這個Module是針對微軟的Office 365的郵件門戶OWA實現，工做起來後，用戶訪問Office 365 OWA，無需再輸入Office 365的網址，只需輸入咱們Proxy的地址，而後咱們會將請求轉送到Office 365 OWA，達到用戶訪問的目的，並使用戶的體驗如同實際訪問Office 365 OWA同樣。cookie

　　其實咱們Proxy的原理是，使用Node.js構建一個http Server，拿到client端（實際是Browser）的請求後，再將請求轉給Office 365，將Office 365的返回內容Response再送給Client端，這樣實現Proxy的功能。app

　　固然實際實現過程當中還有不少細節的事情，包括cookie的處理，URL的轉換等，這裏不細講。ide

　　但在工做中開發並維護此Module時，我發現一個問題，那就是雖然咱們是將請求轉發，但仍是有不少請求咱們須要特殊處理，並且有不少複雜的請求須要研究去支持，所以作爲Proxy我必須知道Office 365，即目標網站都有哪些請求的類型，其實就是哪些不一樣的URL，不一樣的URL其實Path不一樣。post

　　所以我作了一個優化，由於Proxy本質是一個Http Server，所以我將客戶端發來的全部請求URL打印在Log中，這樣我能夠在Log中收集到全部的URL，同時將該URL發送出去後收到的結果（Response Status Code）也打印在一塊兒，這樣就能知道這個URL是否處理有問題，若是返回值200，則說明OK。fetch

　　因而打印在Log中後，獲得以下的Log，優化

 1 /___/outlook.office365.com/, 302
 2 /owa/, 302
 3 /__/login/login.srf, 200
 4 /owa/prefetch.aspx, 200
 5 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/preboot.js, 200
 6 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.0.mouse.js, 200
 7 /___/outlook.office365.com/GetUserRealm.srf, 200
 8 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.1.mouse.js, 200
 9 /owa/ev.owa2, 200
10 /owa/ev.owa2, 200
11 /___/outlook.office365.com/, 302
12 /owa/ev.owa2, 200
13 /owa/, 302
14 /__/login/login.srf, 200
15 /owa/ev.owa2, 200
16 /owa/service.svc, 200
17 /owa/prefetch.aspx, 200
18 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/preboot.js, 200
19 /owa/service.svc, 200
20 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.0.mouse.js, 200
21 /owa/ev.owa2, 200
22 /owa/ev.owa2, 200
23 /owa/service.svc, 200
24 /owa/service.svc, 200
25 /___/outlook.office365.com/GetUserRealm.srf, 200
26 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.1.mouse.js, 200
27 /__/login/ppsecure/post.srf, 200
28 /owa/, 302

　　每一行數據，前面是URL，後面是該請求收到的Response Status Code。網站

　　同時我本身寫了一個腳原本解析Log裏的數據，由於數據是重複的，須要去重以及排序。ui

　　腳本以下：this

  1 var lineReader = require('line-reader');
  2 var fs         = require('fs');
  3 
  4 var fileReadData  = "URLs.log";
  5 var fileWriteData = "result.txt";
  6 
  7 var ignoreNormalStatusCode = false;
  8 if (process.argv && process.argv[2]) {
  9     ignoreNormalStatusCode = process.argv[2];                  // development to be passed as param
 10 }
 11 
 12 console.log("ignoreNormalStatusCode: " + ignoreNormalStatusCode);
 13 
 14 // create data object
 15 var createDataObjectFromLine = function (str) {
 16     var data = str.split(",");
 17 
 18     var obj = {
 19         url: data[0].trim(),
 20         statusCode: data[1].trim(),
 21         number: 1
 22     };
 23 
 24     return obj;
 25 };
 26 
 27 // get the index in the array
 28 var indexOfObjInArray = function (array, obj) {
 29     var pos = -1;
 30     
 31     for (var i=0; i<array.length; i++) {
 32         var e = array[i];
 33 
 34         if (e.url === obj.url && e.statusCode === obj.statusCode) {
 35             pos = i;
 36             break;
 37         }
 38     }
 39 
 40     return pos;
 41 };
 42 
 43 // compare number to sort
 44 var compare_number = function (a, b) {
 45     return b.number - a.number;
 46 };
 47 
 48 // write the array's data to file
 49 var writeResultToFile = function (result, number) {
 50     var string = "";
 51     string += "Here is this URL scan result blow, \n";
 52     string += "Orignial URL number: " + number + "\n";
 53     string += "Unrepeat URL number: " + result.length + "\n";
 54     string += "------------------------------------------\n\n";
 55     string += "req url, this url's response status code (200 is ok), number statics\n";
 56     fs.appendFileSync(fileWriteData, string);
 57 
 58     for (var i=0; i<result.length; i++) {
 59         fs.appendFileSync(fileWriteData, result[i].url + ", " + result[i].statusCode + ", " + result[i].number + "\n");
 60     }
 61 };
 62 
 63 // create an array to save the urls
 64 var result = [];
 65 
 66 // count the orignial url number
 67 var number = 0;
 68 
 69 // main function
 70 lineReader.eachLine(fileReadData, function (line, last) {
 71     number++;
 72 
 73     // parse the data from every line
 74     var obj = createDataObjectFromLine(line);
 75     //console.log(obj);
 76     
 77     var pos = indexOfObjInArray(result, obj);
 78     if (pos != -1) {
 79         // this object already exists in result array
 80         result[pos].number++;
 81     }
 82     else {
 83         if (ignoreNormalStatusCode && obj.statusCode === '200') {
 84             // ...
 85         }
 86         else {
 87             // add this obj to result
 88             result.push(obj);
 89         }
 90     }
 91     
 92     if (last) {
 93         // sort the array by number
 94         result.sort(compare_number);
 95 
 96         // write the result to file
 97         writeResultToFile(result, number);
 98 
 99         // stop reading lines from the file
100         return false;
101     }
102 });

　　這裏使用了一個Node.js Module Line-reader，來從文件中一行行的讀取數據。url

　　這樣運行以後就能夠獲得解析後的結果，

 1 Here is this URL scan result blow, 
 2 Orignial URL number: 142
 3 Unrepeat URL number: 6
 4 ------------------------------------------
 5 
 6 req url, this url's response status code (200 is ok), number statics
 7 /owa/, 302, 10
 8 /___/outlook.office365.com/, 302, 5
 9 /owa/auth/15.1.225/themes/resources/segoeui-regular.ttf, 404, 3
10 /owa/auth/15.1.225/themes/resources/segoeui-semilight.ttf, 404, 1
11 /___/outlook.office365.com/favicon.ico, 302, 1
12 /owa/auth/15.1.219/themes/resources/segoeui-semilight.ttf, 404, 1