在工做中,我在組裏負責一個Proxy(代理)的Module,這個Module是針對微軟的Office 365的郵件門戶OWA實現,工做起來後,用戶訪問Office 365 OWA,無需再輸入Office 365的網址,只需輸入咱們Proxy的地址,而後咱們會將請求轉送到Office 365 OWA,達到用戶訪問的目的,並使用戶的體驗如同實際訪問Office 365 OWA同樣。cookie
其實咱們Proxy的原理是,使用Node.js構建一個http Server,拿到client端(實際是Browser)的請求後,再將請求轉給Office 365,將Office 365的返回內容Response再送給Client端,這樣實現Proxy的功能。app
固然實際實現過程當中還有不少細節的事情,包括cookie的處理,URL的轉換等,這裏不細講。ide
但在工做中開發並維護此Module時,我發現一個問題,那就是雖然咱們是將請求轉發,但仍是有不少請求咱們須要特殊處理,並且有不少複雜的請求須要研究去支持,所以作爲Proxy我必須知道Office 365,即目標網站都有哪些請求的類型,其實就是哪些不一樣的URL,不一樣的URL其實Path不一樣。post
所以我作了一個優化,由於Proxy本質是一個Http Server,所以我將客戶端發來的全部請求URL打印在Log中,這樣我能夠在Log中收集到全部的URL,同時將該URL發送出去後收到的結果(Response Status Code)也打印在一塊兒,這樣就能知道這個URL是否處理有問題,若是返回值200,則說明OK。fetch
因而打印在Log中後,獲得以下的Log,優化
1 /___/outlook.office365.com/, 302 2 /owa/, 302 3 /__/login/login.srf, 200 4 /owa/prefetch.aspx, 200 5 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/preboot.js, 200 6 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.0.mouse.js, 200 7 /___/outlook.office365.com/GetUserRealm.srf, 200 8 /___/r1.res.office365.com/owa/prem/16.801.12.1741001/scripts/boot.worldwide.1.mouse.js, 200 9 /owa/ev.owa2, 200 10 /owa/ev.owa2, 200 11 /___/outlook.office365.com/, 302 12 /owa/ev.owa2, 200 13 /owa/, 302 14 /__/login/login.srf, 200 15 /owa/ev.owa2, 200 16 /owa/service.svc, 200 17 /owa/prefetch.aspx, 200 18 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/preboot.js, 200 19 /owa/service.svc, 200 20 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.0.mouse.js, 200 21 /owa/ev.owa2, 200 22 /owa/ev.owa2, 200 23 /owa/service.svc, 200 24 /owa/service.svc, 200 25 /___/outlook.office365.com/GetUserRealm.srf, 200 26 /___/r1.res.office365.com/owa/prem/16.807.12.1742334/scripts/boot.worldwide.1.mouse.js, 200 27 /__/login/ppsecure/post.srf, 200 28 /owa/, 302
每一行數據,前面是URL,後面是該請求收到的Response Status Code。網站
同時我本身寫了一個腳原本解析Log裏的數據,由於數據是重複的,須要去重以及排序。ui
腳本以下:this
1 var lineReader = require('line-reader'); 2 var fs = require('fs'); 3 4 var fileReadData = "URLs.log"; 5 var fileWriteData = "result.txt"; 6 7 var ignoreNormalStatusCode = false; 8 if (process.argv && process.argv[2]) { 9 ignoreNormalStatusCode = process.argv[2]; // development to be passed as param 10 } 11 12 console.log("ignoreNormalStatusCode: " + ignoreNormalStatusCode); 13 14 // create data object 15 var createDataObjectFromLine = function (str) { 16 var data = str.split(","); 17 18 var obj = { 19 url: data[0].trim(), 20 statusCode: data[1].trim(), 21 number: 1 22 }; 23 24 return obj; 25 }; 26 27 // get the index in the array 28 var indexOfObjInArray = function (array, obj) { 29 var pos = -1; 30 31 for (var i=0; i<array.length; i++) { 32 var e = array[i]; 33 34 if (e.url === obj.url && e.statusCode === obj.statusCode) { 35 pos = i; 36 break; 37 } 38 } 39 40 return pos; 41 }; 42 43 // compare number to sort 44 var compare_number = function (a, b) { 45 return b.number - a.number; 46 }; 47 48 // write the array's data to file 49 var writeResultToFile = function (result, number) { 50 var string = ""; 51 string += "Here is this URL scan result blow, \n"; 52 string += "Orignial URL number: " + number + "\n"; 53 string += "Unrepeat URL number: " + result.length + "\n"; 54 string += "------------------------------------------\n\n"; 55 string += "req url, this url's response status code (200 is ok), number statics\n"; 56 fs.appendFileSync(fileWriteData, string); 57 58 for (var i=0; i<result.length; i++) { 59 fs.appendFileSync(fileWriteData, result[i].url + ", " + result[i].statusCode + ", " + result[i].number + "\n"); 60 } 61 }; 62 63 // create an array to save the urls 64 var result = []; 65 66 // count the orignial url number 67 var number = 0; 68 69 // main function 70 lineReader.eachLine(fileReadData, function (line, last) { 71 number++; 72 73 // parse the data from every line 74 var obj = createDataObjectFromLine(line); 75 //console.log(obj); 76 77 var pos = indexOfObjInArray(result, obj); 78 if (pos != -1) { 79 // this object already exists in result array 80 result[pos].number++; 81 } 82 else { 83 if (ignoreNormalStatusCode && obj.statusCode === '200') { 84 // ... 85 } 86 else { 87 // add this obj to result 88 result.push(obj); 89 } 90 } 91 92 if (last) { 93 // sort the array by number 94 result.sort(compare_number); 95 96 // write the result to file 97 writeResultToFile(result, number); 98 99 // stop reading lines from the file 100 return false; 101 } 102 });
這裏使用了一個Node.js Module Line-reader,來從文件中一行行的讀取數據。url
這樣運行以後就能夠獲得解析後的結果,
1 Here is this URL scan result blow, 2 Orignial URL number: 142 3 Unrepeat URL number: 6 4 ------------------------------------------ 5 6 req url, this url's response status code (200 is ok), number statics 7 /owa/, 302, 10 8 /___/outlook.office365.com/, 302, 5 9 /owa/auth/15.1.225/themes/resources/segoeui-regular.ttf, 404, 3 10 /owa/auth/15.1.225/themes/resources/segoeui-semilight.ttf, 404, 1 11 /___/outlook.office365.com/favicon.ico, 302, 1 12 /owa/auth/15.1.219/themes/resources/segoeui-semilight.ttf, 404, 1
固然以上結果是沒有顯示Status Code爲200的URL,緣由是這是Proxy處理正常的URL,暫時不必統計與分析。
獲得結果後,顯而易見,有不少404的URL,咱們的Proxy並無正確的處理,須要進一步的分析,在代碼中支持。由此完成這次對產品Module的優化。
我的小感慨,工做中不少小事情,若是本身認爲正確,就應堅持去作。小的優化,只要有意義,都會有大用處:-)
Kevin Song
2015-7-22