web日誌採集實戰

爲了採集網站訪問日誌,構建了一套日誌採集系統,使用js探針的方式採集請求數據,避免了使用web服務器訪問日誌採集帶來的大量無效數據(js,css等的請求,佔比達到70%左右).javascript

 先來看一下總體的流程圖:css

  • 應用服務器搭建

安裝nginx,修改配置文件(/etc/nginx/conf.d/default.conf)html

server {
  listen 80;
  server_name spark2;java

  location / {
    root /data/nginx/app;
    index index.html index.htm;
    access_log on;
  }
}nginx

添加html頁面index.html,content.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>首頁</title>
</head>
<body>
<a href="content.html">hello nginx</a>

<script type="text/javascript" src="track.js"></script>
</body>
</html>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>內容</title>
</head>
<body>

<h3>來看內容啊</h3>

<script type="text/javascript" src="track.js"></script>
</body>
</html>
啓動nginx(service nginx start) 
 
  • js探針的實現

頁面嵌入jsweb

<script type="text/javascript">
    var _maq = _maq || [];
    _maq.push(['_setAccount', 'zx5352']);
 
    (function() {
        var ma = document.createElement('script'); 
        ma.type = 'text/javascript';
        ma.async = true;
        ma.src = 'http://flow.itcast.zx/ma.js';
        var s = document.getElementsByTagName('script')[0]; 
        s.parentNode.insertBefore(ma, s);
    })();
</script>

 

track.js算法

(function () {
    var params = {};
    //Document對象數據
    if(document) {
        params.domain = document.domain || ''; 
        params.url = document.URL || ''; 
        params.title = document.title || ''; 
        params.referrer = document.referrer || ''; 
    }   
    //Window對象數據
    if(window && window.screen) {
        params.sh = window.screen.height || 0;
        params.sw = window.screen.width || 0;
        params.cd = window.screen.colorDepth || 0;
    }   
    //navigator對象數據
    if(navigator) {
        params.lang = navigator.language || ''; 
    }   
    //解析_maq配置
    if(_maq) {
        for(var i in _maq) {
            switch(_maq[i][0]) {
                case '_setAccount':
                    params.account = _maq[i][1];
                    break;
                default:
                    break;
            }   
        }   
    }   
    //拼接參數串
    var args = ''; 
    for(var i in params) {
        if(args != '') {
            args += '&';
        }   
        args += i + '=' + encodeURIComponent(params[i]);
    }   
 
    //經過Image對象請求後端腳本
    var img = new Image(1, 1); 
    img.src = 'http://spark3/log.gif?' + args;
})();

 js請求的URL:後端

http://spark3/log.gif?domain=spark2&url=http://spark2/content.html&title=內容&referrer=http://spark2/&sh=768&sw=1366&cd=24&lang=zh-CN&account=hll

  

3:日誌服務器搭建緩存

1.安裝依賴ruby

yum -y install gcc perl pcre-devel openssl openssl-devel

2.上傳LuaJIT-2.0.4.tar.gz並安裝LuaJIT

tar -zxvf LuaJIT-2.0.4.tar.gz -C /usr/local/src/

cd /usr/local/src/LuaJIT-2.0.4/

make && make install PREFIX=/usr/local/luajit

3.設置環境變量

export LUAJIT_LIB=/usr/local/luajit/lib

export LUAJIT_INC=/usr/local/luajit/include/luajit-2.0

4.建立modules保存nginx的模塊

mkdir -p /usr/local/nginx/modules

 

5.上傳openresty-1.9.7.3.tar.gz和依賴的模塊lua-nginx-module-0.10.0.tarngx_devel_kit-0.2.19.tarngx_devel_kit-0.2.19.tarecho-nginx-module-0.58.tar.gz

 

6.將依賴的模塊直接解壓到/usr/local/nginx/modules目錄便可,不須要編譯安裝

tar -zxvf lua-nginx-module-0.10.0.tar.gz -C /usr/local/nginx/modules/

tar -zxvf set-misc-nginx-module-0.29.tar.gz -C /usr/local/nginx/modules/

tar -zxvf ngx_devel_kit-0.2.19.tar.gz -C /usr/local/nginx/modules/

tar -zxvf echo-nginx-module-0.58.tar.gz -C /usr/local/nginx/modules/

 

7.解壓openresty-1.9.7.3.tar.gz

tar -zxvf openresty-1.9.7.3.tar.gz -C /usr/local/src/

cd /usr/local/src/openresty-1.9.7.3/

8.編譯安裝openresty

./configure --prefix=/usr/local/openresty --with-luajit && make && make install

 

9.上傳nginx

tar -zxvf nginx-1.8.1.tar.gz -C /usr/local/src/

cd /usr/local/src/nginx-1.8.1/

10.編譯nginx並支持其餘模塊

./configure --prefix=/usr/local/nginx \

--with-ld-opt="-Wl,-rpath,/usr/local/luajit/lib" \

    --add-module=/usr/local/nginx/modules/ngx_devel_kit-0.2.19 \

    --add-module=/usr/local/nginx/modules/lua-nginx-module-0.10.0 \

    --add-module=/usr/local/nginx/modules/set-misc-nginx-module-0.29 \

    --add-module=/usr/local/nginx/modules/echo-nginx-module-0.58

make -j2

make install

 

11.修改nginx配置文件

worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format tick "$msec^A$remote_addr^A$u_domain^A$u_url^A$u_title^A$u_referrer^A$u_sh^A$u_sw^A$u_cd^A$u_lang^A$http_user_agent^A$u_utrace^A$u_account";
    
    access_log  logs/access.log  tick;

    sendfile        on;

    keepalive_timeout  65;

    server {
        listen       80;
        server_name  localhost;
        location /1.gif {
            #假裝成gif文件
            default_type image/gif;    
            #自己關閉access_log,經過subrequest記錄log
            access_log off;
        
            access_by_lua "
                -- 用戶跟蹤cookie名爲__utrace
                local uid = ngx.var.cookie___utrace        
                if not uid then
                    -- 若是沒有則生成一個跟蹤cookie,算法爲md5(時間戳+IP+客戶端信息)
                    uid = ngx.md5(ngx.now() .. ngx.var.remote_addr .. ngx.var.http_user_agent)
                end 
                ngx.header['Set-Cookie'] = {'__utrace=' .. uid .. '; path=/'}
                if ngx.var.arg_domain then
                -- 經過subrequest到/i-log記錄日誌,將參數和用戶跟蹤cookie帶過去
                    ngx.location.capture('/i-log?' .. ngx.var.args .. '&utrace=' .. uid)
                end 
            ";  
        
            #此請求不緩存
            add_header Expires "Fri, 01 Jan 1980 00:00:00 GMT";
            add_header Pragma "no-cache";
            add_header Cache-Control "no-cache, max-age=0, must-revalidate";
        
            #返回一個1×1的空gif圖片
            empty_gif;
        }   
    
        location /i-log {
            #內部location,不容許外部直接訪問
            internal;
        
            #設置變量,注意須要unescape
            set_unescape_uri $u_domain $arg_domain;
            set_unescape_uri $u_url $arg_url;
            set_unescape_uri $u_title $arg_title;
            set_unescape_uri $u_referrer $arg_referrer;
            set_unescape_uri $u_sh $arg_sh;
            set_unescape_uri $u_sw $arg_sw;
            set_unescape_uri $u_cd $arg_cd;
            set_unescape_uri $u_lang $arg_lang;
            set_unescape_uri $u_utrace $arg_utrace;
            set_unescape_uri $u_account $arg_account;
        
            #打開日誌
            log_subrequest on;
            #記錄日誌到ma.log,實際應用中最好加buffer,格式爲tick
            access_log /var/nginx_logs/ma.log tick;
        
            #輸出空字符串
            echo '';
        }
    }
}

查看日誌:

1489718383.170^A192.168.154.2^Aspark2^Ahttp://spark2/^A\xE6\xA3\xA3\xE6\xA0\xAD\xE3\x80\x89^A^A768^A1366^A24^Azh-CN^AMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36^A0f21f45cf2c1ba459e9812ee3de17d8a^Azx5352
1489718385.448^A192.168.154.2^Aspark2^Ahttp://spark2/content.html^A\xE5\x86\x85\xE5\xAE\xB9^Ahttp://spark2/^A768^A1366^A24^Azh-CN^AMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36^A0f21f45cf2c1ba459e9812ee3de17d8a^Azx5352

 

4:日誌採集

 logstash配置文件

input {
  file {
    type => "syslog"
    path => "/var/nginx_logs/track.log"
    discover_interval => 10
    start_position => "beginning" 
  }
    
}
output { stdout { codec => rubydebug } }

[root@spark3 logstash]# bin/logstash -f config/log.conf

logstash打印到屏幕的日誌

{
       "message" => "1489718383.170^A192.168.154.2^Aspark2^Ahttp://spark2/^A\\xE6\\xA3\\xA3\\xE6\\xA0\\xAD\\xE3\\x80\\x89^A^A768^A1366^A24^Azh-CN^AMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36^A0f21f45cf2c1ba459e9812ee3de17d8a^Azx5352",
      "@version" => "1",
    "@timestamp" => "2017-03-17T03:12:34.380Z",
          "path" => "/var/nginx_logs/track.log",
          "host" => "spark3",
          "type" => "syslog"
}
{
       "message" => "1489718385.448^A192.168.154.2^Aspark2^Ahttp://spark2/content.html^A\\xE5\\x86\\x85\\xE5\\xAE\\xB9^Ahttp://spark2/^A768^A1366^A24^Azh-CN^AMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36^A0f21f45cf2c1ba459e9812ee3de17d8a^Azx5352",
      "@version" => "1",
    "@timestamp" => "2017-03-17T03:12:34.906Z",
          "path" => "/var/nginx_logs/track.log",
          "host" => "spark3",
          "type" => "syslog"
}

 

  • 能夠使用logstash的filter對日誌作一些過濾,使用output組件將日誌寫入kafka或者es等存儲介質,以供後續的處理。
相關文章
相關標籤/搜索