openresty 日誌輸出的處理

時間 2019-12-13

標籤 openresty 日誌輸出處理简体版

原文原文鏈接

最近出了個故障，有個接口的請求竟然出現了長達幾十秒的處理時間，因爲日誌缺少，網絡故障也解除了，就無法再重現這個故障了。爲了能夠在下次出現問題的時候能追查到問題，因此須要添加一些追蹤日誌。
添加這些追蹤日誌，我但願可以達到以下幾點：nginx

一、只有請求超過必定時間才記錄，否則請求太多，系統扛不住api

二、添加的代碼能夠儘可能的少網絡

三、對接口的影響儘可能小，好比不影響實際時延，甚至記錄日誌時出現了錯誤，也不影響系統正常運行併發

openresty這套工具，能夠在nginx處理請求的每個階段介入，編寫代碼進行邏輯處理。其可介入的流程以下圖：
函數

log Phase這個階段，就是openresty能處理的最後階段。到這個階段的時候，實際上請求的響應已經發送給客戶端了。因此使用 log_by_lua (知乎真特麼蛋疼啊，左右下劃線就自動斜體，還沒提供轉義功能) 工具

log Phase這個階段，就是openresty能處理的最後階段。到這個階段的時候，實際上請求的響應已經發送給客戶端了。另外我也測試過了，即便在這個階段發生了錯誤，如 io 錯誤，也不會影響接口的正常響應，因此使用 log_by_lua 非常符合需求。測試

好處不止如此， log_by_lua是一個請求的最後處理階段，那麼只要請求正常進行，好比會走到這一步，所以，在這一步，咱們就知道了這個請求的耗時了。另外，則是咱們的代碼裏有很多的 ngx.exit ，若是是在業務邏輯處理的時候就記錄日誌，那麼每一個出現 ngx.exit 的地方，都須要插入寫日誌到硬盤的操做，大大增長了代碼量。ui

寫日誌到硬盤的這一步操做，能夠在 log_by_lua 這個階段來完成，剩下的另外一個問題就是每一步記錄的日誌如何傳遞到 log_by_lua 這一階段來了。lua

我處理的方式是使用ngx.ctx, 每個請求，都會有本身獨立的 ngx.ctx, 這個 ngx.ctx 會貫穿整個請求的始終，簡單的log函數以下：spa

logger.lua
--------------------------
local _M = {}

function _M.log(format, ...)
   if ngx.ctx.log_slot == nil then
        ngx.ctx.log_slot = {}
   end
   arg = {...}
   local logstr = ""
   if arg == nil then
       logstr = format
   else
       logstr = string.format(format, unpack(arg))
   end
   logstr = logstr .. "\t" .. ngx.now()
   table.insert(ngx.ctx.log_slot, logstr)
end

return _M

到了 log_by_lua 階段要把追蹤日誌寫入到硬盤裏，處理代碼以下：

log_slot.lua
---------------------
local request_time = ngx.var.request_time
if request_time < 1 then
    return  --- 小於1秒的請求不記錄
end
local slot = ngx.ctx.log_slot
if slot == nil or type(slot) ~= "table" then
    return
end
local logs = table.concat(slot, "\n")
local f = assert(io.open("/logs/trace", "a"))
f:write(logs .. "\n")
f:close()

log_by_lua 能夠用在 http 模塊，也能夠用在server模塊，也能直接精確到location模塊，即只到某個請求。因此你能夠在nginx.conf 裏的http裏添加:

http{
    log_by_lua_file '/code/log_slot.lua';
}

也能夠在server的配置裏添加：

server {
    log_by_lua_file '/code/log_slot.lua';
}

更能直接在某個接口裏添加：

/v1/test {
    content_by_lua_file '/code/v1/test.lua';
    log_by_lua_file '/code/log_slot.lua';
}

http裏添加，則對全部的server； server裏添加，則只針對此server；location裏添加，就只針對這個接口。

可是，比較坑爹的是，log_by_lua 不像 access log，能夠多層級使用。log_by_lua 在某層使用了以後，上層的 log_by_lua 就對此一層無效了。好比 /v1/test 接口添加了 log_by_lua，那麼 http 或者 server 裏添加的 log_by_lua 在接受/v1/test接口的請求時都不會被用到。

正是由於這個坑，浪費了我很多的時間來解決。咱們的系統裏，http 模塊是配置了 log_by_lua 的，用來作接口監控，監控返回的錯誤碼，處理的時延等。若是我在 /v1/test 裏添加了只針對 /v1/test 的追蹤日誌，那麼接口監控就沒法正常運行了。

不過天無絕人之路，我想到了一個處理方法以下：

monitor_log.lua
---------------------
local _M = {}

function _M.monitor_log()
    local f = _M.api_monitor_log_func
    if f == nil then
        f, err = loadfile("/code/monitor.lua")
        if f == nil then
            ngx.log(ngx.ERR, "/code/monitor.lua, ", err)
            --- 若是不存在接口監控，直接給一個空函數
            f = function() end
        end
        _M.api_monitor_log_func = f
    end
    local status, err = pcall(f)
    if not status then
        ngx.log(ngx.ERR, "run api monitor /code/monitor.lua failed", err)
    end
end

return _M

修改log_slot.lua代碼以下：

local logger = require "code.monitor"
local request_time = ngx.var.request_time
logger.monitor_log()
if request_time < 1 then
    return  --- 小於1秒的請求不記錄
end
local slot = ngx.ctx.log_slot
if slot == nil or type(slot) ~= "table" then
    return
end
local logs = table.concat(slot, "\n")
local f = assert(io.open("/logs/trace", "a"))
f:write(logs .. "\n")
f:close()

如上，就能夠進行其餘層級的 log_by_lua 代碼運行了，皆大歡喜，問題解決了。當系統併發請求較低的時候，worker夠用，則使用 log_by_lua 能夠說是毫無壞處。固然，一旦 log_by_lua 出現了故障，如死循環，則會長時間佔用worker，形成整個系統崩潰掉。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。