基於統計方法對時序數據進行不一樣指標(均值、方差、散度、峯度等)結果的判別,經過必定的人工經驗設定閾值進行告警。同時能夠引入時序歷史數據利用環比、同比等策略,經過必定的人工經驗設定閾值進行告警。
經過創建不一樣的統計指標:窗口均值變化、窗口方差變化等能夠較好的解決下圖中(1,2,5)所對應的異常點檢測;經過局部極值能夠檢測出圖(4)對應的尖點信息;經過時序預測模型能夠較好的找到圖(3,6)對應的變化趨勢,檢測出不符合規律的異常點。
如何判別異常?html
PS:git
什麼是無監督方法:是否有監督(supervised),主要看待建模的數據是否有標籤(label)。若輸入數據有標籤,則爲有監督學習;沒標籤則爲無監督學習。
爲什麼須要引入無監督方法:在監控創建的初期,用戶的反饋是很是稀少且珍貴的,在沒有用戶反饋的狀況下,爲了快速創建可靠的監控策略,所以引入無監督方法。
針對單維度指標
針對單維度指標github
iForest(IsolationForest)是基於集成的異常檢測方法算法
幾點說明網絡
論文題目:《Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications》(WWW 2018)架構
標註異常這件事兒,自己很複雜?機器學習
經常使用的有監督的機器學習方法學習
時序分析url
模式分析spa
海量文本智能聚類
具體的SQL邏輯以下:
* | select time, buffer_cnt, log_cnt, buffer_rate, failed_cnt, first_play_cnt, fail_rate from ( select date_trunc('minute', time) as time, sum(buffer_cnt) as buffer_cnt, sum(log_cnt) as log_cnt, case when is_nan(sum(buffer_cnt)*1.0 / sum(log_cnt)) then 0.0 else sum(buffer_cnt)*1.0 / sum(log_cnt) end as buffer_rate, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt , case when is_nan(sum(failed_cnt)*1.0 / sum(first_play_cnt)) then 0.0 else sum(failed_cnt)*1.0 / sum(first_play_cnt) end as fail_rate from log group by time order by time ) limit 100000
具體的SQL邏輯以下:
* | select time, log_cnt_cmp[1] as log_cnt_now, log_cnt_cmp[2] as log_cnt_old, case when is_nan(buffer_rate_cmp[1]) then 0.0 else buffer_rate_cmp[1] end as buf_rate_now, case when is_nan(buffer_rate_cmp[2]) then 0.0 else buffer_rate_cmp[2] end as buf_rate_old, case when is_nan(fail_rate_cmp[1]) then 0.0 else fail_rate_cmp[1] end as fail_rate_now, case when is_nan(fail_rate_cmp[2]) then 0.0 else fail_rate_cmp[2] end as fail_rate_old from ( select time, ts_compare(log_cnt, 86400) as log_cnt_cmp, ts_compare(buffer_rate, 86400) as buffer_rate_cmp, ts_compare(fail_rate, 86400) as fail_rate_cmp from ( select date_trunc('minute', time - time % 120) as time, sum(buffer_cnt) as buffer_cnt, sum(log_cnt) as log_cnt, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt , sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate from log group by time order by time) group by time) where time is not null limit 1000000
具體的SQL邏輯以下:
* | select time, case when is_nan(buffer_rate) then 0.0 else buffer_rate end as show_index, isp as index from (select date_trunc('minute', time) as time, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate, sum(log_cnt) as log_cnt, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt, isp from log group by time, isp order by time) limit 200000
* | select res.name from ( select ts_anomaly_filter(province, res[1], res[2], res[3], res[6], 100, 0) as res from ( select t1.province as province, array_transpose( ts_predicate_arma(t1.time, t1.show_index, 5, 1, 1) ) as res from ( select province, time, case when is_nan(buffer_rate) then 0.0 else buffer_rate end as show_index from ( select province, time, sum(buffer_cnt)*1.0 / sum(log_cnt) as buffer_rate, sum(failed_cnt)*1.0 / sum(first_play_cnt) as fail_rate, sum(log_cnt) as log_cnt, sum(failed_cnt) as failed_cnt, sum(first_play_cnt) as first_play_cnt from log group by province, time) ) t1 inner join ( select DISTINCT province from ( select province, time, sum(log_cnt) as total from log group by province, time ) where total > 200 ) t2 on t1.province = t2.province group by t1.province ) ) limit 100000
具體的SQL的語法分析邏輯能夠參照以前的文章:SLS機器學習最佳實戰:批量時序異常檢測
本文爲雲棲社區原創內容,未經容許不得轉載。