MySQL查詢優化之性能提高一個數量級

    這段時間一直在用kettle作數據抽取和報表,寫SQL即是屢見不鮮了,200行+SQL常常要寫。甚至寫過最長的一個SQL500多行將近600行。這麼長的SQL估計大部分人連看的意願都沒有,讀起來也比較坑爹,我通常是把這種長SQL分紅幾個子SQL,測試好了再組裝起來。SQL語句寫的越多也就越可能出現性能問題。優化SQL能夠從不少細節入手,好比加索引,但也不是萬能的,當SQL達到必定規模,從結構上優化纔是根本解決問題的辦法,固然前提是改加的索引已經加了,大部分能夠從局部優化的細節已經注意到了。html

    和往常同樣,一個新的需求須要從大概10個表中抽取數據,大部分表數據量都在四十萬左右,最多的表有100萬左右。說真的數據並不算多,可是這麼多遍鏈接後,若是SQL有的有問題查詢效率也是很是低的。一開始我按照本身的思路寫了一個SQL,只考慮需求和最短期內實現。mysql

部分SQL以下圖,SQL已經超過200行了:sql

執行結果以下圖:性能

    只查詢了38行記錄,盡然花了將近10s,感受已經很慢了。學習

此時我精簡SQL的大概結構以下:測試

 SELECT 
    *
FROM
    (SELECT 
        *
    FROM
        A m
    INNER JOIN B pm ON pm.id_sour = m.pk_id
    LEFT JOIN (SELECT 
        *
    FROM
        C
    WHERE
        is_bring IS NULL OR is_bring = 0
    GROUP BY id_m) pd ON m.pk_id = pd.id_m
    LEFT JOIN (SELECT 
        *
    FROM
        D sd
    INNER JOIN E si ON sd.id_ser = si.pk_id
    GROUP BY sd.id_m) sd ON m.pk_id = sd.id_m
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
        *
    FROM
        F m
    INNER JOIN G pm ON pm.id_sour = m.pk_id
    LEFT JOIN (SELECT 
        *
    FROM
        H
    WHERE
        is_bring IS NULL OR is_bring = 0
    GROUP BY id_m) pd ON m.pk_id = pd.id_m
    LEFT JOIN (SELECT 
        *
    FROM
        I sd
    INNER JOIN E si ON sd.id_ser = si.pk_id
    GROUP BY sd.id_m) sd ON m.pk_id = sd.id_m
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
            *
    FROM
        F m
    INNER JOIN G pm ON pm.id_sour = m.pk_id
    LEFT JOIN (SELECT 
        *
    FROM
        H
    WHERE
        is_bring IS NULL OR is_bring = 0
    GROUP BY id_m) pd ON m.pk_id = pd.id_m
    LEFT JOIN (SELECT 
        *
    FROM
        I sd
    INNER JOIN E si ON sd.id_ser = si.pk_id
    GROUP BY sd.id_m) sd ON m.pk_id = sd.id_m
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('')) t1
        LEFT JOIN
    (SELECT 
        *
    FROM
        J sb
    INNER JOIN (SELECT 
        m.pk_id AS pk_id, pm.m_time AS m_time
    FROM
        A m
    INNER JOIN B pm ON pm.id_sour = m.pk_id
    WHERE
        pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND pm.status = '' UNION ALL SELECT 
        m.from_mid_sn AS pk_id,
            pm.m_time AS m_time
    FROM
        F m
    INNER JOIN G pm ON pm.id_sour = m.pk_id
    WHERE
        pm.time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND pm.status = '') mp ON mp.pk_id = sb.id_sour
    WHERE
        sb.c_time <= mp.m_time
    GROUP BY sb.id_sour , mp.m_time) t2 ON t1.id_m = CAST(t2.id_sour AS CHAR)
        AND t1.m_time_cost = t2.m_time

 再精簡一下結構以下:優化

SELECT 
    *
FROM
    (SELECT 
        *
    FROM
        A UNION ALL SELECT 
        *
    FROM
        B UNION ALL SELECT 
        *
    FROM
        C) t1
        LEFT JOIN
    ((SELECT 
        *
    FROM
        D)
    INNER JOIN (SELECT 
        *
    FROM
        E UNION ALL SELECT 
        *
    FROM
        F) t2 ON t1.id = t2.id) t3 ON t1.tid = t3.id

其中上面的A、B、C、D、E、F都是10個表中多個表的鏈接查詢的結果。其實以上SQL在咱們實現的時候就作過簡單的優化了,t3其實能夠放進t1中分別和A、B、C鏈接。但其實A、B、C、已經鏈接好多表了,在分別鏈接t3性能會產生更多的數據,效率會更低。spa

    因爲是數據抽取,數據只是存儲到指定的事實表中。所以對效率沒過高的要求,一分鐘以內都是能夠接受的。原本想這樣就算了,還有堆事要幹。剛好手裏有一段相似邏輯的SQL,可是不徹底同樣。而後我就跑了一下。發現比我寫的快一個數量級,大吃一驚之餘我決定探索一下緣由。3d

  精簡優化過的SQL代碼以下:code

SELECT 
    *
FROM
    (SELECT 
        *
    FROM
        A m
    INNER JOIN (SELECT * FROM B where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
        *
    FROM
        F m
    INNER JOIN (SELECT * FROM G where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
        *
    FROM
        F m
    INNER JOIN (SELECT * FROM G where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('')) mm
        LEFT JOIN
    (SELECT 
        *
    FROM
        J sb
    INNER JOIN (SELECT 
        m.pk_id AS pk_id, pm.m_time AS m_time
    FROM
        A m
    INNER JOIN B pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59' UNION ALL SELECT 
        m.from_mid_sn AS pk_id,
            pm.m_time AS m_time
    FROM
        F m
    INNER JOIN (SELECT * FROM G where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') mp ON mp.pk_id = sb.id_sour
    WHERE
        sb.c_time <= mp.m_time
    GROUP BY sb.id_sour , mp.m_time) cost ON cost.id_sour = mm.id_m
        AND cost.m_time = mm.m_time_cost
        LEFT JOIN
    (SELECT 
        *
    FROM
        D sd
    INNER JOIN E si ON sd.id_ser = si.pk_id
    INNER JOIN (SELECT DISTINCT
        *
    FROM
        A m
    INNER JOIN B pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') ms ON sd.id_m = ms.pk_id
    GROUP BY sd.id_m UNION ALL SELECT 
        *
    FROM
        I sd
    INNER JOIN E si ON sd.id_ser = si.pk_id
    INNER JOIN (SELECT DISTINCT
        m.pk_id, from_mid_sn, pm.m_time
    FROM
        F m
    INNER JOIN G pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') ms ON sd.id_m = ms.pk_id
    GROUP BY sd.id_m) ser ON ser.id_m = mm.id_m
        AND ser.m_time = mm.m_time_cost
        LEFT JOIN
    (SELECT 
        *
    FROM
        C pd
    INNER JOIN (SELECT DISTINCT
        m.pk_id, pm.m_time
    FROM
        A m
    INNER JOIN B pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') ms ON ms.pk_id = pd.id_m
    WHERE
        is_bring IS NULL OR is_bring = 0
    GROUP BY pd.id_m , ms.m_time UNION ALL SELECT 
        *
    FROM
        H pd
    INNER JOIN (SELECT DISTINCT
        m.pk_id, pm.m_time, from_mid_sn
    FROM
        F m
    INNER JOIN G pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND m.type IN ('')
            AND m.is_del = 0
            AND m.is_mig = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') ms ON ms.pk_id = pd.id_m
    WHERE
        is_bring IS NULL OR is_bring = 0
    GROUP BY pd.id_m) part ON part.id_m = mm.id_m
        AND part.m_time = mm.m_time_cost

        運行此代碼結果以下:

        一樣的結果,效率整整提高了一個數量級,哇咔咔。。。其實寫出以前讓我參考的效率較高的SQL的一位妹子。在我公司,你們稱之爲SQL女神,果真名不虛傳。佩服之餘我要要要學習一下。

仔細分析以上優化過的SQL,實際上是巧妙的使用了某種規律,我稱之爲---SQL分配率和結合律。

最左側的子SQL(或者臨時表:mm)以下:

SELECT 
        *
    FROM
        A m
    INNER JOIN (SELECT * FROM B where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
        *
    FROM
        F m
    INNER JOIN (SELECT * FROM G where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('') UNION ALL SELECT 
        *
    FROM
        F m
    INNER JOIN (SELECT * FROM G where is_del = 0 AND m_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59') pm ON pm.id_sour = m.pk_id
    WHERE
        pm.status = ''
            AND pm.is_del = 0
            AND pm.s_time BETWEEN '2017-05-25 00:00:00' AND '2017-05-25 23:59:59'
            AND m.type IN ('')

        其實38條數據的結果,在以上子SQL就已經肯定了,所以後面的LEFT JOIN或INNER JOIN,JOIN的數據都會比較少,效率天然高。相對於優化前的寫法,以上子SQL各自還鏈接了一堆相同的表。如今把這堆相同的表提到最外面作一次鏈接。這裏體現的是SQL結合律。

轉載請註明出處

      總結:當SQL規模比較龐大時,良好的SQL結構能大大提高執行的效率。而且SQL的優化也不是一蹴而就,也是一個按部就班不斷嘗試的過程。以上SQL不必定就是最優,此處並無談SQL語法最佳使用細節。具體可參考如下連接。

https://dev.mysql.com/doc/refman/5.7/en/optimization.html

相關文章
相關標籤/搜索