[Oracle]Oracle數據庫CPU利用率很高解決方案

Oracle數據庫常常會遇到CPU利用率很高的狀況,這種時候大都是數據庫中存在着嚴重性能低下的SQL語句,這種SQL語句大大的消耗了CPU資源,致使整個系統性能低下。固然,引發嚴重性能低下的SQL語句的緣由是多方面的,具體的緣由要具體的來分析,下面經過一個實際的案例來講明如何來診斷和解決CPU利用率高的這類問題。   數據庫:Oracle9.2.0.4   問題描述:現場工程師彙報數據庫很是慢,幾乎全部應用操做均沒法正常進行。   首先登錄主機,執行top發現CPU資源幾乎消耗殆盡,存在不少佔用CPU很高的進程,而內存和I/O都不高,具體以下:   last pid: 26136; load averages: 8.89, 8.91, 8.12   216 processes: 204 sleeping, 8 running, 4 on cpu   CPU states: 0.6% idle, 97.3% user, 1.8% kernel, 0.2% iowait, 0.0% swap   Memory: 8192M real, 1166M free, 14M swap in use, 8179M swap free   PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND   25725 oracle 1 50 0 4550M 4508M cpu2 12:23 11.23% oracle   25774 oracle 1 41 0 4550M 4508M run 14:25 10.66% oracle   26016 oracle 1 31 0 4550M 4508M run 5:41 10.37% oracle   26010 oracle 1 41 0 4550M 4508M run 4:40 9.81% oracle   26014 oracle 1 51 0 4550M 4506M cpu6 4:19 9.76% oracle   25873 oracle 1 41 0 4550M 4508M run 12:10 9.45% oracle   25723 oracle 1 50 0 4550M 4508M run 15:09 9.40% oracle   26121 oracle 1 41 0 4550M 4506M cpu0 1:13 9.28% oracle   因而先查看數據庫的告警日誌ALERT文件,並無發現有什麼錯誤存在,日誌顯示數據庫運行正常,排除數據庫自己存在問題。   而後查看這些佔用CPU資源很高的Oracle進程到底是在作什麼操做,使用以下SQL語句:   select sql_text,spid,v$session.program,process from   v$sqlarea,v$session,v$process   where v$sqlarea.address=v$session.sql_address   and v$sqlarea.hashvalue=v$session.sql_hash_value   and v$session.paddr=v$process.addr   and v$process.spid in (PID);   用top中佔用CPU很高的進程的PID替換腳本中的PID,獲得相應的Oracle進程所執行的SQL語句,發現佔用CPU資源很高的進程都是執行同一個SQL語句:   SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999   基本上能夠確定是這個SQL引發了系統CPU資源大量被佔用,那到底是什麼緣由形成這個SQL這麼大量佔用CPU資源呢,咱們先來看看數據庫的進程等待事件都有些什麼:   SQL> select sid,event,p1,p1text from v$session_wait;    SID EVENT P1 P1TEXT   ---------- ----------------------------------------------------------------    12 latch free 4.3982E+12 address    36 latch free 4.3982E+12 address    37 latch free 4.3982E+12 address    84 latch free 4.3982E+12 address    102 latch free 4.3982E+12 address    101 latch free 4.3982E+12 address    85 latch free 4.3982E+12 address    106 latch free 4.3982E+12 address    155 latch free 4.3982E+12 address    151 latch free 4.3982E+12 address    149 latch free 4.3982E+12 address    147 latch free 4.3982E+12 address    1 pmon timer 300 duration   從上面的查詢咱們能夠看出,大都是latch free的等待事件,而後接着查一下這些latch的等待都是什麼進程產生的:   SQL> select spid from v$process where addr in   (select paddr from v$session where sid in(84,102,101,106,155,151));   SPID   ------------   25774   26010   25873   25725   由此看出latch free這個等待事件致使了上面的那個SQL語句都在等待,佔用了大量的CPU資源。咱們來看看究竟主要是那種類型的latch的等待,根據下面的SQL語句:   SQL> SELECT latch#, name, gets, misses, sleeps    FROM v$latch    WHERE sleeps>0    ORDER BY sleeps;   LATCH# NAME GETS MISSES SLEEPS   ---------- ----------------------------------------------------------------    15 messages 96876 20 1    159 library cache pin allocation 407322 43 1    132 dml lock allocation 194533 213 2    4 session allocation 304897 48 3    115 redo allocation 238031 286 4    17 enqueue hash chains 277510 85 5    7 session idle bit 2727264 314 16    158 library cache pin 3881788 5586 58    156 shared pool 2771629 6184 662    157 library cache 5637573 25246 801    98 cache buffers chains 1722750424 758400 109837   由上面的查詢能夠看出最主要的latch等待是cache buffers chains,這個latch的等待代表數據庫存在單獨的BLOCK的競爭這些latch,咱們來看這個latch存在的子latch及其對應的類型:   SQL> SELECT addr, latch#, gets, misses, sleeps    FROM v$latch_children    WHERE sleeps>0    and latch# = 98    ORDER BY sleeps desc;   ADDR LATCH# GETS MISSES SLEEPS   ---------------- ---------- ---------- ---------- ----------   000004000A3DFD10 98 10840661 82891 389   000004000A698C70 98 159510 2 244   0000040009B21738 98 104269771 34926 209   0000040009B227A8 98 107604659 35697 185   000004000A3E0D70 98 5447601 18922 156   000004000A6C2BD0 98 853375 7 134   0000040009B24888 98 85538409 25752 106   ……………   接着咱們來查看sleep較多的子latch對應都有哪些對象:   SQL> select distinct a.owner,a.segment_name,a.segment_type from    dba_extents a,   (select dbarfil,dbablk   from x$bh   where hladdr in    (select addr    from (select addr    from v$latch_children    order by sleeps desc)    where rownum < 5)) b   where a.RELATIVE_FNO = b.dbarfil   and a.BLOCK_ID <= b.dbablk and a.block_id + a.blocks > b.dbablk;   OWNER SEGMENT_NAME SEGMENT_TYPE   ---------------------------------------------------------------------------   TEST I_SERVICE_SERVICESPECID INDEX   TEST I_SERVICE_SUBSIDIARYID INDEX   TEST SERVICE TABLE   TEST MSWITCHDOMAIN TABLE   TEST I_SERVICE_SC_S INDEX   …………………   咱們看到在開始的那個SQL語句中的幾個對象都有包括在內,因而來看看開始的那個SQL的執行計劃:   SQL> set autotrace trace explain   SQL>SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999;   Execution Plan   ----------------------------------------------------------    0 SELECT STATEMENT Optimizer=CHOOSE    1 0 NESTED LOOPS    2 1 NESTED LOOPS    3 2 NESTED LOOPS    4 3 TABLE ACCESS (FULL) OF 'SUBBUREAUNUMBER'    5 3 TABLE ACCESS (BY INDEX ROWID) OF 'GATEWAYLOC'    6 5 INDEX (UNIQUE SCAN) OF 'PK_GATEWAYLOC' (UNIQUE)    7 2 TABLE ACCESS (BY INDEX ROWID) OF 'MSWITCHDOMAIN'    8 7 INDEX (UNIQUE SCAN) OF 'PK_MSWITCHDOMAIN' (UNIQUE)    9 1 TABLE ACCESS (BY INDEX ROWID) OF 'SERVICE'    10 9 AND-EQUAL    11 10 INDEX (RANGE SCAN) OF 'I_SERVICE_SERVICESPECID' (NON    -UNIQUE)    12 10 INDEX (RANGE SCAN) OF 'I_SERVICE_SUBSIDIARYID' (NON-    UNIQUE)   根據開始查到的引發latch free等待中的對象和SQL語句的執行計劃,以爲SERVICE表上的索引有問題,彷佛存在了過多的掃描,因而將一樣的SQL語句在別的地市的一樣的數據庫上執行一下,查看相應的執行計劃:   SQL> set autotrace trace explain   SQL>SELECT d.domainname,d.mswitchdomainid, a.SERVICEID,a.SERVICECODE,a.USERTYPE,a.STATUS,a.NOTIFYSTATUS,to_char(a.DATECREATED,'yyyy-mm-dd hh24:mi:ss') DATECREATED,VIPFLAG,STATUS2,CUSTOMERTYPE,CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn AND a.ServiceCode like c.code||'%' and a.serviceSpecID=1 and a.status!='4' and a.status!='10' and a.servicecode like '010987654321%' and SubsidiaryID=999999999;   Execution Plan   ----------------------------------------------------------    0 SELECT STATEMENT Optimizer=CHOOSE    1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SERVICE'    2 1 NESTED LOOPS    3 2 NESTED LOOPS    4 3 NESTED LOOPS    5 4 TABLE ACCESS (FULL) OF 'SUBBUREAUNUMBER'    6 4 TABLE ACCESS (BY INDEX ROWID) OF 'GATEWAYLOC'    7 6 INDEX (UNIQUE SCAN) OF 'PK_GATEWAYLOC' (UNIQUE)    8 3 TABLE ACCESS (BY INDEX ROWID) OF 'MSWITCHDOMAIN'    9 8 INDEX (UNIQUE SCAN) OF 'PK_MSWITCHDOMAIN' (UNIQUE)    10 2 INDEX (RANGE SCAN) OF 'I_SERVICE_SC_S' (NON-UNIQUE)   對比兩個執行計劃,發現索引I_SERVICE_SERVICESPECID和I_SERVICE_SUBSIDIARYID是不該該走的,因而又對比了兩個地方SERVICE表上的索引個數:   SQL> select index_name from user_indexes where table_name='SERVICE';   INDEX_NAME   ------------------------------   I_SERVICE_ACCOUNTNUM   I_SERVICE_CID   I_SERVICE_DATEACTIVATED   I_SERVICE_PRICEPLANID   I_SERVICE_SC_S   I_SERVICE_SERVICECODE   I_SERVICE_SERVICESPECID   I_SERVICE_SUBSIDIARYID   PK_SERVICE_SID   SQL> select index_name from user_indexes where table_name='SERVICE';   INDEX_NAME   ------------------------------   I_SERVICE_ACCOUNTNUM   I_SERVICE_CID   I_SERVICE_DATEACTIVATED   I_SERVICE_SC_S   I_SERVICE_SERVICECODE   PK_SERVICE_SID   發現存在問題的數據庫中的SERVICE表上不知道怎麼多出了I_SERVICE_PRICEPLANID、I_SERVICE_SERVICESPECID 、I_SERVICE_SUBSIDIARYID三個索引,而這些索引就是致使了開始那個SQL語句用了不應用的索引,引發latch free等待和CPU佔用很高的罪魁禍首,因而刪除了那三個索引,從新執行相應的SQL語句,很快就得出告終果,CPU的利用率也立刻降低爲正常了,觀察結果以下:   last pid: 26387; load averages: 1.61, 1.38, 1.21   195 processes: 194 sleeping, 1 on cpu   CPU states: 96.2% idle, 1.6% user, 1.7% kernel, 0.5% iowait, 0.0% swap   Memory: 8192M real, 1183M free, 14M swap in use, 8179M swap free   PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND   26383 oracle 1 59 0 4550M 4506M sleep 0:12 4.52% oracle    409 root 15 59 0 7168K 7008K sleep 173.1H 0.53% picld   25653 oracle 1 59 0 4550M 4508M sleep 2:12 0.48% oracle   26384 root 1 59 0 2800K 1912K cpu2 0:00 0.21% top-3.5b8-sun4u   25569 oracle 1 59 0 4550M 4508M sleep 0:12 0.09% oracle   25717 oracle 1 59 0 4550M 4507M sleep 0:07 0.05% oracle   25571 oracle 1 59 0 4550M 4507M sleep 0:10 0.04% oracle   25681 oracle 1 59 0 4550M 4508M sleep 0:10 0.04% oracle   25544 oracle 1 58 0 4554M 4501M sleep 0:14 0.03% oracle   25703 oracle 1 59 0 4550M 4506M sleep 0:23 0.03% oracle   ………………   對於CPU利用率太高的狀況,若是是SQL語句性能比較低下引發的基本上均可以按照這個思路來診斷和解決問題,固然具體問題還得具體分析,解決問題的方法也有不少種,這裏不過是拋磚引玉一下,只要能最終達到咱們解決問題的目的就能夠了。  
相關文章
相關標籤/搜索