某核心交易庫,報警IOwait超過30%,看似很普通的一條告警,實則暗藏玄機;登錄主機查看到有不少RMAN備份腳本在跑;服務器
平時不到一小時的任務,跑了6個多小時了。
該備份任務經過nfs掛載的方式,懷疑是nfs有問題,oracle
果真,進入到mount目錄,ll都沒法正常顯示結果,卡住不動,隨即到備份服務器上面,也就是nfs server查看有沒有異常,發現以前部署的一個監控腳本,在平時load和io都很低。與今天故障時間段不太同樣。ide
返回到備份客戶端服務器,ps -ef |grep nfs發現有不少cat進程
[root@trandb1 log]# ps -ef |grep nfs
root 9700 2 0 2017 ? 00:00:00 [nfsv4.0-svc]
oracle 88889 88888 0 10:05 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9868_1
oracle 90224 90223 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9888_1
oracle 90566 90565 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9872_1
oracle 90571 90570 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9869_1
oracle 90576 90575 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9872_1
oracle 90584 90583 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9868_1
oracle 90588 90587 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9884_1
oracle 90593 90592 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9885_1
oracle 90597 90596 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190619_9865_1
oracle 90606 90605 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9881_1
oracle 90616 90615 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9871_1
oracle 90626 90625 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9887_1
oracle 90631 90630 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9888_1
oracle 90641 90640 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9871_1
oracle 90645 90644 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9880_1
oracle 91999 91998 0 10:06 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190620_9883_1
oracle 92488 92487 0 10:06 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190620_9880_1
oracle 93837 93836 0 10:07 ? 00:00:00 cat ./nfs/arch_TRANDB_20190620_9890_1
oracle 94011 94010 0 10:07 ? 00:00:00 cat ./nfs/full_data_TRANDB_20190620_9886_1
oracle 94238 94237 0 10:07 ? 00:00:01 cat ./nfs/full_data_TRANDB_20190619_9865_1
root 98024 17863 0 10:09 pts/7 00:00:00 grep nfs
root 130976 2 0 2017 ? 00:00:00 [nfsiod]操作系統
經過操做系統kill掉這些pid,可是立馬就會派生出來。後umount掉目錄後,沒有了。暫時沒有找出緣由,記錄一下。server