ASYNC_IO_COMPLETION

項目組有一個數據庫備份的Job運行異常,該Job將備份數據存儲到remote server上,平時5個小時就能完成的備份操做,如今運行19個小時尚未完成,backup命令的Wait type是 ASYNC_IO_COMPLETION:html

根據MSDN 官方文檔的定義:Occurs when a task is waiting for asynchronous I/O operations to finish.sql

該等待類型表示:Task在等待異步IO操做的完成。進程申請了一個IO操做,可是系統(Disk/Network)不能當即完成該IO請求,所以該進程進入等待狀態,等待Async IO完成。數據庫

查看SQL Server的Error Log,發現大量IO請求超時的記錄,從File 的路徑 [D:\xxx\yyy.ndf] 上看到,IO 請求是發送給Local Server,說明SQL Server發出的IO請求,Local Server的Disk不能及時完成:網絡

SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [D:\xxx\yyy.ndf] in database [database_name] (8).  The OS file handle is 0x00000000000010E4.  The offset of the latest long I/O is: 0x00000482f60000session

出現長時間的ASYNC_IO_COMPLETION 等待的緣由多是:app

  1. 同時有其餘Application在進行大量的IO操做,致使用來存儲備份數據的Disk特別忙,備份操做必須等待Disk IO完成。
  2. 若是備份數據存儲到remote server上,那麼多是network被其餘application佔用,致使用於傳輸備份數據的帶寬變小。

我看到的這個異常的備份操做,是將備份數據存儲到遠程Server上,打開ResouceMonitor(點擊Windows+R,輸入resmon),發現Disk Activity是2MB-4MB之間,而Network Activity在18MB-46MB之間,如圖:異步

這說明,Remote Server的network相對比較忙,網絡傳輸極可能是致使備份操做長時間運行的root cause。可是,從SQL Server的Error Log中得知,是本地Disk的不能及時響應IO請求。async

因爲備份操做已經結束,沒有辦法查看Local Server的Disk IO和 Network Activity,而此時,Remote Server上的Disk Activity 和 Network Activity 十分平靜,都在幾百KB水平上,間接說明Remote Server不是backup操做等待異步IO操做完成的root cause,Local Server的Disk IO是瓶頸。post

分析Disk IO的性能,最好的方法是分析Performance Counter。打開Local Server的Performance Monitor(點擊Windows+R,輸入perfmon),沒有記錄Data Collection,這樣沒有辦法進行深刻的分析了,下次遇到時,繼續分析。性能

總之:在數據庫備份或還原的過程當中,出現長時間的ASYNC_IO_COMPLETION 等待,表示backup/restore操做正在等待獲取IO資源,致使backup/restore 進程處於掛起狀態(suspended),只有獲取到IO資源,該操做纔會繼續進行下去。

在執行長時間的IO密集的操做時,SQL Server進程常常會發生ASYNC_IO_COMPLETION等待,同時會在Error Log中記錄 「I/O requests are taking longer than 15 seconds to complete」 消息,IO密集的操做主要有:backup/restore操做,新建一個大的數據庫文件,和數據庫文件增加。長時間執行IO密集操做,原本就須要大量的IO才能完成,可是,長時間處於ASYNC_IO_COMPLETION 等待,在很大程度上,代表IO是系統的瓶頸,Disk的讀寫能力或Network的傳輸能力是系統的短板。

 

Appendix:使用如下腳本查看備份操做的等待類型,已經完成的百分比,對於預估的剩餘時間,只是個預估值,不可信。

select  r.session_id,
        r.blocking_session_id as blocking,
        r.wait_type as Current_Wait_Type,
        r.wait_time/1000/60/60 as wait_h,
        r.status,
        r.command,
        r.total_elapsed_time/1000/60/60 as total_h,
        r.percent_complete,
        r.estimated_completion_time/1000/60/60 as estimated_h
from sys.dm_exec_requests r
outer APPLY sys.dm_exec_sql_text(r.sql_handle) as st
where (r.wait_type<>'MISCELLANEOUS' or r.wait_type is null)
    and r.session_id>50
    and r.session_id<>@@spid

 

推薦閱讀《ASYNC_IO_COMPLETION Wait type》:

Normally this wait type can be seen in backup and restore transactions, and whenever you will see this wait type your backup/restore process will be in suspended state most of the time because the process is waiting to get IO resource to proceed its operation and it will wait till certain time period then moved in suspended state. In that case your process will take more than its usual time to complete or most of the time it will hung or will showing in executing state for unknown time duration.
This wait type occurs when a task is waiting for asynchronous I/Os to finish. This wait type is normally seen with few other wait types like BACKUPBUFFER,BUCKIO etc. This is clear indication of DISK IO issue.You can also get the Average disk queue length or current disk queue length value at the same time when you are getting this wait type.Compare both counters and if these counters have high value then you should look into your storage subsystem. Identify disk bottlenecks, by using Perfmon Counters, Profiler, sys.dm_io_virtual_file_stats and SHOWPLAN.
Any of the following reduces these waits:
  1. Adding additional IO bandwidth.
  2. Balancing IO across other drives.
  3. Reducing IO with appropriate indexing.
  4. Check for bad query plans.
  5. Check for memory pressure
We can also corelate this wait type between Memory pressure and Disk IO subsystem issues.

參考文檔:

ASYNC_IO_COMPLETION Wait type

ASYNC_IO_COMPLETION

本站公眾號
   歡迎關注本站公眾號,獲取更多信息