把一個打開的文件描述符,經過mmap映射到一片內存區間,對這塊區間進行讀寫,長時間運行後出現訪存錯誤 SIGBus Error, GDB分析相應的core出現一些內存空間不可用的錯誤。app
參考man mmap , 在出現下列狀況下,會出錯:ide
ERRORS EBADF fd is not a valid file descriptor (and MAP_ANONYMOUS was not set). EACCES A file descriptor refers to a non-regular file. Or MAP_PRIVATE was requested, but fd is not open for reading. Or MAP_SHARED was requested and PROT_WRITE is set, but fd is not open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the file is append-only. EINVAL We don't like start or length or offset. (E.g., they are too large, or not aligned on a PAGESIZE boundary.) ETXTBSY MAP_DENYWRITE was set but the object specified by fd is open for writing. EAGAIN The file has been locked, or too much memory has been locked. ENOMEM No memory is available, or the process's maximum number of mappings would have been exceeded. ENODEV The underlying filesystem of the specified file does not support memory mapping. Use of a mapped region can result in these signals: SIGSEGV Attempted write into a region specified to mmap as read-only. SIGBUS Attempted access to a portion of the buffer that does not correspond to the file (for example, beyond the end of the file, including the case where another process has truncated the file).
根據上面的說明,能夠看到出現SIGBUS錯誤的時候,要麼訪問的buffer 不在文件範圍以內,或者所映射的文件已經被truncate了。但筆者碰到的錯誤並非調用mmap碰到的,而是在訪問buffer 過程當中碰到的。 那該怎麼分析呢?
首先理清了下筆者所在的系統的上下文環境,弄清楚了涉及到mmap 的文件及其內存區間的使用方式。接着根據異常的core ,用GDB去看訪問那個文件的多個線程的堆棧,竟然發現:一個線程在訪問mmap的buffer,另一個線程竟然還在從新打開那個文件!對着異常日誌檢查,確實是有個線程從新打開了一個已經mmap的文件。測試
立刻加了下防護的代碼,從新跑起來了測試,這個問題完全消失了。線程
發現mmap 異常的問題,須要充分結合 core的多個線程堆棧進行分析排查,才能解決問題。日誌