慎用 maxrregcount

時間 2019-12-12

標籤慎用 maxrregcount 简体版

原文原文鏈接

須要編譯一個 *.cubin 文件。測試

在編譯時使用--ptxas-option=v參數，顯示register使用的個數是36。因而，在編譯時使用maxrregcount=32。從而，register的使用個數是32，使用了「8 bytes stack frame, 12 bytes spill stores, 28 bytes spill loads」this

nvcc -cubin -m64 -arch sm_35 *.cu --use_fast_math --maxrregcount=32 --ptxas-options=-v -O3 -o *.cubin

可是，通過屢次測試，發現浮點計算結果不同（int計算結果沒有測試）。blog

所以，發現了這個bug：maxrregcount可能致使最終結果不一樣。ci

搜了一下，其餘人也遇到了這個問題。有一個解釋以下：it

「Operation order may change with register optimization. Since fp arithmetic is not associative due to finite precision, this may affect the result.」io

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。