關於Mali GPU的浮點數異常

一個華爲手機上的Bug

今天查一個 輝光抖動 的問題:咱們一個PBR的摩托車,在開輝光後高光處閃爍的厲害,而且這個閃爍只出如今華爲手機上(Mali GPU)。git

用RenderDoc分析了一下,閃爍處的高光值已經逆天了,以下圖:github

由上圖可見,紅框標記的顏色值達到了 65504,因爲咱們開啓了 FP16 HDR,這裏的 65504 恰好是 FP16 能表示的最大值。算法

0 11110 1111111111=(-1)^0 * 2^15 * (1+1-2^-10)=65504bash

直覺上這裏是 浮點數精度 的問題,由於以前沒少吃 Mali GPU 的虧,:)app

修正

要堵這個問題很簡單,只須要對最終的高光值用 clamp大法 便可。函數

不過做爲一個強迫症患者,我仍是想找到具體是哪裏出了問題,因而作了一番調試,最後發現問題代碼以下:ui

half perceptualRoughness = SmoothnessToPerceptualRoughness(smoothness);
half roughness = PerceptualRoughnessToRoughness(perceptualRoughness);

half V = SmithJointGGXVisibilityTerm(NoL, NoV, roughness); 
half D = GGXTerm(NoH, roughness);
half specularTerm = V * D * UNITY_PI;
複製代碼

這裏 PBR 的高光項計算直接摘了Unity的 BRDF1 算法,去掉了 菲涅爾項,上述代碼中 roughness精度 影響了最終高光的計算結果。spa

咱們看一下法線分佈函數 GGXTerm 的代碼:調試

inline float GGXTerm (float NdotH, float roughness)
{
    float a2 = roughness * roughness;
    float d = (NdotH * a2 - NdotH) * NdotH + 1.0f; // 2 mad
    return UNITY_INV_PI * a2 / (d * d + 1e-7f); 
    // This function is not intended to be running on Mobile,
    // therefore epsilon is smaller than what can be represented by half
}
複製代碼

參數都是 float,而且函數結尾有一個清楚的註釋,說這個函數沒打算在移動設備上跑,由於這裏 1e-7f 並沒考慮兼容 half 的精度:code

This function is not intended to be running on Mobile, therefore epsilon is smaller than what can be represented by half

半精度浮點數能表示的最小值爲 6.10×10^(-5)

0 00001 0000000000=2^-14 = 6.10*10^-5

因此把 roughness 的精度從 half 變成 float,這個問題也就修正了。

URP管線對BRDF的簡化

在移動設備直接用 Standard管線BRDF1 算法,計算量會略高。

這裏咱們也能夠參考 BRDF2 的寫法,或者參考 URP管線 對於 DirectBDRF 的簡化方式,代碼以下:

// Based on Minimalist CookTorrance BRDF
// Implementation is slightly different from original derivation: http://www.thetenthplanet.de/archives/255
//
// * NDF [Modified] GGX  
// * Modified Kelemen and Szirmay-Kalos for Visibility term
// * Fresnel approximated with 1/LdotH 
half3 DirectBDRF(BRDFData brdfData, half3 normalWS, half3 lightDirectionWS, half3 viewDirectionWS)
{
#ifndef _SPECULARHIGHLIGHTS_OFF
    float3 halfDir = SafeNormalize(float3(lightDirectionWS) + float3(viewDirectionWS)); 

    float NoH = saturate(dot(normalWS, halfDir));
    half LoH = saturate(dot(lightDirectionWS, halfDir));

    // GGX Distribution multiplied by combined approximation of Visibility and Fresnel 
    // BRDFspec = (D * V * F) / 4.0
    // D = roughness^2 / ( NoH^2 * (roughness^2 - 1) + 1 )^2 
    // V * F = 1.0 / ( LoH^2 * (roughness + 0.5) )
    // See "Optimizing PBR for Mobile" from Siggraph 2015 moving mobile graphics course
    // https://community.arm.com/events/1155

    // Final BRDFspec = roughness^2 / ( NoH^2 * (roughness^2 - 1) + 1 )^2 * (LoH^2 * (roughness + 0.5) * 4.0)
    // We further optimize a few light invariant terms 
    // brdfData.normalizationTerm = (roughness + 0.5) * 4.0 rewritten as roughness * 4.0 + 2.0 to a fit a MAD. 
    float d = NoH * NoH * brdfData.roughness2MinusOne + 1.00001f;

    half LoH2 = LoH * LoH;    
    half specularTerm = brdfData.roughness2 / ((d * d) * max(0.1h, LoH2) * brdfData.normalizationTerm);

    // On platforms where half actually means something, the denominator has a risk of overflow
    // clamp below was added specifically to "fix" that, but dx compiler (we convert bytecode to metal/gles)
    // sees that specularTerm have only non-negative terms, so it skips max(0,..) in clamp (leaving only min(100,...))
#if defined (SHADER_API_MOBILE) || defined (SHADER_API_SWITCH)
    specularTerm = specularTerm - HALF_MIN;
    specularTerm = clamp(specularTerm, 0.0, 100.0); // Prevent FP16 overflow on mobiles
#endif

    half3 color = specularTerm * brdfData.specular + brdfData.diffuse;
    return color;
#else
    return brdfData.diffuse;  
#endif
}
複製代碼

代碼註釋寫得很清楚,簡化方式參考了 SIGGRAPH 2015Optimizing PBR for Mobile

經典的微表面高光 BRDF 公式以下:

按照 Optimizing PBR for Mobile 的方式,能夠對 V * F 合併和近似:

BRDFspec = (D * V * F) / 4.0

D = roughness^2 / ( NoH^2 * (roughness^2 - 1) + 1 )^2

V * F = 1.0 / ( LoH^2 * (roughness + 0.5) )

最終結果以下:

最後,上面的代碼也兼顧了 half 的精度:

#define HALF_MIN 6.103515625e-5 // 2^-14, the same value for 10, 11 and 16-bit: https://www.khronos.org/opengl/wiki/Small_Float_Formats

// On platforms where half actually means something, the denominator has a risk of overflow
// clamp below was added specifically to "fix" that, but dx compiler (we convert bytecode to metal/gles)
// sees that specularTerm have only non-negative terms, so it skips max(0,..) in clamp (leaving only min(100,...))

#if defined (SHADER_API_MOBILE) || defined (SHADER_API_SWITCH)
    specularTerm = specularTerm - HALF_MIN;
    specularTerm = clamp(specularTerm, 0.0, 100.0); // Prevent FP16 overflow on mobiles
#endif
複製代碼

我的主頁

本文的我的主頁連接:baddogzz.github.io/2020/04/27/…

好了,拜拜!

相關文章
相關標籤/搜索