Tile based Rendering //後面一段是手機優化建議

時間 2020-05-08

標籤 tile based rendering 後面一段手機優化建議简体版

原文原文鏈接

https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/android

一種硬件結構ios

color target 分紅tileide

減少帶寬工具

提早（fs）用depth作隱藏面消除 earlyz一個意思性能

減少cache missing 一行短了。。開發工具

因此early失效的都不能夠 fs 改depth那些操做fetch

好比fs裏面discard (mask or alpha test) alpha to coverage優化

會不走onchip depth而訪存拿depththis

要clear 否則就少一次往tilebuffer上存上幀內容的操做.net

========================================

http://aras-p.info/texts/files/FastMobileShaders_siggraph2011.pdf

這段優化策略是2011年的不少東西變了好比ETC2 好比

tiled deferred PowerVR

Tiled Mali, Andreno

Immediate Tegra

1) TBDR: Render everything in tiles, shade only visible pixels

2) Tiled: Render everything in tiles

3) Classic: Render everything

由於分tile sample的時候cache missing會比一張大的frame buffer降低

這樣mipmap就沒那麼那麼(對性能的影響) 要緊了但也是好的（對錶現的影響走樣）

貼圖資源分平臺壓縮

PVRTC for PowerVR; DXT for Tegra; ATC for Adreno

ETC2 for Android ogles3.0

TBDR:ipad2

msaa cheaper than immediate

2-4ms 4xmsaa

aniso 3ms

aniso=2

關了mipmap ipad2 2-3ms降低

tegra 跪了

TBDR不存在每一個draw call的gpu時間了，這樣拿不到GPU時間不利於作優化

Andero和Tegra還有

一幀的VB太大會被切致使效率降低（一次處理不了分兩次） 1000 thound vertex ipd2

=====================================

減少overdraw of alpha blend

PerfHUD profiler ES

============================

優化示例 tegra

天空盒後畫

opaque 從前日後（不太現實須要polygon粒度的排序排序）

近的大的物體按這個方式排序遠的按照material分合併批次減小renderstate切換

（太有才了，我以前只考慮到這兩點是矛盾的沒有想到能夠分遠近使用這兩種策略）

主角先畫敵人在場景以後畫（被遮擋）

由於reject occlude geo在tegra2上cost1ms*（vs）咱們能夠設置trigger zone 這裏關掉skybox 這樣vs也去了

排序opaque帶來 15ms提高

------

shader優化

shader指標 cycles/pixel 有靜態分析工具見別的帖子

light in lookup texture--LUT

by tex2d（N.L,N.H） (我以前用過一張beckmann的)

----------

texture 壓縮硬件支持的格式直接sample了

工具

IOS+PowerVR

unity profiler

Apple Instruments

PowerVR 他家有一套工具見官網PVRUniSCo shader analyzer 能夠看cycle

Android +Tegra

nv PerfHUD ES

每一個drawcall的gpu時間

shader cycle

2x2 texture， null view rectangle 這兩個排除很好用

雖然做者很喜歡這個，感受這個東西須要開發工具箱那種實體設備不太方便的樣子

Mali

Andreno 都有他們家本身的工具

抓幀

shader 分析，live editing（這個功能我很喜歡）

我用Snapdragon比較多

最近renderdoc 也出了android版本還算好用

============================================

shader優化浮點數精度

float/half/fixed 對應highp/mediump/lowp

不要相信直覺

lowp 8bit -2.0--+2.0

存顏色歸一化的vector 不要縮放拆解 lowp

mediump 16bit uv， 2d vector 不須要高精度的量

highp 24-32bit 看平臺

世界座標，標量，大貼圖UV 對精度要求比較高的offset之類

這個精度的事情分平臺有的顯卡對精度比較敏感總之看操做手冊

===============================

Likewise, do not pack 2 UVs into one float4/vec4 varying for PowerVR

float4 uv -----uv.xy ux.zw, povwerVR裏面不要這樣用

變量和插值

變量開銷分平臺看手冊

andreno對shader comple沒那麼敏感

==============

下面一個例子是ios優化

glFInish wait 這個能夠看gpu時間 profiler 看cpu wait了多久

後處理 bloom和熱扭曲花了10ms+

浮點數精度合併熱扭曲和bloom 減小一次blit

優化了10ms （這個我也會我減了兩次blit在ppv2 也是10ms+）

它有個處處都用的fire wall shader

判斷ALU bound 仍是Texture bound

ALU bound

浮點數精度逐頂點計算 lookup light tex

用工具分析shader PVRUniSCo

減少頂點數量致使scene split了 3ms（Apple’s Instruments show this）

粒子優化減少overdraw 簡化shader

省出來的budget給了msaa和aniso

======================

tbdr

• Hidden Surface Removal

– For opaque only

– Don’t keep alpha-test enabled all the time（少用，用的時候纔開）

– Don’t keep 「discard」 keyword in shader source, even if it’s not used（沒用的discard去掉）

• Group opaque drawcalls together

• Sort on state, not distance

============================

梟龍優化建議

Qualcomm Snapdragon Rendering Tips

• Traditional handling of overdraw (via depth test)

– Cull as much as you can on CPU, to avoid both CPU and GPU cost

– Sort on distance (front to back) to maximize early z-rejection

• The Adreno SIMD is wide

– Check your ALU utilization in the Adreno Profiler and optimize

– Minimize temp register usage

– Use long shaders with a lot of ALU instructions

– Avoid dependent texture fetches (or cover the latency with a lot of ALUs)

==================

FBO和tile 切換很費須要frame buffer存到 memory

Expensive to switch Frame Buffer Object on Tile-based GPUs

– Saves the current FBO to RAM

– Reloads the new FBO from RAM

帶寬高

Framebuffer Resolve/Restore

• Clear ALL FBO attachments after new frame/rendertarget

– Clear after eglSwapBuffers / glBindFramebuffer

– Avoids reloading FBO from RAM

– NOTE: Do NOT do unnecessary clears on non-tile-based GPUs (e.g. NVIDIA)

• Discard unused attachments before new frame/rendertarget

– Discard before eglSwapBuffers / glBindFramebuffer

– Avoids saving unused FBO attachments to RAM

– glDiscardFramebufferEXT / glInvalidateFramebuffer

這些都是爲了防止從memory讀寫framebuffer

=============================================================

https://de45xmedrsdbp.cloudfront.net/Resources/files/GDC2014_Next_Generation_Mobile_Rendering-2033767592.pdf

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。