實現一個GPU壓縮紋理的GLTF擴展

原因

很早以前就聽公司的WebGL同事調研過GPU壓縮紋理，我以前也作過一些調研，發現有basis_universal工具能夠實現快速的uastc、etc1s快速transcode到對應平臺所支持的壓縮紋理格式，可是因爲wasm體積和loader等js體積過大而沒有使用。後面發現有更輕量的transcode實現，因此想利用起來。html

探索

Basis-Universal-Transcoders是由KhronosGroup所使用AssemblyScript編寫，相比於basis 220+kb的wasm，十分輕量，可是缺點是所支持的transcode的格式少，只有3種，還有開發不算太活躍。node

後面瞭解到LayaAir的壓縮紋理使用方案則是相對簡單粗暴，ios使用pvrtc, 安卓etc1, 其餘則是png/jpg。加上以前實現過hdr-prefilter-texture, 一樣的思路也可應用到壓縮紋理上面。ios

各類須要runtime處理的都可以預處理，runtime只須要加載預處理後的產物便可git

因此就有這個這個GPU壓縮紋理擴展，把basis transcode產出存儲起來，runtime根據所支持的格式下載對應預處理後的格式。github

前置知識

GLTF結構

既然目標是GLTF擴展，就須要瞭解GLTF格式。web

asset: 描述GLTF格式版本信息
extensionsUsed：告訴parser須要一下擴展，才能解析GLTF
其餘的和關係型數據庫的表有點相似，不過使用下標來進行關聯，好比：chrome

scene: 指向scenes[0]
scenes[i].nodes[j]: 指向nodes[j]
nodes[i].mesh: 指向meshes[i]
meshes[i].primitives[j].material: 指向materials[i]
materials[i].normalTexture: 指向textures[i]
textures[i].source: 指向images[i]
images[i].uri: 指向網絡地址
images[i].bufferView: 指向bufferViews[i]
bufferViews[i].buffer: 指向buffers[i]
buffers[i].uri: 指向網絡地址數據庫

GLTF擴展

簡單瞭解了GLTF的信息關聯方式後，則能夠着手瞭解GLTF擴展如何編寫。須要實現GLTF擴展也能夠理解爲是一個降級擴展，和google所實現的EXT_texture_webp, 至關相似。npm

function GLTFTextureWebPExtension(parser) {
  this.parser = parser;
  this.name = EXTENSIONS.EXT_TEXTURE_WEBP;
  this.isSupported = null;
}

GLTFTextureWebPExtension.prototype.loadTexture = function (textureIndex) {
  var name = this.name;
  var parser = this.parser;
  var json = parser.json;

  var textureDef = json.textures[textureIndex];

  if (!textureDef.extensions || !textureDef.extensions[name]) {
    return null;
  }

  var extension = textureDef.extensions[name];
  var source = json.images[extension.source];

  var loader = parser.textureLoader;
  if (source.uri) {
    var handler = parser.options.manager.getHandler(source.uri);
    if (handler !== null) loader = handler;
  }

  return this.detectSupport().then(function (isSupported) {
    if (isSupported) return parser.loadTextureImage(textureIndex, source, loader);

    if (json.extensionsRequired && json.extensionsRequired.indexOf(name) >= 0) {
      throw new Error('THREE.GLTFLoader: WebP required by asset but unsupported.');
    }

    // Fall back to PNG or JPEG.
    return parser.loadTexture(textureIndex);
  });
};

GLTFTextureWebPExtension.prototype.detectSupport = function () {
  if (!this.isSupported) {
    this.isSupported = new Promise(function (resolve) {
      var image = new Image();

      image.src = 'data:image/webp;base64,UklGRiIAAABXRUJQVlA4IBYAAAAwAQCdASoBAAEADsD+JaQAA3AAAAAA';
      image.onload = image.onerror = function () {
        resolve(image.height === 1);
      };
    });
  }

  return this.isSupported;
};
複製代碼

能夠看到關鍵只有兩個方法，一個是detectSupport，一個是loadTexture，邏輯均比較容易理解，其中loadTexture是由GLTFLoader觸發。json

能夠發現自定義GLTF擴展仍是比較容易的，只須要在GLTFLoader裏搜索this._invokeOne便可知道所支持的鉤子函數有多少，目前有5個，分別是

loadMesh
loadBufferView
loadMaterial
loadTexture
getMaterialType

實現

先整理實現的大概思路。

GLTF擴展部分

定義擴展的scheme
detectSupport 經過獲取gl讀取擴展支持狀況取得
loadTexture 按照scheme加載對應數據，生成CompressedTexture並返回

工具部分

從GLTF/GLB加載，把裏面包含的texture轉換成basis, 而後decode成astc|bc7|dxt|pvrtc|etc1
按照scheme格式存儲導出gltf。

定義scheme

參考EXT_texture_webp可知，擴展配置存放在extensions.EXT_texture_webp中，也就是隻須要定義這部分格式便可。

{
  "textures": [
    {
      "source": 0,
      "extensions": {
        "EXT_GPU_COMPRESSED_TEXTURE": {
          "astc": 1,
          "bc7": 2,
          "dxt": 3,
          "pvrtc": 4,
          "etc1": 5,
          "width": 2048,
          "height": 2048,
          "hasAlpha": 0,
          "compress": 1
        }
      }
    }
  ],
  "buffers": [
    { "name": "buffer", "byteLength": 207816, "uri": "buffer.bin" },
    { "name": "image3.astc", "byteLength": 48972, "uri": "image3.astc.bin" },
    { "name": "image3.bc7", "byteLength": 50586, "uri": "image3.bc7.bin" },
    { "name": "image3.dxt", "byteLength": 10686, "uri": "image3.dxt.bin" },
    { "name": "image3.pvrtc", "byteLength": 21741, "uri": "image3.pvrtc.bin" },
    { "name": "image3.etc1", "byteLength": 22360, "uri": "image3.etc1.bin" }
  ]
}
複製代碼

格式很簡單，一看就明白，astc|bc7|dxt|pvrtc|etc1字段指向buffers[i]。

生成對應結構的GLTF

這裏一部分能夠參考basis的webgl/texture/index.html，循環生成5種類型的壓縮紋理產物保存到bin文件便可，而後手動編寫GLTF文件便可。

至此，基礎版已經能夠編寫出來了。

export class GLTFGPUCompressedTexture {
  constructor(parser) {
    this.name = 'EXT_GPU_COMPRESSED_TEXTURE';
    this.parser = parser;
  }

  detectSupport(renderer) {
    this.supportInfo = {
      astc: renderer.extensions.has('WEBGL_compressed_texture_astc'),
      bc7: renderer.extensions.has('EXT_texture_compression_bptc'),
      dxt: renderer.extensions.has('WEBGL_compressed_texture_s3tc'),
      etc1: renderer.extensions.has('WEBGL_compressed_texture_etc1'),
      etc2: renderer.extensions.has('WEBGL_compressed_texture_etc'),
      pvrtc:
        renderer.extensions.has('WEBGL_compressed_texture_pvrtc') ||
        renderer.extensions.has('WEBKIT_WEBGL_compressed_texture_pvrtc'),
    };
    return this;
  }

  loadTexture(textureIndex) {
    const { parser, name } = this;
    const json = parser.json;
    const textureDef = json.textures[textureIndex];

    if (!textureDef.extensions || !textureDef.extensions[name]) return null;
    
    const extensionDef = textureDef.extensions[name];
    const { width, height, hasAlpha } = extensionDef;

    for (let name in this.supportInfo) {
      if (this.supportInfo[name] && extensionDef[name] !== undefined) {
        return parser
          .getDependency('buffer', extensionDef[name])
          .then(buffer => {
            // TODO: 支持帶mipmap的壓縮紋理
            // TODO: zstd壓縮

            const mipmaps = [
              {
                data: new Uint8Array(buffer),
                width,
                height,
              },
            ];


            // 目前的buffer是直接能夠傳遞到GPU的buffer
            const texture = new CompressedTexture(
              mipmaps,
              width,
              height,
              typeFormatMap[name][hasAlpha],
              UnsignedByteType,
            );
            texture.minFilter =
              mipmaps.length === 1 ? LinearFilter : LinearMipmapLinearFilter;
            texture.magFilter = LinearFilter;
            texture.generateMipmaps = false;
            texture.needsUpdate = true;

            return texture;
          });
      }
    }

    // Fall back to PNG or JPEG.
    return parser.loadTexture(textureIndex);
  }
}
複製代碼

豐富細節

因爲etc1s產出的basis，體積小，可是質量差，uastc質量高，可是體積大，因此須要使用無損壓縮。
須要支持mipmap, GPU壓縮紋理沒法在GPU快速生成mipmap，須要實現mipmap加載
既然須要壓縮，可能須要使用web worker加速，wasm加速，SIMD加速等
CLI轉換工具支持多進程，批量處理，輸出大小統計信息
編寫性能測試用例，對比 KTX2+uastc 的壓縮紋理方案，記錄數據整理表格
PC端、手機瀏覽器對比，還有ImageBitmapLoader，紋理數量大小，分辨率大小等對比
少圖片使用 UI 線程 decode, 多圖片使用 worker decode
完善資源釋放邏輯，dipose

而後就有了相對完善的解決方案gltf-gpu-compressed-texture

一個用於 GPU 壓縮紋理降級的 GLTF 擴展，以及批量 CLI 轉換工具，適用於THREE的GLTFLoader，DEMO 地址，擴展定義

性能數據

運行環境 Chrome 93, CPU Intel I9 10900 ES 版，核顯 HD630
加載 BC7 格式，use ImageBitmapLoader，THREE r129，localhost，disable cache: true

模型	參數	load	render	總耗時	模型大小	依賴大小
banzi_blue	gltf-tc zstd no-mimap no-worker	36.10ms	1.60ms	37.70ms	506kb	22.3kb
banzi_blue	gltf-tc no-zstd mimap no-worker	25.80ms	1.50ms	27.30ms	2.2mb	22.3kb
banzi_blue	gltf-tc zstd mimap no-worker	37.90ms	1.60ms	39.50ms	648kb	22.3kb
banzi_blue	gltf ktx2 uastc	534.70ms	1.70ms	536.40ms	684kb	249.3kb
banzi_blue	glb	32.80qms	6.00ms	38.80ms	443kb
banzi_blue	gltf	27.70ms	4.90ms	32.60ms	446kb
BoomBox	gltf-tc zstd mipmap worker	153.50ms	23.70ms	177.20ms	6.6mb	22.3kb
BoomBox	gltf-tc zstd mipmap no-worker	241.10ms	9.40ms	250.50ms	6.6mb	22.3kb
BoomBox	glb ktx2 uastc	506.10ms	9.30ms	515.40ms	7.1mb	249.3kb
BoomBox	glb	156.10ms	89.50ms	245.60ms	11.3mb
BoomBox	gltf	120.20ms	58.80ms	179.00ms	11.3mb

因爲 banzi_blue 貼圖小於 4 張，因此在 UI 線程 decode zstd，由於 worker 傳數據也會有很多耗時對比使用的 KTX2Loader 所有 zstd decode 是在 UI 線程，decode in Web Worker PR已提交
依賴大小 22.3kb 是從線上 DEMO 取得，http-server --gzip 不太好使

能夠明顯看到相比於 KTX2+uastc 的壓縮紋理方案，從加載耗時和依賴大小，有大幅優點，模型大小也有很多優點
同時也能夠看到 BoomBox gltf-tc zstd mipmap worker load+render 耗時，與 gltf 耗時相差不大，可是模型大小有大幅優點

MI 8 下的測試數據能夠查看 screenshots 目錄

微信 webview 下 BoomBox 均比 glb/gltf 快，屬於異常，chrome 下表現正常，banzi_blue 則稍慢一些，KTX2 的方案依然很慢

命令行使用

使用以前請確保zstd和basisu已經在 PATH 裏面

> npm i gltf-gpu-compressed-texture -S
# 查看幫助
> gltf-tc -h

  -h --help                                              顯示幫助
  -i --input [dir] [?outdir] [?compress] [?mipmap]       把gltf所使用紋理轉換爲GPU壓縮紋理並支持fallback

Examples:
  gltf-tc -i ./examples/glb ./examples/zstd
  gltf-tc -i ./examples/glb ./examples/no-zstd 0
  gltf-tc -i ./examples/glb ./examples/no-mipmap 1 false
  gltf-tc -i ./examples/glb ./examples/no-zstd-no-mipmap 0 false

# 執行
> gltf-tc -i ./examples/glb ./examples/zstd

done: 6417ms    image3.png      法線:false      sRGB: true
done: 13746ms   image2.png      法線:true       sRGB: false
done: 14245ms   image0.png      法線:false      sRGB: true
done: 14491ms   image1.png      法線:false      sRGB: false
done: 577ms     FINDI_TOUMING01_nomarl1.jpg     法線:true       sRGB: false
done: 568ms     FINDI_TOUMING01_Basecoler.png   法線:false      sRGB: true
done: 1267ms    lanse_banzi-1.jpg       法線:false      sRGB: true
done: 577ms     FINDI_TOUMING01_Basecoler.png   法線:false      sRGB: true
done: 604ms     FINDI_TOUMING01_nomarl1.jpg     法線:true       sRGB: false
done: 1280ms    lvse_banzi-1.jpg        法線:false      sRGB: true

cost: 17.75s
compress: 1, summary:
  bitmap: 11.22MB
  astc  : 7.18MB
  etc1  : 1.85MB
  bc7   : 7.16MB
  dxt   : 3.04MB
  pvrtc : 2.28MB
複製代碼

NPM 包使用

import { GLTFLoader, CompressedTexture, WebGLRenderer } from 'three-platfromzie/examples/jsm/loaders/GLTFLoader';
import GLTFGPUCompressedTexture from 'gltf-gpu-compressed-texture';

const gltfLoader = new GLTFLoader();
const renderer = new WebGLRenderer();
const scene = new Scene();

gltfLoader.register(parser => {
  return new GLTFGPUCompressedTexture(parser, renderer, {
    CompressedTexture: THREE.CompressedTexture,
  });
});

gltfLoader.loadAsync('./examples/zstd/BoomBox.gltf').then((gltf) => {
  scene.add(gltf.scene);
});
複製代碼

折騰發現

壓縮紋理minFilter和magFilter支持有限
zstd比png decode速度快，因此有zpng格式出現
比zstd更好的是az64不過沒開源，也不知道實際性能狀況
ktx2Loader裏使用的竟然zstddec是在UI線程decode, 因此提個PR，實現worker pool decode
利用transferable傳遞buffer不能是通過Offset的TypeArray, 好比Uint8Array(buffer, dataOffset), 須要clone一下Uint8Array.from(new Uint8Array(buffer, dataOffset));
epic有相似basis transcode方案和壓縮格式 oodle, 閉源
zstd還可能可使用到tf模型上面去，不過tf也有本身的數據壓縮
有實如今GPU decode Huffman, Massively Parallel Huffman Decoding on GPUs
最開始提到的Basis-Universal-Transcoders，babylon已經應用起來了, 只是仍是標註實驗性
zstd wasm應該是未使用SIMD版本，而且是上一年構建的，使用最新版本構建wasm，不過未能成功跑起來
IOS 上傳紋理會卡GIF，使用了壓縮紋理則不會

最後

歡迎你們使用gltf-gpu-compressed-texture，歡迎star