WebGPU學習（十）：介紹「GPU實現粒子效果」

時間 2019-12-29

標籤 webgpu 學習介紹 gpu 實現粒子效果简体版

原文原文鏈接

你們好，本文介紹了「GPU實現粒子效果」的基本思想，並推薦了相應的學習資料。html

本文學習webgpu-samplers->computeBoids示例，它展現瞭如何用compute shader實現粒子效果，模擬鳥羣的行爲。git

上一篇博文：
WebGPU學習（九）：學習「fractalCube」示例github

下一篇博文：
WebGPU學習（十一）：學習兩個優化：「reuse render command buffer」和「dynamic uniform buffer offset」web

最終渲染結果：
ide

爲何不在CPU端實現粒子效果？

雖然在CPU端實現會更靈活和可控，但若是粒子數量很大（如上百萬），且與場景有交互，則最好在GPU端實現。學習

示例的實現思想

首先執行compute pass

代碼以下：優化

return function frame() {
    ...

    const commandEncoder = device.createCommandEncoder({});
    {
      const passEncoder = commandEncoder.beginComputePass();
      passEncoder.setPipeline(computePipeline);
      passEncoder.setBindGroup(0, particleBindGroups[t % 2]);
      passEncoder.dispatch(numParticles);
      passEncoder.endPass();
    }
    ...
  }

咱們對這個pass進行分析：google

bind group包含兩個storage buffer：ParticlesA和ParticlesB

ParticlesA存儲了上一幀全部粒子的數據。compute shader讀取它，並計算出下一幀全部粒子的數據，寫到ParticlesB中，打了一個ping-pong操做；spa

注：storage buffer在shader中可被讀和寫，而uniform buffer、vertex buffer等在shader中只能被讀.net

dispatch到1500個instance，每一個instance執行一次compute shader

compute shader計算每一個粒子的數據時，須要遍歷其它的全部粒子，從而實現相互的交互做用，模擬鳥羣行爲。
一共有1500個粒子，共須要計算1500*1500次。若是在CPU端執行，太花時間；而在GPU端執行，則每一個instance只須要計算1500次，大大提升了效率。

而後執行render pass

代碼以下：

const renderPipeline = device.createRenderPipeline({
    ...
    vertexState: {
      vertexBuffers: [{
        // instanced particles buffer
        arrayStride: 4 * 4,
        stepMode: "instance",
        attributes: [{
          // instance position
          shaderLocation: 0,
          offset: 0,
          format: "float2"
        }, {
          // instance velocity
          shaderLocation: 1,
          offset: 2 * 4,
          format: "float2"
        }],
      }, {
        // vertex buffer
        arrayStride: 2 * 4,
        stepMode: "vertex",
        attributes: [{
          // vertex positions
          shaderLocation: 2,
          offset: 0,
          format: "float2"
        }],
      }],
    },
    ...
  });
  
  ...

  const vertexBufferData = new Float32Array([-0.01, -0.02, 0.01, -0.02, 0.00, 0.02]);
  const verticesBuffer = device.createBuffer({
    size: vertexBufferData.byteLength,
    usage: GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
  });
  verticesBuffer.setSubData(0, vertexBufferData);
  
  ...

  return function frame() {
    ...

    const commandEncoder = device.createCommandEncoder({});
    ...
    {
      const passEncoder = commandEncoder.beginRenderPass(renderPassDescriptor);
      passEncoder.setPipeline(renderPipeline);
      passEncoder.setVertexBuffer(0, particleBuffers[(t + 1) % 2]);
      passEncoder.setVertexBuffer(1, verticesBuffer);
      passEncoder.draw(3, numParticles, 0, 0);
      passEncoder.endPass();
    }
    ...
  }

有兩個vertex buffer：
ParticlesB使用「instance」的stepMode，被設置到第一個vertex buffer中；
vertices buffer（包含3個頂點數據，每一個頂點數據包含x座標和y座標）使用「vertex」的stepMode，被設置到第二個vertex buffer中。

draw一次，繪製1500個實例（使用ParticlesB的數據），3個頂點（使用vertices buffer的數據）。

注：每一個粒子做爲一個實例，由包含3個頂點的三角形組成。