Unity 2017 Game Optimizations (Chris Dickinson 著)

1. Pursuing Performance Problemscanvas

  Pursuing Performance Problems, provides an exploration of the Unity Profiler and a series of methods to profile our application, detect performance bottlenecks, and perform root cause analysis.windows

2. Scripting Strategies (已看)api

  Scripting Strategies, deals with the best practices for our Unity C# Script code, minimizing MonoBehaviour callback overhead, improving interobject communication, and more.app

3. The Benefits of Batchingless

  The Benefits of Batching, explores Unity's Dynamic Batching and Static Batching systems, and how they can be utilized to ease the burden on the Rendering Pipeline.dom

4. Kickstart Your Artasync

  Kickstart Your Art, helps you understand the underlying technology behind art assets and learn how to avoid common pitfalls with importing, compression, and encoding.ide

5. Faster Physicsoop

  Faster Physics, is about investigating the nuances of Unity's internal Physics Engines for both 3D and 2D games, and how to properly organize our physics objects for improved performance.ui

6. Dynamic Graphics

  Dynamic Graphics, provides an in-depth exploration of the Rendering Pipeline, and how to improve applications that suffer rendering bottlenecks in the GPU, or CPU, how to optimize graphical effects such as lighting, shadows, and Particle Effects, ways in which to ptimize Shader code, and some specific techniques for mobile devices.

7. Virtual Velocity and Augmented Acceleration

  Virtual Velocity and Augmented Acceleration, focuses on the new entertainment mediums of Virtual Reality (VR) and Augmented Reality (AR), and includes several techniques for optimizing performance that are unique to apps built for these platforms. 

8. Masterful Memory Management

  Masterful Memory Management, examines the inner workings of the Unity Engine, the Mono Framework, and how memory is managed within these components to protect our application from excessive heap allocations and runtime garbage collection.

9. Tactical Tips and Tricks

  Tactical Tips and Tricks, closes the book with a multitude of useful techniques used by Unity professionals to improve project workflow and scene management. 

1. Pursuing Performance Problems

  The Unity Profiler

 

  Launching the Profiler


  Editor or standalone instances


  Connecting to a WebGL instance


  Remote connection to an iOS device


  Remote connection to an Android device


  Editor profiling


  The Profiler window


  Profiler controls


  Add Profiler


  Record


  Deep Profile


  Profile Editor


  Connected Player


  Clear


  Load


  Save


  Frame Selection


  Timeline View


  Breakdown View Controls


  Breakdown View


  The CPU Usage Area


  The GPU Usage Area


  The Rendering AreaThe Memory Area


  The Audio Area


  The Physics 3D and Physics 2D Areas


  The Network Messages and Network Operations Areas


  The Video Area


  The UI and UI Details Areas


  The Global Illumination Area


  Best approaches to performance analysis


  Verifying script presence


  Verifying script count


  Verifying the order of events


  Minimizing ongoing code changes


  Minimizing internal distractions


  Minimizing external distractions


  Targeted profiling of code segments


  Profiler script control


  Custom CPU Profiling


  Final thoughts on Profiling and Analysis


  Understanding the Profiler


  Reducing noise


  Focusing on the issue


  Summary

 

2. Scripting Strategies

In this chapter, we will explore ways of applying performance enhancements to the following areas:

  • Accessing Components
  • Component callbacks (Update(), Awake(), and so on)
  • Coroutines
  • GameObject and Transform usage
  • Interobject communication
  • Mathematical calculations
  • Deserialization such as Scene and Prefab loading

  Obtain Components using the fastest method

  Remove empty callback definitions

  Cache Component references

private Rigidbody rigidbody;

void Awake() {
    rigidbody = GetComponent<Rigidbody>();   
}

void Update() {
    rigidbody.xxx;
}
View Code

  Share calculation output

  Update, Coroutines, and InvokeRepeating

void Update() {
    ProcessAI();
}

private float _aiProcessDelay = 0.2f;
private float _timer = 0.0f;

void Update() {
    _timer += Time.deltaTime;
    if (_timer > _aiProcessDelya) {
        ProcessA();
        _timer -= _aiProcessDelay;
    }
}

void Start() {
    StartCoroutine(ProcessAICoroutine());
}

IEnumerator ProcessAICoroutine() {
    while (true) {
        ProcessAI();
        yield return new WaitForSeconds(_aiProcessDelay);
    }
}

void Start() {
    InvokeRepeating("ProcessAI", 0f, _aiProcessDelay);
}
View Code

  Faster GameObject null reference checks

        if (!System.Object.ReferenceEquals(gameObject, null)) {
            // do something
        }
View Code

  Avoid retrieving string properties from GameObjects

  Use appropriate data structures

  Avoid re-parenting Transforms at runtime

GameObject.Instantiate(Object original, Transform parent);

transform.hierarchyCapacity;
View Code

  Consider caching Transform changes

  Avoid Find() and SendMessage() at runtime

    Assigning references to pre-existing objects

    Static Classes

using UnityEngine;
public class EnemyCreatorComponent : MonoBehaviour {
    [SerializeField] private int _numEnemies;
    [SerializeField] private GameObject _enemyPrefab;
    [SerializeField] private EnemyManagerComponent _enemyManager;
    void Start() {
        for (int i = 0; i < _numEnemies; ++i) {
            CreateEnemy();
        }
    }
    p
  ublic void CreateEnemy() {
        _enemyManager.CreateEnemy(_enemyPrefab);
    }
}
View Code

    Singleton Components

using UnityEngine;
public class SingletonComponent<T> : MonoBehaviour where T : SingletonComponent<T> {
    private static T __Instance;
    protected static SingletonComponent<T> _Instance {
        get {
            if (!__Instance) {
                T[] managers = GameObject.FindObjectsOfType(typeof(T)) as T[];
                if (managers != null) {
                    if (managers.Length == 1) {
                        __Instance = managers[0];
                        return __Instance;
                    } else if (managers.Length > 1) {
                        Debug.LogError("You have more than one " +
                        typeof(T).Name +
                        " in the Scene. You only need " +
                        "one - it's a singleton!");
                        for (int i = 0; i < managers.Length; ++i) {
                            T manager = managers[i];
                            Destroy(manager.gameObject);
                        }
                    }
                }

                GameObject go = new GameObject(typeof(T).Name, typeof(T));
                __Instance = go.GetComponent<T>();
                DontDestroyOnLoad(__Instance.gameObject);
            }
            return __Instance;
        }
        set {
            __Instance = value as T;
        }
    }
}

public class EnemyManagerSingletonComponent : SingletonComponent<EnemyManagerSingletonComponent>
    public static EnemyManagerSingletonComponent Instance {
        get { return ((EnemyManagerSingletonComponent)_Instance); }
        set { _Instance = value; }
    }

    public void CreateEnemy(GameObject prefab) {
        // same as StaticEnemyManager
    }

    public void KillAll() {
        // same as StaticEnemyManager
    }
}
View Code

    A global Messaging System

public class Message {
    public string type;
    public Message() { type = this.GetType().Name; }
}
View Code

Moving on to our MessageSystem class, we should define its features by what kind of requirements we need it to fulfill:

  • It should be globally accessible
  • Any object (MonoBehaviour or not) should be able to register/deregister as linsteners to receive specific message types(that is, the Observer design pattern)
  • Registering objects should provide a method to call when the given message is broadcasted from elsewhere
  • The system should send the message to all listeners within a reasonable time frame, but not choke on too many requests at once

      A globally accessible objectRegistration

      Registration

public delegate bool MessageHandlerDelegate(Message message);
View Code

      Message processing

      Implementing the Messaging System

using System.Collections.Generic;
using UnityEngine;

public class MessagingSystem : SingletonComponent<MessagingSystem> {

    public static MessagingSystem Instance {
        get { return ((MessagingSystem)_Instance); }
        set { _Instance = value; }
    }

    private Dictionary<string, List<MessageHandlerDelegate>> _listenerDict = new Dictionary<string, List<MessageHandlerDelegate>>();

    public bool AttachListener(System.Type type, MessageHandlerDelegate handler) {
        if (type == null) {
            Debug.Log("MessagingSystem: AttachListener failed due to having no " +
            "message type specified");
            return false;
        }

        string msgType = type.Name;

        if (!_listenerDict.ContainsKey(msgType)) {
            _listenerDict.Add(msgType, new List<MessageHandlerDelegate>());
        }

        List<MessageHandlerDelegate> listenerList = _listenerDict[msgType];
        if (listenerList.Contains(handler)) {
            return false; // listener already in list
        }

        listenerList.Add(handler);

        return true;
    }
}
View Code

      Message queuing and processing

private Queue<Message> _messageQueue = new Queue<Message>();

public bool QueueMessage(Message msg) {
    if (!_listenerDict.ContainsKey(msg.type)) {
        return false;
    }
    _messageQueue.Enqueu(msg);
    return true;
}

private const int _maxQueueProcessingTime = 16667;
private System.Diagnostics.Stopwatch timer = new System.Diagnostics.Stopwatch();

void Update() {
    timer.Start();
    while (_messageQueue.Count > 0) {
        if (_maxQueueProcessingTime > 0.0f) {
            if (timer.Elapsed.Milliseconds > _maxQueueProcessingTime) {
                timer.Stop();
                return;
            }
        }

        Message msg = _messageQueue.Dequeue();
        if (!TriggerMessage(msg)) {
            Debug.Log("Error when processing message: " + msg.type);
        }
    }
}

public bool TriggerMessage(Message msg) {
    string msgType = msg.type;

    if (!_listenerDict.ContainKey(msgType)) {
        Debug.Log("MessagingSystem: Message \"" + msgType + "\" has no listeners!");
        return false;
    }

    List<MessageHandlerDelegate> listenerList = _listenerDict[msgType];
    for (int i = 0; i < listenerList.Count; ++i) {
        if (listenerList[i](msg)) {
            return true;
        }
    }

    return true;
}
View Code

      Implementing custom messages

public class CreateEnemyMessage : Message { }
public class EnemyCreatedMessage : Message {
    public readonly GameObject enemyObject;
    public readonly string enemyName;
    public EnemyCreatedMessage(GameObject enemyObject, string enemyName) {
        this.enemyObject = enemyObject;
        this.enemyName = enemyName;
    }
}
View Code

      Message sending

public class EnemyCreatorComponent : MonoBehaviour {
    void Update() {
        if (Input.GetKeyDown(KeyCode.Space)) {
            MessagingSystem.Instance.QueueMessage(new CreateEnemyMessage());
        }
    }
}
View Code

      Message registration

public class EnemyManagerWithMessagesComponent : MonoBehaviour {

    private List<GameObject> _enemies = new List<GameObject>();

    [SerializeField] private GameObject _enemyPrefab;

    void Start() {
        MessagingSystem.Instance.AttachListener(typeof(CreateEnemyMessage),this.HandleCreateEnemy);
    }

    bool HandleCreateEnemy(Message msg) {
        CreateEnemyMessage castMsg = msg as CreateEnemyMessage;
        string[] names = { "Tom", "Dick", "Harry" };
        GameObject enemy = GameObject.Instantiate(_enemyPrefab,5.0f * Random.insideUnitSphere,Quaternion.identity);
        string enemyName = names[Random.Range(0, names.Length)];
        enemy.gameObject.name = enemyName;
        _enemies.Add(enemy);
        MessagingSystem.Instance.QueueMessage(new EnemyCreatedMessage(enemy,enemyName));
        return true;
    }
}

public class EnemyCreatedListenerComponent : MonoBehaviour {

    void Start() {
        MessagingSystem.Instance.AttachListener(typeof(EnemyCreatedMessage),
        HandleEnemyCreated);
    }

    bool HandleEnemyCreated(Message msg) {
        EnemyCreatedMessage castMsg = msg as EnemyCreatedMessage;
        Debug.Log(string.Format("A new enemy was created! {0}", castMsg.enemyName));
        return true;
    }
}
View Code

      Message cleanup

public bool DetachListener(System.Type type, MessageHandlerDelegate handler) {
    if (type == null) {
        Debug.Log("MessagingSystem: DetachListener failed due to having no " + "message type specified");
        return false;
    }

    string msgType = type.Name;

    if (!_listenerDict.ContainsKey(type.Name)) {
        return false;
    }

    List<MessageHandlerDelegate> listenerList = _listenerDict[msgType]; if (!listenerList.Contains(handler)) {
        return false;
    }

    listenerList.Remove(handler);
    return true;
}

void OnDestroy() {
    if (MessagingSystem.IsAlive) {
        MessagingSystem.Instance.DetachListener(typeof(EnemyCreatedMessage),this.HandleCreateEnemy);
    }
}
View Code

      Wrapping up the Messaging System

   Disable unused scripts and objects

    Disabling objects by visibility

void OnBecameVisible() { enabled = true; }
void OnBecameInvisible() { enabled = false; }

void OnBecameVisible() { gameObject.SetActive(true); }
void OnBecameInvisible() { gameObject.SetActive(false); }
View Code

    Disabling objects by distance

[SerializeField] GameObject _target;
[SerializeField] float _maxDistance;
[SerializeField] int _coroutineFrameDelay;

void Start() {
    StartCoroutine(DisableAtADistance());
}

IEnumerator DisableAtADistance() {
    while (true) {
        float distSqrd = (transform.position - _target.transform.position).sqrMagnitude;

        if (distSqrd < _maxDistance * _maxDistance) {
            enabled = true;
        } else {
            enabled = false;
        }

        for(int i = 0; i < _coroutineFrameDelay; ++i) {
            yield return new WaitForEndOfFrame();
        }
    }
}
View Code

  Consider using distance-squared over distance

  Minimize Deserialization behavior

Unity's Serialization system is mainly used for Scenes, Prefabs,ScriptableObjects and various Asset types(which tend to derive from ScriptableObject).

When one of these object types is saved to disk, it is converted into a text file using the Yet Another Markup Language (YAML) format, which can be deserialized back into the original object type at a later time.

All GameObjects and their properties get serialized when a Prefab or Scene is serialized, including private and protected fields, all of their Components as well as its child GameObjects and their Components, and so on. 

When our application is built, this serialized data is bundled together in large binary data files internally called Serialized Files in Unity.

Reading and deserializing this data from disk at runtime is an incredibly slow process (relatively speaking) and so all deserialization activity comes with a significant performance cost. 

This kind of deserialization takes place any time we call Resources.Load() for a file path found under a folder named Resources.

Once the data has been loaded from disk into memory, then reloading the same reference later is much faster, but disk activity is always required the first time it is accessed. 

Naturally, the larger the data set we need to deserialize, the longer this process takes.

Since every Component of a Prefab gets serialized, then the deeper the hierarchy is, the more data needs to be deserialized.

This can be a problem for Prefabs with very deep hierarchies, Prefabs with many empty GameObjects (since every GameObject always contains at least a Transform Component), and particularly problematic for User Interface(UI) Prefabs, since they tend to house many more Components than a typical Prefab. 

Loading in large serialized data sets like these could cause a significant spike in CPU the first time they are loaded, which tend to increase loading time if they're needed immediately at the start of the Scene.

More importantly, they can cause frame drops if they are loaded at runtime.

There are a couple ofapproaches we can use to minimize the costs of deserialization.

    Reduce serialized object size

    Load serialized objects asynchronously

    Keep previously loaded serialized objects in memory

    Move common data into ScriptableObjects

  Load scenes additively and asynchronously

  Create a custom Update() layer

Earlier in this chapter, in the "Update, Coroutines and InvokeRepeating" section, we discussed the relative pros and cons of using these Unity Engine features as a means of avoiding excessive CPU workload during most of our frames.

Regardless of which of these approaches we might adopt, there is an additional risk of having lots of MonoBehaviours written to periodically call some function, which is having too many methods triggering in the same frame simultaneously. 

Imagine thousands of MonoBehaviours that initialized together at the start of a Scene, each starting a Coroutine at the same time that will process their AI tasks every 500 milliseconds.

It is highly likely that they would all trigger within the same frame, causing a huge spike in its CPU usage for a moment, which settles down temporarily and then spikes again a few moments later when the next round of AI processing is due.

Ideally, we would want to spread these invocations out over time. 

The following are the possible solutions to this problem: Generating a random time to wait each time the timer expires or Coroutine triggers Spread out Coroutine initialization so that only a handful of them are started at each frame Pass the responsibility of calling updates to some God Class that places a limit on the number of invocations that occur each frame

The first two options are appealing since they’re relatively simple and we know that Coroutines can potentially save us a lot of unnecessary overhead. 

However, as we discussed, there are many dangers and unexpected side effects associated with such drastic design changes. 

A potentially better approach to optimize updates is to not use Update() at all, or more accurately, to use it only once.

When Unity calls Update(), and in fact, any of its callbacks, it crosses the aforementioned Native-Managed Bridge,which can be a costly task.

In other words, the processing cost of executing 1,000 separate Update() callbacks will be more expensive than executing one Update() callback, which calls into 1,000 regular functions.

As we witnessed in the "Remove empty callback definitions" section, calling Update() thousands of times is not a trivial amount of work for the CPU to undertake, primarily because of the Bridge.

We can, therefore, minimize how often Unity needs to cross the Bridge by having a God Class MonoBehaviour use its own Update() callback to call our own custom updatestyle system used by our custom Components. 

In fact, many Unity developers prefer implementing this design right from the start of their projects, as it gives them finer control over when and how updates propagate throughout the system; this can be used for things such as menu pausing, cool time manipulation effects, or prioritizing important tasks and/or suspending low priority tasks if we detect that we’re about to reach our CPU budget for the current frame. 

All objects wanting to integrate with such a system must have a common entry point.

We can achieve this through an Interface Class with the interface keyword.

Interface Classes essentially set up a contract whereby any class that implements the Interface Class Class must provide a specific series of methods.

In other words, if we know the object implements an Interface Class, then we can be certain about what methods are available.

In C#, classes can only derive from a single base class, but they can implement any number of Interface Classes (this avoids the deadly diamond of death problem that C++ programmers will be familiar with). 

The following Interface Class definition will suffice, which only requires the implementing class to define a single method called OnUpdate():


public interface IUpdateable {
    void OnUpdate(float dt);
}


It’s common practice to start an Interface Class definition with a capital ‘I’ to make it clear that it is an Interface Class we’re dealing with.

The beauty of Interface Classes is that they improve the decoupling of our codebase, allowing huge subsystems to be replaced, and as long as the Interface Class isadhered to, we will have greater confidence that it will continue to function as intended. 

Next, we'll define a custom MonoBehaviour type which implements this Interface Class:


public class UpdateableComponent : MonoBehaviour, IUpdateable {
    public virtual void OnUpdate(float dt) {}
}


Note that we're naming the method OnUpdate() rather than Update().

We're defining a custom version of the same concept, but we want to avoid name collisions with the built-in Update() callback. 

The OnUpdate() method of the UpdateableComponent class retrieves the current delta time (dt), which spares us from a bunch of unnecessary Time.deltaTime calls, which are commonly used in Update() callbacks.

We've also made the function virtual to allow derived classes to customize it. 

This function will never be called as it's currently being written.

Unity automatically grabs and invokes methods defined with the Update() name, but has no concept of our OnUpdate() function, so we will need to implement something that will call this method when the time is appropriate.

For example, some kind of GameLogic God Class could be used for this purpose. 

During the initialization of this Component, we should do something to notify our GameLogic object of both its existence and its destruction so that it knows when to start and stop calling its OnUpdate() function. 

In the following example, we will assume that our GameLogic class is a SingletonComponent, as defined earlier in the "Singleton Components" section, and has appropriate static functions defined for registration and deregistration.

Bear in mind that it could just as easily use the aforementioned MessagingSystem to notify the GameLogic of its creation/destruction.

For MonoBehaviours to hook into this system, the most appropriate place is within their Start() and OnDestroy() callbacks:


void Start() {

    GameLogic.Instance.RegisterUpdateableObject(this);
}

void OnDestroy() {
    if (GameLogic.Instance.IsAlive) {
        GameLogic.Instance.DeregisterUpdateableObject(this);
    }
}


It is best to use the Start() method for the task of registration, since using Start() means that we can be certain all other pre-existing Components will have at least had their Awake() methods called prior to this moment.

This way, any critical initialization work will have already been done on the object before we start invoking updates on it. 

Note that because we're using Start() in a MonoBehaviour base class, if we define a Start() method in a derived class, it will effectively override the base class definition, and Unity will grab the derived Start() method as a callback instead.

It would, therefore, be wise to implement a virtual Initialize() method so that derived classes can override it to customize initialization behavior without interfering with the base class's task of notifying the GameLogic object of our Component's existence. 

The following code provides an example of how we might implement a virtual Initialize() method. 

 

void Start() {
    GameLogic.Instance.RegisterUpdateableObject(this);
    Initialize();
}

protected virtual void Initialize() {
    // derived classes should override this method for initialization code, and NOT reimple
}


Finally, we will need to implement the GameLogic class.

The implementation is effectively the same whether it is a SingletonComponent or a MonoBehaviour, and whether or not it uses the MessagingSystem.

Either way, our UpdateableComponent class must register and deregister as IUpdateable objects, and the GameLogic class must use its own Update() callback to iterate through every registered object and call their OnUpdate() function.

Here is the definition for our GameLogic class:

 

public class GameLogicSingletonComponent : SingletonComponent<GameLogicSingletonComponent> {
    public static GameLogicSingletonComponent Instance {
        get { return ((GameLogicSingletonComponent)_Instance); }
        set { _Instance = value; }
    }

    List<IUpdateable> _updateableObjects = new List<IUpdateable>();


    public void RegisterUpdateableObject(IUpdateable obj) {
        if (!_updateableObjects.Contains(obj)) {
            _updateableObjects.Add(obj);
        }
    }

    public void DeregisterUpdateableObject(IUpdateable obj) {
        if (_updateableObjects.Contains(obj)) {
            _updateableObjects.Remove(obj);
        }
    }

    void Update() {
        float dt = Time.deltaTime;
        for (int i = 0; i < _updateableObjects.Count; ++i) {
            _updateableObjects[i].OnUpdate(dt);
        }
    }
}


If we make sure that all of our custom Components inherit from the UpdateableComponent class, then we've effectively replaced "N" invocations of the Update() callback with just one Update() callback, plus "N" virtual function calls.

This can save us a large amount of performance overhead because even though we're calling virtual functions (which cost a small overhead more than a non-virtual function call because it needs to redirect the call to the correct place), we're still keeping the overwhelming majority of update behavior inside our Managed code and avoiding the Native-Managed Bridge as much as possible.

This class can even be expanded to provide priority systems, to skip low-priority tasks if it detects that the current frame has taken too long, and many other possibilities.

Depending on how deep you already are into your current project, such changes can be incredibly daunting, time-consuming, and likely to introduce a lot of bugs as subsystems are updated to make use of a completely different set of dependencies.

However, the benefits can outweigh the risks if time is on your side.

It would be wise to do some testing on a group of objects in a Scene that is similarly designed to your current Scene files to verify that thebenefits outweigh the costs.

  Summary 

3. The Benefits of Batching

In 3D graphics and games, batching is a very general term describing the process of grouping a large number of wayward pieces of data together and processing them as a single, large block of data.

This situation is ideal for CPUs, and particularly GPUs, which can handle simultaneous processing of multiple tasks with their multiple cores.

Having a single core switching back and forth between different locations in memory takes time, so the less this needs to be done, the better.

In some cases, the act of batching refers to large sets of meshes, vertices, edges, UV coordinates, and other different data types that are used to represent a 3D object.

However, the term could just as easily refer to the act of batching audio files, sprites, texture files, and other large datasets.

So, just to clear up any confusion, when the topic of batching is mentioned in Unity, it is usually referring to the two primary mechanisms it offers for batching mesh data: Dynamic Batching and Static Batching.

These methods are essentially two different forms of geometry merging, where we combine mesh data of multiple objects together and render them all in a single instruction, as opposed to preparing and drawing each one separately. 

The process of batching together multiple meshes into a single mesh is possible because there is no reason a mesh object must fill a contiguous volume of 3D space.

The Rendering Pipeline is perfectly happy with accepting a collection of vertices that are not attached together with edges, and so we can take multiple separate meshes that might have resulted in multiple render instructions and combine them together into a single mesh, thus rendering it out using a single instruction.

There has been a lot of confusion over the years surrounding the conditions under which the Dynamic Batching and Static Batching systems activate and where we might even see a performance improvement.

After all, in some cases, batching can actually degrade performance if it is not used wisely.

A proper understanding of these systems will give us the knowledge we need to improve the graphics performance of our application in significant ways.

This chapter intends to dispel much of the misinformation floating around about these systems.

We will observe, via explanation, exploration, and examples, just how these two batching methods operate.

This will enable us to make informed decisions, using most of them to improve our application's performance.

We will cover the following topics in this chapter:

  • A brief introduction to the Rendering Pipeline and the concept of Draw Calls 
  • How Unity's Materials and Shaders work together to render our objects 
  • Using the Frame Debugger to visualize rendering behavior
  • How Dynamic Batching works, and how to optimize it
  • How Static Batching works, and how to optimize it

  Draw Calls

Before we discuss Dynamic Batching and Static Batching independently, let's first understand the problems that they are both trying to solve within the Rendering Pipeline.

We will try to keep fairly light on the technicalities as we will explore this topic in greater detail in Chapter 6, Dynamic Graphics.

The primary goal of these batching methods is to reduce the number of Draw Calls required to render all objects in the current view.

At its most basic form, a Draw Call is a request sent from the CPU to the GPU asking it to draw an object.

Draw Call is the common industry vernacular for this process, although they are sometimes referred to as SetPass Calls in Unity, since some low-level methods are named as such.

Think of it as configuring options before initiating the current rendering pass.

We will refer to them as Draw Calls throughout the remainder of this book.

Before a Draw Call can be requested, several tasks need to be completed.

Firstly, mesh and texture data must be pushed from the CPU memory (RAM) into GPU memory (VRAM), which typically takes place during initialization of the Scene, but only for textures and meshes the Scene file knows about.

If we dynamically instantiate objects at runtime using texture and mesh data that hasn't appeared in the Scene yet, then they must be loaded at the time they are instantiated.

The Scene cannot know ahead of time which Prefabs we're planning to instantiate at runtime, as many of them are hidden behind conditional statements and much of our application's behavior depends upon user input.

Next, the CPU must prepare the GPU by configuring the options and rendering features that are needed to process the object that is the target of the Draw Call.

 

These communication tasks between the CPU and GPU take place through the underlying Graphics API, which could be DirectX, OpenGL, OpenGLES, Metal, WebGL, or Vulkan, depending on the platform we're targeting and certain graphical settings.

These API calls go through a library, called a driver, which maintains a long series of complex and interrelated settings, state variables, and datasets that can be configured and executed from our application (although drivers are designed to service multiple applications
simultaneously, as well as render calls coming from multiple threads).

The available features change enormously based on the graphics card we're using and the version of the Graphics API we're targeting; more advanced graphics cards support more advanced features, which would need to be supported by newer versions of the API, so updated drivers would be needed to enable them.

The sheer number of settings, supported features, and compatibility levels between one version and another that have been created over the years (particularly for the older APIs such as DirectX and OpenGL) can be nothing short of mind-boggling.

Thankfully, at a certain level of abstraction, all of these APIs tend to operate in a similar fashion; hence Unity is able to support many different Graphics APIs through a common interface.

This utterly massive array of settings that must be configured to prepare the Rendering Pipeline just prior to rendering an object is often condensed into a single term known as the Render State.

Until these Render State options are changed, the GPU will maintain the same Render State for all incoming objects and render them in a similar fashion. 

Changing the Render State can be a time-consuming process.

So, for example, if we were to set the Render State to use a blue texture file and then ask it to render one gigantic mesh, then it would be rendered very rapidly with the whole mesh appearing blue.

We could then render 9 more, completely different meshes, and they would all be rendered blue, since we haven't changed which texture is being used.

If, however, we wanted to render 10 meshes using 10 different textures, then this will take longer.

This is because we will need to prepare the Render State with the new texture just prior to sending the Draw Call instruction for each mesh.

The texture being used to render the current object is effectively a global variable in the Graphics API, and changing a global variable within a parallel system is much easier said than done.

In a massively parallel system such as a GPU, we must effectively wait until all of the current jobs have reached the same synchronization point (in other words, the fastest cores need to stop and wait for the slowest ones to catch up, wasting processing time that they could be using on other tasks) before we can make a Render State change, at which point we will need to spin up all of the parallel jobs again.

This can waste a lot of time, so the less we need to ask the Render State to change, the faster the Graphics API will be able to process our requests.

Things that can trigger Render State synchronization include--but are not limited to--an immediate push of a new texture to the GPU and changing a Shader, lighting information, shadows, transparency, and pretty much any graphical setting we can think of.

Once the Render State is configured, the CPU must decide what mesh to draw, what textures and Shader it should use, and where to draw the object based on its position, rotation, and scale (all represented within a 4x4 matrix known as a transform, which is where the Transform Component gets its name from) and then send an instruction to the GPU to draw it.

In order to keep the communication between CPU and GPU very dynamic, new instructions are pushed into a queue known as the Command Buffer.

This queue contains instructions that the CPU has created and that the GPU pulls from each time it finishes the preceding command.

The trick to how batching improves the performance of this process is that a new Draw Call does not necessarily mean that a new Render State must be configured.

If two objects share the exact same Render State information, then the GPU can immediately begin rendering the new object since the same Render State is maintained after the last object is finished.

This eliminates the time wasted due to a Render State synchronization.

It also serves to reduce the number of instructions that need to be pushed into the Command Buffer, reducing the workload on both the CPU and GPU.
  Materials and Shaders


  The Frame Debugger

  Dynamic Batching


    Vertex attributes


    Mesh scaling


    Dynamic Batching summary


  Static Batching


  The Static flag

 

  Memory requirements


  Material references


  Static Batching caveats


    Edit Mode debugging of Static Batching


    Instantiating static meshes at runtime


  Static Batching summary


  Summary

 

4. Kickstart Your Art

  Audio

 

    Importing audio files


    Loading audio files


    Encoding formats and quality levels


    Audio performance enhancements


      Minimize active Audio Source count


      Enable Force to Mono for 3D sounds


      Resample to lower frequencies


      Consider all compression formats


      Beware of streaming


      Apply Filter Effects through Mixer Groups to reduce duplication


      Use remote content streaming responsibly


      Consider Audio Module files for background music


  Texture files

The terms texture and sprite often get confused in game development, so it's worth making the distinction--a texture is simply an image file, a big list of color data telling the interpreting program what color each pixel of the image should be, whereas a sprite is the 2D equivalent of a mesh, which is often just a single quad(a pair of triangles combined to make a rectangular mesh) that renders flat against the current Camera.

There are also things called Sprite Sheets, which are large collections of individual images contained within a larger texture file, commonly used to contain the animations of a 2D character.

These files can be split apart by tools, such as Unity's Sprite Atlas tool, to form individual textures for the character's animated frames.

Both meshes and sprites use textures to render an image onto its surface.

Texture image files are typically generated in tools such as Adobe Photoshop or Gimp and then imported into our project in much the same way as audio files.

At runtime, these files are loaded into memory, pushed to the GPU's VRAM, and rendered by a Shader over the target sprite or mesh during a given Draw Call.

    Texture compression formats

 


    Texture performance enhancements


      Reduce texture file size


      Use Mip Maps wisely


      Manage resolution downscaling externally


      Adjust Anisotropic Filtering levels


      Consider Atlasing


      Adjust compression rates for non-square textures


      Sparse Textures


      Procedural Materials


      Asynchronous Texture Uploading


  Mesh and animation files


    Reduce polygon count


    Tweak Mesh Compression


    Use Read-Write Enabled appropriately


    Consider baked animations


    Combine meshes


  Asset Bundles and Resources


  Summary

 

5. Faster Physics

In  this chapter, we will cover the following areas:

Understanding how Unity's Physics Engine works:

  • Timesteps and FixedUpdates
  • Collider types
  • Collisions
  • Raycasting
  • Rigidbody active states

Physics performance optimizations:

  • How to structure Scenes for optimal physics behavior
  • Using the most appropriate types of Collider
  • Optimizing the Collision Matrix
  • Improving physics consistency and avoiding error-prone behavior
  • Ragdolls and other Joint-based objects

  Physics Engine internals

    Physics and time

      Maximum Allowed Timestep

It is important to note that if a lot of time has passed since the last Fixed Update (for example, the game froze momentarily), then Fixed Updates will continue to be calculated within the same Fixed Update loop until the Physics Engine has caught up with the current time.

For example, if the previous frame took 100 ms to render (for example, a sudden CPU spike caused the main thread to block for a long time), then the Physics Engine will need to be updated five times.

The FixedUpdate() method will, therefore, be called five times before Update() can be called again due to the default Fixed Update Timestep of 20 milliseconds.

Of course, if there is a lot of physics activity to process during these five Fixed Updates, such that it takes more than 20 milliseconds to resolve them all, then the Physics Engine will need to invoke a sixth update. 

Consequently, it's possible during moments of heavy physics activity that the Physics Engine takes more time to process a Fixed Update than the amount of time it is simulating.

For example, if it took 30 ms to process a Fixed Update simulating 20 ms of Gameplay, then it has fallen behind, requiring it to process more Timesteps to try and keep up, but this could cause it to fall behind even further, requiring it to process even more Timesteps, and so on. 

In these situations the Physics Engine is never able to escape the Fixed Update loop and allow another frame to render.

This problem is often known as the spiral of death.

However, to prevent the Physics Engine from locking up our game during these moments, there is a maximum amount of time that the Physics Engine is allowed to process each Fixed Update loop.

This threshold is called the Maximum Allowed Timestep, and if the current batch of Fixed Updates takes too long to process, then it will simply stop and forgo further processing until the next render update completes.

This design allows the Rendering Pipeline to at least render the current state and allow for user input and gameplay logic to make some decisions during rare moments where the Physics Engine has gone ballistic (pun intended). 

This setting can be accessed through Edit | Project Settings | Time | Maximum Allowed Timestep

      Physics updates and runtime changes

When the Physics Engine processes a given timestep, it must move any active Rigidbody objects(GameObjects with a Rigidbody Component), detect any new collisions, and invoke the collision callbacks on the corresponding objects. 

The Unity documentation makes an explicit note that changes to Rigidbody objects should be handled within FixedUpdate() and other physics callbacks for exactly this reason.

These methods are tightly coupled with the update frequency of the Physics Engine as opposed to other parts of the Game Loop, such as Update(). 

This means that callbacks such as FixedUpdate() and OnTriggerEnter() are safe places to make Rigidbody changes, whereas methods such as Update() and Coroutines yielding on WaitForSeconds or WaitForEndOfFrame are not.

Ignoring this advice could cause unexpected physics behavior, as multiple changes may be made to the same object before the Physics Engine is given a chance to catch and process all of them. 

It's particularly dangerous to apply forces or impulses to objects in Update() callbacks without taking into account the frequency of those calls.

For instance, applying a 10-Newton force each Update while the player holds down a key would result in completely different resultant velocity between two different devices than if we did the same thing in Fixed Update since we can't rely on the number of Update() calls being consistent.

However, doing so in a FixedUpdate() callback will be much more consistent.

Therefore, we must ensure that all physics-related behavior is handled in the appropriate callbacks or we will risk introducing some especially confusing gameplay bugs that are very hard to reproduce. 

It logically follows that the more time we spend in any given Fixed Update iteration, the less time we have for the next gameplay and rendering pass. 

Most of the time this results in minor, unnoticeable background processing tasks, since the Physics Engine barely has any work to do, andthe FixedUpdate() callbacks have a lot of time to complete their work. 

However, in some games, the Physics Engine could be performing a lot of calculations during each and every Fixed Update.

This kind of bottlenecking in physics processing time will affect our frame rate, causing it to plummet as the Physics Engine is tasked with greater and greater workloads.

Essentially, the Rendering Pipeline will try to proceed as normal, but whenever it's time for a Fixed Update, in which the Physics Engine takes a long time to process, the Rendering Pipeline would have very little time to generate the current display before the frame is due, causing a sudden stutter.

This is in addition to the visual effect of the Physics Engine stopping early because it hit the Maximum Allowed Timestep.

All of this together would generate a very poor user experience. 

Hence, in order to keep a smooth and consistent frame rate, we will need to free up as much time as we can for rendering by minimizing the amount of time the Physics Engine takes to process any given timestep.

This applies in both the best-case scenario (nothing moving) and worst-case scenario (everything smashing into everything else at once).

There are a number of time-related features and values we can tweak within the Physics Engine to avoid performance pitfalls such as these.

    Static Colliders and Dynamic Colliders

Dynamic Colliders simply mean GameObjects that contain both a Collider Component (which could be one of several types) and a Rigidbody Component.

We can also have Colliders that do not have a Rigidbody Component attached, and these are called Static Colliders.

    Collision detection

    Collider types

    The Collision Matrix

The Collision Matrix can be accessed through Edit | Project Settings | (Physics / Physics2D) | Layer Collision Matrix. 
    Rigidbody active and sleeping states

Every modern Physics Engine shares a common optimization technique, whereby objects that have come to rest have their internal state changed from an active state to a sleeping state.

The threshold value that controls the sleeping state can be modified under Edit | Project Settings | Physics | Sleep Threshold.

    Ray and object casting

Another common feature of Physics Engines is the ability to cast a ray from one point to another and generate collision information with one or more of the objects in its path.

This is known as Raycasting.It is pretty common to implement several gameplay mechanics through Raycasting, such as firing a gun.

This is typically implemented by performing Raycasts from the player to the target location and finding any viable targets in its path (even if it's just a wall).

We can also obtain a list of targets within a finite distance of a fixed point in space using a Physics.OverlapSphere() check.

This is typically used to implement area-of-effect gameplay features, such as grenade or fireball explosions.

We can even cast entire objects forward in space using Physics.SphereCast() and Physics.CapsuleCast().

These methods are often used to simulate wide laser beams, or if we simply want to see what would be in the path of a moving character.

    Debugging Physics


  Physics performance optimizations


    Scene setup


      Scaling


      Positioning


      Mass


    Use Static Colliders appropriately


    Use Trigger Volumes responsibly


    Optimize the Collision Matrix


    Prefer Discrete collision detection


    Modify the Fixed Update frequency


    Adjust the Maximum Allowed Timestep


    Minimize Raycasting and bounding-volume checks


    Avoid complex Mesh Colliders


      Use simpler primitives


      Use simpler Mesh Colliders


    Avoid complex physics Components


    Let physics objects sleep


    Modify the Solver Iteration Count


    Optimize Ragdolls


      Reduce Joints and Colliders


      Avoid inter-Ragdoll collisions


      Replace, deactivate or remove inactive Ragdolls


    Know when to use physics


  Summary

 

6. Dynamic Graphics

  The Rendering Pipeline


  The GPU Front End


  The GPU Back End


  Fill Rate


  Overdraw


  Memory Bandwidth


  Lighting and Shadowing


  Forward Rendering


  Deferred RenderingVertex Lit Shading (legacy)


  Global Illumination


  Multithreaded Rendering


  Low-level rendering APIs


  Detecting performance issues


  Profiling rendering issues


  Brute-force testing


  Rendering performance enhancements


  Enable/Disable GPU Skinning


  Reduce geometric complexity


  Reduce Tessellation


  Employ GPU Instancing


  Use mesh-based Level Of Detail (LOD)


  Culling Groups


  Make use of Occlusion Culling


  Optimizing Particle Systems


  Make use of Particle System Culling


  Avoid recursive Particle System calls


  Optimizing Unity UI


  Use more Canvases


  Separate objects between static and dynamic canvases


  Disable Raycast Target for noninteractive elements


  Hide UI elements by disabling the parent Canvas Component


  Avoid Animator Components


  Explicitly define the Event Camera for World Space Canvases


  Don't use alpha to hide UI elements


  Optimizing ScrollRects


  Make sure to use a RectMask2D


  Disable Pixel Perfect for ScrollRects


  Manually stop ScrollRect motion


  Use empty UIText elements for full-screen interaction


  Check the Unity UI source code


  Check the documentation


  Shader optimization


  Consider using Shaders intended for mobile platforms


  Use small data types


  Avoid changing precision while swizzling


  Use GPU-optimized helper functions


  Disable unnecessary features


  Remove unnecessary input dataExpose only necessary variables


  Reduce mathematical complexity


  Reduce texture sampling


  Avoid conditional statements


  Reduce data dependencies


  Surface Shaders


  Use Shader-based LOD


  Use less texture data


  Test different GPU Texture Compression formats


  Minimize texture swapping


  VRAM limits


  Preload textures with hidden GameObjects


  Avoid texture thrashing


  Lighting optimization


  Use real-time Shadows responsibly


  Use Culling Masks


  Use baked Lightmaps


  Optimizing rendering performance for mobile devices


  Avoid Alpha Testing


  Minimize Draw Calls


  Minimize Material count


  Minimize texture size


  Make textures square and power-of-two


  Use the lowest possible precision formats in Shaders


  Summary

 

7. Virtual Velocity and Augmented Acceleration

  XR Development

 

  Emulation


  User comfort


  Performance enhancements


  The kitchen sink


  Single-Pass versus Multi-Pass Stereo Rendering


  Apply anti-aliasing


  Prefer Forward Rendering


  Image effects in VR


  Backface culling


  Spatialized audio


  Avoid camera physics collisions


  Avoid Euler anglesExercise restraint


  Keep up to date with the latest developments


  Summary

 

8. Masterful Memory Management

  The Mono platform


  Memory Domains

 

  Garbage collection


  Memory Fragmentation


  Garbage collection at runtime


  Threaded garbage collection


  Code compilation


  IL2CPP


  Profiling memory


  Profiling memory consumption


  Profiling memory efficiency


  Memory management performance enhancements


  Garbage collection tactics


  Manual JIT compilation


  Value types and Reference types


  Pass by value and by reference


  Structs are Value types


  Arrays are Reference types


  Strings are immutable Reference types


  String concatenation


  StringBuilder


  String formatting


  Boxing


  The importance of data layout


  Arrays from the Unity API


  Using InstanceIDs for dictionary keys


  foreach loops


  Coroutines


  Closures


  The .NET library functions


  Temporary work buffers


  Object Pooling


  Prefab Pooling


  Poolable Components


  The Prefab Pooling System


  Prefab poolsObject spawning


  Instance prespawning


  Object despawning


  Prefab pool testing


  Prefab Pooling and Scene loading


  Prefab Pooling summary


  IL2CPP optimizations


  WebGL optimizations


  The future of Unity, Mono, and IL2CPP


  The upcoming C# Job System


  Summary

 

9. Tactical Tips and Tricks

  Editor hotkey tips


  GameObjects


  Scene window


  Arrays


  Interface


  In-editor documentation


  Editor UI tips


  Script Execution Order


  Editor files


  The Inspector window


  The Project window


  The Hierarchy window


  The Scene and Game windows


  Play Mode


  Scripting tips


  General


  Attributes


  Variable attributes


  Class attributes


  Logging


  Useful links


  Custom Editor scripts and menu tips


  External tips


  Other tips


  Summary

相關文章
相關標籤/搜索