從系統性能優化談對象相等性

時間 2019-11-08

原文原文鏈接

公司系統中有一接口訪問量大，內部計算邏輯較爲複雜。在優化時打算把Request中的參數作爲Key，Response作爲Value放到進程內緩存中，以下降服務器壓力，提升接口響應速度。由於Response中一些數據時效性要求較高，因此緩存設置一個較短的過時時間（好比10s）。git

但這裏牽涉到一個問題，如何有效的判斷兩次請求的參數是相等的。C#中自定義類型會從Object類繼承Equals和GetHashCode兩個方法，能夠根據實際需求來重寫這兩個方法實現對象相等性比較。github

Object.Equals(Object)

.NET 中不一樣類型對於Equals方法的默認實現以下：算法

Type category	Equality defined by	Comments
Class derived directly from Object	Object.Equals(Object)	Reference equality; equivalent to calling Object.ReferenceEquals.
Structure	ValueType.Equals	Value equality; either direct byte-by-byte comparison or field-by-field comparison using reflection.
Enumeration	Enum.Equals	Values must have the same enumeration type and the same underlying value.
Delegate	MulticastDelegate.Equals	Delegates must have the same type with identical invocation lists.
Interface	Object.Equals(Object)	Reference equality.

Object

經過源碼，能夠看到Object中Equals方法的實現，即.NET中全部類型的默認實現：數據庫

ValueType

反編譯以後，能夠看到ValueType中Equals方法的實現，即值類型的默認實現，它重寫了Object.Equals方法：api

上面能夠看到，ValueType中Equals實現思路以下：緩存

obj==null返回false安全
若this和obj的運行時類型不一樣則返回false服務器
若是值類型中包含的字段均是值類型則逐字節比較字段值網絡
若含有引用類型字段，則使用使用反射獲取字段信息，而後調用字段的Equals方法來逐字段比較相等性數據結構

重寫Equals

Object的Equals僅經過引用來比較相等性。應該說是identity而非equality，與Python中的is、== 操做符相似;ValueType的Equals中使用了反射性能較差。這種默認實現一般不能知足需求，自定義實現Equals思路以下：

obj爲null，返回false，由於Equals是實例方法，this不會爲null
對於引用類型，this和obj引用同一個對象返回true
調用GetType方法來判斷this和obj在運行時是不是相同類型
必要時調用基類的Equals方法來比較基類中字段的相等性（一般不調用Object類的Equals）
調用Equals方法逐字段進行比較

根據上述思路，實現自定義類型的Equals方法：

public class Entity
{
    public Entity(string tag, int count, IDictionary<string, string> descriptioins)
    {
        this.Tag = tag;
        this.Count = count;
        this.Descriptions = descriptioins;
    }

    public string Tag { private set; get; }

    public int Count { private set; get; }

    public IDictionary<string, string> Descriptions { private set; get; }
    /// <summary>
    /// 逐字段比較相等性
    /// </summary>
    public override bool Equals(object obj)
    {
        if (obj == null)
        {
            return false;
        }

        if (object.ReferenceEquals(this, obj))
        {
            return true;
        }

        // 這裏判斷this與obj在運行時類型是否同樣
        // 使用is關鍵字進行類型判斷的話，若是obj是Entity的子類也會返回true
        // 若是類型被標記爲sealed，可使用is來判斷
        if (this.GetType().Equals(obj.GetType()) == false)
        {
            return false;
        }

        var other = obj as Entity;
        if (other == null)
        {
            return false;
        }
        if (this.Tag != other.Tag)
        {
            return false;
        }
        if (this.Count != other.Count)
        {
            return false;
        }
        if (this.Descriptions.FieldsEquales(other.Descriptions) == false)
        {
            return false;
        }

        return true;
    }
    /// <summary>
    /// 獲得的哈希值應在對象生命週期中保持不變
    /// </summary>
    public override int GetHashCode() => this.ToString().GetHashCode();
    /// <summary>
    /// 含義同Equals(object obj)
    /// </summary>
    public static bool operator ==(Entity left, Entity right)
    {
        // The null keyword is a literal that represents a null reference, one that does not refer to any object. 
        // null is the default value of reference - type variables.Ordinary value types cannot be null, except for nullable value types.
        if (object.ReferenceEquals(null, left))
        {
            return false;
        }
        return left.Equals(right);
    }
    /// <summary>
    /// 含義與==相反
    /// </summary>
    public static bool operator !=(Entity left, Entity right) => !(left == right);

    public override string ToString() => JsonConvert.SerializeObject(this);
}

override Equals

public static class DictionaryExtension
{
    /// <summary>
    /// 調用Object.Equals(Object)方法逐個字段進行相等性比較
    /// <para>雙方均爲null時返回true，一方爲null是返回false</para>
    /// </summary>
    public static bool FieldsEquals<TKey, TValue>(this IDictionary<TKey, TValue> source, IDictionary<TKey, TValue> target)
    {
        if (source == null && target == null)
        {
            return true;
        }
        if ((source == null && target != null) ||
             (source != null && target == null))
        {
            return false;
        }
        if (object.ReferenceEquals(source, target))
        {
            return true;
        }
        if (source.Keys.Count != target.Keys.Count)
        {
            return false;
        }
        foreach (var key in source.Keys)
        {
            if (target.ContainsKey(key) == false)
            {
                return false;
            }

            var targetValue = target[key];
            var sourceValue = source[key];
            if (object.ReferenceEquals(targetValue, null) &&
                 object.ReferenceEquals(sourceValue, null))
            {
                continue;
            }

            if (object.ReferenceEquals(targetValue, null) &&
                 sourceValue.Equals(targetValue) == false)
            {
                return false;
            }
            else if (targetValue.Equals(sourceValue) == false)
            {
                return false;
            }

        }
        return true;
    }
}

Directory FieldsEquals

⚠️ 要調用GetType方法來判斷this與obj在運行時類型是否相同。若使用is關鍵字進行類型判斷的話，若是obj是Entity的子類也會返回true。當類型不能作爲基類時，如被標記爲sealed或值類型（struct、enum），可使用is來判斷。

重寫Equals方法應知足如下幾點：

自反：x.Equals(x)返回true
對稱：x.Equals(y)==y.Equals(x)
可傳遞：若x.Equals(y)==true且y.Equals(z)==true，則x.Equals(z)==true
一致性：x,y的值不發生變化，則x.Equals(y)的結果也不變
x.Equals(null) 返回false
x.Equals(y)返回true，若是x,y都是NaN的話
Equals方法不要拋出異常

有關String及StringBuilder對於Equals的實現，或更多重寫Equals方法的細節可參考：Object.Equals。

Object.GetHashCode()

Object

默認實現根據對象在內存中的地址，即引用來計算哈希碼。換言之， ReferenceEquals方法返回true的兩個對象的哈希碼也相同。

ValueType

默認實現經過反射基於字段的值來計算哈希碼。換言之，兩個值類型實例的全部字段值都相等，那麼它們的哈希碼也相等。

重寫GetHashCode

重寫Equals方法後，一般也須要重寫GetHashCode方法，反之亦然。由於在哈希結構（如字典）中，存取數據時須要用到鍵的哈希碼。以下圖是Github上Dictionary根據key獲取value的一段源碼，代碼中先比較了hashCode是否相等，而後再調用Enquals方法對key作相等性判斷：

重寫GetHashCode方法應注意如下事項：

算法至少使用對象的一個實例字段，不要使用靜態字段

保證哈希碼和實例對象相關
算法使用的實例字段應儘量保持不變

儘量保證在對象生命週期中哈希碼保持不變
兩個相等的對象（使用Equals方法判斷）應返回相同的哈希碼，但反過來則不成立
若是影響到Euqals方法的字段值未發生變化，GetHashCode返回的哈希碼也不該變化
生成的哈希值隨機均勻分佈
良好的性能

一般，對於可變引用對象，應重寫GetHashCode方法，除非能保證如下兩點：

用於計算哈希碼的字段不可變
對象存儲在依賴哈希碼的集合中，對象的哈希碼不變

若是要重寫可變對象的GetHashCode方法，儘量在文檔中指出：若是對象要用做哈希結構的key，儘量不要修改該對象，不然，在讀取數據時可能會引起KeyNotFoundException。

⚠️ 不一樣的.NET版本、不一樣的平臺（32位、64位系統）對於GetHashCode的默認實現可能會有差別。所以，若使用默認的GetHashCode方法，須注意如下兩點:

不能僅經過哈希碼來判斷對象是否相等

由於對象能夠在應用程序域、進程、平臺間傳遞，不要持久化或在生成哈希碼的應用程序域以外使用哈希碼

下面是微軟官方文檔中對於GetHashCode的一段總結，人太懶水平又差，就不翻譯了，抄錄在這裏備之後查詢：

A hash function must have the following properties:

If two objects compare as equal, the GetHashCode() method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode() methods for the two objects do not have to return different values.

The GetHashCode() method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's System.Object.Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function should generate an even distribution for all input, including input that is heavily clustered. An implication is that small modifications to object state should result in large modifications to the resulting hash code for best hash table performance. - Hash functions should be inexpensive to compute.

The GetHashCode() method should not throw exceptions.

For example, the implementation of the GetHashCode() method provided by the String class returns identical hash codes for identical string values. Therefore, two String objects return the same hash code if they represent the same string value. Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters).

Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with keys that provide a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation).

In a hash table with a poor implementation of a hash function, the performance of a search depends on the number of items in the hash table (for example, an O(n) operation, where n is the number of items in the hash table).

A malicious user can input data that increases the number of collisions, which can significantly degrade the performance of applications that depend on hash tables, under the following conditions:

When hash functions produce frequent collisions.

When a large proportion of objects in a hash table produce hash codes that are equal or approximately equal to one another.

When users input the data from which the hash code is computed.

Derived classes that override GetHashCode() must also override Equals(Object) to guarantee that two objects considered equal have the same hash code; otherwise, the Hashtable type might not work correctly.

系統優化思路

性能知足當前需求就好，莫要追求極致性能
性能與代碼可讀性之間要有一個權衡，喪失了可讀性也就增長了維護成本
減小I/0（磁盤、網絡）

優化數據庫查詢，只查詢必要的字段，便可減小磁盤I/O又能節省帶寬資源；

合理使用緩存；

適當拆分一次返回大量數據的請求爲多個請求（如，分頁查詢）。適當合併屢次結果集較小的查詢（如，Redis中的Pipline）；
避免計算機作無用功

使用合理的數據結構；

儘量減小循環次數；
充分利用CPU（多線程、並行運算）

將一次運算拆分爲多個獨立的運算單元，但要注意，不是全部的運算任務都能拆分。同時，也要在單線程的簡單安全運行較慢和多線程的複雜較爲高效之間作適當取捨。
異步替換同步，避免線程阻塞
適當重構代碼，儘量下降代碼的混亂程度以保持系統的簡潔

從系統性能優化談對象相等性

Object.Equals(Object)

Object

ValueType

重寫Equals

Object.GetHashCode()

Object

ValueType

重寫GetHashCode

系統優化思路

推薦閱讀