CLR via C# 3rd - 05 - Primitive, Reference, and Value Types

 

1. Primitive Types
 
     Any data types the  compiler directly supports  are called   primitive types.
 
      Primitive types map directly to types existing in the Framework Class Library (FCL).
 
     For the types that are compliant with the Common Language Specification ( CLS), other languages will offer similar primitive types. However, languages aren’t required to offer any support for the non–CLS-compliant types.
 
      Primitives with Corresponding FCL Types
    
 
     Another way to think of this is that the C# compiler automatically assumes that you have the following using directives in all of your source code files.
using sbyte = System.SByte;
using byte = System.Byte;
using short = System.Int16;
using ushort = System.UInt16;
using int = System.Int32;
using uint = System.UInt32;
...
 
     About the compiler:
 
     First, the compiler is able to perform implicit or explicit casts between primitive types. C# allows implicit casts if the conversion is 「safe,」 that is, no loss of data is possible. C# requires explicit casts if the conversion is potentially unsafe. For numeric types, 「unsafe」 means that you could lose precision or magnitude as a result of the conversion.
 
     Be aware that different compilers can generate different code to handle these cast operations. For example, when casting a Single with a value of 6.8 to an Int32, some compilers could generate code to put a 6 in the Int32, and others could  perform the cast by rounding the result up to 7. By the way, C# always truncates the result.
 
     In addition to casting, primitive types can be written as   literals.
 
     If you have an expression consisting of literals, the compiler is able to evaluate the expression at compile time, improving the application’s performance.
     
2. Checked and Unchecked Primitive Type Operations
 
     The CLR offers IL instructions that allow the compiler to choose the desired behavior. The CLR has an instruction called   add  that adds two values together. The   add  instruction performs no overflow checking. The CLR also has an instruction called add.ovf that also adds two values together. However,   add.ovf  throws a   System.OverflowException  if an overflow occurs. In addition to these two IL instructions for the add operation, the CLR also has similar IL instructions for subtraction ( sub/sub.ovf), multiplication ( mul/mul.ovf), and data conversions ( conv/conv.ovf).
 
     One way to get the C# compiler to control overflows is to use the   /checked+  compiler switch. This switch tells the compiler to generate code that has the overflow-checking versions of the add, subtract, multiply, and conversion IL instructions. The code  executes a little slower because the CLR is checking these operations to determine whether an overflow occurred. If an overflow occurs, the CLR throws an   OverflowException.
 
       In addition to having overflow checking turned on or off globally,   programmers  can   control overflow checking  in specific   regions  of their code. C# allows this flexibility by offering  checked  and   unchecked  operators. 
     e.g 
     UInt32 invalid = unchecked((UInt32) (-1)); // OK
     
     Byte b = 100;
     b = checked((Byte) (b + 200)); // OverflowException is thrown
     b = (Byte) checked(b + 200); // b contains 44; no OverflowException
 
       In addition to the checked and unchecked operators,   C#  also   offers checked and unchecked statements.  The statements   cause all expressions within a block to be checked or unchecked.
     e.g
checked { // Start of checked block
      Byte b = 100;
      b = (Byte) (b + 200); // This expression is checked for overflow.
}
     
     In fact, if you use a   checked  statement block, you can now use the   += operator  with the Byte, which simplifies the code a bit:
     e.g 
checked { // Start of checked block
     Byte b = 100;
     b += 200; // This expression is checked for overflow.
}
     
      Important: Because the only effect that the checked operator and statement have is to determine which versions of the add, subtract, multiply, and data conversion IL instructions are produced, calling a method within a checked operator or statement has no impact on that method, as the following code demonstrates:
checked {
     // Assume SomeMethod tries to load 400 into a Byte.
     SomeMethod(400);
     // SomeMethod might or might not throw an OverflowException.
     // It would if SomeMethod were compiled with checked instructions.
}
 
     Some recommended rules to programmers
     (1)  Use signed data  types (such as Int32 and Int64)   instead of unsigned numeric types (such as UInt32 and UInt64)   wherever possible.
 
     (2) As you write your code,   explicitly use checked  around blocks where an unwanted overflow might occur due to invalid input data
 
     (3) As you write your code,   explicitly use unchecked  around blocks where an overflow is OK, such as calculating a checksum
 
     (4) For any code that doesn’t use checked or unchecked, the assumption is that you do want an exception to occur on overflow.
 
      Important: The   System.Decimal  type is a very   special type. Although many programming languages ( C#  and Visual Basic included)   consider Decimal a primitive type, the   CLR does not. This means that the CLR doesn’t have IL instructions that know how to manipulate a   Decimal value. If you look up the   Decimal  type in the .NET Framework SDK documentation, you’ll see that it has public static methods called   Add,   Subtract,   Multiply,   Divide, and so on. In addition, the   Decimal  type provides operator overload methods for +, -, *, /, and so on.
 
     When you compile code that uses   Decimal  values, the compiler generates code to call  Decimal’s members to perform the actual operation. This means that   manipulating Decimal values is   slower than manipulating CLR primitive values. Also, because   there are no IL instructions for manipulating Decimal values, the   checked and unchecked operators, statements, and compiler switches have no effect. Operations on Decimal values always throw an   OverflowException  if the operation can’t be performed safely.
 
     Similarly, the   System.Numerics.BigInteger  type is also special in that it internally uses an array of   UInt32s  to represent an arbitrarily large integer whose value has no upper or lower bound. Therefore, operations on a BigInteger never result in an OverflowException. However, a BigInteger operation may throw an   OutOfMemoryException  if the value gets too large and there is insufficient available memory to resize the array.
 
3. Reference Types and Value Types
 
     The CLR supports two kinds of types:   reference types  and   value types.
 
      In C#, types declared using struct are value types, and types declared using class are reference types.
 
      Value type  instances are usually   allocated on a thread’s stack  (although they can also be embedded as a field in a reference type object). The variable representing the instance doesn’t contain a pointer to an instance; the variable contains the fields of the instance itself.
     
      Reference types  are always   allocated from the managed heap, and the C# new operator returns the memory address of the object—the memory address refers to the object’s bits.     
 
     All of the   structures  are immediately   derived from  the   System.ValueType  abstract type.   System.ValueType  is itself immediately   derived from  the   System.Object  type.   By definition, all value types must be derived from System.ValueType. All enumerations are derived from the System.Enum abstract type, which is itself derived from System.ValueType.  The CLR and all programming languages give enumerations special treatment.
 
     In addition,   all value types are sealed, which prevents a value type from being used as a base type for any other reference type or value type
 
      Important  For many developers (such as unmanaged C/C++ developers), reference types and value types will seem strange at first.   In unmanaged C/C++, you declare a type, and then  the code  that uses the type gets to   decide  if an instance of the type should be   allocated on the thread’s stack or in the application’s heap. 
 
     In   managed  code,   the developer defining the type indicates where instances of the type are allocated; the developer using the type has no control over this.
 
4. CLR controls the Layout of Type's Fields
 
     To improve performance, the CLR is capable of arranging the fields of a type any way it chooses.
 
     You tell the CLR what to do by applying the   System.Runtime.InteropServicesStructLayoutAttribute  attribute on the class or structure you’re defining. To this attribute’s constructor, you can pass   LayoutKind.Auto  to have the CLR arrange the fields,   LayoutKind.Sequential  to have the CLR preserve your field layout, or  LayoutKind.Explicit  to explicitly arrange the fields in memory by using offsets. If you don’t explicitly specify the   StructLayoutAttribute  on a type that you’re defining, your compiler selects whatever layout it determines is best.       
 
     You should be aware that   Microsoft’s C# compiler   selects LayoutKind.Auto for reference types  (classes) and   LayoutKind.Sequential for value types  (structures).
 
     The   StructLayoutAttribute  also allows you to explicitly indicate the offset of each field by passing   LayoutKind.Explicit  to its constructor. Then you apply an instance of the   System.Runtime.InteropServices.FieldOffsetAttribute  attribute to each field passing to this attribute’s constructor an Int32 indicating the offset (in bytes) of the field’s first byte from the beginning of the instance. Explicit layout is typically used to simulate what would be a   union  in unmanaged C/C++ because you can have multiple fields starting at the same offset in memory.
 
     The Differences between Value Type and Reference Type:
     (1) Value type objects have two representations: an   unboxed  form and a   boxed  form. Reference types are always in a boxed form.
 
     (2)  Value types are derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects’ fields match. In addition, System.ValueType overrides the GetHashCode method to produce a hash code value by using an algorithm that takes into account the values in the object’s instance fields.
 
     (3) Because you can’t define a new value type or a new reference type by using a value type as a base class, you shouldn’t introduce any new virtual methods into a value type. No methods can be abstract, and all methods are implicitly   sealed  (can’t be overridden).
 
     (4) Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to   null, indicating that the reference type variable doesn’t currently point to a valid object. Attempting to use a null reference type variable causes a   NullReferenceException  to be thrown. By contrast, value type variables always contain a value of the underlying type, and all members of the value type are initialized to 0. Since a value type variable isn’t a pointer, it’s not possible to generate a NullReferenceException when accessing a value type. The CLR does offer a special feature that adds the notion of nullability to a value type. This feature, called   nullable types.
 
     (5) When you assign a value type variable to another value type variable, a   field-by-field copy  is made. When you assign a reference type variable to another reference type variable, only the memory address is copied.
 
     (6) Two or more reference type variables can refer to a single object in the heap, allowing operations on one variable to affect the object referenced by the other variable. On the other hand,   value type  variables are   distinct  objects, and it’s not possible for operations on one value type variable to affect another
 
     (7) Because   unboxed value types  aren’t allocated on the heap, the   storage  allocated for them is   freed  as soon as the method that defines an instance of the type is no longer active. This means that a value type instance doesn’t  receive a notification (via a Finalize method) when its memory is reclaimed.
 
4. Boxing and Unboxing Value Types
 
     It’s possible to   convert a value type to a reference type  by using a mechanism called   boxing.
     
     Internally, here’s what happens when an instance of a value type is boxed:
     1.   Memory  is   allocated  from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the   two additional overhead members (the type object pointer and the sync block index)   required by all objects on the managed heap.
 
     2. The value type’s   fields  are   copied to  the newly allocated   heap  memory.
 
     3. The   address  of the object is   returned. This address is now a reference to an object;   the value type is now a reference type.
 
     When trying convert reference type to value type. Two steps to accomplish the progress:
     First, the   address  of the value type's  fields  in the boxed value type's object is   obtained. This process is called   unboxing.
     Then, the values of these   fields  are   copied  from the heap to the stack-based value type instance.
     
      Unboxing is not the exact opposite of boxing.  The unboxing operation is much less costly than boxing. 
     
      Unboxing is really just the operation of obtaining a pointer to the raw value type (data fields) contained within an object.  In effect, the pointer refers to the unboxed portion in the boxed instance. So, unlike boxing, unboxing doesn’t involve the copying of any bytes in memory. Having made this important clarification, it is important to note that   an unboxing operation is typically followed by copying the fields.
     
      Unboxed  value types are   lighter-weight  types than reference types for two reasons:
     (1) They are   not allocated  on the   managed heap.
     (2) They   don’t have  the   additional overhead members  that every object on the heap has: a  type object pointer and a sync block index.
     
     Because unboxed value types don’t have a sync block index, you can’t have multiple threads synchronize their access to the instance by using the methods of the  System.Threading.Monitor  type
 
5. Changing Fields in a Boxed Value Type by Using Interfaces
 
6. Object Equality and Identity
 
     The   System.Object  type   offers  a   virtual method  named   Equals, whose purpose is to return  true  if two objects contain the   same value.  The implementation of Object’s Equals method looks like this:
 
public class Object {
     public virtual Boolean Equals(Object obj) {
          // If both references point to the same object,
          // they must have the same value.
          if (this == obj) return true;
          // Assume that the objects do not have the same value.
          return false;
     }
}
 
     At first, this seems like a reasonable default implementation of Equals: it returns true if the this and obj arguments refer to the same exact object. This seems reasonable because Equals knows that an object must have the same value as itself. However, if the arguments refer to different objects, Equals can’t be certain if the objects contain the same values, and therefore, false is returned.   In other words, the default implementation of Object’s Equals method really implements identity, not value equality.
 
     Here is how to properly implement an Equals method internally
     1. If the   obj  argument is   null, return   false  because the current object identified by   this is obviously not   null  when the nonstatic   Equals  method is called.
     2. If the   this  and   obj  arguments refer to the same object, return   true. This step can improve performance when comparing objects with many fields.
     3. If the   this  and   obj  arguments refer to objects of   different types, return   false. Obviously, checking if a String object is equal to a FileStream object should result in a false result.
     4. For each instance field defined by the type, compare the value in the   this  object with the value in the   obj  object. If any fields are not equal, return   false.
     5. Call the base class’s   Equals  method so it can compare any fields defined by it. If the base class’s   Equals  method returns   false, return   false; otherwise, return   true.
 
     So Microsoft should have implemented Object’s Equals like this:
     e.g 
public class Object {
     public virtual Boolean Equals(Object obj) {
          // The given object to compare to can't be null
          if (obj == null) return false;
 
          // If objects are different types, they can't be equal.
          if (this.GetType() != obj.GetType()) return false;
 
          // If objects are same type, return true if all of their fields match
          // Since System.Object defines no fields, the fields match
          return true;
     }
}
 
     But, since Microsoft didn’t implement Equals this way, the rules for how to implement Equals are significantly more complicated than you would think. When a type overrides Equals, the override should call its base class’s  implementation of Equals unless it would be calling Object’s implementation. This also means that since a type can override Object’s Equals method, this Equals method can no longer be called to test for identity. To fix this, Object offers a static   ReferenceEquals  method, which is implemented like this: 
 
public class Object {
     public static Boolean ReferenceEquals(Object objA, Object objB) {
          return (objA == objB);
     }
}
     
     You should always call   ReferenceEquals  if you want to check for identity (if two references point to the same object). You shouldn’t use the C# == operator (unless you cast both operands to Object first) because one of the operands’ types could overload the == operator, giving it semantics other than identity.
 
     As you can see, the .NET Framework has a very confusing story when it comes to object equality and identity. By the way, System.ValueType (the base class of all value types) does override Object’s Equals method and is correctly implemented to perform a value equality check (not an identity check). Internally, ValueType’s Equals is implemented this way:
 
     1. If the obj argument is null, return false.
     2. If the this and obj arguments refer to objects of different types, return false.
     3. For each instance field defined by the type, compare the value in the this object withthe value in the obj object by calling the field’s Equals method. If any fields are notequal, return false.
     4. Return true. Object’s Equals method is not called by ValueType’s Equals method.
 
     Internally, ValueType’s Equals method uses reflection in step #3.
 
     The four properties of equality
     .. Equals must be   reflexive; that is, x.Equals(x) must return true.
     .. Equals must be   symmetric; that is, x.Equals(y) must return the same value as y.Equals(x).
     .. Equals must be   transitive; that is, if x.Equals(y) returns true and y.Equals(z) returns true, then x.Equals(z) must also return true.
     .. Equals must be   consistent. Provided that there are no changes in the two values being compared, Equals should consistently return true or false.
 
     When overriding the Equals method, there are a few more things that you’ll probably want to do:
     ..  Have the type implement the System.IEquatable<T> interface’s Equals method 
     This generic interface allows you to define a type-safe Equals method. Usually, you’ll implement the Equals method that takes an Object parameter to internally call the type-safe Equals method
     ..  Overload the == and !=operator methods 
     Usually, you’ll implement these operator methods to internally call the type-safe Equals method.
 
7. Object Hash Codes
 
     The designers of the FCL decided that it would be incredibly useful if any instance of any object could be placed into a hash table collection. To this end,   System.Object  provides a virtual   GetHashCode  method so that an   Int32  hash code can be obtained for any and all objects.
 
     If you define a type and   override  the   Equals  method, you should   also override  the  GetHashCode  method. In fact, Microsoft’s C# compiler emits a warning if you define a type that overrides Equals without also overriding GetHashCode.
     
     The reason why a type that defines Equals must also define GetHashCode is that the implementation of the System.Collections.Hashtable type, the System.Collections. Generic.Dictionary type, and some other collections require that   any two objects that are equal must have the same hash code value. So if you override Equals, you should override GetHashCode to ensure that the algorithm you use for calculating equality corresponds to the algorithm you use for calculating the object’s hash code.
 
     Defining a   GetHashCode  method can be   easy  and   straightforward. But depending on your data types and the distribution of data, it can be tricky to come up with a hashing algorithm that returns a well-distributed range of values. Here’s a simple example that will probably work just fine for Point objects:
 
internal sealed class Point {
     private readonly Int32 m_x, m_y;
     public override Int32 GetHashCode() {
          return m_x ^ m_y; // m_x XOR'd with m_y
     }
     ...
}
 
     When selecting an algorithm   for calculating hash codes  for instances of your type, try to follow these   guidelines:
     .. Use an algorithm that gives a   good random distribution  for the best performance of the hash table.
     .. Your algorithm can also call the base type’s GetHashCode method, including its return value. However, you   don’t generally  want to   call Object’s or ValueType’s GetHashCode method, because the implementation in either method doesn’t lend itself to highperformance hashing algorithms.
     .. Your algorithm should   use at least one instance field.
     .. Ideally,   the fields you use  in your algorithm should be   immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object’s lifetime.
     .. Your algorithm should execute as   quickly  as possible.
     .. Objects with the   same value  should return the   same code. For example, two String objects with the same text should return the same hash code value.
 
      System.Object’s implementation of the   GetHashCode  method doesn’t know anything about its derived type and any fields that are in the type. For this reason, Object’s GetHashCode method returns a number that is guaranteed to uniquely identify the object within the AppDomain; this number is guaranteed not to change for the lifetime of the object. After the object is garbage collected, however, its unique number can be reused as the hash code for a new object.
 
      Note  If a type   overrides Object’s GetHashCode  method, you can   no longer  call it to   get a unique ID for the object. If you want to get a unique ID (within an AppDomain) for an object, the FCL provides a method that you can call. In the   System.Runtime.CompilerServices namespace, see the   RuntimeHelpers  class’s public,   static GetHashCode  method that takes a reference to an   Object  as an argument.   RuntimeHelpers’ GetHashCode  method   returns a unique ID  for an object even if the object’s type overrides Object’s GetHashCode method. This method got its name because of its heritage, but it would have been better if Microsoft had named it something like   GetUniqueObjectID.
 
      System.ValueType’s implementation of   GetHashCode uses reflection  (which is slow) and  XORs  some of the type’s instance fields together. This is a naïve implementation that might be good for some value types, but I still recommend that you implement GetHashCode yourself because you’ll know exactly what it does, and your implementation will be faster than ValueType’s implementation.
 
8. Dynamic Primitive Type
 
      Important  Do not confuse   dynamic  and   var. Declaring a local variable using var is just a syntactical shortcut that has the compiler infer the specific data type from an expression. The  var  keyword can be   used only for declaring local variables  inside a method while the   dynamic keyword can be   used for local variables, fields, and arguments. You cannot cast an expression to var but you can cast an expression to dynamic. You must explicitly initialize a variable declared using var while you do not have to initialize a variable declared with dynamic. 
 
      Important  A   dynamic  expression   is  really   the same type as System.Object. The compiler assumes that whatever operation you attempt on the expression is legal, so the compiler will not generate any warnings or errors. However, exceptions will be thrown at runtime if you attempt to execute an invalid operation. In addition, Visual Studio cannot offer any IntelliSense support to help you write code against a dynamic expression. You cannot define an extension method that extends dynamic, although you can define one that extends Object. And, you cannot pass a lambda expression or anonymous method as an argument to a dynamic method call since the compiler cannot infer the types being used.
相關文章
相關標籤/搜索