1. Primitive Types
Any data types the
compiler directly supports
are called
primitive types.
Primitive types map directly to types existing in the Framework Class Library (FCL).
For the types that are compliant with the Common Language Specification (
CLS), other languages will offer similar primitive types. However, languages aren’t required to offer any support for the non–CLS-compliant types.
Primitives with Corresponding FCL Types
Another way to think of this is that the C# compiler automatically assumes that you have the following using directives in all of your source code files.
using sbyte = System.SByte;
using byte = System.Byte;
using short = System.Int16;
using ushort = System.UInt16;
using int = System.Int32;
using uint = System.UInt32;
...
About the compiler:
First, the compiler is able to perform implicit or explicit casts between primitive types. C# allows implicit casts if the conversion is 「safe,」 that is, no loss of data is possible. C# requires explicit casts if the conversion is potentially unsafe. For numeric types, 「unsafe」 means that you could lose precision or magnitude as a result of the conversion.
Be aware that different compilers can generate different code to handle these cast operations. For example, when casting a Single with a value of 6.8 to an Int32, some compilers could generate code to put a 6 in the Int32, and others could perform the cast by rounding the result up to 7. By the way, C# always truncates the result.
In addition to casting, primitive types can be written as
literals.
If you have an expression consisting of literals, the compiler is able to evaluate the expression at compile time, improving the application’s performance.
2. Checked and Unchecked Primitive Type Operations
The CLR offers IL instructions that allow the compiler to choose the desired behavior. The CLR has an instruction called
add
that adds two values together. The
add
instruction performs no overflow checking. The CLR also has an instruction called add.ovf that also adds two values together. However,
add.ovf
throws a
System.OverflowException
if an overflow occurs. In addition to these two IL instructions for the add operation, the CLR also has similar IL instructions for subtraction (
sub/sub.ovf), multiplication (
mul/mul.ovf), and data conversions (
conv/conv.ovf).
One way to get the C# compiler to control overflows is to use the
/checked+
compiler switch. This switch tells the compiler to generate code that has the overflow-checking versions of the add, subtract, multiply, and conversion IL instructions. The code executes a little slower because the CLR is checking these operations to determine whether an overflow occurred. If an overflow occurs, the CLR throws an
OverflowException.
In addition to having overflow checking turned on or off globally,
programmers
can
control overflow checking
in specific
regions
of their code. C# allows this flexibility by offering
checked
and
unchecked
operators.
e.g
UInt32 invalid = unchecked((UInt32) (-1)); // OK
Byte b = 100;
b = checked((Byte) (b + 200)); // OverflowException is thrown
b = (Byte) checked(b + 200); // b contains 44; no OverflowException
In addition to the checked and unchecked operators,
C#
also
offers checked and unchecked statements.
The statements
cause all expressions within a block to be checked or unchecked.
e.g
checked { // Start of checked block
Byte b = 100;
b = (Byte) (b + 200); // This expression is checked for overflow.
}
In fact, if you use a
checked
statement block, you can now use the
+= operator
with the Byte, which simplifies the code a bit:
e.g
checked { // Start of checked block
Byte b = 100;
b += 200; // This expression is checked for overflow.
}
Important: Because the only effect that the checked operator and statement have is to determine which versions of the add, subtract, multiply, and data conversion IL instructions are produced, calling a method within a checked operator or statement has no impact on that method, as the following code demonstrates:
checked {
// Assume SomeMethod tries to load 400 into a Byte.
SomeMethod(400);
// SomeMethod might or might not throw an OverflowException.
// It would if SomeMethod were compiled with checked instructions.
}
Some recommended rules to programmers
(1)
Use signed data
types (such as Int32 and Int64)
instead of unsigned numeric types (such as UInt32 and UInt64)
wherever possible.
(2) As you write your code,
explicitly use checked
around blocks where an unwanted overflow might occur due to invalid input data
(3) As you write your code,
explicitly use unchecked
around blocks where an overflow is OK, such as calculating a checksum
(4) For any code that doesn’t use checked or unchecked, the assumption is that you do want an exception to occur on overflow.
Important: The
System.Decimal
type is a very
special type. Although many programming languages (
C#
and Visual Basic included)
consider Decimal a primitive type, the
CLR does not. This means that the CLR doesn’t have IL instructions that know how to manipulate a
Decimal value. If you look up the
Decimal
type in the .NET Framework SDK documentation, you’ll see that it has public static methods called
Add,
Subtract,
Multiply,
Divide, and so on. In addition, the
Decimal
type provides operator overload methods for +, -, *, /, and so on.
When you compile code that uses
Decimal
values, the compiler generates code to call
Decimal’s members to perform the actual operation. This means that
manipulating Decimal values is
slower than manipulating CLR primitive values. Also, because
there are no IL instructions for manipulating Decimal values, the
checked and unchecked operators, statements, and compiler switches have no effect. Operations on Decimal values always throw an
OverflowException
if the operation can’t be performed safely.
Similarly, the
System.Numerics.BigInteger
type is also special in that it internally uses an array of
UInt32s
to represent an arbitrarily large integer whose value has no upper or lower bound. Therefore, operations on a BigInteger never result in an OverflowException. However, a BigInteger operation may throw an
OutOfMemoryException
if the value gets too large and there is insufficient available memory to resize the array.
3. Reference Types and Value Types
The CLR supports two kinds of types:
reference types
and
value types.
In C#, types declared using struct are value types, and types declared using class are reference types.
Value type
instances are usually
allocated on a thread’s stack
(although they can also be embedded as a field in a reference type object). The variable representing the instance doesn’t contain a pointer to an instance; the variable contains the fields of the instance itself.
Reference types
are always
allocated from the managed heap, and the C# new operator returns the memory address of the object—the memory address refers to the object’s bits.
All of the
structures
are immediately
derived from
the
System.ValueType
abstract type.
System.ValueType
is itself immediately
derived from
the
System.Object
type.
By definition, all value types must be derived from System.ValueType. All enumerations are derived from the System.Enum abstract type, which is itself derived from System.ValueType.
The CLR and all programming languages give enumerations special treatment.
In addition,
all value types are sealed, which prevents a value type from being used as a base type for any other reference type or value type
Important
For many developers (such as unmanaged C/C++ developers), reference types and value types will seem strange at first.
In unmanaged C/C++, you declare a type, and then
the code
that uses the type gets to
decide
if an instance of the type should be
allocated on the thread’s stack or in the application’s heap.
In
managed
code,
the developer defining the type indicates where instances of the type are allocated; the developer using the type has no control over this.
4. CLR controls the Layout of Type's Fields
To improve performance, the CLR is capable of arranging the fields of a type any way it chooses.
You tell the CLR what to do by applying the
System.Runtime.InteropServices.
StructLayoutAttribute
attribute on the class or structure you’re defining. To this attribute’s constructor, you can pass
LayoutKind.Auto
to have the CLR arrange the fields,
LayoutKind.Sequential
to have the CLR preserve your field layout, or
LayoutKind.Explicit
to explicitly arrange the fields in memory by using offsets. If you don’t explicitly specify the
StructLayoutAttribute
on a type that you’re defining, your compiler selects whatever layout it determines is best.
You should be aware that
Microsoft’s C# compiler
selects LayoutKind.Auto for reference types
(classes) and
LayoutKind.Sequential for value types
(structures).
The
StructLayoutAttribute
also allows you to explicitly indicate the offset of each field by passing
LayoutKind.Explicit
to its constructor. Then you apply an instance of the
System.Runtime.InteropServices.FieldOffsetAttribute
attribute to each field passing to this attribute’s constructor an Int32 indicating the offset (in bytes) of the field’s first byte from the beginning of the instance. Explicit layout is typically used to simulate what would be a
union
in unmanaged C/C++ because you can have multiple fields starting at the same offset in memory.
The Differences between Value Type and Reference Type:
(1) Value type objects have two representations: an
unboxed
form and a
boxed
form. Reference types are always in a boxed form.
(2)
Value types are derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects’ fields match. In addition, System.ValueType overrides the GetHashCode method to produce a hash code value by using an algorithm that takes into account the values in the object’s instance fields.
(3) Because you can’t define a new value type or a new reference type by using a value type as a base class, you shouldn’t introduce any new virtual methods into a value type. No methods can be abstract, and all methods are implicitly
sealed
(can’t be overridden).
(4) Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to
null, indicating that the reference type variable doesn’t currently point to a valid object. Attempting to use a null reference type variable causes a
NullReferenceException
to be thrown. By contrast, value type variables always contain a value of the underlying type, and all members of the value type are initialized to 0. Since a value type variable isn’t a pointer, it’s not possible to generate a NullReferenceException when accessing a value type. The CLR does offer a special feature that adds the notion of nullability to a value type. This feature, called
nullable types.
(5) When you assign a value type variable to another value type variable, a
field-by-field copy
is made. When you assign a reference type variable to another reference type variable, only the memory address is copied.
(6) Two or more reference type variables can refer to a single object in the heap, allowing operations on one variable to affect the object referenced by the other variable. On the other hand,
value type
variables are
distinct
objects, and it’s not possible for operations on one value type variable to affect another
(7) Because
unboxed value types
aren’t allocated on the heap, the
storage
allocated for them is
freed
as soon as the method that defines an instance of the type is no longer active. This means that a value type instance doesn’t receive a notification (via a Finalize method) when its memory is reclaimed.
4. Boxing and Unboxing Value Types
It’s possible to
convert a value type to a reference type
by using a mechanism called
boxing.
Internally, here’s what happens when an instance of a value type is boxed:
1.
Memory
is
allocated
from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the
two additional overhead members (the type object pointer and the sync block index)
required by all objects on the managed heap.
2. The value type’s
fields
are
copied to
the newly allocated
heap
memory.
3. The
address
of the object is
returned. This address is now a reference to an object;
the value type is now a reference type.
When trying convert reference type to value type. Two steps to accomplish the progress:
First, the
address
of the value type's
fields
in the boxed value type's object is
obtained. This process is called
unboxing.
Then, the values of these
fields
are
copied
from the heap to the stack-based value type instance.
Unboxing is not the exact opposite of boxing.
The unboxing operation is much less costly than boxing.
Unboxing is really just the operation of obtaining a pointer to the raw value type (data fields) contained within an object.
In effect, the pointer refers to the unboxed portion in the boxed instance. So, unlike boxing, unboxing doesn’t involve the copying of any bytes in memory. Having made this important clarification, it is important to note that
an unboxing operation is typically followed by copying the fields.
Unboxed
value types are
lighter-weight
types than reference types for two reasons:
(1) They are
not allocated
on the
managed heap.
(2) They
don’t have
the
additional overhead members
that every object on the heap has: a
type object pointer and a sync block index.
Because unboxed value types don’t have a sync block index, you can’t have multiple threads synchronize their access to the instance by using the methods of the
System.Threading.Monitor
type
5. Changing Fields in a Boxed Value Type by Using Interfaces
6. Object Equality and Identity
The
System.Object
type
offers
a
virtual method
named
Equals, whose purpose is to return
true
if two objects contain the
same value. The implementation of Object’s Equals method looks like this:
public class Object {
public virtual Boolean Equals(Object obj) {
// If both references point to the same object,
// they must have the same value.
if (this == obj) return true;
// Assume that the objects do not have the same value.
return false;
}
}
At first, this seems like a reasonable default implementation of Equals: it returns true if the this and obj arguments refer to the same exact object. This seems reasonable because Equals knows that an object must have the same value as itself. However, if the arguments refer to different objects, Equals can’t be certain if the objects contain the same values, and therefore, false is returned.
In other words, the default implementation of Object’s Equals method really implements identity, not value equality.
Here is how to properly implement an Equals method internally
1. If the
obj
argument is
null, return
false
because the current object identified by
this is obviously not
null
when the nonstatic
Equals
method is called.
2. If the
this
and
obj
arguments refer to the same object, return
true. This step can improve performance when comparing objects with many fields.
3. If the
this
and
obj
arguments refer to objects of
different types, return
false. Obviously, checking if a String object is equal to a FileStream object should result in a false result.
4. For each instance field defined by the type, compare the value in the
this
object with the value in the
obj
object. If any fields are not equal, return
false.
5. Call the base class’s
Equals
method so it can compare any fields defined by it. If the base class’s
Equals
method returns
false, return
false; otherwise, return
true.
So Microsoft should have implemented Object’s Equals like this:
e.g
public class Object {
public virtual Boolean Equals(Object obj) {
// The given object to compare to can't be null
if (obj == null) return false;
// If objects are different types, they can't be equal.
if (this.GetType() != obj.GetType()) return false;
// If objects are same type, return true if all of their fields match
// Since System.Object defines no fields, the fields match
return true;
}
}
But, since Microsoft didn’t implement Equals this way, the rules for how to implement Equals are significantly more complicated than you would think. When a type overrides Equals, the override should call its base class’s implementation of Equals unless it would be calling Object’s implementation. This also means that since a type can override Object’s Equals method, this Equals method can no longer be called to test for identity. To fix this, Object offers a static
ReferenceEquals
method, which is implemented like this:
public class Object {
public static Boolean ReferenceEquals(Object objA, Object objB) {
return (objA == objB);
}
}
You should always call
ReferenceEquals
if you want to check for identity (if two references point to the same object). You shouldn’t use the C# == operator (unless you cast both operands to Object first) because one of the operands’ types could overload the == operator, giving it semantics other than identity.
As you can see, the .NET Framework has a very confusing story when it comes to object equality and identity. By the way, System.ValueType (the base class of all value types) does override Object’s Equals method and is correctly implemented to perform a value equality check (not an identity check). Internally, ValueType’s Equals is implemented this way:
1. If the obj argument is null, return false.
2. If the this and obj arguments refer to objects of different types, return false.
3. For each instance field defined by the type, compare the value in the this object withthe value in the obj object by calling the field’s Equals method. If any fields are notequal, return false.
4. Return true. Object’s Equals method is not called by ValueType’s Equals method.
Internally, ValueType’s Equals method uses reflection in step #3.
The four properties of equality
.. Equals must be
reflexive; that is, x.Equals(x) must return true.
.. Equals must be
symmetric; that is, x.Equals(y) must return the same value as y.Equals(x).
.. Equals must be
transitive; that is, if x.Equals(y) returns true and y.Equals(z) returns true, then x.Equals(z) must also return true.
.. Equals must be
consistent. Provided that there are no changes in the two values being compared, Equals should consistently return true or false.
When overriding the Equals method, there are a few more things that you’ll probably want to do:
..
Have the type implement the System.IEquatable<T> interface’s Equals method
This generic interface allows you to define a type-safe Equals method. Usually, you’ll implement the Equals method that takes an Object parameter to internally call the type-safe Equals method
..
Overload the == and !=operator methods
Usually, you’ll implement these operator methods to internally call the type-safe Equals method.
7. Object Hash Codes
The designers of the FCL decided that it would be incredibly useful if any instance of any object could be placed into a hash table collection. To this end,
System.Object
provides a virtual
GetHashCode
method so that an
Int32
hash code can be obtained for any and all objects.
If you define a type and
override
the
Equals
method, you should
also override
the
GetHashCode
method. In fact, Microsoft’s C# compiler emits a warning if you define a type that overrides Equals without also overriding GetHashCode.
The reason why a type that defines Equals must also define GetHashCode is that the implementation of the System.Collections.Hashtable type, the System.Collections. Generic.Dictionary type, and some other collections require that
any two objects that are equal must have the same hash code value. So if you override Equals, you should override GetHashCode to ensure that the algorithm you use for calculating equality corresponds to the algorithm you use for calculating the object’s hash code.
Defining a
GetHashCode
method can be
easy
and
straightforward. But depending on your data types and the distribution of data, it can be tricky to come up with a hashing algorithm that returns a well-distributed range of values. Here’s a simple example that will probably work just fine for Point objects:
internal sealed class Point {
private readonly Int32 m_x, m_y;
public override Int32 GetHashCode() {
return m_x ^ m_y; // m_x XOR'd with m_y
}
...
}
When selecting an algorithm
for calculating hash codes
for instances of your type, try to follow these
guidelines:
.. Use an algorithm that gives a
good random distribution
for the best performance of the hash table.
.. Your algorithm can also call the base type’s GetHashCode method, including its return value. However, you
don’t generally
want to
call Object’s or ValueType’s GetHashCode method, because the implementation in either method doesn’t lend itself to highperformance hashing algorithms.
.. Your algorithm should
use at least one instance field.
.. Ideally,
the fields you use
in your algorithm should be
immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object’s lifetime.
.. Your algorithm should execute as
quickly
as possible.
.. Objects with the
same value
should return the
same code. For example, two String objects with the same text should return the same hash code value.
System.Object’s implementation of the
GetHashCode
method doesn’t know anything about its derived type and any fields that are in the type. For this reason, Object’s GetHashCode method returns a number that is guaranteed to uniquely identify the object within the AppDomain; this number is guaranteed not to change for the lifetime of the object. After the object is garbage collected, however, its unique number can be reused as the hash code for a new object.
Note
If a type
overrides Object’s GetHashCode
method, you can
no longer
call it to
get a unique ID for the object. If you want to get a unique ID (within an AppDomain) for an object, the FCL provides a method that you can call. In the
System.Runtime.CompilerServices namespace, see the
RuntimeHelpers
class’s public,
static GetHashCode
method that takes a reference to an
Object
as an argument.
RuntimeHelpers’ GetHashCode
method
returns a unique ID
for an object even if the object’s type overrides Object’s GetHashCode method. This method got its name because of its heritage, but it would have been better if Microsoft had named it something like
GetUniqueObjectID.
System.ValueType’s implementation of
GetHashCode uses reflection
(which is slow) and
XORs
some of the type’s instance fields together. This is a naïve implementation that might be good for some value types, but I still recommend that you implement GetHashCode yourself because you’ll know exactly what it does, and your implementation will be faster than ValueType’s implementation.
8. Dynamic Primitive Type
Important
Do not confuse
dynamic
and
var. Declaring a local variable using var is just a syntactical shortcut that has the compiler infer the specific data type from an expression. The
var
keyword can be
used only for declaring local variables
inside a method while the
dynamic keyword can be
used for local variables, fields, and arguments. You cannot cast an expression to var but you can cast an expression to dynamic. You must explicitly initialize a variable declared using var while you do not have to initialize a variable declared with dynamic.
Important
A
dynamic
expression
is
really
the same type as System.Object. The compiler assumes that whatever operation you attempt on the expression is legal, so the compiler will not generate any warnings or errors. However, exceptions will be thrown at runtime if you attempt to execute an invalid operation. In addition, Visual Studio cannot offer any IntelliSense support to help you write code against a dynamic expression. You cannot define an extension method that extends dynamic, although you can define one that extends Object. And, you cannot pass a lambda expression or anonymous method as an argument to a dynamic method call since the compiler cannot infer the types being used.