Introduction
In C#, as in most programming languages, we work widely with variables, without almost ever having to think about them. In this article I want to examine, with some degree of rigour, what variables are and their significance and hopefully by the end we will have a better understanding and appreciation of what these ubiquitous programming constructs really are.
(Note that I speak only for C# and anything I say may or may not apply to other programming languages. In the following sections, wherever I speak of memory or storage, it is assumed to be the process memory that the CLI operates within, note that the actual runtime may involve the use of CPU registers, CPU caches etc.)
So, what is a variable?
Defining a variable is deceptively easy, a variable is a symbol that refers to a storage location. Most modern programming languages allow us to use convenient symbols to represent storage locations in source code. This is useful since it allows us to write programs with statements like
int i = 10
rather than
0xFFFFE000EC3E8F20 = 10
A variable in this sense represents a binding between a storage location and a mnemonic symbol.
Variables in C#
C# is a type safe language, and that means that C# compiler gives us some guarantees with respect to variables -
- Each variable has a type and can only store values of that type.
- Before a variable is used, it must have a value.
To ensure that a variable does have a value, the compiler checks by means of static analysis that before each variable's value is accessed, it has been definitely assigned. If the compiler cannot satisfy itself that all variables have a value before they are used, it signals a compilation failure. For instance consider the following piece of code
void Foo()
{
int i;
Console.WriteLine(i);
}
If you try to compile this code, it will fail with the error
Use of unassigned local variable 'i'
This is because the compiler has been unable to establish that the variable 'i' has been assigned at the point where it is being written to the console. As we shall see later, for certain variables the complier automatically sets a value if it has not been explicitly set in the code.
Variables vs Constants
So, if a variable represents a storage location, what does a constant represent? After all their syntaxes are deceptively similar.
const int i = 100;
The answer is that irrespective of syntax, the complier processes constants very differently to variables. During compilation, every occurrance of a constant is replaced by its value. So, unlike variables which represent locations in memory where a value can be stored and changed if needed (and hence 'variable' since its values can vary); constants just represent the data, and so cannot be altered during the lifetime of the program.
Variable categories
C# defines seven categories of variables based on where the variable is seen. Consider the following piece of code that illustrates the different categories of variables.
class Foo
{
public static int a; string b;
public void Bar(object obj, ref string x, out long y) {
int[] i = new int[10]; int l; }
}
- 'a' is a static field
- 'b' is an instance field
- 'obj' is a value parameter
- 'x' is a reference parameter
- 'y' is an output parameter
- 'i[0]' is an array element
- 'l' is a local variable
Let's explore each variable category
Static Fields
A static field is declared with a static keyword.
It comes into existence before the execution of the static constructor of the containing type and lives as long as the AppDomain in which the code is executing. For purposes of compilation, the variable is considered to be initially assigned, i.e. if there is no initial value explicitly set, the compiler assigns a default and hence there is no compilation error if its value is not explicitly assigned.
public static void Main()
{
var x = Foo.X;
}
class Foo
{
static object o;
static Foo()
{
Console.WriteLine ("The value of variable 'o' is " + (o ?? "null"));
}
public static int X {get;set;}
}
Running the above code will give the output
The value of variable 'o' is null.
Note that the value of the static object field 'o' was initialized to null and there were no compilation errors when it was accessed. Note that, unlike C or C++, C# does not support static function variables.
Instance Fields
A non-static variable in a type is an instance field.
An instance variable in a class comes into existence when an instance of the class is created, but before the execution of the class constructor, and it lives as long as there are any references to the class instance and the instance destructor has not been executed. Instance fields are considered initially assigned and they take the default value of their type.
Instance variables in structs are pretty much the same, their lifetimes are tied to the lifetimes of the containing struct, i.e. when a variable of a struct type comes into existence, all its instance variables come into existence at the same time. Note that since an instance of a struct is always created with defaults for all its fields, a struct cannot have instance field initializers (although a struct may contain static fields with initializers). For the same reason, a struct cannot define a default parameterless constructor.
struct Bar
{
int i = 100; static int j = 100;
Bar() {} <span style="font-size: 9pt;">}</span>
Value Parameters
A method parameter declared without an out or a ref keyword is a value parameter. Value parameters come into existence upon invoking a method. They live typically for the duration of the method, unless captured in an anonymous function. If captured, their lifetime depends on the lifetime of the delegate or expression tree and live till the capturing closure is garbage collected.
Note that a value parameter defines a brand new variable, its a new memory location that is different to location where the original argument is stored. This can been seen in the code below. In 'Foo' we set the initial value of 'i' is 1, it is passed as an argument into a parameter of 'Bar'. The value of 'i' in Foo() remains unchanged despite changes to the value of the parameter it was passed into.
public static void Main()
{
Foo(1);
}
static void Foo(int a)
{
Console.WriteLine("In Foo 'a' is - " + a);
Bar(a);
Console.WriteLine("In Foo 'a' is - " + a);
}
static void Bar(int a)
{
Console.WriteLine("In Bar 'a' is - " + a);
a = 100;
Console.WriteLine("In Bar 'a' is - " + a);
}
Produces this output
In Foo 'a' is - 1
In Bar 'a' is - 1
In Bar 'a' is - 100
In Foo 'a' is - 1
Value parameters are considered initially assigned.
Reference Parameters
A reference parameter is declared with the 'ref' keyword. It shares location with the argument that is passed in and does not allocate new memory. This can be seen in the code below which is identical to the code above except that the call to Bar() now uses a reference parameter. So, any changes to its value Bar() can be seen in Foo().
public static void Main()
{
Foo(1);
}
static void Foo(int a)
{
Console.WriteLine("In Foo 'a' is - " + a);
Bar(ref a);
Console.WriteLine("In Foo 'a' is - " + a);
}
static void Bar(ref int a)
{
Console.WriteLine("In Bar 'a' is - " + a);
a = 100;
Console.WriteLine("In Bar 'a' is - " + a);
}
Output is
In Foo 'a' is - 1
In Bar 'a' is - 1
In Bar 'a' is - 100
In Foo 'a' is - 100
A variable must be definitely assigned before it can be passed as a reference parameter. Within the function they are used, reference parameters are considered initially assigned.
Output Parameters
Output parameters are declared with the 'out' keyword. As their name suggests output variables are used to send outputs to the caller. They are useful if you want to return multiple values from a function. (Since .NET 4.0 we can now use Tuples to return multiple values back, however, I personally still prefer output variables, especially for any public APIs, they are named individually and lend more semantic clarity to the code).
Like reference parameters, output parameters do not create a new storage location, and share their location with the variable passed in as an argument. Within the function they are used, output variables are considered initially unassigned, which is why we get an 'initially unassigned error' if we try to use them before assigning them.
Before normal return of a function, an output parameter must be definitely assigned. The code below illustrates the use of output parameters.
public static void Main()
{
Foo(1);
}
static void Foo(int a)
{
int b = 2;
Console.WriteLine("In Foo 'a' is - " + a + "; b is - " + b );
Bar(out a, out b);
<span style="font-size: 9pt;"> Console.WriteLine("In Foo 'a' is - " + a + "; b is - " + b );
</span><span style="font-size: 9pt;">}</span>
static void Bar(out int a, out int b)
{
// Console.WriteLine("In Bar 'a' is - " + a); -- Error : Use of unassigned out parameter 'a'
a = 100;
b = 200;<span style="font-size: 9pt;">
Console.WriteLine("In Foo 'a' is - " + a + "; b is - " + b );
</span><span style="font-size: 9pt;">}</span>
Output is -
In Foo 'a' is - 1; b is - 2
In Foo 'a' is - 100; b is - 200
In Foo 'a' is - 100; b is - 200
Another way of thinking about the difference between reference and output parameters is thinking of reference parameters as in-out parameters and output parameters as out parameters.
Array Elements
These are fairly straightforward, each element in an array is assigned a storage location. All variables come into existence when the array is created and live for the lifetime of the array. The initial value for each element is the default for the array type.
int [] array = new int[3] {1,2,3};
Local Variables
Local variables can occur in various positions in code.
- In a block
{
int i = 100; int j;
}
- In a for statement
-
for (int i = 0; i< 100; i++) <span style="font-size: 9pt;"></span><span style="font-size: 9pt;">{
</span><span style="font-size: 9pt;">} </span>
- A switch statement
switch (k)
{
case 0:
var p = 0; break;
}
- A using statement
using (var sw = new StreamReader("aaa")) {
}
- For each statement
foreach (var element in list) {
}
- Catch clause.
try
{
}
catch (Exception ex) {
}
All local variables are considered initially unassigned. Even if there is an initializer, the variable is considered assigned only after the execution of the initializing expression. The lifetime of a local variable starts from the entry into the block within which the variable is declared till the execution of the block is complete, except of course when the variable is captured in an anonymous function, in which case, the lifetime extends till the anonymous function is ready for garbage collection. It is illegal to refer to a variable prior to its declaration.
Variable defaults
As we have seen above, certain category of variables are considered initially assigned. What this means is that the location associated with the variable is initialized with a default value associated with the variable type.
- For a reference type variable, this is null.
- For a variable of a value type, this is the value produced by the implicitly defined default constructor.
We can obtain the default value of any type using the default operator. e.g. default(int)
gives us 0
.
Types and variables
Variables of a value type directly store data directly in the storage location, whereas variables of a reference type store a reference to the data, the actual data may be stored elsewhere in memory. Consequently, it is possible for two reference variables to reference the same object/data in memory.
In the figure below consider a simplistic view of memory laid out as a single flat sequence of bytes, where the address of the memory location is on the left and the corresponding data contained within the blocks. We can see three variable declarations and how they might be laid out in this memory.
'i' refers to an integer value of 1000. Integers are value types and as we see here, the storage location for i holds the actual value '1000'.
foo and foo2 refer to an instance of class Foo, both those variables hold the same data, which is an address of the location where the object is kept, so the actual data is in location 0x10464FBB, and this is the value that gets stored in the locations pointed to by 'foo' and 'foo2'.
To conclude...
We have looked at variables in a little closer detail, and hopefully I have demonstrated that there is a lot more to the humble variable than would appear at first glance. But, its still scratching the metaphorical surface, there is a lot more that I have skipped or glossed over. I highly recommend the C# specification (surprisingly readable and accessible for a programming language specification) for a more rigorous look at variables and everything else C#.