Introduction
With advance c# - code these days heavily depends upon lambda expression. Callbacks are so common with delegates like
Action
and Func<T>
. We don't hesitate to access variables from outside of the lambda expression range. It comes handy and provides a great way to construct a programming-model which otherwise with conventional object-oriented programming is quite a challenge. Here I am trying to discuss the construct called closure and the memory footprint that it leaves. Being aware of these basic concepts would certainly help writing better code. My idea is to discuss the core concept without going too-much in internal details.
Before we get into this - let's touch the basics.
What is a class? It’s a template to define objects. In wiki I got this definition
In object-oriented programming, a class is a construct that is used to create instances of itself – referred to as class
instances, class objects, instance objects or simply objects. A class defines constituent members which enable its instances to have state and behavior. Data field members (member variables or instance variables) enable a class instance to maintain state. Other kinds of members, especially methods, enable the behavior of class instances. Classes define the type of their instances.
If you are a C# developer then nothing quite new here – we all know this – then there are other things related to class; like – static class, sealed class, nested class, generics and so on. Then there are object oriented behavior that it provide; like abstraction, overloading and encapsulation.
The key differences between structured programming and object oriented programming are introduction of concepts like encapsulation, association, aggregation and composition. For time being let’s not go to the concept of abstraction and polymorphism although that’s also equally important if not more.
OOPS in the most basic of sense - tells about relating data and functionality together and restricting unnecessary exposure of data. Class do that by having private modifier on types on which only methods defined in that class can operate directly. This modifier is applicable on behavior as well. Functionality/behavior which is not relevant to outside world will not be visible from outside. It helps maintaining a good design.
The data contained in an object at any point of time is known as state. The state can change any moment at run-time. Storing of this state information requires some memory space either on heap or stack. When an object is no more useful or cannot be accessed from code then that should die and the memory occupied by its state should be freed to be be made available for others looking for it.
Before we proceed further let’s discuss something about the freeing of memory.
Beginning with language C the programming constructs are block structured. Before this we use to have statements like jump and goto – these are random calls. You can go to any labels defined from any where - so execution flow was unpredictable and extremely complicated to understand. The introduction of block is one of the nicest features of high-level-languages. In C family and most of others as well - the blocks are defined by curly braces { }. (With advance C# may times the blocks are inferred). This block defines scope for whatever is there inside. In C block could exist at function level and inside that in loops, branches and for anything else. Block within block (nested blocks) are just fine. Function was the top-most level where block could exist. Outside that whatever you define that becomes global. With inclusion of a header file all variable defined at file level would be available for usage globally. In current scenario we can think of these as static. So there was only static thing – Instance concept was not there.
To bring Instance thing into picture (or OOPS) there was just one thing needed. That was allowing blocks outside function. This came in form of class. Class will have a name that defines a scope – inside that whatever we define that would be limited to that class only. Class is not an execution unit like function. It’s an entity with some state which offers some behavior/services/functionality. This new concept is termed as OOPS. Then it offers additional things – like inheritance, encapsulation, abstraction, polymorphism, static and so on. But introduction of block beyond function limit was one of the most fundamental differences in structured programming and object oriented programming. System defined types were there already like int, char, float. In that line now we got freedom to define our types as well.
I have a bad habit of diverting topic! Let’s come-back to concept behind freeing resources now (memory in particular).
If a variable/object is defined inside function then that variable will not exist until someone calls that function. When the function gets called the variable gets created somewhere in memory, then used and then as soon as function finishes its job and comes to end that variable is of no use. No code can reach that – that goes out of scope. So this is the time the memory claimed by that variable should be freed. The same holds true for any variable defined in any scope. As soon as the scope dies the claimed resources should be freed.
C++ destructor works that way. When an object goes out of scope the destructor gets called immediately. For some reason (scattered memory-points after some time of execution was the main reason ) this wasn't the best approach so later (java onward) cleaning process started following new concept known as garbage-collection. It cleans periodically instead of immediately and for large systems it helps improving performance as well as maintaining cleaner memory footprints.
Forget about the timing of cleaning – the basic thing is that when a variable/object gets out-of-scope it should free the resource claimed either immediately of after sometime (some definite time).
To make it happen - framework should know that something is out of scope now. It’s easiest to determine for inner blocks/functions, it is OK for class also – but in case of
closure it gets a bit complicating. We will see why it gets complicated but before that we must know closure first.
What is CLOSURE?
Let’s see how it has been defined in Wikipedia.
In computer science, a closure (also lexical closure or function closure) is a function or reference to a function together with a referencing environment—a table storing a reference to each of the non-local variables (also called free variables) of that function. A closure—unlike a plain function pointer—allows a function to access those non-local variables even when invoked outside of its immediate lexical scope.
The concept of closures was developed in the 1960s and was first fully implemented in 1975 as a language feature in the Scheme programming language to support lexically scoped first-class functions. The explicit use of closures is associated with functional programming languages such as Lisp and ML, as traditional imperative languages such as Algol, C and Pascal did not support returning nested functions as results of higher-order functions and thus did not require supporting closures either. Many modern garbage-collected imperative languages support closures, such as Smalltalk (the first object-oriented language to do so) and C#. Support for closures in Java isplanned for Java 8.
So – it’s not a new thing, it is not a OOPS concept either and it has its existence from before object oriented programming. But it has a really nice feature to treat a function as an entity with some state; this is supported now by object-oriented languages like C#. In basic sense it means that treating function as an object and expecting it to remember about the environment in which it got created. Functions are being passed as an object which holds some state as well.
But C# is fully object oriented language and so it need to treat these as well as object only some how. let's see one example now.
public class BasicClosure
{
private IntOne one = new IntOne() { name = "IntOne+one", value = 30 };
private IntOne two = new IntOne() { name = "IntOne+two", value = 30 };
void IWillBeExposed()
{
one.value += 10;
}
public Action AddValue()
{
return IWillBeExposed;
}
~BasicClosure()
{
LogDetail.DebugLogs.Add(new LogDetail() { Name = "BasicClosure",
Method = "~BasicClosure", Variable = "one", Value = one.value.ToString() });
}
}
public class IntOne
{
public int value = 0;
public string name = "";
~IntOne()
{
LogDetail.DebugLogs.Add(new LogDetail() { Name = name, Method =
"~IntOne", Variable = "value", Value = value.ToString() });
}
}
public class LogDetail
{
public string Name = "";
public string Method = "";
public string Variable = "";
public string Value = "";
public override string ToString()
{
return string.Format(
"Name : {0}, Method : {1}, Variable : {2}, Value : {3}", Name, Method, Variable, Value);
}
public static List<logdetail> DebugLogs = new List<logdetail>();
}
public partial class TestBasicClosure : Form
{
public TestBasicClosure()
{
InitializeComponent();
}
Action addValue = null;
private void btnStart_Click(object sender, EventArgs e)
{
BasicClosure fbc = new BasicClosure();
addValue = fbc.AddValue();
}
private void btnCall_Click(object sender, EventArgs e)
{
addValue();
}
private void btnLog_Click(object sender, EventArgs e)
{
rtbLog.Clear();
rtbLog.Lines = LogDetail.DebugLogs.Select(dl => dl.ToString()).ToArray();
rtbLog.Refresh();
}
private void btnGC_Click(object sender, EventArgs e)
{
GC.Collect();
}
private void btnClr_Click(object sender, EventArgs e)
{
addValue = null;
}
}
We have a BasicClosure
class which has a private void method
IWillBeExposed
that conforms to signature of Action delegate. There is another method
AddValue
that returns the reference of the private method as action.
Now in the TestBasicClosure
class in the Start_Click
we are creating object of BasicClosure
- calling the AddValue
and storing the Action in a Instance variable addValue
. The object fbc
immediately gets out-of-scope as the Start_Click
ends.
This is a bad design - functions should not be exposed outside of class this way. What happens in this case that due to the function reference the lifetime of the object fbc
gets extended and it will not release memory till the function reference exists. We have another method in the test class that requests Garbage-Collector to start cleaning ASAP. after calling this method also you can see - the object fbc
finalizer does not get called. The Log_Click
displays all the cleaned object logs (I created this for easy tracing).
In the code above the fbc
object reference doesn't exist anywhere directly - it means we can't do any operation on that object - apart from calling the function that got exposed (that too never through the object). The function uses one Instance variable namely one
. There is one more variable two
that is not accessed by the exposed function - so logically that should be cleaned. The lifetime of one
can be extended as that is still a reachable reference but as two
is not reachable so there isn't any harm in cleaning that. In a true function-oriented-language that's how it should happen - but as C# is object-oriented-language so it doesn't happen that way. It knows one thing - a member of the class is still reachable so the object can't be cleaned.
The exposed function is basically behaving like a CLOSURE here. function acting as an entity and extending the lifetime of the context it got created. This is not really helpful till this point; the lifetime of object gets extended even though the method doesn't refer any instance variable - given any member function reference goes outside.
This will not happen in case of static methods as they are not tied to instances.
Now let's move to the next example:
public class TestClosure
{
public void Test()
{
var shout = Shout(new string[] {"John", "Bill", "Danish"} );
shout[0].Invoke(); shout[2].Invoke();
shout[1].Invoke();
}
List<action> Shout(string[] names)
{
List<action> actions = new List<action>();
foreach (string currName in names)
{
string name = currName;
actions.Add(
() => MessageBox.Show(name)
);
}
return actions;
}
}
In this the Shout
method creates a List of Action delegates dynamically based on number of strings supplied in the names
argument. The MessageBox.Show
call that is inside the Action delegate body - is accessing outer variable name
that is created inside the loop.
When you run the Test
method it will show all names
in message-box one by one.
One thing to note here is the name
variable that is declared in the scope of loop. So in effect the Action delegate accessing the name
will be different every time - In other words each action
has it's own copy of outer variable.
The difference you can see if instead of showing name
you show currName
. This variable is not inside scope of loop; so it doesn't get created every time it loops. So if we show currName
then for every action the last value of currName
will get flashed.
When a method or delegate forms a closure on some outer variable - it doesn't copy the value. It just associate the reference of that with itself. While making the call to such delegate the associated variable will be demanded and the value contained that time will be the one that will be taken. This is the reason why in case of currName
it will always be the last one that will be available because all the actions are getting invoked much after the loop ends.
This was about scope, how should we associate outer variables to a lambda-expression/anonymous-delegates and all that. Now just see how a object-oriented language like C# achieves this and how memory-management goes with all these? How methods are treated as first-class-objects? Here is another sample code:
public class FCO
{
public string name = "FCO";
IntOne y = new IntOne() { name = "Y", value = 20 };
public void TestLifeTime(out Func<int> func1, out Func<int> func2) {
IntOne i = new IntOne() { name = "I", value = 10 };
IntOne j = new IntOne() { name = "J", value = y.value };
IntOne k = new IntOne() { name = "K", value = y.value };
Func<int> add10 = () =>
{
i.value += 10; return i.value;
};
Func<int> addFive = () =>
{
j.value = i.value + 5; return j.value;
};
func1 = add10;
func2 = addFive;
k.value++; }
~FCO(){
y = new IntOne() { name = "new Y" };
LogDetail.DebugLogs.Add(new LogDetail() {Name = name, Method = "~FCO", Variable = "y", Value = y.value.ToString() });
}
}
Here is a windows form to test this class:
public partial class TestFCO : Form
{
public TestFCO()
{
InitializeComponent();
}
Func<int> f1 = null;
Func<int> f2 = null;
private void btnStart_Click(object sender, EventArgs e)
{
FCO fco = new FCO();
fco.TestLifeTime(out f1, out f2);
}
private void btnAddTen_Click(object sender, EventArgs e)
{
if (f1 != null)
{
lblOne.Text = f1().ToString();
}
}
private void btnAddFive_Click(object sender, EventArgs e)
{
if (f2 != null)
{
lblTwo.Text = f2().ToString();
}
}
private void btnGC_Click(object sender, EventArgs e)
{
GC.Collect();
}
private void btnLog_Click(object sender, EventArgs e)
{
rtbLog.Clear();
rtbLog.Lines = LogDetail.DebugLogs.Select(dl => dl.ToString()).ToArray();
rtbLog.Refresh();
}
private void btnClrTen_Click(object sender, EventArgs e)
{
f1 = null;
}
private void btnClrFive_Click(object sender, EventArgs e)
{
f2 = null;
}
}
We are creating an instance of FCO
on start - then we get two action delegates supplied as out parameters. The
start
method ends so the locally created object goes out-of-scope. Unlike the first sample here we aren't exposing any instance member of
FCO
so there isn't any reason why finalizer should not fire. As garbage-collector may take sometime so we are requesting immediate clean-up using
GC.Collect()
. We can see the log if some objects has been released. Here is what i get if i call
start
then collect
then getLog
.
Name : FCO, Method : ~FCO, Variable : y, Value : 0
Name : Y, Method : ~IntOne, Variable : value, Value : 20
Name : K, Method : ~IntOne, Variable : value, Value : 21
If i call collect
once again and then getLog
then i get one more line of log.
Name : new Y, Method : ~IntOne, Variable : value, Value : 0
So what is happening here - FCO
finalizer gets called. Y
is an instance member and nobody closes on it as well so it's finalizer also gets called. in FCO
finalizer a new object gets created and kept in Y
. K
is a local variable to the function TestLifeTime
and nobody closes (depends for future) on it so it's finalizer also gets called.
the newly created Y
doesn't have any reference but it got created during clean-up so it will survive first cycle of GC
, but upon next GC.Collect()
call that also gets collected and that's why we are getting the one extra log after the 2nd call to GC.Collect().
Now we have two closures created. The Object died already but the closures survives in TestFCO
as f1
and f2
member variables. Here the lifetime of I
and J
get extended because
f1
closes on I
and
f2
closes in I
and J
both
I
and J
were local variables to the function they were defined in - the function execution has ended already so how come the I
and J
will still survive? with what object they would be tied to? because apart from execution context local variables - state has to be associated with an object or-else the GC
will clean that. OK, what would happen if we nullify f2
reference in the TestFCO
? logically J
should get cleared as f1
doesn't closes on J
, it requires only I
so holding J
doesn't have any relevance. In my demo-project i am doing this on Clear 5
button - but to my surprise - J
doesn't get collected. But when i nullify f1
also then I
and J
both gets collected.
This problem is here to stay - and it is because of the object-oriented construct. It doesn't poses a big threat although.
When you create a closure then dynamically - nested classes gets created at compile-time for all the scopes which variables are accessed as outer
variables by the closing function/delegate. But the compiler isn't a dumb piece. In the nested class only those member will participate on whom somebody is closing.
See in the figure below how it looks from IL Dis-assembler. the c_displayClass5
is the dynamic generated type for assisting
closure. further we can see that I
and J
both are member of this class but not k
. An instance of this class gets associated with
the delegate and so this instance will live till the time any reference of the delegate exist. Here in case this type is shared by both the delegate f1
and
f2
so even though f2
dies and J
gets unreachable - the J
will still exist because it is member of the same object
which I
belongs. This is how in object-oriented environment function-oriented construct is supported. It may extend life of some un-wanted
variables as well (here in case it is of J
) but still it's useful. with careful design by having these concepts in mind the problem can
be avoided almost and the impact of this in most cases is negligible.
This is why we call Methods in C# are treated as first-class objects. They act as a simple object - in fact compilers create simple objects to hold their required state in case they form closure on some outer variable. these objects are simple and do not go to the length of abstraction and polymorphism as that's not required also in these cases.
With this I am bringing closure to this article. hope I haven't missed any important part. Thanks.