Introduction
This article is a beginner-level look at the topic of accessing related entities using the Entity Framework (EF). It is for people new to the entity framework, whether experienced or not as developers generally.
I have used EF6 for the code in this article, but it's applicable to all Code First versions. In the download code, the database will be created and seeded when you first run it. It uses LocalDb
.
Background
In my first job, a team of experienced developers was working on an application which used EF as its Object Role Mapper (ORM). Unfortunately, they didn’t understand how EF was behaving. And it was all to do with how they were loading related entities. The application was painfully slow. Finally, one of them ran up SQL Server Profiler to discover that there was something like 10,000 hits to the database every time a particular button was clicked on the page. Fail. So if this stuff could trip up a team of experienced developers, it’s worthy of a look.
We will be covering concepts such as Lazy Loading, Eager Loading, Explicit Loading and proxies. I won’t be going too deep, as people new to EF will find all of the various combinations of configuration and query-construction to be quite confusing at first. But I hope to lay these things out clearly in this article in such a way as to enable people to ramp up quickly when beginning with EF.
Note also that this is not an article about LINQ. But just so that readers are aware, I have used the expression-style of syntax (i.e., "dot notation"), as distinct from the SQL style.
Related Entities
So what do I mean by related entities? First, I will introduce you to the schema of the database which we will use to demonstrate the principles in this article.
Let’s say someone had an MSDN subscription and wanted to manage the various software licences and files which they acquired through that. The following schema depicts an example of such a scenario:
Figure 1
A related entity (for the purposes of this article) is simply an entity that is related to another entity by way of a foreign key relationship. So, if we had a Software
entity, a collection of SoftwareFiles
entities are related to that Software
entity.
In traditional SQL data retrieval, we would use a join or sub-query to access information about related entities. That would result in a flattened resultset
:
Figure 2
Using an ORM such as EF, we end up with an object graph, which by its very nature, is not flattened:
Figure 3
All ORMs have their own conventions/API for loading related entities. These then form part of the object graph which is loaded for that particular entity. The first type of access which we’ll look at is lazy loading.
Lazy Loading
Lazy loading is pretty much the default. That is to say, if you leave the default configuration as is, and don’t explicitly tell EF in your query that you want something other than lazy loading, then lazy loading is what you will get. This is the trap that the development team, which I referred to above, fell into. But I’ll comment on that at the end of this section.
To briefly explain, lazy loading is where a related entity will be automatically loaded from the database the first time it is accessed, but not before. Consider the following code:
Listing 1
private static void LazyLoadingRelatedProperty()
{
using (var context = new LicenceTrackerContext())
{
foreach (var licence in context.Licences)
{
Console.WriteLine("{0} - {1} - {2}", licence.LicenceKey, licence.Software.Name,
licence.Software.Description);
}
}
}
As you can see in the line which writes some values to the console, the related Software
entity of each Licence
entity is accessed in the foreach
loop. If the only thing that was written to the console was the LicenceKey
, then the related Software
entity would not be retrieved from the database. But it was (by way of licence.Software.Name
and licence.Software.Description
) and consequently, Software
is not null
. All we had to do, to load that entity, was access the Software
property of the licence
objects.
So how does that automagically happen? Here are the prerequisites for lazy loading:
- Your entity class must be
public
and not sealed
- The navigation property (related entity) that you want to be lazy loaded has to be a
virtual
property - Two things need to be configured as
true
on the context:
- lazy loading (
context.Configuration.LazyLoadingEnabled = true;
) - proxy creation (
context.Configuration.ProxyCreationEnabled = true;
)
Regarding 1 and 2, you can see that the Licence
class is public
, not sealed
and the Software
property is virtual
. The reason the class has to be not sealed
and the navigation property virtual
, is because EF creates proxy classes at run-time, which provide the plumbing to enable lazy loading. The dynamic proxies are derived classes of the relevant entities.
If you run the example application in the download code and select number 1 from the menu, an example of lazy loading runs which writes out the run-time type of the proxy to the output window:
Figure 4
If you run the application again and choose 2 from the menu, you will see what happens when the navigation property is not marked with the virtual
keyword. In that case, I have tried to access the Software navigation property of the SoftwareFile
class. A NullReferenceException
is thrown. Selecting 3 from the menu will show what happens when proxies are not enabled.
Note that the decisions as to whether to enable proxies or not is wider than the decision to lazy load. Lazy loading is not the only reason for which proxies can be used. There are also reasons not to use proxies, such as serialization (the proxies created by EF cannot be serialized). In the download code, you'll see me make comments in places where I disable proxies. That is just to demonstrate that the operation does not need proxies, rather than being a comment as to whether I think they should be enabled or not.
Whilst it is very cool to get this automatic functionality where things just work, we need to know what is happening in terms of the application hitting the database. The following screenshot is taken from SQL Server Profiler after I ran up the example application and selected 1 from the menu:
Figure 5
It clearly shows that there is one SELECT
statement sent to the database, followed by 3 more SELECT
s. So the code in Listing 1 has resulted in 4 separate statements being sent to the database. And here, we see the cost of Lazy Loading. As a related property is lazily accessed in the foreach
loop, a SELECT
statement is sent to the database. So you could end up in a situation like the hapless developers I described above, where the resolution of a navigation property in a foreach
loop (which has thousands of iterations), results in thousands of hits to the database server, rather than just 1.
Actually, the related property isn’t accessed for every iteration of the foreach
loop. In the example above, the Software
with Id
of 1
is a related property of two licences. However, it is only retrieved once. So the number of database hits will depend on whether an entity has been returned in a previous iteration.
Anyhow, that’s not to say Lazy Loading should never be used. There may certainly be circumstances where you want to take advantage of it. You make want to only access the related property for a certain Licence (see Listing 2). But just be cognisant of the potential for multiple SQL statements and be sure to perform an analysis of whether this is preferred to Eager Loading (coming up next).
Listing 2
foreach (var licence in context.Licences)
{
if (licence.Id == 2)
{
Console.WriteLine("{0} - {1} - {2}",
licence.LicenceKey,
licence.Software.Name,
licence.Software.Description
);
}
}
Lazy loading even works where a database retrieval has been forced (see Listing 3). You will recall that an IQueryable
is not actually hitting the database until it is told to do so (Deferred Execution). So this means you could actually force an execution and then rely on lazy loading to access related entities. Menu item 4 in the example code does exactly that:
Listing 3
private static void LazyLoadingAfterQueryHasBeenForced()
{
using (var context = new LicenceTrackerContext())
{
var licences = context.Licences.ToList();
foreach (var licence in licences)
{
Console.WriteLine("{0} - {1} - {2}",
licence.LicenceKey,
licence.Software.Name,
licence.Software.Description
);
}
}
}
Eager Loading
Eager loading is basically the opposite idea. Rather than waiting for some condition to arise for us to load the related property, we load all related entities anyway; we issue instructions to the Entity Framework to load those related entities up front. This results in a single SELECT
command being sent to the database, usually with all kinds of joins, depending on how deep in the object graph you are eager loading. It results in the whole object graph materializing all in one go (during one of the normal operations which tells EF to hit the database, i.e., an iteration, a call to ToList()
or Single()
, etc.)
The code looks very similar to before, except with the use of the Include
extension method:
Listing 4
using (var context = new LicenceTrackerContext())
{
context.SoftwareFiles.Include(s => s.Software).ToList()
.ForEach(s => Console.WriteLine("{0} - {1}", s.Software.Name, s.Software.Description));
}
In that example, we are navigating through the Software navigation property of the SoftwareFile
object. By virtue of using Include
, we are able to instruct Ef to use one query statement to retrieve the data, regardless of the number of iterations the foreach
loop has. And the lambda overload of Include
gives us strong typing, giving us the immeasurably, useful intellisense, which re-assures us that we are on the right path.
If you are joining the dots in your mind, it is obvious that the development team I've referred to earlier should have been using eager loading, rather than lazy loading. It would have returned a big dataset
, but in only the 1 database hit.
I’ll take a look at some more complex scenarios involving eager loading later in the article. Before that, will cover explicit loading.
Explicit Loading
Explicit loading is exactly that; explicit. Unlike lazy loading, there is no ambiguity or possibility of confusion about when a query is run. In the next example, I will retrieve a single instance of a LicenceAllocation
and explicitly load the Person
navigation property. Explicit loading is done by virtue of the Entry
method of the context:
Listing 5
private static void ExplicitRelatedProperty()
{
using (var context = new LicenceTrackerContext())
{
context.Configuration.ProxyCreationEnabled = false;
var licenceAllocation = context.LicenceAllocations.Single(la => la.Id == 1);
context.Entry(licenceAllocation)
.Reference(la => la.Person)
.Load();
Console.WriteLine("This Licence allocation is allotted to {0} {1}",
licenceAllocation.Person.FirstName,
licenceAllocation.Person.LastName);
}
}
Entry returns an object of DbEntityEntry
type. This has a number of public
methods, but the two which we are interested in for the purposes of explicitly loading a related entity are Reference
and Collection
. In the case where the navigation property is not a collection, you can explicitly load the related entity by calling the Reference
method and passing a lambda, as depicted in Listing 5. Following that and continuing with the fluent API, you call the Load
method to actually invoke the query and load the entity.
And, for navigation properties which are collections, use the Collection
method of the DbEntityEntry
object. Everything else is the same as for non-collection, navigation properties:
Listing 6
private static void ExplicitRelatedCollectionProperty()
{
using (var context = new LicenceTrackerContext())
{
context.Configuration.ProxyCreationEnabled = false;
var softwareType = context.SoftwareTypes.Single(st => st.Id == 2);
context.Entry(softwareType)
.Collection(st => st.SoftwareProducts)
.Load();
Console.WriteLine("This SoftwareType has the following {0} products:",
softwareType.SoftwareProducts.Count);
foreach (var softwareProduct in softwareType.SoftwareProducts)
{
Console.WriteLine("{0}", softwareProduct.Name);
}
}
}
You don’t tend to see explicit loading anywhere near as much as the other kinds. People tend to abstract away the data context behind a repository Interface and I’ve never seen the explicit loading exposed in a repository. But that’s just anecdotal, based on my experience.
Going a Little Deeper with Include
We saw above how you could load the related property of an entity upfront with the Include
method that uses a lambda to specify which property is to be included:
context.SoftwareFiles.Include(s => s.Software)
What happens if we want to go deeper into the graph? For example, we are starting with the Licences set of entities and for each Licence we want to eager load the Software, and for each Software
entity, we want to eager load its Type
navigation property (SoftwareType
entity). That’s going to look something like Listing 7:
Listing 7
using (var differentContext = new LicenceTrackerContext())
{
differentContext.Configuration.ProxyCreationEnabled = false;
foreach (var licence in differentContext.Licences.Include(l => l.Software.Type))
{
Console.WriteLine("{0} - {1} - {2} - {3}",
licence.LicenceKey,
licence.Software.Name,
licence.Software.Description,
licence.Software.Type.Description);
}
}
You can see there how I have just gone 1 level deeper in the object graph, by accessing the Type
property of Software in the lambda expression. That's pretty simple, but things get a little different when it comes to collections.
Let’s see how that’s done with an example that has us starting from the People
entity. Our objective is to navigate through the graph from People
to Software
. Looking at it from that perspective, we can see that we have the LicenceAllocations
collection of each Person
entity to go through. The syntax for that is:
Listing 8
private static void EagerLoadingThroughCollections()
{
using (var context = new LicenceTrackerContext())
{
context.People.Include(p => p.LicenceAllocations.Select(la => la.Licence.Software)).ToList()
.ForEach(p =>
{
if (p.LicenceAllocations.Any())
Console.WriteLine("{0} - {1}",
p.LicenceAllocations.First().Licence.Software.Name,
p.LicenceAllocations.First().Licence.LicenceKey
);
});
}
}
So for each LicenceAllocation
collection, the LINQ Select
extension method is called and passed in a lambda for the property of each LicenceAllocation
that we want to eager load. And I've stepped through the Licence
navigation property to access the Software
entity for each Licence
. As always with Include
, that results in just the 1 statement being executed against the database. You can use Profiler to check it out.
But what about when you have gone down a path in the graph to a certain level, and you want to eager load something on another branch in the graph, which is some way down the path you previously eager loaded? That was a mouthful, so I’ll clarify that with an example. Let’s say you have 1 Include
invocation in your query so far, which starts from the Licences
entity and eager loads through LicenceAllocations
to People
:
Licences -> LicenceAllocations -> People
Now, in the same query, you also want to eager load through LicenceAllocations
to SoftwareFiles
:
Licences -> LicenceAllocations -> Licence -> Software -> SoftwareFiles
So, your navigation down the path in the object graph effectively branches at the LicenceAllocation
object.
The Include
calls required for that query would be as set out in Listing 8:
Listing 8
private static void EagerLoadingThroughCollectionsAgain()
{
using (var context = new LicenceTrackerContext())
{
context.Configuration.ProxyCreationEnabled = false;
var licencesQuery = context.Licences
.Include(s => s.LicenceAllocations.Select(la => la.Person))
.Include(p => p.LicenceAllocations.Select(la => la.Licence.Software.SoftwareFiles));
foreach (var licence in licencesQuery)
{
licence.LicenceAllocations
.Select(l =>
string.Concat(
l.Person.LastName, ", ",
l.Person.FirstName, " - ",
l.Licence.Software.SoftwareFiles.Select(sf => sf.FileName).First()))
.ToList()
.ForEach(Console.WriteLine);
}
}
}
So you can see the tactic there is not to try and attack both paths in a single Include
method, but to actually use 2 Include
methods, as they "terminate" on different paths.
Conclusion
ORM frameworks all have their own, distinct ways of transmogrifying flat, SQL result-sets into object-graphs which more closely resemble the domain at hand. This article has been an introduction to the ways in which EF performs that function.
Each alternative discussed has both pros and cons; trade-offs, if you will. As such, some will suit certain scenarios better than others.
It has also discussed how proxies can be relevant in the mix, but noted that there are several things to take into account in the decision to enable/disable them. The decision not to use lazy loading is not the only thing to ponder in that regard.
Finally, know what is going on. If you are seeing something out of the ordinary, or wondering why a query takes forever, spin up SQL Server Profiler to see exactly what SQL is being generated by LINQ-to-Entities. (If you are working on a Web application using ASP.NET, there's a Glimpse plugin which also shows you the SQL which is generated). See how many times statements are being executed. Make the black box white.
History
Article
Version | Date | Summary |
1.0 | 25th July, 2014 | Original published article |
Code
Version | Date |
1.0 | 25th July, 2014 |