Loading Related Entities with Entity Framework - A Beginner's Primer

David Rogers Dev

5.00/5 (43 votes)

24 Jul 2014CPOL12 min read

177.1K

1.5K

Learn how to load related entities using the Entity Framework with simple examples

Download source code - 31.3 KB

Introduction

This article is a beginner-level look at the topic of accessing related entities using the Entity Framework (EF). It is for people new to the entity framework, whether experienced or not as developers generally.

I have used EF6 for the code in this article, but it's applicable to all Code First versions. In the download code, the database will be created and seeded when you first run it. It uses LocalDb.

Background

In my first job, a team of experienced developers was working on an application which used EF as its Object Role Mapper (ORM). Unfortunately, they didn’t understand how EF was behaving. And it was all to do with how they were loading related entities. The application was painfully slow. Finally, one of them ran up SQL Server Profiler to discover that there was something like 10,000 hits to the database every time a particular button was clicked on the page. Fail. So if this stuff could trip up a team of experienced developers, it’s worthy of a look.

We will be covering concepts such as Lazy Loading, Eager Loading, Explicit Loading and proxies. I won’t be going too deep, as people new to EF will find all of the various combinations of configuration and query-construction to be quite confusing at first. But I hope to lay these things out clearly in this article in such a way as to enable people to ramp up quickly when beginning with EF.

Note also that this is not an article about LINQ. But just so that readers are aware, I have used the expression-style of syntax (i.e., "dot notation"), as distinct from the SQL style.

Related Entities

So what do I mean by related entities? First, I will introduce you to the schema of the database which we will use to demonstrate the principles in this article.

Let’s say someone had an MSDN subscription and wanted to manage the various software licences and files which they acquired through that. The following schema depicts an example of such a scenario:

Figure 1

A related entity (for the purposes of this article) is simply an entity that is related to another entity by way of a foreign key relationship. So, if we had a Software entity, a collection of SoftwareFiles entities are related to that Software entity.

In traditional SQL data retrieval, we would use a join or sub-query to access information about related entities. That would result in a flattened resultset:

Figure 2

Using an ORM such as EF, we end up with an object graph, which by its very nature, is not flattened:

Figure 3

All ORMs have their own conventions/API for loading related entities. These then form part of the object graph which is loaded for that particular entity. The first type of access which we’ll look at is lazy loading.

Lazy Loading

Lazy loading is pretty much the default. That is to say, if you leave the default configuration as is, and don’t explicitly tell EF in your query that you want something other than lazy loading, then lazy loading is what you will get. This is the trap that the development team, which I referred to above, fell into. But I’ll comment on that at the end of this section.

To briefly explain, lazy loading is where a related entity will be automatically loaded from the database the first time it is accessed, but not before. Consider the following code:

Listing 1

private static void LazyLoadingRelatedProperty()
{
    //  Software is loaded as it is accessed in each iteration of the foreach loop
    using (var context = new LicenceTrackerContext())
    {
        foreach (var licence in context.Licences)
        {
            Console.WriteLine("{0} - {1} - {2}", licence.LicenceKey, licence.Software.Name,
                licence.Software.Description);
        }        
    }
}

As you can see in the line which writes some values to the console, the related Software entity of each Licence entity is accessed in the foreach loop. If the only thing that was written to the console was the LicenceKey, then the related Software entity would not be retrieved from the database. But it was (by way of licence.Software.Name and licence.Software.Description) and consequently, Software is not null. All we had to do, to load that entity, was access the Software property of the licence objects.

So how does that automagically happen? Here are the prerequisites for lazy loading:

Your entity class must be public and not sealed
The navigation property (related entity) that you want to be lazy loaded has to be a virtual property
Two things need to be configured as true on the context:
- lazy loading (context.Configuration.LazyLoadingEnabled = true;)
- proxy creation (context.Configuration.ProxyCreationEnabled = true;)

Regarding 1 and 2, you can see that the Licence class is public, not sealed and the Software property is virtual. The reason the class has to be not sealed and the navigation property virtual, is because EF creates proxy classes at run-time, which provide the plumbing to enable lazy loading. The dynamic proxies are derived classes of the relevant entities.

If you run the example application in the download code and select number 1 from the menu, an example of lazy loading runs which writes out the run-time type of the proxy to the output window:

Figure 4

If you run the application again and choose 2 from the menu, you will see what happens when the navigation property is not marked with the virtual keyword. In that case, I have tried to access the Software navigation property of the SoftwareFile class. A NullReferenceException is thrown. Selecting 3 from the menu will show what happens when proxies are not enabled.

Note that the decisions as to whether to enable proxies or not is wider than the decision to lazy load. Lazy loading is not the only reason for which proxies can be used. There are also reasons not to use proxies, such as serialization (the proxies created by EF cannot be serialized). In the download code, you'll see me make comments in places where I disable proxies. That is just to demonstrate that the operation does not need proxies, rather than being a comment as to whether I think they should be enabled or not.

Whilst it is very cool to get this automatic functionality where things just work, we need to know what is happening in terms of the application hitting the database. The following screenshot is taken from SQL Server Profiler after I ran up the example application and selected 1 from the menu:

Figure 5

It clearly shows that there is one SELECT statement sent to the database, followed by 3 more SELECTs. So the code in Listing 1 has resulted in 4 separate statements being sent to the database. And here, we see the cost of Lazy Loading. As a related property is lazily accessed in the foreach loop, a SELECT statement is sent to the database. So you could end up in a situation like the hapless developers I described above, where the resolution of a navigation property in a foreach loop (which has thousands of iterations), results in thousands of hits to the database server, rather than just 1.

Actually, the related property isn’t accessed for every iteration of the foreach loop. In the example above, the Software with Id of 1 is a related property of two licences. However, it is only retrieved once. So the number of database hits will depend on whether an entity has been returned in a previous iteration.

Anyhow, that’s not to say Lazy Loading should never be used. There may certainly be circumstances where you want to take advantage of it. You make want to only access the related property for a certain Licence (see Listing 2). But just be cognisant of the potential for multiple SQL statements and be sure to perform an analysis of whether this is preferred to Eager Loading (coming up next).

Listing 2

foreach (var licence in context.Licences)
{
    if (licence.Id == 2)
    {
        // This only gets lazy loaded for the licence with an id of 2
        Console.WriteLine("{0} - {1} - {2}",
            licence.LicenceKey,
            licence.Software.Name,
            licence.Software.Description
            );
    }
}

Lazy loading even works where a database retrieval has been forced (see Listing 3). You will recall that an IQueryable is not actually hitting the database until it is told to do so (Deferred Execution). So this means you could actually force an execution and then rely on lazy loading to access related entities. Menu item 4 in the example code does exactly that:

Listing 3

private static void LazyLoadingAfterQueryHasBeenForced()
{
    using (var context = new LicenceTrackerContext())
    {
        var licences = context.Licences.ToList(); // First Database round trip is here    

        foreach (var licence in licences) // Further round trips happen in here
        {
            Console.WriteLine("{0} - {1} - {2}",
                licence.LicenceKey,
                licence.Software.Name,
                licence.Software.Description
                );
        }
    }
}

Eager Loading

Eager loading is basically the opposite idea. Rather than waiting for some condition to arise for us to load the related property, we load all related entities anyway; we issue instructions to the Entity Framework to load those related entities up front. This results in a single SELECT command being sent to the database, usually with all kinds of joins, depending on how deep in the object graph you are eager loading. It results in the whole object graph materializing all in one go (during one of the normal operations which tells EF to hit the database, i.e., an iteration, a call to ToList() or Single(), etc.)

The code looks very similar to before, except with the use of the Include extension method:

Listing 4

using (var context = new LicenceTrackerContext())
{
    //  Include brings in the related Software property of the Licences objects
    //  It does not matter whether the property is marked virtual or not.
    context.SoftwareFiles.Include(s => s.Software).ToList()
        .ForEach(s => Console.WriteLine("{0} - {1}", s.Software.Name, s.Software.Description));
}

In that example, we are navigating through the Software navigation property of the SoftwareFile object. By virtue of using Include, we are able to instruct Ef to use one query statement to retrieve the data, regardless of the number of iterations the foreach loop has. And the lambda overload of Include gives us strong typing, giving us the immeasurably, useful intellisense, which re-assures us that we are on the right path.

If you are joining the dots in your mind, it is obvious that the development team I've referred to earlier should have been using eager loading, rather than lazy loading. It would have returned a big dataset, but in only the 1 database hit.

I’ll take a look at some more complex scenarios involving eager loading later in the article. Before that, will cover explicit loading.

Explicit Loading

Explicit loading is exactly that; explicit. Unlike lazy loading, there is no ambiguity or possibility of confusion about when a query is run. In the next example, I will retrieve a single instance of a LicenceAllocation and explicitly load the Person navigation property. Explicit loading is done by virtue of the Entry method of the context:

Listing 5

private static void ExplicitRelatedProperty()
{
    using (var context = new LicenceTrackerContext())
    {
        //  Don't need proxies when explicitly loading. 
        context.Configuration.ProxyCreationEnabled = false;
 
        var licenceAllocation = context.LicenceAllocations.Single(la => la.Id == 1);
 
        context.Entry(licenceAllocation)
            .Reference(la => la.Person)
            .Load();
 
        Console.WriteLine("This Licence allocation is allotted to {0} {1}", 
            licenceAllocation.Person.FirstName,
            licenceAllocation.Person.LastName);
    }
}

Entry returns an object of DbEntityEntry type. This has a number of public methods, but the two which we are interested in for the purposes of explicitly loading a related entity are Reference and Collection. In the case where the navigation property is not a collection, you can explicitly load the related entity by calling the Reference method and passing a lambda, as depicted in Listing 5. Following that and continuing with the fluent API, you call the Load method to actually invoke the query and load the entity.

And, for navigation properties which are collections, use the Collection method of the DbEntityEntry object. Everything else is the same as for non-collection, navigation properties:

Listing 6

private static void ExplicitRelatedCollectionProperty()
{
    using (var context = new LicenceTrackerContext())
    {
        //  Don't need proxies when explicitly loading. 
        context.Configuration.ProxyCreationEnabled = false;

        var softwareType = context.SoftwareTypes.Single(st => st.Id == 2);

        context.Entry(softwareType)
            .Collection(st => st.SoftwareProducts)
            .Load();

        Console.WriteLine("This SoftwareType has the following  {0} products:", 
                           softwareType.SoftwareProducts.Count);

        foreach (var softwareProduct in softwareType.SoftwareProducts)
        {
            Console.WriteLine("{0}", softwareProduct.Name);
        }
    }
}

You don’t tend to see explicit loading anywhere near as much as the other kinds. People tend to abstract away the data context behind a repository Interface and I’ve never seen the explicit loading exposed in a repository. But that’s just anecdotal, based on my experience.

Going a Little Deeper with Include

We saw above how you could load the related property of an entity upfront with the Include method that uses a lambda to specify which property is to be included:

context.SoftwareFiles.Include(s => s.Software)

What happens if we want to go deeper into the graph? For example, we are starting with the Licences set of entities and for each Licence we want to eager load the Software, and for each Software entity, we want to eager load its Type navigation property (SoftwareType entity). That’s going to look something like Listing 7:

Listing 7

using (var differentContext = new LicenceTrackerContext())
{
    //  When using Include, no proxies are required.
    differentContext.Configuration.ProxyCreationEnabled = false;
    
    foreach (var licence in differentContext.Licences.Include(l => l.Software.Type))
    {
        Console.WriteLine("{0} - {1} - {2} - {3}", 
            licence.LicenceKey, 
            licence.Software.Name,
            licence.Software.Description,
            licence.Software.Type.Description);
    }
}

You can see there how I have just gone 1 level deeper in the object graph, by accessing the Type property of Software in the lambda expression. That's pretty simple, but things get a little different when it comes to collections.

Let’s see how that’s done with an example that has us starting from the People entity. Our objective is to navigate through the graph from People to Software. Looking at it from that perspective, we can see that we have the LicenceAllocations collection of each Person entity to go through. The syntax for that is:

Listing 8

private static void EagerLoadingThroughCollections()
{
 
    using (var context = new LicenceTrackerContext())
    {
        context.People.Include(p => p.LicenceAllocations.Select(la => la.Licence.Software)).ToList()
            .ForEach(p =>
            {
                if (p.LicenceAllocations.Any())
                    Console.WriteLine("{0} - {1}", 
                        p.LicenceAllocations.First().Licence.Software.Name,
                        p.LicenceAllocations.First().Licence.LicenceKey
                        );
            });
    }
}

So for each LicenceAllocation collection, the LINQ Select extension method is called and passed in a lambda for the property of each LicenceAllocation that we want to eager load. And I've stepped through the Licence navigation property to access the Software entity for each Licence. As always with Include, that results in just the 1 statement being executed against the database. You can use Profiler to check it out.

But what about when you have gone down a path in the graph to a certain level, and you want to eager load something on another branch in the graph, which is some way down the path you previously eager loaded? That was a mouthful, so I’ll clarify that with an example. Let’s say you have 1 Include invocation in your query so far, which starts from the Licences entity and eager loads through LicenceAllocations to People:

Licences -> LicenceAllocations -> People

Now, in the same query, you also want to eager load through LicenceAllocations to SoftwareFiles:

Licences -> LicenceAllocations -> Licence -> Software -> SoftwareFiles

So, your navigation down the path in the object graph effectively branches at the LicenceAllocation object.

The Include calls required for that query would be as set out in Listing 8:

Listing 8

private static void EagerLoadingThroughCollectionsAgain()
{
    using (var context = new LicenceTrackerContext())
    {
        context.Configuration.ProxyCreationEnabled = false;
        
        var licencesQuery = context.Licences
            .Include(s => s.LicenceAllocations.Select(la => la.Person))
            .Include(p => p.LicenceAllocations.Select(la => la.Licence.Software.SoftwareFiles));

        foreach (var licence in licencesQuery)
        {
            licence.LicenceAllocations
                .Select(l => 
                    string.Concat(
                        l.Person.LastName, ", ", 
                        l.Person.FirstName, " - ", 
                        l.Licence.Software.SoftwareFiles.Select(sf => sf.FileName).First()))
                .ToList()
                .ForEach(Console.WriteLine);
        }
    }
}

So you can see the tactic there is not to try and attack both paths in a single Include method, but to actually use 2 Include methods, as they "terminate" on different paths.

Conclusion

ORM frameworks all have their own, distinct ways of transmogrifying flat, SQL result-sets into object-graphs which more closely resemble the domain at hand. This article has been an introduction to the ways in which EF performs that function.

Each alternative discussed has both pros and cons; trade-offs, if you will. As such, some will suit certain scenarios better than others.

It has also discussed how proxies can be relevant in the mix, but noted that there are several things to take into account in the decision to enable/disable them. The decision not to use lazy loading is not the only thing to ponder in that regard.

Finally, know what is going on. If you are seeing something out of the ordinary, or wondering why a query takes forever, spin up SQL Server Profiler to see exactly what SQL is being generated by LINQ-to-Entities. (If you are working on a Web application using ASP.NET, there's a Glimpse plugin which also shows you the SQL which is generated). See how many times statements are being executed. Make the black box white.

History

Article

Version	Date	Summary
1.0	25^th July, 2014	Original published article

Code

Version	Date
1.0	25^th July, 2014

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)