Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Improving Entity Framework Query Performance Using Graph-Based Querying

4.88/5 (14 votes)
29 Aug 2011CPOL14 min read 97K  
This document presents a new, graph-based, way for expressing and executing Microsoft ADO.Net Entity Framework queries. Using an extensive performance comparison, we show that Graph-based Querying (GBQ) easily outperforms traditional querying with LINQ in terms of expressiveness and performance.

Introduction

Microsoft ADO.Net Entity Framework (EF) has some limitations in querying complex data models which result in sub-optimal performance and code complexity (which itself leads to software maintenance concerns).

Inheritance has probably the largest (negative) impact on performance in EF. In case of non-trivial inheritance hierarchies (using TPT), the performance of EF queries drops significantly. This is well-known and Microsoft is working on it (see the June 2011 CTP release of EF). TPT leads to very complex SQL queries that take a lot of resources to construct and to execute. We've seen examples where EF was not even able to construct a query at all.

Another issue is that specifying queries spanning graphs of entities is difficult in LINQ. Consider for example the following structure:

image001.png

Figure 1. An Entity model with a graph structure.

A LINQ query that includes all entities would look something like:

ObjectContext.OSet
   .Inlude("E00.E10.E20")
   .Inlude("E00.E10.E21")
   .Inlude("E00.E11.E22")
   .Inlude("E00.E11.E23")
   .Inlude("E00.A00.A10.B00")
   .Inlude("E00.A00.A11")
   .Inlude("E00.A00.A12")

A query spanning a graph of associated entities consists of the minimum set of paths from a given root ('E00' in this example) that defines that graph. This is a rather verbose and error prone way to define what should be included in a query: it includes redundant information and paths components are untyped.

Things get worse if a query includes inheritance and associations, because LINQ is not expressive enough. For instance, the following figure shows an entity model with inheritance.

image002.png

Figure 2. Depth3Assoc1 model. 1

Inheritance is depicted in this figure as dashed edges, associations as solid edges.

Given such a graph, it is no longer possible to fetch the data using a single query. The reason is that we cannot specify a path from 'E00' to 'B00'. That is, the query 'ObjectContext.OSet.Include("E00.A00.B00")' will raise an exception because there is no association between 'A00' and 'B00' (only from 'A10' to 'B00'). The query 'ObjectContext.OSet.Include("E00.A00.A10.B00")' also doesn't work because the inheritance relation between 'A00' and 'A10' cannot be expressed in LINQ query.

These limitations in combination with complex data models easily lead to badly-performing queries that are hard to understand and maintain. In this document we briefly present an alternative query mechanism that is designed for querying complex structures. We present an in-depth performance comparison that shows that this mechanism quickly pays-off in terms of performance and complexity.

Graph-Based Querying

Graph-based querying (GBQ) is a new approach for querying complex structures from a relational database. If your data model uses TPT inheritance and/or if you are using multiple SQL queries to fetch a graph of related data (such as in Figure 2), GBQ might be an option for you. Figure 3 shows an example of the tables, associations, and inheritance relations that are involved in a typical query that GBQ is designed for.

image003.png

Figure 3. Obfuscated graph representation of a complex query. GBQ is designed for queries as complex as this.

Instead of writing a complex SQL query, or worse, a sequence of SQL queries in order to get all the data from your database, with GBQ you define the shape of the graph of data you are interested in (i.e., the entity types and their relations), and then you call the 'Load()' method on this shape. For example:

var shape = new EntityGraphShape4SQL(ObjectContext)
    .Edge<O, E00>(x => x.E00Set)
    .Edge<E00, A00>(x => x.A00Set)
    .Edge<A10, B00>(x => x.B00Set);

This example shows how you can define a strongly-typed entity graph shape that covers the Depth3Assoc1 data model. You can now fetch all data described in this shape for all owner entities as follows:

shape.Load<O>();

If you would rather load the data only for a single owner object, then, given a primary key, you could issue:

var owner = new O { Id  = <my owner id> };
shape.Load(owner);

Edges in a shape definition may start anywhere in an inheritance hierarchy. This way you can easily express associations from some sub type to another entity. For example, to express that 'A10' has an association with 'B00' (see Figure 2), we define the following edge:

.Edge<A10, B00>(x => x.B00Set)

As you can see, entity graph shapes are concise and declarative, making it easy to define (and maintain) your queries. Moreover, shapes are objects that you can use anywhere in your program (checkout http://riaservicescontrib.codeplex.com/wikipage?title=EntityGraph to learn more about what you can do with entity graphs). Last but not least, GBQ uses the information in the shape definition to synthesize very efficient SQL queries. Consequently, fetching data from an SQL database according to a shape definition is efficient. This document demonstrates how efficient it is by showing that it easily outperforms traditional LINQ queries in EF.

GBQ is an extension to EF. For querying it completely bypasses EF by generating T-SQL and materializing result sets itself. The resulting entities are attached to an EF object context, so that they are further managed by EF. This means that GBQ does not replace Entity Framework. On the contrary, it is designed to work together with EF and is to be used only where EF queries do not satisfy. GBQ uses the Meta model of an EF model to analyze types, store mappings and so on for generating T-SQL and materializing the resulting data.

The Test

To benchmark the performance of GBQ we measure its performance for different data models of increasing complexity and for different population sizes. We compare the results with corresponding EF queries.

The Data Models

GBQ is designed to improve performance for TPT inheritance models with complex association structures. To test the GBQ performance for such models, we created 5 EF models of increasing complexity. Each data model defines an inheritance hierarchy rooted at the entity type 'E00'. The inheritance hierarchy is structured as a binary tree (except for the data model 'Depth5Wide', which has more than 2^(depth-1) nodes at its leaves). The models are of increasing depth, ranging from a depth of 3 (consisting of 4 concrete and 3 abstract base types) to 6 (consisting 32 concrete and 31 abstract base types). The models have an owning entity type 'O' that defines an association with 'E00'. The simplest model (Depth3) is depicted below:

image005.png

Figure 4. Depth3 model.

To also benchmark the performance for data models with associations, we created two flavors for each model: a model with and a model without associations. For two models with associations, we even created variants with 1, 2, and 3 associations. So in total we have 14 different data models. The following figure shows the simplest inheritance model with 3 associations (Depth3Assoc3).

image006.png

Figure 5. Depth3Assoc3 model.

Associations are depicted in this figure as solid edges, while dashed edges denote inheritance. These associations are pretty simple. In practice, such simple association structures are not very realistic. So, the performance gain of GBQ that we will show will be even greater in more realistic scenarios.

There are two important aspects of the associations:

  1. They form an inheritance hierarchy by themselves, rooted at the entity type 'A00'.
  2. Sub types of 'A00' have associations themselves.

This is important because data structures like this cannot be covered in a single EF query using the Include mechanism. That is, you can define a query like ObjectContext.E00Set.Include("A00Set"), but you cannot express to include 'B00Set' for the elements of type 'A10'.

Queries

We will compare GBQ queries with two forms EF queries.

EFSingleQuery

The first form is a single query and has the following form:

ObjectContext.E00Set.Where(x => x.OId == selectedEntityId)

Tis query obtains the elements of type 'E00' for a given owner with key 'selectedEntityId'. We populate the database in such a way that there is exactly one E00 object for each owner object. The result of this query is a single element of a type derived from 'E00'. EF takes care of joining the corresponding tables (in TPT each type has its own data base table), and instantiating a proper subtype of 'E00'. The queries for the models with associations have the form:

ObjectContext.E00Set.Where(x => x.OId == selectedEntityId).Include("A00Set")

This will not only fetch and materialize an instance of a subtype of 'E00', but also the associated entities of type 'A00'. As, we already indicated, this include mechanism is not expressive enough to indicate that also the 'B00' entities associated with 'A10' entities should be fetched. Consequently, this query is incomplete for the data models with associations.

EFMultiQuery

The second form of EF query is a multi-query.

One of the problems with the single EF query is that it performs not very well for complex inheritance hierarchies. This is because EF generates a very complex SQL query spanning all tables of the TPT hierarchy. An alternative approach is to separate this single query in multiple queries, one for each concrete type. Each query has the following form:

ObjectContext.E00Set.OfType<ConcreteType>().Where(x => x.OId == selectedEntityId)

ConcreteType denotes a concrete subtype of 'E00'. For the data model 'Depth3' we would get 4 queries:

ObjectContext.E00Set.OfType<E20>().Where(x => x.OId == selectedEntityId);
ObjectContext.E00Set.OfType<E21>().Where(x => x.OId == selectedEntityId);
ObjectContext.E00Set.OfType<E22>().Where(x => x.OId == selectedEntityId);
ObjectContext.E00Set.OfType<E23>().Where(x => x.OId == selectedEntityId);

Up till EF 4.0 this was a much faster query than the single query, although performance dropped for larger inheritance hierarchies. Since the June 2011 CTP release of EF, the single query is much faster (although the performance drops again for larger inheritance hierarchies). For the models with associations we append '.Include("A00Set")' to each query. To also fetch the 'B00' entities, we execute a join between 'E00', 'A10' and 'B00' entities:

var q1 = ObjectContext.OSet.Where(o => o.Id == SelectEntityId)
  .Join(ObjectContext.E00Set, o => o.Id, e00 => e00.OId, (o, e00) => e00)
  .Join(ObjectContext.A00Set.OfType<A10>(), e00 => e00.Id, a10 => a10.E00Id, (e00, a10) => a10)
  .Join(ObjectContext.B00Set, a10 => a10.Id, b00 => b00.A10Id, (a10, b00) => b00);
q1.ToList();

Observe the complexity of this query for this simple data model. Also observe that expressing such queries really becomes a nightmare if more associated data should be fetched (e.g., for the data models Depth3Assoc2 and Depth3Assoc3). We've experimented with alternative queries but these perform worse and are even more verbose. According to the posts here and here, there doesn't seem to be a good solution for this. I welcome alternative queries that form an improvement over the ones above.

GraphBasedQuery

The GBQ query consists of defining the shape of the entity graph that should be fetched from the database, and then calling the 'Load()' method on this shape. For the model without associations, this looks like:

var shape = new EntityGraphShape4SQL(ObjectContext)
              .Edge<O, E00>(x => x.E00Set);
shape.Load();

For the models with associations, we simply add the associations as edges to the graph shape:

var shape = new EntityGraphShape4SQL(ObjectContext)
                .Edge<O, E00>(x => x.E00Set)
                .Edge<E00, A00>(x => x.A00Set)
                .Edge<A10, B00>(x => x.B00Set);
shape.Load();

Observe how we express that all 'B00's should be fetched for entities of type 'A10'.

Populations

We test the queries for 10 different populations with different numbers of 'E00' entities, ranging from 100 to 1,000. For each concrete subtype of 'E00' we create an equal number of instances. For each 'E00' entity we create 75 instances of 'A00' (25 instances of 'A10' and 25 instances of 'A11', and 25 instances of 'A12'). For each instance of 'A10' we create 5 instances of 'B00'. Likewise, we create 5 instances of 'C00' and 5 instances of 'D00'. For each combination of a data model and a population, we generate a separate data base. So, our tests span 140 different data bases. The number of objects in the database range from 400 in Depth3 model (see Figure 4) to more than 540,000 in Depth6Assoc3 model (see Figure 6). The number of tables range from 8 to 71.

image005a-small.png

Figure 6. Depth6Assoc3 model.

Test Execution

For each of the 14 data models we have the three different queries. These queries are executed for each population. This means we have 42 queries that we execute over 10 populations. This gives 420 different query results.

We run a separate benchmark for each of the three query types. This gives 14 different queries per run over 10 different populations. Each combination of a query for a particular population forms a benchmark test. This gives 140 different benchmark tests per run. The outcome of a benchmark test forms a sample. Each run consists executing 14,000 benchmark tests, resulting in an equal number of samples. On average, each benchmark test is executed 100 times.

Each benchmark test is randomly selected from the set of 140. Each test consists of querying data for an 'owner' object. To minimize effects of result caching, we randomly select an owner object for each benchmark test run.

For each benchmark test we measure the time to complete. After completion, we record the average execution time for each benchmark test. From these average numbers we create the performance graphs that we will discuss later on.

EF requires quite some resources for the first execution of a query. To minimize the impact of this startup time in the benchmark results, we ignore the timings of each first run of a benchmark test. GBQ doesn't have any specific initialization. To prevent auto loading of data, we set the 'LazyLoadingEnabled' flag to 'false'.

Both EF and GBQ have a query plan caching mechanism which improves performance by caching compiled queries. If this mechanism is switched off, there is an immediate performance degrade of EF compared to GBQ. We therefore leave query plan caching switched on to make the performance comparison more challenging for GBQ.

We use EF 4.1 June 2011 CTP release for benchmarking. This release includes a significant performance improvement of TPT queries.

We run the benchmarks on a Dell Latitude E6520 Essential laptop, which has an Intel Core I7-2720QM (2.2 GHz) CPU and 8 Gb memory. On this computer we installed the 64 bits version Windows 7, Visual Studio 2010 SP1, and the June 2011 CTP of Entity framework. For the benchmarking we are not so much interested in the absolute time that is needed for the test runs to complete, but for the relative time difference between the different queries, for the different population and data models. In the end what we want to study is for what complexity of data model and/or population size GBQ starts to pay off.

Test Results

Data Models with Inheritance Only

In this section we present the results of querying the inheritance-only data models. The figures show that GBQ easily outperforms the other two forms of querying. Since the EF 4.1 June 2011 CTP release, the EF single query is much faster than the EF multi query for all tested data models. This used to be the other way around in previous EF versions. The performance of both query types significantly degrades when the inheritance hierarchy grows. Remarkably, the performance of the EFSingleQuery queries significantly drops for larger inheritance hierarchies. So much, that it becomes comparable with the performance of the EFMultiQuery queries. GBQ performance is much less dependent on this form of data model complexity.

Depth 3 Model

image007.png

image008.png

Depth4 Model

image009.png

image010.png

Depth 5 Model

image011.png

image012.png

Depth5Wide Model

image013.png

image014.png

Depth6 Model

image015.png

image016.png

Data Models with Inheritance and Associations

As indicated, it is not possible in EF to express in a single query that all association should be fetched. Therefore, the EFSingleQuery queries below return incomplete results. These queries are included in the figures below just for illustration. Their timings cannot be used for a valid comparison with the other two query types.

Below is the EFMultiQuery for the Depth3Assoc1 model. Observe the join query that is needed to fetch the 'B00' entities associated with 'A10' entities. If more associations should be fetched as well, the complexity of the query increases significantly. Also observe that we have separate queries for each concrete sub type, 'E20', 'E21', 'E22', and 'E23'. Each is a form of code duplication. When the number of concrete types increases, the number of queries increases equally.

ObjectContext.E00Set.OfType<E20>().Include("A00Set").Where(x => x.OId == selectedEntityId).ToList();
ObjectContext.E00Set.OfType<E21>().Include("A00Set").Where(x => x.OId == selectedEntityId).ToList();
ObjectContext.E00Set.OfType<E22>().Include("A00Set").Where(x => x.OId == selectedEntityId).ToList();
ObjectContext.E00Set.OfType<E23>().Include("A00Set").Where(x => x.OId == selectedEntityId).ToList();
var q1 = ObjectContext.OSet.Where(o => o.Id == SelectEntityId)
  .Join(ObjectContext.E00Set, o => o.Id, e00 => e00.OId, (o, e00) => e00)
  .Join(ObjectContext.A00Set.OfType<A10>(), e00 => e00.Id, a10 => a10.E00Id, (e00, a10) => a10)
  .Join(ObjectContext.B00Set, a10 => a10.Id, b00 => b00.A10Id, (a10, b00) => b00);
q1.ToList();

Below is the corresponding GBQ query:

var entity = new O { Id = SelectEntityId };

var shape = new EntityGraphShape4SQL(ObjectContext)
    .Edge<O, E00>(x => x.E00Set)
    .Edge<E00, A00>(x => x.A00Set)
    .Edge<A10, B00>(x => x.B00Set);

shape.Load(entity);

In this query you just define the shape of a graph of associated entities by enumerating its edges. Observe that edges can be defined on sub types (e.g., from 'A10' to 'B00'). Furthermore, in contrast to the EF query, the GBQ query remains the same for the different data models used in this section. The corresponding EF queries need to be adapted to add additional queries for sub types and to add new joins for additional associations.

Depth3Assoc1  Model

image017.png

image018.png

Depth3Assoc2  Model

image019.png

image020.png

Depth3Assoc3  Model

image021.png

image022.png

Depth4Assoc1 Model

image023.png

image024.png

Depth5Assoc1 Model

image025.png

image026.png

Depth5WideAssoc1 Model

image027.png

image028.png

Depth6Assoc1 Model

image029.png

image030.png

Depth6Assoc2 Model

image031.png

image032.png

Depth6Assoc3 Model

image033.png

image034.png

Summary

GBQ is an alternative approach for database querying that bypasses EF's querying mechanism. Since the resulting entities are attached to an EF context, GBQ can be used together with EF. GBQ is typically used when EF query performance is insufficient and/or because EF queries become too complex.

The performance figures in this report clearly show that GBQ queries quickly win in terms of performance. As stated before, the performance of the EFSingleQuery queries in the association models cannot be compared with the EFMultiQuery and EFGraphQuery queries because they yield incomplete results.

The data models and corresponding queries are still relatively simple. GBQ becomes particular interesting for more complex data models where the queries include many associations (e.g., see Figure 3). In these situations query performance of EF really becomes problematic as is the case with expressing and maintaining such queries.

The resulting entities of QBG queries are attached to an EF object context. Since we cannot access any internal APIs of the object context, we are forced to use the 'AttachTo' method. It is expected that a slight performance improvement could be gained if GBQ could access the object context at a lower level.

This report compared the performance of GBQ queries with EF queries. I welcome any suggestion for improving the EF queries that I used.

More information about EntityGraph and Graph-based querying can be found at http://entitygraph.codeplex.com

[1] The data models in this document use the following naming scheme: Depth<x> denotes the depth of the inheritance hierarchy of 'E00'. Assoc<y> denotes that the data model has an association from 'E00' to 'A00'. The number 'y' denotes the number of associations (e.g., 1 denotes a single association from 'A10' to 'B00', while 2 indicates a second association, from 'A11' to 'C00'.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)