Introduction
In this article, I would like to introduce different patterns in Entity Framework around one common problem.
The problem is the following: selecting a list of entities; more precisely, given a list of IDs, we want to select the corresponding entities.
We give a different approach in this article, weighing the pros and cons of each. The different patterns are:
- Selecting each entity one at a time
- Doing a union via EF
- Doing a where-in via EF
- Joining on another table
Using the code
The code sample contains a Visual Studio 2010 Solution. This solution contains a project with a class for each pattern I give here.
The solution also contains a solution folder named "DB Scripts" with three files: SchemaCreation.sql (creates a schema with three tables into a pre-existing database), DataInsert.sql (inserts sample data in the tables), and SchemaDrop.sql (drops the tables and schema if needed).
We use the same database schema for the entire project. Here is its Entity Model representation:
Selecting each entity one at a time
This pattern is the simplest. Basically, we do a database query for each ID.
public static Order[] DoSelect(int[] ids)
{
using (var context = new EFDemoEntities())
{
var list = new List<Order>(3);
foreach (var id in ids)
{
var query = from o in context.Order.Include("OrderDetail")
where o.OrderID == id
select o;
var objectQuery = (System.Data.Objects.ObjectQuery)query;
var sql = objectQuery.ToTraceString();
list.AddRange(query);
}
return list.ToArray();
}
}
Pros:
- Very simple
- The same parametrized SQL query is sent at each query (one per ID), so the query is cached on the server
Cons:
- Scalability: there is one query per ID, therefore the more IDs, the more queries
This pattern could be acceptable if you have a few (say under 10) IDs you want to select. Above that, you'll start to notice serious performance problems because the latency with your database server is going to hit you.
Doing a union via EF
Here, we aim at packaging everything in one query in order to be scalable.
The approach is to do one query per ID, and then to do a union of all the queries and send it to the server.
public static Order[] DoSelect(int[] ids)
{
using (var context = new EFDemoEntities())
{
IQueryable<Order> query = null;
foreach (var id in ids)
{
int concreteID = id;
var queryForID =
from o in context.Order.Include("OrderDetail")
where o.OrderID == concreteID
select o;
if (query == null)
{
query = queryForID;
}
else
{
query = query.Union(queryForID);
}
}
var objectQuery = (System.Data.Objects.ObjectQuery)query;
var sql = objectQuery.ToTraceString();
return query.ToArray();
}
}
Pros:
- Scalability: only one query is sent to the server, however how many IDs
Cons:
- The parametrized SQL query depends on the number of IDs, therefore if the number IDs varies in a system, it will be hard for SQL Server to cache the query plan
- The more IDs, the longer the SQL query is going to be, which hurts scalability
This pattern would be useful only if the number of IDs is always the same.
Doing a where-in via EF
Here, we aim at packaging everything in one query in order to be scalable.
The approach is to filter the IDs on the list of input IDs. This generates a SQL statement WHERE ID IN (X, Y, Z)
.
public static Order[] DoSelect(int[] ids)
{
using (var context = new EFDemoEntities())
{
var query = from o in context.Order.Include("OrderDetail")
where ids.Contains(o.OrderID)
select o;
var objectQuery = (System.Data.Objects.ObjectQuery)query;
var sql = objectQuery.ToTraceString();
return query.ToArray();
}
}
Pros:
- Scalability: only one query is sent to the server, however how many IDs
Cons:
- The parametrized SQL query depends on the IDs themselves; since they do end up in the ad-hoc query, it will be extremely hard for SQL Server to cache the query plan
The only problem here is that the list of IDs finds itself in the ad-hoc query itself: it isn't passed in parameters. Therefore, each query with different IDs you do will be different, and the query plan will never be cached. Otherwise, the generated query is simple and only one query is sent. If you're not concerned about query-plan caching, this could be a good option.
Joining on another table
Here, we use a third table to enter the list of IDs, then we join on it. Basically, we move the ID list from the app-server to the database server.
Entity Framework should insert in batch-mode, so we should have only one round-trip for all the Inserts (actually, the batch-size is limited, so at some point, more than one round-trip will occur). The same holds true for the Deletes after the Select-Join is done. So there should be three round-trips, however the number of IDs.
public static Order[] DoSelect(int[] ids)
{
using (var context = new EFDemoEntities())
{
var batchID = Guid.NewGuid();
var tempIDs = new List<IDList>(ids.Length);
foreach (var id in ids)
{
var idItem = new IDList { BatchID = batchID, OrderID = id };
tempIDs.Add(idItem);
context.IDList.AddObject(idItem);
}
context.SaveChanges();
var query = ((from o in context.Order
join i in context.IDList on o.OrderID equals i.OrderID
where i.BatchID == batchID
select o) as ObjectQuery<Order>).Include("OrderDetail");
var sql = query.ToTraceString();
var orders = query.ToArray();
foreach (var temp in tempIDs)
{
context.IDList.DeleteObject(temp);
}
context.SaveChanges();
return orders;
}
}
Pros:
- Scalability: with batch-Insert and batch-Delete, three queries are sent to the server, however how many IDs
- The same queries are sent to the server, hence the query plan is cached
Cons:
- You need a working-table to store the IDs
- You need to clean-up that table if the Insert and Delete aren't done in the same transaction
- Writing and deleting in order to do a Select is sub-optimal
This pattern offers great scalability properties, but the complexity is pretty high.
Conclusion
We've presented four patterns to solve the problem of selecting many entities. In summary:
Pattern | Comments |
Selecting each entity one at a time
| Simplest, poor scalability |
Doing a union via EF | Poor scalability |
Doing a where-in via EF | Good scalability but forbids query-plan caching |
Joining on another table | Good scalability and query-plan caching, but pretty complex |
There is no definite winner, and this is why we presented each pattern.
The best way to handle this case would be by passing a table in a parameter to a Stored Proc. EF doesn't support Stored Procs with table parameters though, so that would need to be done outside of EF.