Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#4.0

Entity Framework Patterns: Select Multiple Entities

4.69/5 (26 votes)
16 Dec 2010Ms-PL4 min read 177.1K   1.1K  
Introduce different patterns in Entity Framework around selecting a list of entities.

Introduction

In this article, I would like to introduce different patterns in Entity Framework around one common problem.

The problem is the following: selecting a list of entities; more precisely, given a list of IDs, we want to select the corresponding entities.

We give a different approach in this article, weighing the pros and cons of each. The different patterns are:

  • Selecting each entity one at a time
  • Doing a union via EF
  • Doing a where-in via EF
  • Joining on another table

Using the code

The code sample contains a Visual Studio 2010 Solution. This solution contains a project with a class for each pattern I give here.

The solution also contains a solution folder named "DB Scripts" with three files: SchemaCreation.sql (creates a schema with three tables into a pre-existing database), DataInsert.sql (inserts sample data in the tables), and SchemaDrop.sql (drops the tables and schema if needed).

We use the same database schema for the entire project. Here is its Entity Model representation:

EntityModel.JPG

Selecting each entity one at a time

This pattern is the simplest. Basically, we do a database query for each ID.

C#
public static Order[] DoSelect(int[] ids)
{
	using (var context = new EFDemoEntities())
	{
		var list = new List<Order>(3);

		foreach (var id in ids)
		{
			var query = from o in context.Order.Include("OrderDetail")
				where o.OrderID == id
				select o;
			var objectQuery = (System.Data.Objects.ObjectQuery)query;
			var sql = objectQuery.ToTraceString();

			list.AddRange(query);
		}

		return list.ToArray();
	}
}

Pros:

  • Very simple
  • The same parametrized SQL query is sent at each query (one per ID), so the query is cached on the server

Cons:

  • Scalability: there is one query per ID, therefore the more IDs, the more queries

This pattern could be acceptable if you have a few (say under 10) IDs you want to select. Above that, you'll start to notice serious performance problems because the latency with your database server is going to hit you.

Doing a union via EF

Here, we aim at packaging everything in one query in order to be scalable.

The approach is to do one query per ID, and then to do a union of all the queries and send it to the server.

C#
public static Order[] DoSelect(int[] ids)
{
	using (var context = new EFDemoEntities())
	{
		IQueryable<Order> query = null;

		foreach (var id in ids)
		{
		    // Must assign ID to a scope-local variable,
		    // otherwise, only the last ID is used
			int concreteID = id;
			var queryForID = 
			    from o in context.Order.Include("OrderDetail")
				where o.OrderID == concreteID
				select o;

			if (query == null)
			{
				query = queryForID;
			}
			else
			{
				query = query.Union(queryForID);
			}
		}
		var objectQuery = (System.Data.Objects.ObjectQuery)query;
		var sql = objectQuery.ToTraceString();

		return query.ToArray();
	}
}

Pros:

  • Scalability: only one query is sent to the server, however how many IDs

Cons:

  • The parametrized SQL query depends on the number of IDs, therefore if the number IDs varies in a system, it will be hard for SQL Server to cache the query plan
  • The more IDs, the longer the SQL query is going to be, which hurts scalability

This pattern would be useful only if the number of IDs is always the same.

Doing a where-in via EF

Here, we aim at packaging everything in one query in order to be scalable.

The approach is to filter the IDs on the list of input IDs. This generates a SQL statement WHERE ID IN (X, Y, Z).

C#
public static Order[] DoSelect(int[] ids)
{
	using (var context = new EFDemoEntities())
	{
		var query = from o in context.Order.Include("OrderDetail")
			where ids.Contains(o.OrderID)
			select o;
		var objectQuery = (System.Data.Objects.ObjectQuery)query;
		var sql = objectQuery.ToTraceString();

		return query.ToArray();
	}
}

Pros:

  • Scalability: only one query is sent to the server, however how many IDs

Cons:

  • The parametrized SQL query depends on the IDs themselves; since they do end up in the ad-hoc query, it will be extremely hard for SQL Server to cache the query plan

The only problem here is that the list of IDs finds itself in the ad-hoc query itself: it isn't passed in parameters. Therefore, each query with different IDs you do will be different, and the query plan will never be cached. Otherwise, the generated query is simple and only one query is sent. If you're not concerned about query-plan caching, this could be a good option.

Joining on another table

Here, we use a third table to enter the list of IDs, then we join on it. Basically, we move the ID list from the app-server to the database server.

Entity Framework should insert in batch-mode, so we should have only one round-trip for all the Inserts (actually, the batch-size is limited, so at some point, more than one round-trip will occur). The same holds true for the Deletes after the Select-Join is done. So there should be three round-trips, however the number of IDs.

C#
public static Order[] DoSelect(int[] ids)
{
	using (var context = new EFDemoEntities())
	{
		var batchID = Guid.NewGuid();
		var tempIDs = new List<IDList>(ids.Length);

		foreach (var id in ids)
		{
			var idItem = new IDList { BatchID = batchID, OrderID = id };

			tempIDs.Add(idItem);
			context.IDList.AddObject(idItem);
		}
		context.SaveChanges();

		var query = ((from o in context.Order
				join i in context.IDList on o.OrderID equals i.OrderID
				where i.BatchID == batchID
				select o) as ObjectQuery<Order>).Include("OrderDetail");
		var sql = query.ToTraceString();
		var orders = query.ToArray();

		foreach (var temp in tempIDs)
		{
			context.IDList.DeleteObject(temp);
		}
		context.SaveChanges();

		return orders;
	}
}

Pros:

  • Scalability: with batch-Insert and batch-Delete, three queries are sent to the server, however how many IDs
  • The same queries are sent to the server, hence the query plan is cached

Cons:

  • You need a working-table to store the IDs
  • You need to clean-up that table if the Insert and Delete aren't done in the same transaction
  • Writing and deleting in order to do a Select is sub-optimal

This pattern offers great scalability properties, but the complexity is pretty high.

Conclusion

We've presented four patterns to solve the problem of selecting many entities. In summary:

PatternComments

Selecting each entity one at a time

Simplest, poor scalability
Doing a union via EFPoor scalability
Doing a where-in via EFGood scalability but forbids query-plan caching
Joining on another tableGood scalability and query-plan caching, but pretty complex

There is no definite winner, and this is why we presented each pattern.

The best way to handle this case would be by passing a table in a parameter to a Stored Proc. EF doesn't support Stored Procs with table parameters though, so that would need to be done outside of EF.

License

This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)