Download Visual Studio 22 project files 4.1MB
Download SQL Server Script - Create DB, Table and script for 1M records
Introduction
This document provides an overview on how to configure your Entity Framework Core (6 or 7) settings to take advantage of the performance gains that OpenJson provides when using a large SQL IN
clause of values. But without having to migrate your project to EF Core 8.
Purpose
Your project maybe in a situation, where you now need to scale up and query more data using your cached values. But if you are not able to take advantage of EF Core 8, as the migration will entail more testing and development for your team – you have the ability to avail of OpenJson while still using EF Core 6 or 7 with this NuGet Package or directly incorporating the class into your codebase.
Scope
The scope of this document is to convey the configuration steps needed to start using OpenJson with EF Core 6\7.
Prerequisites
Setting up Test Database
Run the SQL Server script above to:
- Create a new test database called LinqPerformanceDemo
- Create a new table called LargeTable
- Insert 1 million records into the table LargeTable
You should end up with the following when completed:
Benchmarks (Seconds)
Using a simple LINQ statement with a large IN clause and a small NOT IN Clause:
context.LargeTables.Where(itm => chunkData.Contains(itm.CodeName) && !_listNotIn.Contains(itm.ProjectRef)).ToListAsync();
In Clause - Batch Size | Standard EF Settings | OpenJson EF Interceptor |
20K | 151 | 204 |
30K | 396 | 157 |
40K | Timeout | 132 |
100K | Timeout | 123 |
150K | Timeout | 99 |
200K | Timeout | 98 |
300k | Timeout | 91 |
500k | Timeout | 83 |
700K | Timeout | 86 |
1M | Timeout | 105 |
At 40K, standard EF failed with a timeout on the first IN clause:
NB: The Entity Framework SQL query generator will very often create SQL statements that are complex and convoluted, especially when there is an IN\NOT IN clause involved. It will duplicate the clause with any joined tables that need to match on your values. This can often make the query fail, as SQL Server is unable to execute it, due to its complexity. But with OpenJson you can get around that issue – by using a larger number of values within your chunking approach, and multithreading with a concurrentbag to maintain your results (example code further down).
Add Interceptor Using NuGet Package OR Add Class Directly into Your Project
Add Interceptor Class Uinsg NuGet Package
The NuGet package is located here on Nuget.Org, with the Readme.md file on how to edit the EF settings within your project – choose the latest version to install.
You can add the NuGet package within Visual Studio like a normal package, into your project – search for Linq.OpenJson.InClause.Middleware and click install.
Add Interceptor Class Directly
If you open the attached Visual Studio solution you will see three projects, the project named DemoLinqContainsOpenJson is to demo and test the new EF interceptor.
The second project called LinqOpenJsonInClauseMiddleware, is the NuGet package contents and contains the interceptor override methods (ReaderExecutingAsync and ReaderExecuting).
If you want to customise the code, for example, increase the threshold as to when to use OpenJson over a normal IN\NOT IN clause, you can do this in the SqlServerReaderExecutingOpenJson
class, by setting the constant MINIMUM_ITEM_SIZE. In this snippet of code, when the interceptor finds an IN\NOT IN clause with more than 1000 values, it will transform the SQL to use an OpenJson syntax.
To integrate the SqlServerReaderExecutingOpenJson
class into your project, simply copy the class form the LinqOpenJsonInClauseMiddleware project, renaming the Namespace in the class to match your project Namespace. Then, you can edit the SQL that gets generated to match your needs.
Edit Entity Framework Core Settings to Call Custom Interceptor Middleware
You can configure the SqlServerReaderExecutingOpenJson
class, to be included in your project in two ways – choose which best suits your project.
Program.cs Class
- By adding the namespace
using LinqOpenJsonInClauseMiddleware.Classes;
to your project - Adding the line of code
options.AddInterceptors(new SqlServerReaderExecutingOpenJson
to each of your dbContext related options (standard and factory contexts below).
Your dbContext Class
- By adding the namespace
using LinqOpenJsonInClauseMiddleware.Classes;
to your project - Adding the line of code
options.AddInterceptors(new SqlServerReaderExecutingOpenJson
to the OnConfiguring
method’s options
Interceptor Code Explanation
ReaderExecutingAsync | ReaderExecuting
In the override method below, you can see we intercept the SQL that is generated by LINQ (through the EF pipeline) and replace it with our OpenJson alternative SQL statement. We call out helper method ConvertInClauseToOpenJson to transpose the original EF SQL into our custom OpenJson oriented SQL statement.
Helper Classes
ConvertInClauseToOpenJson
The ConvertInClauseToOpenJson
method will parse out the IN\NOT IN clauses and then pass the values onto the ConvertToOpenJson
method that will replace the existing SQL with the OpenJson SQL.
ConvertToOpenJson
The ConvertToOpenJson
method will then determine if the values within the IN\NOT IN clause are integers\decimals or varchar related and build up the new OpenJson SQL related to that.
You can finally see the OpenJson statement being inserted to the SQL statement and returned to the parent IN\NOT IN clause.
Examples of the Original EF SQL and the Generated OpenJson SQL
In the original SQL (with slimmed down values for brevity), you can see a standard IN clause and its values.
Then in the OpenJson generated SQL, you can see that the values are string related and that the OpenJson has correctly used the associated NVARCHAR datatype.
NB: The NOT IN clause didn’t get converted to OpenJson as its value count was less than 1k.
Varchar Example
Integer Example
Below, you can see the equivalent for Integer related values within the IN clause, and the correct OpenJson datatype (Int) is associated with the query.
How to Collate Results Using Multithreading and ConcurrentBag
By using OpenJson you can get SQL Server to execute SQL statement with a larger IN\NOT IN clause. By default, EF Core 6\7 will execute with an IN clause of value count less than 20K, and this would be the total values for the whole statement (as your IN clause can be duplicated multiple times within a single EF generated SQL statement).
In previous projects, I have taken the approach to execute OpenJson queries based on the complexity of each query. I do this by limiting the size of the chunk to create a list of values.
Then loop this list, inserting the values into the OpenJson query and executing it, I use Parallel looping to take advantage of the multiple cores a server will have available, then I simply add the results to a (thread safe) ConcurrentBag.
You will find example code to execute the above scenario within the QueryHelper class in the DemoLinqContainsOpenJson project:
The code snippet below will:
- Define your chunk size (100K), this is the size of values you want to process at a time.
- Create the level of Parallelism to use.
- Retrieve 1 million values (integer Id’s)
- Create a chunked list (based on chunk size – 1 above).
- Run a query using a background Thread, add results to your ConcurrentBag
- Each Task will loop the appropriate number of times (chunk list size)
- Wait for all tasks to complete.
const int CHUNK_SIZE = 100_000;
var tskDatabaseQueries = new List<Task>();
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 2 };
var resultBag = new ConcurrentBag<List<LargeTable>>();
List<int> cacheInt = await Retrieve1MCacheIntegerAsync();
List<char> cacheChar = await Retrieve1MCacheCharAsync();
List<double?> cacheDouble = await Retrieve1MCacheDoubleAsync();
List<string> cacheString = await Retrieve1MCacheStringAsync();
var chunkInt = cacheInt.Chunk(CHUNK_SIZE);
var chunkChar = cacheChar.Chunk(CHUNK_SIZE);
var chunkDouble = cacheDouble.Chunk(CHUNK_SIZE);
var chunkString = cacheString.Chunk(CHUNK_SIZE);
tskDatabaseQueries.Add(Task.Run(async () =>
{
await Parallel.ForEachAsync(chunkInt, parallelOptions, async (sublist, cancellationToken) =>
{
var queryResult = await RetrieveIntegerChunkAsync(sublist);
resultBag.Add(queryResult);
});
}));
tskDatabaseQueries.Add(Task.Run(async () =>
{
await Parallel.ForEachAsync(chunkChar, parallelOptions, async (sublist, cancellationToken) =>
{
var queryResult = await RetrieveCharChunkAsync(sublist);
resultBag.Add(queryResult);
});
}));
tskDatabaseQueries.Add(Task.Run(async () =>
{
await Parallel.ForEachAsync(chunkDouble, parallelOptions, async (sublist, cancellationToken) =>
{
var queryResult = await RetrieveDoubleChunkAsync(sublist);
resultBag.Add(queryResult);
});
}));
tskDatabaseQueries.Add(Task.Run(async () =>
{
await Parallel.ForEachAsync(chunkString, parallelOptions, async (sublist, cancellationToken) =>
{
var queryResult = await RetrieveStringChunkAsync(sublist);
resultBag.Add(queryResult);
});
}));
await Task.WhenAll(tskDatabaseQueries);