Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / SQL

Improve EF Core 67 Contains Performance with OpenJson (SQL Server) - NuGet Package

5.00/5 (1 vote)
28 May 2024CPOL6 min read 2.4K   9  
How Entity Framework Core 6 or 7 projects can take advantage of EF Core 8's OpenJson without having to migrate to EF Core 8 - using a custom NuGet Package for SQL Server

Download Visual Studio 22 project files 4.1MB

Download SQL Server Script - Create DB, Table and script for 1M records

Introduction

This document provides an overview on how to configure your Entity Framework Core (6 or 7) settings to take advantage of the performance gains that OpenJson provides when using a large SQL IN clause of values. But without having to migrate your project to EF Core 8.

Purpose

Your project maybe in a situation, where you now need to scale up and query more data using your cached values. But if you are not able to take advantage of EF Core 8, as the migration will entail more testing and development for your team – you have the ability to avail of OpenJson while still using EF Core 6 or 7 with this NuGet Package or directly incorporating the class into your codebase.

Scope

The scope of this document is to convey the configuration steps needed to start using OpenJson with EF Core 6\7.

Prerequisites

Setting up Test Database

Run the SQL Server script above to:

  1. Create a new test database called LinqPerformanceDemo
  2. Create a new table called LargeTable
  3. Insert 1 million records into the table LargeTable

You should end up with the following when completed:

Image 1

Benchmarks (Seconds)

Using a simple LINQ statement with a large IN clause and a small NOT IN Clause:

context.LargeTables.Where(itm => chunkData.Contains(itm.CodeName) && !_listNotIn.Contains(itm.ProjectRef)).ToListAsync();

In Clause - Batch Size Standard EF Settings OpenJson EF Interceptor
20K 151 204
30K 396 157
40K Timeout 132
100K Timeout 123
150K Timeout 99
200K Timeout 98
300k Timeout 91
500k Timeout 83
700K Timeout 86
1M Timeout 105

At 40K, standard EF failed with a timeout on the first IN clause:

Image 2

NB: The Entity Framework SQL query generator will very often create SQL statements that are complex and convoluted, especially when there is an IN\NOT IN clause involved. It will duplicate the clause with any joined tables that need to match on your values. This can often make the query fail, as SQL Server is unable to execute it, due to its complexity. But with OpenJson you can get around that issue – by using a larger number of values within your chunking approach, and multithreading with a concurrentbag to maintain your results (example code further down).

Add Interceptor Using NuGet Package OR Add Class Directly into Your Project

Add Interceptor Class Uinsg NuGet Package

The NuGet package is located here on Nuget.Org, with the Readme.md file on how to edit the EF settings within your project – choose the latest version to install.

Image 3

You can add the NuGet package within Visual Studio like a normal package, into your project – search for Linq.OpenJson.InClause.Middleware and click install.

Image 4

 

Image 5

Image 6

 

Add Interceptor Class Directly

If you open the attached Visual Studio solution you will see three projects, the project named DemoLinqContainsOpenJson is to demo and test the new EF interceptor.

The second project called LinqOpenJsonInClauseMiddleware, is the NuGet package contents and contains the interceptor override methods (ReaderExecutingAsync and ReaderExecuting).

Image 7

If you want to customise the code, for example, increase the threshold as to when to use OpenJson  over a normal IN\NOT IN clause, you can do this in the SqlServerReaderExecutingOpenJson class, by setting the constant MINIMUM_ITEM_SIZE. In this snippet of code, when the interceptor finds an IN\NOT IN clause with more than 1000 values, it will transform the SQL to use an OpenJson syntax.

Image 8

To integrate the SqlServerReaderExecutingOpenJson class into your project, simply copy the class form the LinqOpenJsonInClauseMiddleware project, renaming the Namespace in the class to match your project Namespace. Then, you can edit the SQL that gets generated to match your needs.

Edit Entity Framework Core Settings to Call Custom Interceptor Middleware

You can configure the SqlServerReaderExecutingOpenJson class, to be included in your project in two ways – choose which best suits your project.

Program.cs Class

  1. By adding the namespace using LinqOpenJsonInClauseMiddleware.Classes; to your project
  2. Adding the line of code options.AddInterceptors(new SqlServerReaderExecutingOpenJson to each of your dbContext related options (standard and factory contexts below).

Image 9

Your dbContext Class

  1. By adding the namespace using LinqOpenJsonInClauseMiddleware.Classes; to your project
  2. Adding the line of code options.AddInterceptors(new SqlServerReaderExecutingOpenJson to the OnConfiguring method’s options

Image 10

Interceptor Code Explanation

ReaderExecutingAsync | ReaderExecuting

In the override method below, you can see we intercept the SQL that is generated by LINQ (through the EF pipeline) and replace it with our OpenJson alternative SQL statement. We call out helper method ConvertInClauseToOpenJson to transpose the original EF SQL into our custom OpenJson oriented SQL statement.

Image 11

Helper Classes

ConvertInClauseToOpenJson

The ConvertInClauseToOpenJson method will parse out the IN\NOT IN clauses and then pass the values onto the ConvertToOpenJson method that will replace the existing SQL with the OpenJson SQL.

Image 12

ConvertToOpenJson

The ConvertToOpenJson method will then determine if the values within the IN\NOT IN clause are integers\decimals or varchar related and build up the new OpenJson SQL related to that.

You can finally see the OpenJson statement being inserted to the SQL statement and returned to the parent IN\NOT IN clause.

Image 13

Examples of the Original EF SQL and the Generated OpenJson SQL

In the original SQL (with slimmed down values for brevity), you can see a standard IN clause and its values.

Then in the OpenJson generated SQL, you can see that the values are string related and that the OpenJson has correctly used the associated NVARCHAR datatype.

NB: The NOT IN clause didn’t get converted to OpenJson as its value count was less than 1k.

Varchar Example

Image 14

Integer Example

Below, you can see the equivalent for Integer related values within the IN clause, and the correct OpenJson datatype (Int) is associated with the query.

Image 15

How to Collate Results Using Multithreading and ConcurrentBag

By using OpenJson you can get SQL Server to execute SQL statement with a larger IN\NOT IN clause. By default, EF Core 6\7 will execute with an IN clause of value count less than 20K, and this would be the total values for the whole statement (as your IN clause can be duplicated multiple times within a single EF generated SQL statement).

In previous projects, I have taken the approach to execute OpenJson queries based on the complexity of each query. I do this by limiting the size of the chunk to create a list of values.

Then loop this list, inserting the values into the OpenJson query and executing it, I use Parallel looping to take advantage of the multiple cores a server will have available, then I simply add the results to a (thread safe) ConcurrentBag.

You will find example code to execute the above scenario within the QueryHelper class in the DemoLinqContainsOpenJson project:

The code snippet below will:

  1. Define your chunk size (100K), this is the size of values you want to process at a time.
  2. Create the level of Parallelism to use.
  3. Retrieve 1 million values (integer Id’s)
  4. Create a chunked list (based on chunk size – 1 above).
  5. Run a query using a background Thread, add results to your ConcurrentBag
  6. Each Task will loop the appropriate number of times (chunk list size)
  7. Wait for all tasks to complete.
C#
const int CHUNK_SIZE = 100_000;            
var tskDatabaseQueries = new List<Task>(); // maintain the running background tasks
var parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 2 };
var resultBag = new ConcurrentBag<List<LargeTable>>(); // ConcurrentBag to hold all results safely

// mock a client that uses a cache of data to filter against - retrieve cached data (1M in each collection) - data will be duplicated in cases like char!
List<int> cacheInt = await Retrieve1MCacheIntegerAsync();
List<char> cacheChar = await Retrieve1MCacheCharAsync();
List<double?> cacheDouble = await Retrieve1MCacheDoubleAsync();
List<string> cacheString = await Retrieve1MCacheStringAsync();

// chunk data into 100K batches for looping
var chunkInt = cacheInt.Chunk(CHUNK_SIZE);
var chunkChar = cacheChar.Chunk(CHUNK_SIZE);
var chunkDouble = cacheDouble.Chunk(CHUNK_SIZE);
var chunkString = cacheString.Chunk(CHUNK_SIZE);

// make a linq call using Integer as the IN clause
tskDatabaseQueries.Add(Task.Run(async () =>
{
    await Parallel.ForEachAsync(chunkInt, parallelOptions, async (sublist, cancellationToken) =>
    {
        var queryResult = await RetrieveIntegerChunkAsync(sublist);
        resultBag.Add(queryResult);
    });
}));

// make a linq call using Char as the IN clause
tskDatabaseQueries.Add(Task.Run(async () =>
{
    await Parallel.ForEachAsync(chunkChar, parallelOptions, async (sublist, cancellationToken) =>
    {
        var queryResult = await RetrieveCharChunkAsync(sublist);
        resultBag.Add(queryResult);
    });
}));

// make a linq call using Double as the IN clause
tskDatabaseQueries.Add(Task.Run(async () =>
{
    await Parallel.ForEachAsync(chunkDouble, parallelOptions, async (sublist, cancellationToken) =>
    {
        var queryResult = await RetrieveDoubleChunkAsync(sublist);
        resultBag.Add(queryResult);
    });
}));

// make a linq call using String as the IN clause
tskDatabaseQueries.Add(Task.Run(async () =>
{
    await Parallel.ForEachAsync(chunkString, parallelOptions, async (sublist, cancellationToken) =>
    {
        var queryResult = await RetrieveStringChunkAsync(sublist);
        resultBag.Add(queryResult);
    });
}));

// Wait for all tasks to complete
await Task.WhenAll(tskDatabaseQueries);

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)