Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Slow LINQ Methods

4.65/5 (7 votes)
27 Jun 2014CPOL4 min read 27.7K   69  
LINQ methods can be slow inside loops as this little demo shows.

Introduction

Ever had problems with slow execution when looping through tables in a dataset?

If so, this little tip might help.

Background

In a project I had to save the data in a DataSet to a tab-separated text file. The dataset consists of several nested tables, where the the two largest is a Group that contains a SubGroup. Both these tables has a Description which is stored in another table for multi-language support.

Not exactly rocket science, I thought, and started to implement. After a few hours of coding and testing with a small number of data to get the text format right, I started to test the function with a real life scenario. This consists of about 3000 groups (MPG) with about 400 sub-groups (MP) each.

I ran the test again and got an unpleasant surprise. The test software totally froze. (no threading in my small test bench). Time to get a coffee and then start to figure out what went wrong. When I came back I saw that the execution had finsished. Hmm, so it was not frozen, it just took a long time.

After I added a timer I realized it took 4 minutes! to save a text file with about 16000 lines. As this functionality is part of a GUI, you can imagine how happy any user would be to have to wait 4 minutes for every time the file is saved.

Analyzing my code

After some analysis I found some stupid coding of my part, e.g. I had put one operation inside a loop that only needed to be done once. I also changed from String to StringBuilder where I had a lot of concatenations. This helped some, I cut the time by 50 seconds.

The real culprit, however, was the LINQ extensions I used to get data from indirectly referenced tables. These extensions are pretty neat as they save some lines of code used for type casting, but appearantly they are not very efficient when used inside a loop.

In my code I called the same function around 15000 times in my code (3000 groups + 3000 groups * 400 sub-groups). The only difference is if the type should be MPG or MP.

C#
string description = "Description is missing.";
DescrRow rowDescription = rowDescriptions.GetDescrRows().
  Where(x => x.Type == "MPG" && x.Name == rowMPG.Name).SingleOrDefault();
if (rowDescription != null)
  description = rowDescription.Text; // This text is written to the text file later

The little snippet above took a very long time to execute, so I had to do something about it. I tested quite a few different variants, but the final verdict was that the old Select method was the fastest in my case.

First I created two temporary tables, one for MPG and one for MP descriptions. This is done outside the loop.

C#
DataTable tableMPGDescriptions = null;
DataTable tableMPDescriptions = null;
if (rowDescriptions != null)
{
  IEnumerable<DescrRow> descriptions = null;
  descriptions = rowDescriptions.GetDescrRows().Where(x => x.Type == "MPG");
  tableMPGDescriptions = descriptions.CopyToDataTable<DescrRow>();

  descriptions = rowDescriptions.GetDescrRows().Where(x => x.Type == "MP");
  tableMPDescriptions = descriptions.CopyToDataTable<DescrRow>();
}

Inside the loop the code below is used.

C#
description = "Description is missing.";
if (tableMPGDescriptions != null)
{
  DataRow[] descriptionRows = null;
  descriptionRows = tableMPGDescriptions.Select(String.Format("Name='{0}'", rowMPG.Name));

  if (descriptionRows.Length == 1)
    description = descriptionRows[0]["Text"].ToString();
}

Final Result

After changing the code at all places where I used LINQ methods inside a loop the time to save the file became less than 4 seconds, which for this application is acceptable.

So from 4 minutes to 4 seconds in around 4 hours of work.

Using the code

As I can't post the code from my project here, I created a small demo project that shows the difference in time between the default method Select() and the extension method Where().

I created a small dataset to simulate my real code.

Test data table

I filled the table with dummy values

I updated the number of rows in the data table to get more data.

C#
DataSetTest dsTestData = new DataSetTest();
Random randy = new Random();
for (int i = 0; i < 45000; i++) // Changed from 15000 to 45000
{
  dsTestData.Test.AddTestRow(String.Format("Hello_{0}", i), 
                             (int)(1000 *randy.NextDouble()));
}

and then called this code different number of times.

C#
for (int n = 1; n <= 10000; n *= 10)
{
  Console.WriteLine("Function call made {0} times.", n);
  Console.WriteLine("");
  timee.Restart();
  for (int i = 0; i < n; i++)
  {
    name = String.Format("Hello_{0}", i);
    selectRows = (DataSetTest.TestRow[])dsTestData.Test.
                 Select(String.Format("Name = '{0}'", name));
  }
  Console.WriteLine("Using DataTable.Select()
   : {0,6} ms", timee.ElapsedMilliseconds);

  timee.Restart();
  for (int i = 0; i < n; i++)
  {
    name = String.Format("Hello_{0}", i);
    linqRows = dsTestData.Test.Where(x => x.Name == name);
    linqRow = linqRows.ElementAt(0);
  }
  Console.WriteLine("Using Where().ElementAt(0)      : {0,6} ms", timee.ElapsedMilliseconds);

  timee.Restart();
  for (int i = 0; i < n; i++)
  {
    name = String.Format("Hello_{0}", i);
    linqRows = dsTestData.Test.Where(x => x.Name == name);
    linqRow = linqRows.FirstOrDefault();
  }
  Console.WriteLine("Using Where().FirstOrDefault()  : {0,6} ms", timee.ElapsedMilliseconds);
  
  timee.Restart();
  for (int i = 0; i < n; i++)
  {
    name = String.Format("Hello_{0}", i);
    linqRows = dsTestData.Test.Where(x => x.Name == name);
    linqRow = linqRows.SingleOrDefault();
  }
  Console.WriteLine("Using Where().SingleOrDefault() 
                    : {0,6} ms", timee.ElapsedMilliseconds);
}

The table below shows the time elapsed for different values of n.

Time Elapsed (ms)
Method n = 1 100 1000 10000 Comment
DataTable.Select() 586  82  257  First call to takes a long time.
Where() 13  I kept this result for reference.
ElementAt(0) 4913  Exponentially longer time.
FirstOrDefault() 205  4887  Exponentially longer time.
SingleOrDefault() 15  1280  6944  72583  Slow as a turtle.

The times differ a bit from execution to execution, but the ratios are usually the same.

Conclusions

My little investigation shows that using the LINQ extension methods ElementAt(), FirstOrDefault() and SingleOrDefault() really can slow things down if used inside loops and the number of iterations > 1000.

For single calls and a low number of iterations the LINQ methods are fast.

That the Where() by itself is fast is because the execution is delayed until you try to access any element in the list. (Thanks Mike and SA for explaining that to me.)

That SingleOrDefault() takes a long time is because the whole list has to be searched. (Thanks Nicolas for that pointer.)

One thing I still don't get is why the execution time per iteration for ElementAt() and FirstOrDefault() increases with the number of iterations.

Time Elapsed / number of iterations (ms)
Method 100 1000 10000 Comment
DataTable.Select() 0.080  0.082  0.025  Getting faster and faster.
ElementAt(0) 0.163  0.491  Exponentially longer time.
FirstOrDefault() 0.01  0.205  0.489  Exponentially longer time.
SingleOrDefault() 12.800  6.944  7.253  Execution time decreases.

Points of Interest

I learned a thing or two about optimization and that new is not always better.

  • Use StringBuilder instead of String inside loops if a lot of concatenations are performed.
  • Avoid using the LINQ extension methods inside loops.

History

  1. First release of the tip.
  2. Source code and Conclusions section updated after comments.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)