Re: Memory leak trouble with Linq to Sql and multiple threads - C# Discussion Boards

Re: Memory leak trouble with Linq to Sql and multiple threads

Gerry Schmitz10-Jan-16 9:04

Gerry Schmitz

10-Jan-16 9:04

That sounds like a good reason.

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8610-Jan-16 16:33

JD86

10-Jan-16 16:33

Anytime I'm doing multiple queries I manually open the connection. From my understanding each query (without manually opening) is in essence doing this:

-> Open Connection
-> Query
-> Close connection
-> Open Connection
-> Query
-> Close connection

So by me calling: db.Database.Connection.Open() it should be doing this:

-> Open Connection
-> Query
-> Query
-> Close connection

Re: Memory leak trouble with Linq to Sql and multiple threads

Gerry Schmitz10-Jan-16 17:40

Gerry Schmitz

10-Jan-16 17:40

Here is the remarks section for your "Connection" property:

Quote:
You can use this property to interoperate with relational ADO.NET code.

The returned connection will be closed unless it has been explicitly opened by the user.

(Which appears to also reflect what is in the EF docs).

DataContext.Connection Property (System.Data.Linq)[^]

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8611-Jan-16 5:01

JD86

11-Jan-16 5:01

Ok as a test I took out all of the following:

db.Connection.Open();

I kept my Try/Catch/Finally which calls Dispose on the DataContext for testing. Since I am not manually opening connections it should open and close after each query/update but it should handle the connections by itself.

I'm going to keep an eye on memory today. If this doesn't work then I will try the USING statement. I still want to handle the connection because I don't want it opening and closing all the time since it causes a delay. Once I narrow down what it is I'll call Close() before disposing of the DataContext.

I will let you all know once I have an update

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8612-Jan-16 6:24

JD86

12-Jan-16 6:24

So I thought it was looking promising but the memory maxed out again. I could see the memory going up and down but over time it just steadily cliimbed.

I posted the code here:
KnowMoreIT/CloudPanel-Service · GitHub[^]

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn12-Jan-16 15:50

Luc Pattyn

12-Jan-16 15:50

Hi again,

that's a lot of code, but I like your coding style (except for too many vars) so that kept me going; I have been browsing some of the code, and have a few questions already:

1) are you running this in 32-bit or in 64-bit mode? and how much RAM does your PC have?

2) what is the estimated count (order of magnitude) of yours lists (users, messages received, messages sent, ...) ?

3) what is the frequency of the Get_MessageTrackingLogsTask? (while hoping that is where the problem is).
In the end I will probably be curious to see all the intervals as collected by GetIntervalsFromConfig().

4) please give a quantitative description instead of "the memory going up and down but over time it just steadily cliimbed". e.g. is it like a sawtooth? how much does it go down when it does? how much does it go up from one tooth to the next? what is the periodicity of it going down? at what rate does it go up on average? what is the final value? and how long does it run fine?

5) when it fails, what is the exact message you are getting? is this repetitive? how many runs did you perform with the same code (and pretty much the same data)? do you know where in the code it comes to a sudden end?

6) do you know of anything in the application domain that (essentially without changing the code) dramatically changes the up-time? e.g. can you make it fail much more rapidly by altering some parameters?

7) suggestion: if acceptable, could you also provide a log of a failing run that fits the code currently on github?

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

modified 12-Jan-16 22:38pm.

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8612-Jan-16 16:29

JD86

12-Jan-16 16:29

Hey Luc!

Glad to hear you like my coding style... its been completely self taught (with help from random people on the web) so never really knew LOL.

1) I am running 32-bit process. Memory is a virtual machine running 2008 R2 with 16GB of memory (4GB is used normally)

2) Here is the breakdown:
Users: 4333 rows
Message Tracking Logs: 1432284 rows
Mailbox Sizes: 2209537 rows
Database Sizes: 10520
History Stats table: 69806 rows

3) Get_MessageTrackingLogsTask normally would run every 720 minutes. To speed up the testing phase I changed it to every 60 minutes (along with the rest of the tasks)

4) The service starts out at 5MB of memory. From 6pm to 9pm tonight it has climbed to 314MB. I usually end up stopping the service around 2.8GB used. I've let it just go once and it crashed when it hit the max memory for 32-bit process) Here is intervals:
GetDisabledUsers has ran 180 times (every 10 min) [Users table]
GetLockedUsers has ran 180 times (Every 10 min) [Users table]
GetMailboxSizes has ran 3 times (every 60 min) [Mailbox sizes table]
GetMailboXDatabaseSizes has ran 6 times (every 30 min) [Database sizes table]
GetMessageTrackingLogs has ran 3 times (every 60 min) [Message Tracking table]
FindMissingData has ran 9 times (every 20 min) [Users table]
UpdateDatabaseHistory has ran 3 times (every 60 min) [History Stats table]

5) I've only let it crash once so I don't know where it crashed. I don't see any errors and it continues to function fine till it runs out of memory. This does happen everytime I run it.. I've tried removing the db.Connection.Open() and currently (right now) i'm trying using the "using" statement.

6) I can make the memory climb by increasing the intervals. So one of my tasks is causing the issue but examining the code over and over I can't find where it possibly is doing it Frown | :-(

7) It probably would take about 2 days for it to fully climb and crash. I would setup something to capture a memory dump when it crashes but it would take a few days.

One thing I can try is to increase the interval for all tasks to like 999999 minutes except for one (set at 10 min). Then I can slowly include each task until I see the memory problem.

But right now it has gone from 5MB to 314MB in 3 hours. It should just be querying from Exchange/AD and dumbing the data to SQL and disposing of all of it... so technically the memory should go back down right?

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn12-Jan-16 17:16

Luc Pattyn

12-Jan-16 17:16

Hi,

thanks for all that info. Here are 3 remarks:

1) changing from explicit Dispose() calls to using constructs will not solve the problem by itself; it merely results in simpler code that is less likely to contain errors. So it should not be a priority.

2) I have an experiment I want to suggest to you with high priority: your code contains some 16 instances of

List<something> somename=new List<something>();

I want you to replace those by:

List<something> somename=new List<something>(someconstant);

i.e. specify an initial capacity for each and every list, where someconstant is a small factor larger than your current number of items; also preferably do not use more than 2 different values. So I'd suggest 0x00400000 for Message Tracking Logs and Mailbox Sizes, and 0x00040000 for everything else; yes those are hex numbers for a reason, and they exceed 4 million and quarter of a million respectively. The idea is to have the Lists of sufficient size right away (this will save a lot of data copying while populating an currently also growing the lists), and to improve the probability the same memory chunks will be used for newer generations of those lists.

This may seem mysterious, the rationale is: a List internally has an array holding the items; an array is an object; objects larger than around 80KB are kept in what is known as the "Large Object Heap". That heap used to have very poor memory management in the early days of .NET (say .NET 2.0); the issue was kept quiet by MS as long as possible, but it was (maybe or probably still is) a weakness of the .NET memory management. It has been said it got improved, however I don't know whether it has, how, and by how much. BTW: I expect behavior to be better on a 64-bit application.) If my suggested experiment behaves much better, it more or less proves the issue still exists and is relevant to your application.

Warning: the memory usage may initially be higher, however the "climbing steadily" should disappear completely.

3) I discovered one thing that is wasteful, and fixing that would then probably be the final solution I'd suggest: in your scenario you first build a list of all messages (sent, received), then repetitively use LINQ to reduce that list to a smaller list, and then basically are interested in the count of that smaller list. With a little effort, you could write a method that doesn't build the big list, it could create a list of lists (one for each user) hopefully resulting in many lists each of them much smaller than the current big list (a list with fewer than 10K items is bound to be managed well on the regular heap). If needed we can discuss this more later on, after we have the results from the experiment under (2).

That's it for now.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8612-Jan-16 17:30

JD86

12-Jan-16 17:30

Luc,

Thank you for assisting me with this. So I'm going to update the code and push it out tomorrow morning with the pre-allocated list sizes. Also just to let you know the process is at 386MB of memory right now.

I know you said don't worry about the using statement but when I started this at 6pm I already put them in place. The code on github is still showing the try/catch/finally but I can update it with what I changed if you think it is necessary.

Again,
I really appreciate you helping!

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn12-Jan-16 17:46

Luc Pattyn

12-Jan-16 17:46

OK, the change to usings is no problem, once it works it should be fine.

yes, if anything is still the matter, I will want access to the code of which you will tell us the results later on.

this will be my last message for today. CU tomorrow.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8614-Jan-16 4:07

JD86

14-Jan-16 4:07

Luc,

Sorry for the delay. Yesterday I discovered another issue in the code. The Quartz library i'm using would let the job fire multiple times. So it didn't stop the timer while the job was executing and restart the timer after it was done.

I discovered after that the memory leak still exists and i'm trying your suggestions now. HOWEVER the memory leak is MUCH slower now because the jobs are not occurring so often.

From yesterday morning to now the memory has grown to 1.5GB

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8614-Jan-16 4:26

JD86

14-Jan-16 4:26

Ok code is updated and running now. I changes the triggers that were every 60 minutes to every 30 minutes to hopefully reproduce quicker.

Updated code is here:
KnowMoreIT/CloudPanel-Service · GitHub[^]

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8614-Jan-16 14:05

JD86

14-Jan-16 14:05

Ok it is running with that new code is currently at 677MB. Just a few minutes ago it went up to 1.5GB and back down... however it is slowly climbing

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn14-Jan-16 18:26

Luc Pattyn

14-Jan-16 18:26

Hi,

1. once again I must ask you to be more specific; "slowly climbing" doesn't offer much info.

2. changing the code and changing run-time parameters too much makes it extremely hard if not impossible to understand what sporadic measurement results actually mean.

3. I won't be in tomorrow.

4. Today I have been, and still am, researching the behavior of the Large Object Heap, and probably will write a CP article about it; if so, that will take a few weeks though.
In summary, while there may be a way to make it perform as required (with a lot of conditions), I am still inclined to avoid large objects as much as possible. And hence:

5. I developed a little class that would keep your huge message lists in the regular heap, hence avoiding the LOH fragmentation you are currently bound to get. It does not offer an initial capacity, I don't think that would be very useful. It has been tested with foreach and with LINQ. Here it is:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;

namespace LargeObjectsHeapTester {
	//
	// This class collects a sequence of items of type T (value or reference type)
	// and keeps them in the order they get added.
	// A list-of-lists technique is used to keep everything in the regular heap,
	// i.e. this class should not challenge the large object heap unless
	// the number of items collected gets really big (above 100 million).
	// To control what is inside the collection:
	//	- use Add to add an item at the end;
	//	- use Clear to clear the collection.
	// To obtain information about its content:
	//	- use the Count property;
	//	- use the IEnumerable<T> interface (or the old-fashioned IEnumerable if you must)
	//	  (both foreach and LINQ rely on the IEnumerable<T> interface when present)
	//
	// Note: value types such as structs should be small (max 8B) otherwise
	// the level1 lists may not fit in the regular heap.
	public class ListOfLists<T> :IEnumerable, IEnumerable<T> {
		public const int MAX_ITEMS_IN_LEVEL1=10000;
		//public static ILog Logger;
		private List<List<T>> level0;
		private List<T> currentLevel1;

		// constructor
		public ListOfLists() {
			Clear();
		}

		// logging utility
		private void log(string msg) {
			//if(Logger!=null) Logger.log(msg);
		}

		// empty the collection
		public void Clear() {
			level0=new List<List<T>>();
			currentLevel1=new List<T>();
			level0.Add(currentLevel1);
		}

		// add an item at the end of the collection
		public void Add(T item) {
			if(currentLevel1.Count>=MAX_ITEMS_IN_LEVEL1) {
				currentLevel1=new List<T>();
				level0.Add(currentLevel1);
			}
			currentLevel1.Add(item);
		}

		// get the number of items in the collection
		public int Count {
			get {
				int totalCount=0;
				foreach(List<T> level1 in level0) totalCount+=level1.Count;
				return totalCount;
			}
		}

		// log the internals
		public void DumpCounts() {
			int totalCount=0;
			log("level0.Count="+level0.Count);
			int i=0;
			foreach(List<T> level1 in level0) {
				int count1=level1.Count;
				log("level0["+i+"].Count="+count1);
				totalCount+=count1;
				i++;
			}
			log("total count="+totalCount);
		}

		// get an old-style enumerator (typeless)
		IEnumerator IEnumerable.GetEnumerator() {
			return this.GetEnumerator();
		}

		// get an enumerator (this one gets used by foreach and LINQ)
		public IEnumerator<T> GetEnumerator() {
			foreach(List<T> level1 in level0) {
				foreach(T item in level1) {
					yield return item;
				}
			}
		}
	}
}

6. Your LINQ statements such as

var allMailboxes = db.Users.Where(x ... ... ... ).ToList();

clearly also create lists; I don't know how large they are, if more than 10000 elements, they too will contribute to the LOH fragmentation, and have escaped every remedy I have suggested so far.

Questions:
a) how many users satisfy x.MailboxPlan > 0
b) what would be a good upper limit for the Count of this allMailboxes?
c) how many users would there typically be in the below a.Users (sent or received)

List<MessageTrackingLog> totalSentLogs = sentLogs.Where(a => a.Users.Contains(x.EmailAddress, StringComparer.OrdinalIgnoreCase)).ToList();

That's it for now.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

modified 15-Jan-16 8:18am.

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8617-Jan-16 16:09

JD86

17-Jan-16 16:09

Hi Luc!

I think after reading your questions/answers and researching that I have a much better understanding of what is going on. Basically when a List has more than 10,000 objects in it then it goes into LOH. Now from what I understand the garbage collector doesn't normally "compact" these as it expects to reuse this space?

Now I read in 4.5.1 they introduced this: whi[^] which gives you the ability to tell it TO compact once and then it resets to default which doesn't compact LOH.

Now what you posted creates a List of Lists (as named) and keeps each list under 10,000 so it never enters LOH.

I'm testing your class right now and recompiling..

Edit: (Sorry forgot to answer your questions):

Questions:
a) how many users satisfy x.MailboxPlan > 0
Right now it is under 5000 but technically it could be well over 10,000 one day

b) what would be a good upper limit for the Count of this allMailboxes?
Same thing.. right now under 5000 but could be over 10,000 one day

c) how many users would there typically be in the below a.Users (sent or received)
This is how many messages a single user has sent. Most of the time it will be under 1000 but if someone spams it could be larger. The list of ALL users for sent messages is well over 10,000. I can put in some logs to get the exact data

modified 17-Jan-16 22:19pm.

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn17-Jan-16 17:10

Luc Pattyn

17-Jan-16 17:10

1. Yes, your first paragraph is correct.

2.
I have almost zero experience with the improved LOH GC in .NET 4.5/4.5.1/4.5.2/4.6 (yes it came incrementally!), I've read it all and I'm working on some experiments, however I do not fully trust it for the potential side effects. I'd rather avoid the fragmentation if there happens to be a reasonable way to do so. Keep in mind an LOH compaction is a potentially huge operation that is rumored to maybe take ten seconds or so, in which time your app probably doesn't respond to anything.

3.
Once you got an OOMExc, you're stuck, unless you put try-catch AND retry logic everywhere! As an OOMExc could occur in many places, you would have to:
(a) either include a lot of GCSettings.LargeObjectHeapCompactionMode=...Once statements,
(b) or trust that by setting it once at the start of some/all intervals would suffice.
But that assumption might be hard to proof correct.

Anyway I'd be more interested in the newer forms of GC.Collect, see here[^] but that requires 4.6

You could also set LargeObjectHeapCompactionMode to once AND call GC.Collect(2) to force a LOH collect right away. That should work on 4.5.1

BTW: I've seen an article suggesting a timer that periodically causes an LOH compaction, but that sounds horrible, it would not synchronize to your app at all.

4.
I'm inclined to recommend you eventually switch to a 64-bit app if that is at all possible. Yes pointers become twice as large (so lists become LOH candidates sooner), but the usable virtual address space theoretically grows from 2 GB (maybe 3 or 4) up to 8 TB (other limits may apply, the Windows memory system is pretty complex!).

A 64-bit app needs a CPU that supports x64 and a Windows OS version that does the same (check under MyComputer/Properties, not sure how virtual machines handle it), and a .NET app that is built for "Any CPU", an option in Build/ConfigurationManager (no problem, unless you are referencing libraries/DLLs that are built explicitly for 32-bit).

5.
Warning: every little step you take to solve a problem like this may make it less probable, so it becomes harder to detect if anything more needs to be done. It is essential you find and use a way to "stress test" your code, maybe by shortening intervals, entering duplicates of actual list elements, etc.

6.
Finally, depending on how your program is going to be used, maybe an automatic periodic restart is quite acceptable. Application.Restart()[^] works really well!

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8617-Jan-16 17:21

JD86

17-Jan-16 17:21

I have no issues moving it to x64 except it seems like it would use way more memory than it should even if I switched it right?

I have shortened the intervals (just a xml file) to every 30 minutes... i'm querying all of this from Exchange so I don't want to run it too often. I may need to just create some fake data and seed it into a list and bypass Exchange for testing.

This is a Windows Service... The restart would technically help but I don't like that idea lol

I'm testing your ListOfLists class right now.. worse case I could combine my powershell command sql insert statements so i'm not passing around Lists. However the powershell command does return a ICollection of PSObjects so most likely I guess we would end up in the same situation?

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn17-Jan-16 18:32

Luc Pattyn

17-Jan-16 18:32

1. The C# code remains the same; after JIT compilation, your code is somewhat larger, this won't be relevant.
IIRC the smallest object grows from 32 to 48B when switching to x64.
And obviously every reference grows from 4 to 8B.
So memory usage is bound to be less than twice the original, and typically much less than that, as value types, texts, etc. don't grow at all.

2.
A piece of code that returns a collection and doesn't care about fragmentation issues probably is based on an array; so is the ToList() you're using in LINQ. An array (or an array-based class) is just the easiest to produce and consume.

There are alternatives, such as streaming (produce while consuming, never have it all in memory; that still is likely to contain an array internally, a smaller one). Or just asking for less data at once (you could keep start and end datetimes closer to each other, possibly in a loop, inside ExchActions.Get_TotalSentMessages().

BTW: My ListOfLists class is an IEnumerable, i.e. to the consumer it only shows one element at a time. That is extreme streaming! Fortunately the compiler converts "yield return" statements in all the code required to keep track of where the consumer is currently fetching the data.

3.
What the powershell interface gives you is an ICollection, which is slightly more than an IEnumerable (e.g. it has a Count property, and an Add method). From this one can only gamble how it is implemented. Or look at it using Reflector or some other tool.

4.
I'm not sure you need ToList() in your LINQ statements. Dit you try without?
Select() returns an IEnumerable, and I expect that is all you are needing. I don't know how Select is implemented, I would hope it works in some kind of streaming mode (IEnumerable in, IEnumerable out, and produce while being consumed).
If this holds true, that is a number of potentially big objects you no longer need.
You would have to declare differently, and count yourself (while at it, I also summed the
bytes!):

IEnumerable<MessageTrackingLog> totalSentLogs=LINQ statement without .ToList();
int sentCount=0;
int sentBytes=0;
foreach(MessageTrackingLog item in totalSentLogs) {
    sentCount++;
    sentBytes+=item.TotalBytes;
}

which is much cheaper than having the CLR hand you a List first, and then use LINQ to process it.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8619-Jan-16 3:26

JD86

19-Jan-16 3:26

Just wanted to update you that i've tried a couple different things... The first thing i've tried is completely disabling all Exchange actions (message tracking logs, mailbox sizes, etc).

Now it basically just processes the Active Directory options which has a total of 4533 that can be put in a list.

What I am finding is the memory usage is still up to 1GB now even with all the tasks disabled and growing.

I've had this service working without memory issues in the past. I completely rewrote it changing from Entity Framework to Linq to SQL because I didn't want to worry about the "context" being different. My goal was to make it where the scheduler version could last multiple version of the primary application. I'm really starting to wonder if its Linq to SQL because nothing should be over 5000 in a list now after disabling those other options.

I may try switching to using SqlConnection and SqlCommand for a test (BTW I updated my code if you want to check it out again at the current state)

Re: Memory leak trouble with Linq to Sql and multiple threads

JD8621-Jan-16 4:04

JD86

21-Jan-16 4:04

Luc,

I've been running ANTS Memory Profiler 8.8 on the service... the Large Object Heap size is actually only about 40MB according to this profiler.

It shows the "Private Bytes" and "Working Set - Private" as the one that has all the memory.

When I took a snapshot this is what it is saying:

-> Generation 1 0 bytes
-> Generation 2 -> 1.105MB
-> Large Object Heap -> 2.645MB
-> Unused memory allocated to .NET -> 108.6MB
-> Unmanaged -> 618.6MB

It also shows this for class list (Live size (bytes)):
-> ConditionalWeakTable<tkey, tvalue="">+Entry<Object, PSMemberInfoInternalCollection<psmemberinfo>> (8,073,936 bytes)
-> Int32[] (2,726,312 bytes)
-> string (337,040)
-> AdsValueHelper (151,956)

and it just goes down from there. It does show "string" has 4,782 live instances and "AdsValueHelper" has 4,221 live instances

Re: Memory leak trouble with Linq to Sql and multiple threads

Luc Pattyn21-Jan-16 15:06

Luc Pattyn

21-Jan-16 15:06

1. I have no idea what all that means.

2.
I'm not a PowerShell user, nor will I become one any time soon.
I have been reading up on it a bit, and seem to have hit on two reasons for it to leak memory:

one of the first results googling "C# powershell memory leak"[^]

leaky PowerShell scripts[^]

3.
If I were to expect lots of output from something like PowerShell, and having seen the number of questions and complaints on it after a 1 minute Google, I would opt for a file interface: launch it with Process.Start() and have it create a file, hence avoiding most potential trouble.

4.
I recommend you reduce your program to a fraction of the intended functionality, make the memory consumption numbers very visible, and work on it till your "climbing slowly" is completely gone. Then iteratively add code and functionality, keeping a sharp eye on the memory situation at all times.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

better way to Linqify this ?

BillWoodruff9-Jan-16 2:01

BillWoodruff

9-Jan-16 2:01

edit: Piebald and others raised concerns here about the possibility that the use of IEnumerable<T>Count() here would make the code break with Stack and Queue.

That is not the case; the code works:

private void TestChunking()
{
    string testString = "aaabbbcccdddeeefffggghhh"; 
    int[] intary = new int[36];
    List<int> intlist = new List<int>(36);
    Stack<int> stack = new Stack<int>();
    Queue<int> queue = new Queue<int>();
    
    for (int i = 0; i < 36; i++)
    {
        intary[i] = i;
        intlist.Add(i);
        queue.Enqueue(i);
        stack.Push(i);
    }
    
    var result1 = intary.ToChunkedKvPList(9);
    var result2 = intlist.ToChunkedKvPList(6);

    // I reverse the Stack here so results would be in expected order
    var result3 = stack.Reverse().ToChunkedKvPList(9);

    var result4 = queue.ToChunkedKvPList(4);
    var result5 = testString.ToChunkedKvPList(3);
}

The goal here (Extension method on IEnumerable) was to take an IEnumerable of any Type, and a chunk-size, and return a List of KeyValuePairs where each KeyValuePair had as its 'Key the first element in a chunk, and the KeyValuePair 'Value contained the all-but-the-first element in the chunk:

public static class IEnumerableExtensions
{
    public static IEnumerable<KeyValuePair<T1, List<T1>>> ToChunkedKvPList<T1>(this IEnumerable<T1> source, int chunksz)
    {
        if(source.Count() % chunksz != 0) throw new ArgumentException("Source.Count must equal ChunkSize modulo 0");
    
        int ndx = 0;
        int listsz = chunksz - 1;
    
        return source
            .GroupBy(x => (ndx++/chunksz))
            .Select(grp => grp.ToList())
            .Select(lst => new KeyValuePair<T1, List<T1>>(lst[0], lst.GetRange(1, listsz)));
    }
}

Yeah, this works, but I remain convinced there is probably a much more elegant way of doing this using Linq; a way that would not require using an indexer external to the Linq operation. Perhaps a way to avoid two levels of 'Select ?

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

modified 12-Jan-16 6:18am.

Re: better way to Linqify this ?

Sascha Lefèvre9-Jan-16 2:25

Sascha Lefèvre

9-Jan-16 2:25

You can avoid the first Select like this:

return source
  .GroupBy(x => (ndx++ / chunksz))
  .Select(grp => new KeyValuePair<T1, List<T1>>(grp.First(), grp.Skip(1).ToList()));

This also makes the variable listsz obsolete.

I don't see a way to get rid of the external indexer without making it more convoluted.

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

Re: better way to Linqify this ?

BillWoodruff11-Jan-16 8:06

BillWoodruff

11-Jan-16 8:06

Thanks for this excellent response, Sascha.

It would be interesting to know if the use of 'First and 'Skip makes for any difference in computation-time and memory use compared to the code I showed. I doubt it.

cheers, Bill

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

modified 11-Jan-16 14:40pm.

Re: better way to Linqify this ?

Sascha Lefèvre11-Jan-16 8:37

Sascha Lefèvre

11-Jan-16 8:37

You're very welcome, Bill. Best of luck for your eyes surgery!

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.