Re: Memory leak trouble with Linq to Sql and multiple threads - C# Discussion Boards

Re: Memory leak trouble with Linq to Sql and multiple threads

14-Jan-16 4:26

Ok code is updated and running now. I changes the triggers that were every 60 minutes to every 30 minutes to hopefully reproduce quicker.

Updated code is here:
KnowMoreIT/CloudPanel-Service · GitHub[^]

JD8614-Jan-16 14:05

Re: Memory leak trouble with Linq to Sql and multiple threads

14-Jan-16 14:05

Ok it is running with that new code is currently at 677MB. Just a few minutes ago it went up to 1.5GB and back down... however it is slowly climbing

Luc Pattyn14-Jan-16 18:26

Re: Memory leak trouble with Linq to Sql and multiple threads

14-Jan-16 18:26

Hi,

1. once again I must ask you to be more specific; "slowly climbing" doesn't offer much info.

2. changing the code and changing run-time parameters too much makes it extremely hard if not impossible to understand what sporadic measurement results actually mean.

3. I won't be in tomorrow.

4. Today I have been, and still am, researching the behavior of the Large Object Heap, and probably will write a CP article about it; if so, that will take a few weeks though.
In summary, while there may be a way to make it perform as required (with a lot of conditions), I am still inclined to avoid large objects as much as possible. And hence:

5. I developed a little class that would keep your huge message lists in the regular heap, hence avoiding the LOH fragmentation you are currently bound to get. It does not offer an initial capacity, I don't think that would be very useful. It has been tested with foreach and with LINQ. Here it is:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;

namespace LargeObjectsHeapTester {
	//
	// This class collects a sequence of items of type T (value or reference type)
	// and keeps them in the order they get added.
	// A list-of-lists technique is used to keep everything in the regular heap,
	// i.e. this class should not challenge the large object heap unless
	// the number of items collected gets really big (above 100 million).
	// To control what is inside the collection:
	//	- use Add to add an item at the end;
	//	- use Clear to clear the collection.
	// To obtain information about its content:
	//	- use the Count property;
	//	- use the IEnumerable<T> interface (or the old-fashioned IEnumerable if you must)
	//	  (both foreach and LINQ rely on the IEnumerable<T> interface when present)
	//
	// Note: value types such as structs should be small (max 8B) otherwise
	// the level1 lists may not fit in the regular heap.
	public class ListOfLists<T> :IEnumerable, IEnumerable<T> {
		public const int MAX_ITEMS_IN_LEVEL1=10000;
		//public static ILog Logger;
		private List<List<T>> level0;
		private List<T> currentLevel1;

		// constructor
		public ListOfLists() {
			Clear();
		}

		// logging utility
		private void log(string msg) {
			//if(Logger!=null) Logger.log(msg);
		}

		// empty the collection
		public void Clear() {
			level0=new List<List<T>>();
			currentLevel1=new List<T>();
			level0.Add(currentLevel1);
		}

		// add an item at the end of the collection
		public void Add(T item) {
			if(currentLevel1.Count>=MAX_ITEMS_IN_LEVEL1) {
				currentLevel1=new List<T>();
				level0.Add(currentLevel1);
			}
			currentLevel1.Add(item);
		}

		// get the number of items in the collection
		public int Count {
			get {
				int totalCount=0;
				foreach(List<T> level1 in level0) totalCount+=level1.Count;
				return totalCount;
			}
		}

		// log the internals
		public void DumpCounts() {
			int totalCount=0;
			log("level0.Count="+level0.Count);
			int i=0;
			foreach(List<T> level1 in level0) {
				int count1=level1.Count;
				log("level0["+i+"].Count="+count1);
				totalCount+=count1;
				i++;
			}
			log("total count="+totalCount);
		}

		// get an old-style enumerator (typeless)
		IEnumerator IEnumerable.GetEnumerator() {
			return this.GetEnumerator();
		}

		// get an enumerator (this one gets used by foreach and LINQ)
		public IEnumerator<T> GetEnumerator() {
			foreach(List<T> level1 in level0) {
				foreach(T item in level1) {
					yield return item;
				}
			}
		}
	}
}

6. Your LINQ statements such as

var allMailboxes = db.Users.Where(x ... ... ... ).ToList();

clearly also create lists; I don't know how large they are, if more than 10000 elements, they too will contribute to the LOH fragmentation, and have escaped every remedy I have suggested so far.

Questions:
a) how many users satisfy x.MailboxPlan > 0
b) what would be a good upper limit for the Count of this allMailboxes?
c) how many users would there typically be in the below a.Users (sent or received)

List<MessageTrackingLog> totalSentLogs = sentLogs.Where(a => a.Users.Contains(x.EmailAddress, StringComparer.OrdinalIgnoreCase)).ToList();

That's it for now.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

modified 15-Jan-16 8:18am.

JD8617-Jan-16 16:09

Re: Memory leak trouble with Linq to Sql and multiple threads

17-Jan-16 16:09

Hi Luc!

I think after reading your questions/answers and researching that I have a much better understanding of what is going on. Basically when a List has more than 10,000 objects in it then it goes into LOH. Now from what I understand the garbage collector doesn't normally "compact" these as it expects to reuse this space?

Now I read in 4.5.1 they introduced this: whi[^] which gives you the ability to tell it TO compact once and then it resets to default which doesn't compact LOH.

Now what you posted creates a List of Lists (as named) and keeps each list under 10,000 so it never enters LOH.

I'm testing your class right now and recompiling..

Edit: (Sorry forgot to answer your questions):

Questions:
a) how many users satisfy x.MailboxPlan > 0
Right now it is under 5000 but technically it could be well over 10,000 one day

b) what would be a good upper limit for the Count of this allMailboxes?
Same thing.. right now under 5000 but could be over 10,000 one day

c) how many users would there typically be in the below a.Users (sent or received)
This is how many messages a single user has sent. Most of the time it will be under 1000 but if someone spams it could be larger. The list of ALL users for sent messages is well over 10,000. I can put in some logs to get the exact data

modified 17-Jan-16 22:19pm.

Luc Pattyn17-Jan-16 17:10

Re: Memory leak trouble with Linq to Sql and multiple threads

17-Jan-16 17:10

1. Yes, your first paragraph is correct.

2.
I have almost zero experience with the improved LOH GC in .NET 4.5/4.5.1/4.5.2/4.6 (yes it came incrementally!), I've read it all and I'm working on some experiments, however I do not fully trust it for the potential side effects. I'd rather avoid the fragmentation if there happens to be a reasonable way to do so. Keep in mind an LOH compaction is a potentially huge operation that is rumored to maybe take ten seconds or so, in which time your app probably doesn't respond to anything.

3.
Once you got an OOMExc, you're stuck, unless you put try-catch AND retry logic everywhere! As an OOMExc could occur in many places, you would have to:
(a) either include a lot of GCSettings.LargeObjectHeapCompactionMode=...Once statements,
(b) or trust that by setting it once at the start of some/all intervals would suffice.
But that assumption might be hard to proof correct.

Anyway I'd be more interested in the newer forms of GC.Collect, see here[^] but that requires 4.6

You could also set LargeObjectHeapCompactionMode to once AND call GC.Collect(2) to force a LOH collect right away. That should work on 4.5.1

BTW: I've seen an article suggesting a timer that periodically causes an LOH compaction, but that sounds horrible, it would not synchronize to your app at all.

4.
I'm inclined to recommend you eventually switch to a 64-bit app if that is at all possible. Yes pointers become twice as large (so lists become LOH candidates sooner), but the usable virtual address space theoretically grows from 2 GB (maybe 3 or 4) up to 8 TB (other limits may apply, the Windows memory system is pretty complex!).

A 64-bit app needs a CPU that supports x64 and a Windows OS version that does the same (check under MyComputer/Properties, not sure how virtual machines handle it), and a .NET app that is built for "Any CPU", an option in Build/ConfigurationManager (no problem, unless you are referencing libraries/DLLs that are built explicitly for 32-bit).

5.
Warning: every little step you take to solve a problem like this may make it less probable, so it becomes harder to detect if anything more needs to be done. It is essential you find and use a way to "stress test" your code, maybe by shortening intervals, entering duplicates of actual list elements, etc.

6.
Finally, depending on how your program is going to be used, maybe an automatic periodic restart is quite acceptable. Application.Restart()[^] works really well!

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

JD8617-Jan-16 17:21

Re: Memory leak trouble with Linq to Sql and multiple threads

17-Jan-16 17:21

I have no issues moving it to x64 except it seems like it would use way more memory than it should even if I switched it right?

I have shortened the intervals (just a xml file) to every 30 minutes... i'm querying all of this from Exchange so I don't want to run it too often. I may need to just create some fake data and seed it into a list and bypass Exchange for testing.

This is a Windows Service... The restart would technically help but I don't like that idea lol

I'm testing your ListOfLists class right now.. worse case I could combine my powershell command sql insert statements so i'm not passing around Lists. However the powershell command does return a ICollection of PSObjects so most likely I guess we would end up in the same situation?

Luc Pattyn17-Jan-16 18:32

Re: Memory leak trouble with Linq to Sql and multiple threads

17-Jan-16 18:32

1. The C# code remains the same; after JIT compilation, your code is somewhat larger, this won't be relevant.
IIRC the smallest object grows from 32 to 48B when switching to x64.
And obviously every reference grows from 4 to 8B.
So memory usage is bound to be less than twice the original, and typically much less than that, as value types, texts, etc. don't grow at all.

2.
A piece of code that returns a collection and doesn't care about fragmentation issues probably is based on an array; so is the ToList() you're using in LINQ. An array (or an array-based class) is just the easiest to produce and consume.

There are alternatives, such as streaming (produce while consuming, never have it all in memory; that still is likely to contain an array internally, a smaller one). Or just asking for less data at once (you could keep start and end datetimes closer to each other, possibly in a loop, inside ExchActions.Get_TotalSentMessages().

BTW: My ListOfLists class is an IEnumerable, i.e. to the consumer it only shows one element at a time. That is extreme streaming! Fortunately the compiler converts "yield return" statements in all the code required to keep track of where the consumer is currently fetching the data.

3.
What the powershell interface gives you is an ICollection, which is slightly more than an IEnumerable (e.g. it has a Count property, and an Add method). From this one can only gamble how it is implemented. Or look at it using Reflector or some other tool.

4.
I'm not sure you need ToList() in your LINQ statements. Dit you try without?
Select() returns an IEnumerable, and I expect that is all you are needing. I don't know how Select is implemented, I would hope it works in some kind of streaming mode (IEnumerable in, IEnumerable out, and produce while being consumed).
If this holds true, that is a number of potentially big objects you no longer need.
You would have to declare differently, and count yourself (while at it, I also summed the
bytes!):

IEnumerable<MessageTrackingLog> totalSentLogs=LINQ statement without .ToList();
int sentCount=0;
int sentBytes=0;
foreach(MessageTrackingLog item in totalSentLogs) {
    sentCount++;
    sentBytes+=item.TotalBytes;
}

which is much cheaper than having the CLR hand you a List first, and then use LINQ to process it.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

JD8619-Jan-16 3:26

Re: Memory leak trouble with Linq to Sql and multiple threads

19-Jan-16 3:26

Just wanted to update you that i've tried a couple different things... The first thing i've tried is completely disabling all Exchange actions (message tracking logs, mailbox sizes, etc).

Now it basically just processes the Active Directory options which has a total of 4533 that can be put in a list.

What I am finding is the memory usage is still up to 1GB now even with all the tasks disabled and growing.

I've had this service working without memory issues in the past. I completely rewrote it changing from Entity Framework to Linq to SQL because I didn't want to worry about the "context" being different. My goal was to make it where the scheduler version could last multiple version of the primary application. I'm really starting to wonder if its Linq to SQL because nothing should be over 5000 in a list now after disabling those other options.

I may try switching to using SqlConnection and SqlCommand for a test (BTW I updated my code if you want to check it out again at the current state)

JD8621-Jan-16 4:04

Re: Memory leak trouble with Linq to Sql and multiple threads

21-Jan-16 4:04

Luc,

I've been running ANTS Memory Profiler 8.8 on the service... the Large Object Heap size is actually only about 40MB according to this profiler.

It shows the "Private Bytes" and "Working Set - Private" as the one that has all the memory.

When I took a snapshot this is what it is saying:

-> Generation 1 0 bytes
-> Generation 2 -> 1.105MB
-> Large Object Heap -> 2.645MB
-> Unused memory allocated to .NET -> 108.6MB
-> Unmanaged -> 618.6MB

It also shows this for class list (Live size (bytes)):
-> ConditionalWeakTable<tkey, tvalue="">+Entry<Object, PSMemberInfoInternalCollection<psmemberinfo>> (8,073,936 bytes)
-> Int32[] (2,726,312 bytes)
-> string (337,040)
-> AdsValueHelper (151,956)

and it just goes down from there. It does show "string" has 4,782 live instances and "AdsValueHelper" has 4,221 live instances

Luc Pattyn21-Jan-16 15:06

better way to Linqify this ?

21-Jan-16 15:06

1. I have no idea what all that means.

2.
I'm not a PowerShell user, nor will I become one any time soon.
I have been reading up on it a bit, and seem to have hit on two reasons for it to leak memory:

one of the first results googling "C# powershell memory leak"[^]

leaky PowerShell scripts[^]

3.
If I were to expect lots of output from something like PowerShell, and having seen the number of questions and complaints on it after a 1 minute Google, I would opt for a file interface: launch it with Process.Start() and have it create a file, hence avoiding most potential trouble.

4.
I recommend you reduce your program to a fraction of the intended functionality, make the memory consumption numbers very visible, and work on it till your "climbing slowly" is completely gone. Then iteratively add code and functionality, keeping a sharp eye on the memory situation at all times.

Smile | :)

Luc Pattyn [My Articles] Nil Volentibus Arduum

BillWoodruff9-Jan-16 2:01

9-Jan-16 2:01

edit: Piebald and others raised concerns here about the possibility that the use of IEnumerable<T>Count() here would make the code break with Stack and Queue.

That is not the case; the code works:

private void TestChunking()
{
    string testString = "aaabbbcccdddeeefffggghhh"; 
    int[] intary = new int[36];
    List<int> intlist = new List<int>(36);
    Stack<int> stack = new Stack<int>();
    Queue<int> queue = new Queue<int>();
    
    for (int i = 0; i < 36; i++)
    {
        intary[i] = i;
        intlist.Add(i);
        queue.Enqueue(i);
        stack.Push(i);
    }
    
    var result1 = intary.ToChunkedKvPList(9);
    var result2 = intlist.ToChunkedKvPList(6);

    // I reverse the Stack here so results would be in expected order
    var result3 = stack.Reverse().ToChunkedKvPList(9);

    var result4 = queue.ToChunkedKvPList(4);
    var result5 = testString.ToChunkedKvPList(3);
}

The goal here (Extension method on IEnumerable) was to take an IEnumerable of any Type, and a chunk-size, and return a List of KeyValuePairs where each KeyValuePair had as its 'Key the first element in a chunk, and the KeyValuePair 'Value contained the all-but-the-first element in the chunk:

public static class IEnumerableExtensions
{
    public static IEnumerable<KeyValuePair<T1, List<T1>>> ToChunkedKvPList<T1>(this IEnumerable<T1> source, int chunksz)
    {
        if(source.Count() % chunksz != 0) throw new ArgumentException("Source.Count must equal ChunkSize modulo 0");
    
        int ndx = 0;
        int listsz = chunksz - 1;
    
        return source
            .GroupBy(x => (ndx++/chunksz))
            .Select(grp => grp.ToList())
            .Select(lst => new KeyValuePair<T1, List<T1>>(lst[0], lst.GetRange(1, listsz)));
    }
}

Yeah, this works, but I remain convinced there is probably a much more elegant way of doing this using Linq; a way that would not require using an indexer external to the Linq operation. Perhaps a way to avoid two levels of 'Select ?

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

modified 12-Jan-16 6:18am.

Sascha Lefèvre9-Jan-16 2:25

Sascha Lefèvre

9-Jan-16 2:25

You can avoid the first Select like this:

return source
  .GroupBy(x => (ndx++ / chunksz))
  .Select(grp => new KeyValuePair<T1, List<T1>>(grp.First(), grp.Skip(1).ToList()));

This also makes the variable listsz obsolete.

I don't see a way to get rid of the external indexer without making it more convoluted.

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

BillWoodruff11-Jan-16 8:06

11-Jan-16 8:06

Thanks for this excellent response, Sascha.

It would be interesting to know if the use of 'First and 'Skip makes for any difference in computation-time and memory use compared to the code I showed. I doubt it.

cheers, Bill

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

modified 11-Jan-16 14:40pm.

Sascha Lefèvre11-Jan-16 8:37

Sascha Lefèvre

11-Jan-16 8:37

You're very welcome, Bill. Best of luck for your eyes surgery!

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

Richard Deeming11-Jan-16 11:00

Richard Deeming

11-Jan-16 11:00

You could also get rid of the second Select:

return source.GroupBy(
    x => (ndx++ / chunksz), 
    (key, grp) => new KeyValuePair<T1, List<T1>>(grp.First(), grp.Skip(1).ToList()));

Enumerable.GroupBy(TSource, TKey, TResult) Method (IEnumerable(TSource), Func(TSource, TKey), Func(TKey, IEnumerable(TSource), TResult)) (System.Linq)[^]

Add in a KeyValuePair<TKey, TValue> factory method:

public static class KeyValuePair
{
    public static KeyValuePair<TKey, TValue> Create<TKey, TValue>(TKey key, TValue value)
    {
        return new KeyValuePair<TKey, TValue>(key, value);
    }
}

and the statement becomes almost readable:

return source.GroupBy(
    x => (ndx++ / chunksz),
    (key, grp) => KeyValuePair.Create(grp.First(), grp.Skip(1).ToList()));

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Sascha Lefèvre11-Jan-16 23:00

Sascha Lefèvre

11-Jan-16 23:00

I'd actually prefer the separate .Select over the .GroupBy with resultSelector, to me that's a split second faster to recognize. I like the idea with the factory method though Smile | :)

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

BillWoodruff12-Jan-16 0:19

12-Jan-16 0:19

thanks for this ! Bill

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

PIEBALDconsult9-Jan-16 6:09

PIEBALDconsult

9-Jan-16 6:09

I suspect that would be so much easier (and quicker) in straight procedural code.

And IEnumerable doesn't have a Count member; so you're doomed to failure from the first statement. Maybe you want to use IList instead? I think a better behaviour would be to not attempt to count the items, but to leave the final item short or to pad the final item with default(T1)s Dead | X|

. Maybe even allow the caller to specify which behaviour to use (throw, pad, as-is). And, of course, document such behaviour.

OriginalGriff9-Jan-16 6:17

OriginalGriff

9-Jan-16 6:17

PIEBALDconsult wrote:
And IEnumerable doesn't have a Count member;

:cough: MSDN[^] :cough:

Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...

Jörgen Andersson9-Jan-16 12:53

Jörgen Andersson

9-Jan-16 12:53

But that's an extension method, and it would consume the IEnumerable while counting.

Wrong is evil and must be defeated. - Jeff Ello

BillWoodruff9-Jan-16 13:52

9-Jan-16 13:52

That's an interesting comment: the word "consume" usually means "use-up;" but, in this case, the code works, and works because a source IEnumerable can be "used" any number of times.

Of real interest is whether multiple evaluations of the IEnumerable source are very expensive ... in terms of memory, time.

Perhaps it is the case that transforming the IEnumerable to a List<T;> is a good thing to do, if it needs to be evaluated more than once.

thanks, Bill

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin

PIEBALDconsult9-Jan-16 16:05

PIEBALDconsult

9-Jan-16 16:05

BillWoodruff wrote:
because a source IEnumerable can be "used" any number of times.

Not all of them; and you can't tell. Queue and Stack implement IEnumerable, but they can be consumed only once (fortunately they have Count properties).

BillWoodruff wrote:
whether multiple evaluations of the IEnumerable source are very expensive ... in terms of memory, time.

It may have to enumerate it fully; that takes time. Enumerating may also involve file or network access or similar (e.g. database access, reading from a socket) that uses time, IO, and memory.
And this particular result is not worth the effort in this case; so it's a waste.

Of course, it's possible that the Count method you use checks for certain types (e.g. Stack, Queue, Array, String) or interfaces (e.g. IList) and then uses the appropriate Count or Length methods rather than enumerating, but failing that, it must enumerate.

But the bottom line, in this case, is that there is no reason to check the Count anyway. And as Luc pointed out, if you're checking the Count you might as well check the chunksz before trying to divide by it. Which then leads to the question "what to do when the caller specifies a chunksz of zero?" -- and I suspect the "best" thing to do is to treat it as an "all the rest" value. But that's just my thought.

I suggest leaving the burden of checking such things to the caller. Document what the method does and let the buyer beware.

modified 9-Jan-16 23:34pm.

BillWoodruff11-Jan-16 8:10

11-Jan-16 8:10

Thanks for the interesting response, the "quick sketch" I showed here was not meant to show all the programmer-is-an-idiot-proofing that might go in "production code."

I'll follow up on your comments by doing some testing with Queues and Stacks; never even thought of trying those.

cheers, Bill

«Tell me and I forget. Teach me and I remember. Involve me and I learn.» Benjamin Franklin