Hi,
Nice question,
Solution :
http://msdn.microsoft.com/en-us/library/dd997372.aspx[
^]
Better Performance[
^]
Memory Map File.
I have some other solution if you interested :)
[Update]
You concern is to Save memory right ?
You can use Split file and search for unique, but the problem is when your searching for unique numbers comes to end it may cross the memory limit.
So its better to first sort then search,
For best performance,
Fallow these Steps :
1. Split File in small parts.
2. Write a function which will number from each file and delete it from that file.
3. Start separate threads and pass each file to that function.
4. Create some data block files like 1 - 1000,1001-2000,2001-3000...
5. Put each number to its block ( but before putting it just check is that number already exists or not) if not exists then inset.
6. When your code will traverse all the file you will already have the unique in that data files.
7. Now marge all data blocks files to a single file.
Tips.
1. Don't use more that 10 threads.
2. Initially split each file's in 10 to 20 KB size.
3. Don't create data blocks more than 3000 ( You can use serialization [Binary] but just for process don't hold it in memory. )
If you don't understand any part then feel free to ask.
Sorry for let reply