Click here to Skip to main content
16,012,107 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
HI i have a file containg card numbers . It has 1 million card numbers i need to check for duplicate card numbers in this file and report it in other file and generate other file containing all unique card numbers . I cant use ant database for this operation .Please guide which is the best way to do this in minimum time .
Posted
Comments
Zoltán Zörgő 18-Dec-12 8:12am    
Why can't you use a database? Is it a homework? If not, you have really several embedded DBEs to do that.

Read the file into a string array, sort it, and run through looking for identical values, which will be next to each other.

Or, use Linq methods:
C#
string[] lines = File.ReadAllLines(@"D:\Temp\101.txt");
string[] dups = lines.GroupBy(i => i).Where(g => g.Count() > 1).Select(g => g.Key).ToArray();
 
Share this answer
 
Comments
prejval2006 18-Dec-12 8:35am    
Will it be effiecient way ? i mean any idea how time will it take to parse through 1 million records
OriginalGriff 18-Dec-12 9:29am    
Try it.
I think you might be surprised.
(I can't try it for you - I don't have a file containing 1 million of your card numbers...)
Espen Harlinn 18-Dec-12 8:47am    
Two lines? ;)
OriginalGriff 18-Dec-12 9:28am    
Well, I could have got into one...:laugh:
Espen Harlinn 18-Dec-12 9:58am    
I know you certainly can - it was just a too good an opportunity to miss ;)
Or perhaps:
List<string> uniqueElements = (from w in System.IO.File.ReadAllLines("File.txt") 
                          select w).Distinct().ToList(); 
</string>


Regards
Espen Harlinn
 
Share this answer
 
Comments
Sergey Alexandrovich Kryukov 29-Dec-12 20:48pm    
Nice, a 5.
—SA
Espen Harlinn 30-Dec-12 5:32am    
Thank you, Sergey :-D
I would suggest to parse the file line by line in a separate thread so that the your application still responsive and you can cancel the operation at any time you want. Also its not good idea to load the whole file in memory since 1 million line could eat some memory based on how much data the file contains.

you may try
C#
TextReader or StreamReader ReadLine method.

varieties of read methods also there for your choice.
Read more TextReader.ReadLine in MSDN[^]

C#
FileStream fileStream = new FileStream(unit.FileName, FileMode.Open, FileAccess.Read, FileShare.Read, 256,FileOptions.SequentialScan);
StreamReader streamReader = new StreamReader(fileStream);

while(true)
{
 string line = streamReader.ReadLine(); // reads a line and moves the file pointer to next line
 if(line == null)
 { 
   break; you reached the end of line.
 } 
 //parse the line
}
streamReader .Close()
fileStream .Close();
"FileOptions.SequentialScan" will speedup your file reading if you are reading from the top to bottom
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900