Introduction
How to find the files that have the same contents.
Background
I found there are many files that have the same contents but have different
file name in my computer, i want to clear them and just maint one copy. Therefore i wrote this small tool to find the same files.
Using the code
I used the dictionary to store the file's mark(calculate by MD5 alg) and the corresponding files:
Dictionary<string, List<string>> dtFiles = new Dictionary<string, List<string>>()
and then find each file under the folder and get it's mark. In order to use the Dictionary.ContainsKey method, i convert the byte[] to base64 string.
string[] strFiles = Directory.GetFiles(strFolder_);
foreach (string strFullFile in strFiles)
{
if (_bToStop)
return;
try
{
byte[] byMd5 = Xugd.Hash.XMd5.CalcFile(strFullFile);
string strMd5 = XConvert.BytesToString(byMd5);
if (dtFiles_.ContainsKey(strMd5))
{
List<string> lstFiles = dtFiles_[strMd5];
lstFiles.Add(strFullFile);
dtFiles_[strMd5] = lstFiles;
}
else
{
List<string> lstFiles = new List<string>(2);
lstFiles.Add(strFullFile);
dtFiles_.Add(strMd5, lstFiles);
}
}
catch { }
}
string[] strDirs = Directory.GetDirectories(strFolder_);
foreach (string strSub in strDirs)
FindSameFile(strSub, delStart_, dtFiles_);
}
after process all files, we can check each item in the dictionary and find out the files that has the same contents.
foreach (KeyValuePair<string, List<string>> kvFile in dtFiles)
{
if (kvFile.Value.Count > 1)
{
}
}
Points of Interest
This small article is written for those developers who are want to found the same contents file.
History
31 August 2012: First version