Introduction
Actually I have created this application to search for similar songs on my computer. Using it I found multiple songs in different folders, but this application can be used for searching any kind similar files. Here we will also see how we can search for the similar files from the specified folder. As per the suggestions i have also updated the tool to search from file length as well as through file hash value. if you do the file length there are very rare chances of finding similar file even if they are different. but in case of file hash, there is no chances of grouping different fine and consider it as single one. File matching through hash will take more time then comparison through file length.
Using the Code
We have used C# and WPF for the application development. My background is with Windows forms so my WPF code may not be optimized. Suggestions will be welcome.
First of all we need to find all the files from the specified folder. To fulfill this requirement I have created one recursive function for getting all the files from specified folder. The below method will give you all the files (in form of List<FileAttrib>
) containing in the Given Directory (we need to pass DirectoryInfo
object)
private List<FileAttrib> GetFiles(DirectoryInfo dinfo)
{
List<FileAttrib> files = new List<FileAttrib>();
if (this.searchExtension == "*")
{
files.AddRange(dinfo.GetFiles().Select(s => this.ConvertFileInfo(s)));
}
else
{
files.AddRange(dinfo.GetFiles().Where(g => g.Extension.ToLower() ==
string.Format(".{0}",
this.searchExtension.ToLower())).Select(s => this.ConvertFileInfo(s)));
}
foreach (var directory in dinfo.GetDirectories())
{
files.AddRange(this.GetFiles(directory));
}
return files;
}
We have created one class to store FileInfo
object with required information. FileAttrib
is the class used to store information. Here is the FileAttrib
class code.
public class FileAttrib
{
public string fileName { get; set; }
public string filePath { get; set; }
public string fileImpression { get; set; }
public double fileLength { get; set; }
}
We have stored only FileName
, FilePath
and FileImpression
(it defines your file identity.) We have created one function that Converts our FileInfo
object into FileAttrib
type. Below is the code for conversion.
private FileAttrib ConvertFileInfo(FileInfo finfo)
{
return new FileAttrib
{
fileName = finfo.Name,
filePath = finfo.FullName,
fileImpression = isHashSearch ? this.FileToMD5Hash(finfo.FullName) : null,
fileLength = finfo.Length
};
}
You can see one FileToMD5Hash
function to generate FileImpression
information from the given file location. Below is the code for generating file impression from the file location.
private string FileToMD5Hash(string _fileName)
{
using (var stream = new BufferedStream(File.OpenRead(_fileName), 1200000))
{
SHA256Managed sha = new SHA256Managed();
byte[] checksum = sha.ComputeHash(stream);
return BitConverter.ToString(checksum).Replace("-", string.Empty);
}
}
Here is a LINQ query for grouping the result. This may also help you in other projects.
e.Result = this.GetFiles(dinfo)
.GroupBy(i => i.fileLength)
.Where(g => g.Count() > 1)
.SelectMany(list => list)
.ToList<FileAttrib>();
All other code is very simple to understand. Here you can find the code of the C#. I have not explained the WPF parts (as I am not expert in it). All your suggestions/complaints/appreciations are welcome. Hope this application will be helpful to you.
Step 1: Select folder to search similar files.

Step 2: Select folder for searching similar files.

Step 3: Choose search from either FileLength or from FileHash(more time consuming)

Step 4: Click on the Start button to search similar file in a group.

Step 5: Delete selected file using context menu.

In future i will keep updating this Utility. Keep posting your suggestiong/Query. you can also follow this project on codeplex @ FindDuplicateFile_CodePlex
History
31 Aug 2012 - Added Multiple File delete functionality
19 Jun 2012 - Added Search option through file length
25 May 2012 - Initial Release