|
Well, I just dloaded this today and made some small changes quick so i could use it for my own purposes, but now I get this thread exception. I see that there was a post already, but I don't see the solution...
--exception occurs in frmMain.cs at---
private void deleteCheckedDuplicateFiles()
{
...
int len = lstFiles.CheckedItems.Count;
...
}
--exception from VS2008
"An unhandled exception of type 'System.InvalidOperationException' occurred in System.Windows.Forms.dll
Additional information: Cross-thread operation not valid: Control 'lstFiles' accessed from a thread other than the thread it was created on."
I was hoping to put this to use sometime today yet. Can anyone offer a quick fix? (code is currently using MD5 hashing, not that it matters.)
Thanks.
|
|
|
|
|
Hi,
Cross thread exception is thrown when a programm is calling a form Property from a child running thread, I guess it happens only in debug mode (F5)
public partial class frmMain : Form
{
private Thread tdBad;
private void BadMethod()
{
frmMain.Button1.Text = tdBad.Name;
}
private void MethodCalledByThread()
{
BadMethod();
}
}
The new version I posted 2day contains cross thread exception free sources.
O TEMPORA ! O MORES !
|
|
|
|
|
Its just what i need - i had almost finished my own project of exactly the same thing when i found this! I assume you went down a similar path to me - i was looking for a simple, free duplicate image program and they all A) REALLY sucked, B) cost ridiculous amounts of money and C) had obvious missing features. This simple program of yours does more than any of them and its open source
It is quite well architected - only comment i have after very quickly skimming the code is some of it seems a bit overly complex - like you seem to have implemented your own directory recursion, rather than using Directory.GetFiles() which supports this automatically!?? Also you use a lot of FileInfo and DirectoryInfo objects which are more expensive than just using the string paths when you dont seem to make any other use of the Info classes. Perhaps you did and i just missed it in my quick scan.
In any case, great job i like it a lot
|
|
|
|
|
Thank you,
In a way, I could have done it that way, But MY problem was that Dir.getFiles is not event enabled, and cannot return files if ran from the root folder (try Directory.getFiles(@"C:\");) wich is due to permissions...
This applies also to MD5 class wich extends the .net Crypto class and is event enabled.
I used FileInfo List so as to access filesystem only once. {optimisation }
=Xc@libur= wrote: Its just what i need - i had almost finished my own project of exactly the same thing when i found this!
Anyway... I would have liked to see your project on CODEPROJECT...
Perhaps very soon
Credo quia absurdum
|
|
|
|
|
I am in need of a tool similar to this, however to find non duplicate files with the same name.
|
|
|
|
|
hello,
I don't know if it really needs an application, using the explorer, press F3 button and search for all files i.e: *.* in advanced search mode and sort them by name.
:: YOU make history ::
|
|
|
|
|
Hello,
Yes that is a solution. However with currently around 8000 files, and the need to check for duplicate names on a daily basis the amount of time involved as well as the high possibility for human error on such a scale proves this solution not viable in my circumstances. Thanks for the thought though.
PS. I've attempted another solution.
Command line I CD to the directory. Use:
DIR /S >1.txt
the /S lists files in the subdirectories.
the >1.txt outputs the returned lines into a text file named 1
Then I imported the file into excel, sort by filename then compare with a logical function if the one is equal to the one underneath it. Then sort by the logic return column and I have a list of duplicate name files.
One problem however, each line does not contain the location of each file, so I have a list but not sure what directories they are in, I can run a search for them.
I thought it would an easy option to add into the program being that it already creates and compares the files the directory and sub-directories.
Thanks
modified on Tuesday, September 16, 2008 11:16 AM
|
|
|
|
|
Could you dir /s /b to get a list of all files, including path, import that into one column, then generate a second column containing just the content past the last "\", and sort on that latter column?
|
|
|
|
|
Yeah, that works.
Takes a few minutes of work in xcell but not too bad. Thanks a lot.
|
|
|
|
|
Thats really a nice article thanks.
Aamer A. Alduais
final_zero
My Favorite Qoute
"Faliure is the beginning of Success"
Aamer A. Alduais (^_^Me^_^)
|
|
|
|
|
Thanx Man, It should encourage you to start writing, you know, all contents are interesting specially networking articles are always helpfull, you should start by now!
Because -----|
|
|
|
V
<div class="ForumSig">:: YOU make history ::</div>
|
|
|
|
|
Oh Yeah I will start writing but as soon as my life problems are finished because -unfortunately- I have a lot of troubles in my life man !
However I will start writing soon thank you for this nice replay man .
Aamer A. Alduais
final_zero
My Favorite Qoute
"Faliure is the beginning of Success"
Aamer A. Alduais (^_^Me^_^)
|
|
|
|
|
final_zero wrote: Faliure is the beginning of Success
Good Luck with your troubles (we're in R A M A D A N, so... !)
:: YOU make history ::
|
|
|
|
|
Nice app but the hashing part of the process takes significant time, several hours at least on a new Quad Core PC with roughly 100k files (800gb). Could the hashes be saved in some cache file so they don't have to be re-computed every time?
It looks like the app only really uses 1 cpu so perhaps it could be sped up by computing the hashes for multiple files at the same time.
|
|
|
|
|
GregSawin wrote: Could the hashes be saved in some cache file so they don't have to be re-computed every time
Of course, just like minimalist antiviruses it should be saved in NTFS stream when available or somewhere if not, but you should be notified when file content change, and then recalculate the hash...
Well it for sure IS feasible as described above... but... is it really the goal of the app?
GregSawin wrote: it could be sped up by computing the hashes for multiple files at the same time
Do you mean using a hashing thread pool? Could U explain more?
:: YOU make history ::
|
|
|
|
|
Hi eRRaTuM, great job!
Try creating at least 2 stacks and 2 threads, one for each processor, with processor affinity set.
The push even indexes to one stack and odd to the other.
This should speed up the process as 2 threads run on different processors and process half the records each.
Hope this helps.
By the way, I created a button to export the list to csv, if you want the code mail me.
Thanks!!
|
|
|
|
|
If the goal is to find duplicates (rather than guard against illicit modifications), does it make sense to compute the MD5 hash of every file, or would it make more sense to start by computing some easy hash functions (e.g. length, CRC32, sum32, tripling-sum(*), etc.) and then only compute the MD5 hashes of files for which all of those hashes yield identical results?
(*) For each chunk of the file, {sum=sum+sum+sum+data, dropping any overflow;}
Although none of the above mentioned hash functions is very good by itself, their results tend to be relatively orthogonal. Although there would be some false positives if files were compared only using those simple metrics, I would expect them to be rare.
|
|
|
|
|
Hi,
By definfition, CRC32 calculation is based on basically reading the file once, and then compute division results...
The MD5 method does the same thing, it reads the file once ...
supercat9 wrote: and then only compute the MD5 hashes of files for which all of those hashes yield identical results?
Means that the program MUST read the file(s) at least once, and then recalculate MD5 hash for same result files...
Too long and NON OBVIOUS
O TEMPORA ! O MORES !
|
|
|
|
|
I found a smallish bug...
The file streams are not being closed after the CRC32 check completes. This leaves the files in an open "in use" state if you subsequently want to move or delete them to the recycle bin.
I added code to call the Close method on the stream after the CRC method in the crc32 class. Also I changed the file stream creation to allow full sharing of the files after they are opened:
Dim st As New FileStream(Filename, FileMode.Open, FileAccess.Read, FileShare.Delete Or FileShare.ReadWrite)
(I forgot to mention that I converted the crc32 class to Visual Basic) Maybe the C# version was automatically closing the file streams, but in VB they were being left in an in use state if you went to delete the files to the recycle bin after you did a search. That could potentially leave hundreds of file handles in an open state.
|
|
|
|
|
The stream Needed to be closed otherwise IOException is thrown when deleting the files.
I did mention it too, a bit later Anyway, I would be pleased to see how you did reorganize the code!
:: YOU make history ::
|
|
|
|
|
The only restructuring I did was to add the Stream.Close method at the bottom of the CRC method before it returns, and put more sharing on the Open.
As for converting the code from C# to VB I simply used this web site:
http://labs.developerfusion.co.uk/convert/csharp-to-vb.aspx[^]
Otherwise all the code is the same.
|
|
|
|
|
Oops, actually I did some deployment restructuring by simply placing all the source code into one .dll project and a separate project for the .exe.
|
|
|
|
|
NEW refined Version with more options is available
:: YOU make history ::
|
|
|
|
|
This is a very handy utility, nice job, I gave it a 5. Although I did restructure the source code into fewer library dll's for easier deployment.
|
|
|
|
|
Is there support for hardlinks?
Q
|
|
|
|