Re: !!!DO NOT DUPLICATE CODE!!! - Visual Basic Discussion Boards

Luc Pattyn6-May-10 8:34

6-May-10 8:34

good.
if you just remove the progress stuff for now, you can judge how much it did affect the overall speed.
then, add proper progress reporting; when done right, it should not influence speed at all.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: Multithreading slower?

Luc Pattyn6-May-10 12:38

Luc Pattyn

6-May-10 12:38

NikWing wrote:
result = Nothing hashclass.HashFile(aryFi_hashdups(r2).FullName)
myNewRow = dt_db_cmp_fldr_b.NewRow()
myNewRow("filesize") = aryFi_hashdups(r2).Length
myNewRow("sha256hash") = result

I had a closer look at this part of your code; how is result getting its value here?
HashFile is a method that returns "result" but the caller is not storing it at all?

And inside HashFile the variable result is not declared and not initialized. This is all wrong.

Unsure | :~

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: Multithreading slower? [modified]

NikWing6-May-10 13:23

NikWing

6-May-10 13:23

Thanks for your time, Luc Smile | :)

no no, I just didn't paste the code you're missing Smile | :)

the program is about 4200 lines right now so I kinda forgot to paste this:

Public Module Variables_form1
    Friend result As String = ""
End Module

so I set result to nothing, call the hashfile function with the full filename, it returns the hash and stores it in a new datatable row

like I said, the original program runs without a flaw, just the double-thread stuff breaks it somehow ...

I gave up for today 10 mins ago.
the timer instead of invoke didn't speed it up.
the progressbar value is between 96% and 100%, usually 99%.
progbar.Maximum value is aryFi_hashdups.count, current value is the sum of each FOR .. NEXT counter, ((r1) + (r2 - aryficnt_b + 1)) where aryficnt_b = Math.Ceiling(aryFi_hashdups.Count / 2)

But beside this, the tables are broken somehow
I thought the problem occurs when I copy each table row by row to a new table
but both have the correct amount of lines, as in, I hashed 838 files, each table contains 419 rows
I skipped making a new table out of table_a and table_b and altered the compare part to use table_a, then table_b.
the problem still occurs, the number of results varies. I tried it with my single thread version, the result won't vary and it's correct.

I now try to find out if the aryFi_hashdups array varies. I can't imagine this, why should it alter itself while the program runs? it's just a list of filenames as far as I know (di.getfiles(*.*), and after it's created nothing changes it ...)

I divide it's count by 2, round one to floor, one to ceiling and have 2 loops

Dim aryficnt_a As Integer = Math.Floor(aryFi_hashdups.Count / 2)
For r1 = 0 To aryficnt_a - 1
Next

and

Dim aryficnt_b As Integer = Math.Ceiling(aryFi_hashdups.Count / 2)
For r2 = aryficnt_b To aryFi_hashdups.Count - 1
Next

I just don't understand, I don't see anything that interferes ...

edit: uh, just thought of something ... could the problem be that I only have one hash function and also just one "result" ?
I'll copy the hash function and change result to result_a and result_b and see if that works ...
I will try that now, though it's already 1:25 am ^^

edit2: I guess that was the problem of the wrong results. now the result won't vary anymore. I also fixed the progressbar value, I have to make more tests with different files, but for now it finishes with 100%

with taskman I monitored the cpu usage. without visible caching (1st run) it uses between 2 and 5% of CPU, no idea why it doesn't use more. the hdd isn't very active, only flashing now and then.
WITH visible caching (2nd run etc) it uses 50% of the CPU and of course is much faster.

I have to compare both versions of the code again, how much time each takes for 1000 files and if anything improved ...

modified on Thursday, May 6, 2010 7:32 PM

!!!DO NOT DUPLICATE CODE!!!

Luc Pattyn6-May-10 13:37

Luc Pattyn

6-May-10 13:37

NikWing wrote:
Public Module Variables_form1
Friend result As String = "" End Module

OK, that is completely wrong and would explain why some of the hash values are off. You only have one "result" variable, yet two threads are writing to it and reading from it. So the order of operations will be determined by chance, and at some point in time it could be:

thread1 writes 111
thread2 writes 222
thread2 reads, hence gets 222
thread1 reads, hence gets 222 instead of 111

furthermore, while executing, the HashFile function constantly changes "result". So it is almost by accident that some of your hashes end up being correct!

Your HashFile function returns the result, no need to use an external variable at all. Here is what you do:
remove Friend result As String = ""
add a result variable inside HashFile
where you call HashFile, do it like so: Dim result as Long=...HashFile...
so now each thread receives the correct value and stores it locally.

Haven't read most of your message yet. May have more to comment later.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: !!!DO NOT DUPLICATE CODE!!!

NikWing6-May-10 14:03

NikWing

6-May-10 14:03

alright

did it and it still works Big Grin | :-D

(though I had to use String instead of Long because of hex conversation errors)

that's the problem when you (I mean myself) don't know what you're doing.
searching for infos on the WWW might result in something that works, but it's hard to see if it couldn't be way better, safer, faster ...

Since I used another class I couldn't just Dim result like other variables, so I googled and found the way with Friend, I didn't know how to use the Return value Smile | :)

Since I edited the code using what you suggested, do I still need 2 hash functions? for testing it I copied the function and now have hashfile_a and hashfile_b.
Would having just 1 function matter if 2 threads access it? I'm not sure but I would say it doesn't matter?

Re: !!!DO NOT DUPLICATE CODE!!!

Luc Pattyn6-May-10 14:12

Luc Pattyn

6-May-10 14:12

NikWing wrote:
I had to use String instead of Long

right, I hadn't spotted yet your hash is a string; when I calculate hashes, I use int or long, that works much faster, however SHA256 returns a byte array, so you want to turn that into one thing and chose a string; I still might turn those 16? bytes into one long, but that isn't really relevant.

anyway, a double advice: start studying from a real source, i.e. a book; and then solve your problem with much much much less code. It is complete non-sense to write 4000+ lines of code while not knowing the fundamentals of the language.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: !!!DO NOT DUPLICATE CODE!!!

NikWing6-May-10 14:57

NikWing

6-May-10 14:57

I'll try to Smile | :)

by the way, I just took the operation time again.
1000 files each time, non-identical, 1st run, so no caching.
single thread: 35 seconds
dual-thread: 54 seconds ... D'Oh! | :doh:

it's ... just ... odd ... WTF | :WTF:

oh well, bed now, it's 3 am.

Re: !!!DO NOT DUPLICATE CODE!!!

Dave Kreskowiak6-May-10 16:57

Dave Kreskowiak

6-May-10 16:57

It's not that odd when you consider that starting a thread is a very expensive operation by itself. And, judging by the code that has been written and discussed with Luc, it's not written with threading in mind. Just launching a second copy of the same code you wrote for a single threaded operation in another thread does not mean it's correct for multiple thread operation. Much greater care must be taken to make sure you don't introduce resource contention bottlenecks (access collections, I/O operations, ...) and situations where two or more threads are all waiting on each other to give up control, called a deadlock, as well as others.

Chances are your single-threaded code has to be scrapped and completely rewritten if you want to support multiple threaded operation. And even then, that's no guarantee your entire operation is going to run faster than the single threaded version. You also have to look at what you're doing. Keep in mind that your every day disk can't read more than one part of a disk at any one time, so, depending on your software design, only one thread will get data at any one time. Your algorithm could conceivably thrash the disk, moving the heads back and forth between two areas of the disk constantly, trashing the disk cache and spending more time seeking the heads than reading a segment of data.

A guide to posting questions on CodeProject[^]

Dave Kreskowiak
Microsoft MVP
Visual Developer - Visual Basic
2006, 2007, 2008
But no longer in 2009...

Re: !!!DO NOT DUPLICATE CODE!!!

NikWing7-May-10 0:43

NikWing

7-May-10 0:43

Hello Dave Smile | :)

I just read more into asynchronous file reading, which seems to be a good way to prevent context switching etc.
this might be a way to go, though I don't yet see how to implement it since it's using a callback delegate and StateObj.
So IF this is a better/faster way (like it's described here: http://www.xtremevbtalk.com/showthread.php?t=195997[^]) I just have to find out how to get it to make the hash and return it to where it was initiated, so it finally can be added to the datatable ...

Re: Multithreading slower?

Luc Pattyn6-May-10 13:57

Luc Pattyn

6-May-10 13:57

OK, I looked at some more and now I feel sick.

You have over 4000 lines of such code? I suggest you stop right away, buy a book on VB.NET, and study it (you may want to skip chapters on topics that are not of interest yet, e.g. networking, database, etc). After studying say one week, sit down and create your app from scratch. It may sound harsh, it will pay off pretty soon though.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: Multithreading slower?

NikWing6-May-10 14:21

NikWing

6-May-10 14:21

LOL

well, this function is only one part of my program.
it's mainly for picture management.

one part is a duplicate file finder that 1st sorts out files that exist only once, then compares the left over file size-identical files by their hashes, display the results in a datagridview with the option to delete selected rows.
next part is the problem above. hashing all leftover unique files, comparing them with a hashtable (database), then display the results in a datagridview. if I click on a field it will show the image in an imagebox (resized) then I can delete the found files, select the next hashtable database and compare the leftover files without rehashing (I'm clever, LOL)
another part allows me to add new hashes to the databases
another part allows me to compare the databases with themself or each other, so I can throw out duplicate hashes
(and some other minor parts/functions I need)

and well, I can't complain, it does it's job, saves me a lot of time Smile | :)

and with your help it works a little better now Big Grin | :-D

my problem is, I don't learn that well by books, I usually learn by doing.

for example: just recently I thought it might be faster to make a dataview using the hashtable and then search the hash of a new file in it. I found out it's WAY faster than how I did it before.
Before I had 2 FOR...NEXT loops, compare a row from table_a with every row in table_b. the hashtable grows and grows, so doing it that way takes a lot of time. 400,000,000 comparisons took more than 5 minutes, maybe 10.
now it's so fast it takes like 3 seconds or something Smile | :)

Thank you

Re: Multithreading slower?

Luc Pattyn6-May-10 6:14

Luc Pattyn

6-May-10 6:14

Your hash values not being correct means your code is not OK; you have to fix that first, before you can start working on performance and multi-threading.

BTW: to keep things simple, I would not touch any DataTable/DataSet in the threads, just store the results in an array, or a Dictionary<string filename, int hashValue>; then after the joins, enumerate and store the results.

Smile | :)

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: Multithreading slower?

supercat96-May-10 6:54

supercat9

6-May-10 6:54

Using multiple threads will be helpful if it's possible for the different threads to simultaneously be performing useful work, generally using different resources. If two or more threads need to read files off the same physical disk, each thread will likely have to wait any time other threads access the disk, so adding additional threads won't help anything.

Note that because the operating system has its own caching and read-ahead logic, it's often difficult to predict how any particular piece of code will perform. I'm sure that in general the system caching boosts performance, but it does make it much harder to optimize code.

Re: Multithreading slower?

NikWing6-May-10 7:56

NikWing

6-May-10 7:56

Hello

yes, I noticed that caching thing. if I restart the process it usually hashes a lot faster
but that won't happen if I use new files or different folders, at least it seems so because it as slow as on 1st start Smile | :)

hashing simultaneously was my idea
the files in the folder are pictures, usual folder size is around 200 MB, though it can vary.
I'm not sure but I think it should at least be faster than trying to hash files that are 200 MB each, copying the files doesn't take that much time either.

I know of a (professional) program that can find duplicate images. It allows the user to set how many threads should be used for processing files. It processes files in a list which are added while scanning for these files in folders.
It works really fast, also uses a progressbar and shows div. informations while working.
And the hdd is really active during this.

It just hard to find informations on the WWW that show how to do that Smile | :)

Nik

Re: Multithreading slower?

supercat96-May-10 9:11

supercat9

6-May-10 9:11

I'd like to have a utility to find duplicate files and implement a reasonable backup/archiving approach. One thing I would think might be helpful with large files would be to start by producing a catalog of file sizes. If a file size is unique, one needn't hash anything to know that the file isn't going to match any other. Otherwise, for large files, one could compute a 'quick hash' value by hashing a few 64K chunks of data taken from different areas of the file. It two files have identical quick-hashes, they may or may not be identical, but if a file's quick-hash is unique that's a sure sign that the file is.

Re: Multithreading slower?

NikWing7-May-10 0:50

NikWing

7-May-10 0:50

yes, 2 good ideas IMO.

1st one is how I do it, getting filename and filesize into a datatable, sort it by filesize, compare it with itself to find duplicate rows and import them into a new table.
then process that table, hash all files in it (I hash completely since I'm doing that with pictures which aren't bigger than 2-3 MB) and again compare the datatable, removing unique rows (hash column)
works very well Smile | :)

2nd describes what some duplicate file finders do, hashing the 1st bytes of files with same size, if they are identical it'll hash some more bytes until it finds a difference. at least this is what I found out from my research lol

Maximum number in an array?

Adam Wike6-May-10 4:51

Adam Wike

6-May-10 4:51

I need to make a program that generates 15 random numbers as an array and then lists them in a listbox. I also need to be able to display the maximum and minimum in a label. Heres what I've got so far...

Public Class Form1
    Dim strNumbers(14) As String


    Private Sub btnGenerate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGenerate.Click
        Dim intRandom As New Random()
        Dim intLoop As Integer

        lstOutcome.Items.Clear()

        For intLoop = 0 To strNumbers.Length - 1
            lstOutcome.Items.Add(intRandom.Next(1, 100))
        Next intLoop

    End Sub

    Private Sub btnMaximum_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnMaximum.Click
        Dim intLoop As Integer
        Dim intMax As Integer


        For intLoop = 0 To strNumbers.Length - 1
            intMax = strNumbers(intLoop)
            If intMax < strNumbers(intLoop) Then
                intMax = strNumbers(intLoop)
            End If
        Next intLoop

        lblMinMax.Text = intMax
    End Sub
End Class

I've got the 15 random numbers in the listbox. I just cant figure out why my maximum code isnt working. Any tips would be awesome right now...

Re: Maximum number in an array?

Luc Pattyn6-May-10 5:04

Luc Pattyn

6-May-10 5:04

look again right here:

intMax = strNumbers(intLoop)
If intMax < strNumbers(intLoop) Then

Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]

Prolific encyclopedia fixture proof-reader browser patron addict?
We all depend on the beast below.

Re: Maximum number in an array?

dan!sh 6-May-10 5:06

dan!sh

6-May-10 5:06

Put intMax initialization outside for loop. BTW, you can use a List instead of array since it has a sort method. Then all you would need is to get first and last element.

Re: Maximum number in an array?

Adam Wike6-May-10 7:36

Adam Wike

6-May-10 7:36

It worked...to do the minimum I should only have to change the sign I guess. Thanks a lot Smile | :)

Re: Maximum number in an array?

Adam Wike7-May-10 4:08

Adam Wike

7-May-10 4:08

I also figured out that the way I generated the random numbers made the code mess up. I ended up having to use

strNumbers(intLoop) = Int((100 - 1 + 1) * Rnd()) + 1

to get the random numbers.

Outlook Addin

Dominick Marciano5-May-10 10:48

Dominick Marciano

5-May-10 10:48

I'm trying to develop an addin for Outlook 2003 so that when a new calendar item is added, it fires an event that passes along the new calendar item's information to a web service. I already have the web service working, and I can get the addin to load in outlook. What I can't figure out is how to tie the adding of a new calendar event to a function. Any guidance would be appreciated.

AddIn stopped working

Sonhospa5-May-10 2:38

Sonhospa

5-May-10 2:38

Hi everybody,

I have a funny problem developing an Excel-AddIn with VB Express 2008, based on sample code. The funny thing is that it used to work (at this stage only installing a button), but after working on other projects and coming back it doesn't show the initial message anymore, i.e. the addin doesn't connect. In the other project I had played around with VB's IMessageFilter - so I'm a bit afraid that I keep blocking any Windows messages without knowing?

Maybe someone can have a quick look at the code or has an idea what else might go wrong now? Here's the code:

Option Explicit On
Imports Microsoft.Office.Core
Imports Extensibility

Public Class Connect
    Implements IDTExtensibility2

    Public ext_cm_Startup As ext_ConnectMode
    Public ext_dm_HostShutdown As ext_DisconnectMode
    Public edlcaption As String = ChrW(&H3B1) & ChrW(&H3A9) & "-ED&L"

    Dim oHostApp As Object
    Dim WithEvents MyButton As CommandBarButton

    Private Sub IDTExtensibility2_OnConnection(ByVal Application As Object, ByVal ConnectMode As ext_ConnectMode, _
                                               ByVal AddInInst As Object, ByRef custom As System.Array) _
                                               Implements IDTExtensibility2.OnConnection

        On Error Resume Next
        ' Set a reference to the host application...
        oHostApp = Application

        ' If you aren't in startup, then manually call OnStartupComplete...
        If (ConnectMode <> ext_cm_Startup) Then _
           Call IDTExtensibility2_OnStartupComplete(custom)

    End Sub

    Private Sub IDTExtensibility2_OnStartupComplete(ByRef custom As System.Array) Implements IDTExtensibility2.OnStartupComplete
        Dim oCommandBars As Microsoft.Office.Core.CommandBars
        Dim oStandardBar As Microsoft.Office.Core.CommandBar

        On Error Resume Next
        ' Set up a custom button on the "Standard" commandbar...
        oCommandBars = oHostApp.CommandBars
        If oCommandBars Is Nothing Then
            ' Outlook has the CommandBars collection on the Explorer object
            oCommandBars = oHostApp.ActiveExplorer.CommandBars
        End If

        oStandardBar = oCommandBars.Item("Standard")
        If oStandardBar Is Nothing Then
            ' Access names it's main toolbar Database
            oStandardBar = oCommandBars.Item("Database")
        End If

        ' In case the button was not deleted, use the exiting one...
        MyButton = oStandardBar.Controls.Item(edlcaption)
        If MyButton Is Nothing Then

            MyButton = oStandardBar.Controls.Add(1)
            With MyButton
                .Caption = edlcaption
                .Style = MsoButtonStyle.msoButtonCaption
                .Tag = "EDL"

                .OnAction = "!<MyCOMAddin.Connect>"
                .Visible = True
            End With
        End If

        ' Display a simple message to know which application you started in...
        MsgBox("Started in " & oHostApp.Name & ".")

        oStandardBar = Nothing
        oCommandBars = Nothing
    End Sub

    Private Sub IDTExtensibility2_OnDisconnection(ByVal RemoveMode As Extensibility.ext_DisconnectMode, ByRef custom As System.Array) _
    Implements IDTExtensibility2.OnDisconnection

        On Error Resume Next
        If RemoveMode <> ext_dm_HostShutdown Then _
           Call IDTExtensibility2_OnBeginShutdown(custom)

        oHostApp = Nothing

    End Sub

    Private Sub IDTExtensibility2_OnBeginShutdown(ByRef custom As System.Array) Implements IDTExtensibility2.OnBeginShutdown
        On Error Resume Next
        ' Notify the user you are shutting down, and delete the button...
        MsgBox(String.Format("Der Button '{0}' wird gelöscht.", MyButton.Caption))
        MyButton.Delete()
        MyButton = Nothing
    End Sub

    Private Sub IDTExtensibility2_OnAddInsUpdate(ByRef custom As System.Array) Implements IDTExtensibility2.OnAddInsUpdate
        'You do nothing if this is called, but you need to
        'add a comment so Visual Basic properly implements the function...
    End Sub

    Private Sub MyButton_Click1(ByVal Ctrl As Microsoft.Office.Core.CommandBarButton, ByRef CancelDefault As Boolean) Handles MyButton.Click
        MsgBox("Hier wird eine Aktion vom Add-In ausgeführt!")
    End Sub

    Private Function listFiles(ByVal dir As String) As List(Of String)

        Return Nothing
    End Function
End Class

After compiling, I can successfully register the dll with regasm

regasm MyCOMAddIn.dll /codebase /tlb=MyCOMAddIn.tlb
Gacutil /if MyCOMAddIn.dll

and into the registry

Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Office\Excel\Addins\MyComAddin.Connect]
"LoadBehavior"=dword:00000003
"CommandLineSafe"=dword:00000000
"FriendlyName"="MyComAddin Connect Class"
"Description"="MyComAddin Connect Class"

Thank you for your time!
Mick

Re: AddIn stopped working

tosch5-May-10 3:43

tosch

5-May-10 3:43

Have you tried a reboot?

Tosch

Re: AddIn stopped working

Sonhospa5-May-10 6:01

Sonhospa

5-May-10 6:01

Hi tosch, thanks for having a look. Of course I have tried a reeboot - not only once. Do you think the code itself is ok? I have no idea where or for what to look...

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.