|
That is: the inner loop is executed about 2 million times?
Do iterations depend on each other? I.e. can you parallelize them, e.g. by starting new threads in the outer loop for each middle loop?
|
|
|
|
|
Bernhard Hiller wrote: the inner loop is executed about 2 million times? Unfortunately, yes..
Bernhard Hiller wrote: can you parallelize them, Unfortunately no, since the processing order till the innermost loop needs to be enforced..
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Even parallelise the inner loop can make a difference...Think about it...
It's hard to say if bringing external functions into the loop body will reduce execution time. Most cases it does, but it very depend on your code...
One think that can reduce execution time is better design of your critical code:
1. Remove as much conditional statements as you can
2. Declare and initialize locals only if you have to
3. Find and eliminate of unnecessary computations (e.g. do not compute the square-root of a number but use lookup variable)
I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)
|
|
|
|
|
Thank you for your suggestions, I will see how I can apply those.
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Depends on how much work the inner loop is doing: the setup cost of a thread is not insignificant, and the total overhead can extend the processing time - parallelizing works best when is a few long running tasks, and badly when it's significantly more tiny tasks than cores!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
If you are seriously considering performance, and have issues already then refactoring inline code to methods will make the problem worse, not better: there is a small overhead in each method call that is not present in the inline version. Depending on parameter usage, it can become less than trivial, particularly when executed in nested loops. And you are executing that code 2,250,000 times, so any small increase in execution time can become significant when considered against the loop as a whole.
I'd start with looking at the code and the loops and seeing exactly what they all do, and if it's necessary before refactoring the code: at the moment it's fairly obvious how it's all interrelated and is anything can be moved out of the loop. Refactoring that into multiple DLLs would make that process a lot, lot harder.
But before you do anything add some timing code to monitor what is actually happening, so you can tell exactly if what you did improved or worsened the situation - and remember that the Release version will have different timings to teh Debug as it includes a lot more optimisations!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
Sure, thanks!
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Slightly off-topic but I was working on a little additive synthesis hobby application recently where I needed to sum together hundreds of sine waves 48,000 times a second. I spent quite a while getting it as fast as I could.
First observation is that computers are really blood quick these days.
Obviously Math.Sin was too slow so used a wavetable.
Floats were too slow so used 32-bit signed integers (this seems to vary by processor quite a lot)
Arrays have bound checking so moved into the unsafe domain
Repeated the actual business code inside the loop (copy and paste) so less iterations needed (better setting to looping ratio + to avoid pipeline flushes on processor)
But, the killer improvement I got was not to use member variables in the loops. By copying member data to local variables, doing all the calculations with this then updating member variables from the locals at the end it speeded up several fold. I think this is because the locals were assigned to a register and didn't need to be written back and forth to memory continually.
I can't remember what the stats were but they were outstanding, perhaps something in the order of 50,000 oscillators at 48KHz.
Regards,
Rob Philpott.
|
|
|
|
|
Yes local variables can make a big difference - but Release optimisations should do that for you anyway, in theory.
Computers are indeed damn quick these days, but it's still true that a good, experienced coder can beat the compiler sometimes!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
The whole release/debug thing is pretty blurred in my mind in .NET. I'm not sure that optimization can be made due the hideously complex business of multiple cores with caching at various levels.
I actually need to get to the bottom of the exact differences between debug and release. In C++ it was very clear, but it is much more confusing with 2 stage compilation. Debug code may have NOPs in it, but the jitter can just not bother do anything with these. I think. Really apart from the odd optimization the MSIL should be the same in either case.
There might be an article in there somewhere..
Regards,
Rob Philpott.
|
|
|
|
|
Rob Philpott wrote: apart from the odd optimization the MSIL should be the same in either case.
Nope, not even close: http://www.hanselman.com/blog/ReleaseISNOTDebug64bitOptimizationsAndCMethodInliningInReleaseBuildCallStacks.aspx[^] and that's just an investigate into inlining! The loop optimization is reportedly very good, and so is the localization of variables.
This is one reason why it's important to do any performance timing / tuning against Release builds rather than debug - because the compiler can easily remove two days work shaving a couple of seconds off!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
Interesting read - thanks.
>An anti-pattern is "a pattern that tells how to go from a problem to a bad solution."
I like that definition.
Yes, I need to do my homework here. I may share it one day!
Regards,
Rob Philpott.
|
|
|
|
|
OriginalGriff wrote: it's still true that a good, experienced coder can beat the compiler sometimes
Whilst I'm rambling it became clear recently that assembly can considerably beat compilers. I was playing around with the ARM toolchain after a 20 year absence from ARM world. In that time conventions (forgotten what they are called) have been invented to preserve registers when branching from one function to another to allow for compiler interoperability.
Net effect of this, is that before a branch the compiler pushes registers to the stack in case the branch method does anything with them then pops them coming back. If that register isn't used its a needless operation. So my little blinking LED flashed about 5 times as quick with native ARM vs. that of the compiler.
I was all for writing everything is assembler for at least quarter of an hour after that!
Regards,
Rob Philpott.
|
|
|
|
|
Oh gawd yes! IMO a good assembler programmer will get smaller, faster code than any compiler - but it will take considerably longer to write it!
That's the problem of course: and why I try to only use assembler when I have to: time critical code (mostly interrupts) and space critical applications. It just takes to long to write and maintain otherwise.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952)
Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
|
|
|
|
|
Rob Philpott wrote: By copying member data to local variables, doing all the calculations with this then updating member variables from the locals at the end it speeded up several fold. Wow, that's a great tip! Thanks!!
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Can you say what the nature of what you're doing in the loop is?
Regards,
Rob Philpott.
|
|
|
|
|
Actually I haven't written those modules, I have only been asked to refactor those, but what I can see, it's similar to a heavy pipelined processing where processing order is important. I don't see anything fancy here at all, but yet to debug it thoroughly, so can't say much.
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Fair enough. Normally when I do this sort of stuff I just call the 'business' part of the operation in a thundering great loop (1000x, 1000000x operations etc.) and time it with Diagnostics.Stopwatch. You can just then tinker away and try different things out and see the difference it makes.
Regards,
Rob Philpott.
|
|
|
|
|
Sure, thank you!
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
One question that I would look to answer - are there any items that could be offloaded to the GPU. I've had a lot of success offloading mathematically complex ideas here in the past.
|
|
|
|
|
I hope there is any scope for that, currently debugging the code to understand what exactly it does. I am not the one who has written those modules, have no idea whatsoever about the project and still they are forcing me to refactor and optimize. .
Thanks for you suggestion.
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Before you do ANY refactoring, make sure that the code is covered with meaningful unit tests. It will make your refactoring a lot easier if you can test each change against a repeatable, meaningful set of control values.
|
|
|
|
|
Sure, thank you.
Whether I think I can, or think I can't, I am always bloody right!
|
|
|
|
|
Step one is to examine the algorithm itself and consider completely rewriting it. If performance is absolute, consider creating a native DLL. It does without saying to examine the use of the heap in all of this.
|
|
|
|
|
Hello from Mexico, an apology for my English is not very good.
I have a x628c team was doing some testing and delete my punches with my personal ClearGLog function and then when I wanted to retrieve the information recueperaba but with all my personal ID to 0.
I would ask if there is any other way to remove labels x628c team and if there is way to remove the labels depending on some date.
Beforehand thank you very much for reading this.
I am using the dll zkemkeeper and programming in c #
|
|
|
|