Today I want to describe the way to find a hotspot without profilers. I often use this approach and find it effective and simple enough. The only tools you need are any Windows debugger (I prefer WinDbg, but you can use your favorite one) which should always be in your hands if you decided to investigate some performance issue (as well as any other problem with your software) and tool like Process Explorer which are available free from Windows Sysinternals. You can use this method without profiler at all or in conjunction with profiler to clarify the result of profilers work.
Before explaining the approach, I want to define the term “hotspot”. There are different definitions which can be applied in different contexts. In this article, we consider a hotspot as a limited set of instructions on which execution spends significant CPU time or more CPU time than you expect from it.
The process is divided into the three steps. If you used a profiler and have a call graph, you can miss the first two steps and jump to step number three to verify the profilers result and find out the hotspot most precisely. Nevertheless, I recommend you to go through all the steps because the results returned by profilers sometimes can be very obscure and you will lose time rolling around a call graph or other related information. Even if your profiler was absolutely right, you’ll have one more confirmation of its result. Sometimes it’s better to check twice. :)
1. Determine the Thread Which Spends Significant CPU Time
This is the easiest step because you should do nothing by your hands and can just rely on the results returned by one of the tools which can show you CPU usage for each thread in your process. There are a number of good tools. I prefer to use Process Explorer from Sysinternals because it’s free, it’s powerful and can be useful in step number two (but again, you can use your favorite).
2. Determine the Call Stack Relevant to the Hotspot
Once you found the thread which loads CPU over your expectations, you can find out the call stack relevant to your hotspot. The idea is to get a call stack of thread several times (in fact, more the better). If thread loads CPU too much, it should spend most of its time executing instructions which are part of its hotspot. In fact, profilers use this method as well. It’s known as sampling and it’s the most common method of hotspots analyzing if you can’t use more advanced instrumentation methods for some reason. Don’t forget to adjust your symbols paths, otherwise call stacks you’ll get could be far from reality.
When you have some call stacks, you should analyze it and choose one which observes most often and/or is most probably could be a relevant to the hotspot on your opinion.
3. Find Out the Hotspot Location
Once you have a suspect, it’s time to find out location of the hotspot and prove your assumption. This is most tricky part, so I’ll describe it in details.
Imagine we have the following call stack:
ChildEBP RetAddr
0012f588 0041349f Hotspot!write_string+0x33
0012f85c 00411e50 Hotspot!_output+0xc6f
0012f8a0 00411ca3 Hotspot!sprintf+0x90
0012fab8 00411d43 Hotspot!Func1+0x93
0012fb8c 00411d93 Hotspot!Func2+0x23
0012fc60 00411ed3 Hotspot!Func3+0x23
0012fd34 00412383 Hotspot!Func4+0x23
0012fe08 004123c3 Hotspot!Func5+0x23
0012fedc 004122e0 Hotspot!main+0x23
0012ffc0 7c816fe7 Hotspot!mainCRTStartup+0x170
0012fff0 00000000 kernel32!BaseProcessStart+0x23
Since we have to find out the instruction set which occupies too much CPU time, our first task is to figure out where this instruction set executes relative to the call stack. Even more precisely, we should find the function which is the initial point of the hotspot. That is if that function will no longer execute, our thread will no longer spend too much CPU time.
The way to find this function is very straightforward – we will exclude one call after another from our execution path and check the load of CPU. We can start from the top or from the bottom of the call stack, or if the call stack is too long and we have no idea where the hotspot can be, we can use some more advanced techniques to reduce amount of work. Generally it depends on the situation and you should use your knowledge about the application and your intuition to choose the better approach.
In this case, I prefer to start from the middle of the call stack and exclude all instructions starting from function Hotspot!Func3
. For this purpose, I’ll change the function code so that the function executes no instructions, but just returns instead.
To do this, we need to find the function start address and the stack size occupied for function parameters (it’s not necessary for __cdecl
calling convention because in that case a caller pops a function argument from the stack).
Hotspot!Func3
00411d70 55 push ebp
00411d71 8bec mov ebp,esp
00411d73 81ecc0000000 sub esp,0C0h
00411d79 53 push ebx
00411d7a 56 push esi
00411d7b 57 push edi
00411d7c 8dbd40ffffff lea edi,[ebp-0C0h]
00411d82 b930000000 mov ecx,30h
00411d87 b8cccccccc mov eax,0CCCCCCCCh
00411d8c f3ab rep stos dword ptr es:[edi]
00411d8e b804000000 mov eax,4
00411d93 e8f8570000 call Hotspot!_chkstk (00417590)
00411d98 e825f4ffff call Hotspot!ILT+445(?Func2YAXXZ) (004111c2)
00411d9d 8da534ffffff lea esp,[ebp-0CCh]
00411da3 5f pop edi
00411da4 5e pop esi
00411da5 5b pop ebx
00411da6 8be5 mov esp,ebp
00411da8 5d pop ebp
00411da9 c20800 ret 8
In the case of function Hotspot!Func3
, the start address is 00411d70
and the parameters size is 8
bytes.
Having the address of the first function instruction, we should change this instruction to the appropriate return instruction (in our case ret 8
).
0:000> eb 00411d70 c2 08 00
0:000> uf Hotspot!Func3
Hotspot!Func3
25 00411d70 c20800 ret 8
Don’t forget to return the first function instruction back before you continue your investigation. :)
Now we should execute the process again to analyze CPU usage of our thread. If it’s still high – at least part of our hotspot executes before the call of the Hotspot!Func3
function and we should continue to exclude the functions down the stack. Otherwise, if we observe CPU usage normalization, we can make a conclusion that we cut out all hotspot instructions set and the next step we should do is to make sure that Hotspot!Func3
is really the initial point of the hotspot. That is, we should exclude the next function up the stack (in our case, the function Hotspot!Func2
) and make sure that CPU usage is increased. If it’s not – go up the stack until you reach a function that increases CPU usage. That is, the function which contains a part of the hotspot.
Thus, moving up or down the call stack, you’ll have an opportunity to figure out where your hotspot is. If you examined the call stack and still have no idea about the hotspot location, it’s better to return to step number two to make sure that you chose the right call stack to examine.
Although the description of the process takes a few pages, in real time it takes several minutes of work. For me, it’s fast enough to not use profiles for this task in most cases.
Now the only thing left is to find the cause of the hotspot. Depending on hotspot nature, its cause can be located inside one function (for example, infinite loop) or can be distributed across several function calls, several call paths, several threads, processes or even computers. Unfortunately, there are no common algorithms to find the cause of a hotspot even knowing where the hotspot is, and there are no profilers which can help you. Here you should again rely on your knowledge and intuition.
CodeProject