Installing Intel® Integrated Native Developer Experience beta (Intel® INDE)
Intel INDE provides a complete and consistent set of C++/Java* tools, libraries & samples for environment setup, code creation, compilation, debugging and analysis on Intel® architecture-based devices and select capabilities on ARM*-based Android* devices.
Install Intel INDE from here: http://software.intel.com/en-us/intel-inde
Building Your App for Remote Debugging
To allow the GPA System Analyzer to connect to your app, you need to ensure that it has INTERNET permission and that debuggable is set to "true."
Understanding the App
The Art Browser sample app is a bare-bones sample that encapsulates certain aspects of larger real-world applications. It’s an extremely simple image carousel, such as you might find in a media player or picture browsing app. It has two buttons that you would not find in a production application, which are useful for recreating common performance issues:
- A button for enabling and disabling "Advanced Math Calculations." This simulates real-world situations where you have a choice to use advanced, CPU-intensive physics engines, extremely verbose logging, heavy duty bitmap effects, or any number of other CPU-intensive features that aren’t critical for the success of the app. The button, in its enabled state will slow the app significantly by overloading the CPU, which then gives us an opportunity to see how to debug CPU-bound apps. The button is disabled by default to allow the app to load smoothly.
- A dropdown for choosing geometry complexity. It can often be beneficial to subdivide large planes into grids of smaller tiles to avoid depth-sorting issues. The default value is a 128x128 grid, which is more complex than necessary but will help illustrate our points. Geometry this complex will overload the GPU and give us a chance to see what a GPU-bound app looks like in GPA. Reducing it to any of the lower values will cause the app to run smoother.
Connecting to Your App
Build and install the Art Browser sample app on your device or emulator. Then make sure your Android development tool (Eclipse, etc.) is not running, as this will cause problems connecting to your app. Start the GPA System Analyzer tool. It will list your local machine, as well as any running emulators or attached devices.
Make sure your device or emulator is unlocked and USB Debugging is enabled. Click the "Connect" button to connect to it. GPA will show a list of all debugging-enabled apps.
GPA has a rich variety of data it can monitor, but for this app we’ll be most interested in the frame rate and CPU load. Drag CPU → Aggregated CPU Load from the left sidebar into the upper graph and drag OpenGL → FPS into the lower graph.
Optimizing Code
To demonstrate a CPU-bound situation, we’ll need to enable our Complex Math Calculation via the button in the app. We can immediately see that it is consuming 100% of the processor. This explains the frame rate of near 0 FPS. Clicking the button again to turn off the heavy-duty math improves the situation a bit, bringing the aggregated CPU load below 30% and raising our FPS to a marginally-usable 10.
Optimizing OpenGL
We’ve disabled the only CPU-intensive pieces of code, but our performance is still relatively poor. Our only option now is to optimize the OpenGL rendering. Graphics bottlenecks can be more difficult to untangle than CPU bottlenecks, since the OpenGL graphics pipeline is a complex process, and there is not always a single metric that will reveal a problem. Fortunately, GPA comes with a rich set of OpenGL optimizing tools which consist of checkboxes that turn or replace different off parts of the OpenGL rendering pipeline.
The easiest way to determine if your app is GPU-bound with an OpenGL bottleneck is to use the Disable Draw Calls state override. This will turn off any operations that have been sent to the GPU. If using this override doesn’t improve performance, we know our problem is CPU-related. However, if FPS climbs significantly, we definitely have an OpenGL bottleneck.
As you can see, the FPS graph shot up as a result, so we know our app is GPU-bound. We can see if perhaps our high-resolution textures are causing an issue by disabling all state overrides and then using the Texture 2x2 override.
This effected little change. We can then try using the Simple Fragment Shader override to see if our shader code is too complex.
Again, not the gains we were looking for. We can test for overly-complex geometry by comparing the TA Load metric with USSE Vertex Load metric. Drag the GPU → TA Load metric to the top graph, then hold CTRL and drag the GPU → USSE Vertex Load metric to the top graph as well, to let it graph beside the TA Load. Somewhat reverse of what you might be expecting, a high TA Load with a low Vertex Load indicates too many vertices are being processed.
Clearly this is an issue, since TA Load is an order of magnitude higher. However, notice that it’s still hovering under 50%. It’s worth noting again that even severe graphics bottlenecks may not send any one metric to 100%.
Using the Geometry Complexity spinner in our app, we can simplify our geometry to a 2x2 grid.
This gives us an immediate FPS boost, and TA Load becomes more balanced with USSE Vertex Load. We can also try 8x8 and 32x32 if we want to find the sweet spot between performance and depth sorting. Now, the app is ready for primetime!
Note: The application is tested and the results are analyzed on Intel Atom processor Z2760 tablets.
Conclusion
Although performance issues can be difficult to debug on commercial-scale apps, the GPA System Analyzer can be a huge asset in investigating complex performance bottlenecks. For more information and complete documentation, check out the GPA System Analyzer homepage on Intel.com.