Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Finding performance bottlenecks in Android games and apps using Intel GPA

22 Nov 2013 1  
This article will cover the process of debugging and evaluating Android based games and apps for performance hotspots using Intel’s Graphics Performance Analyzers.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

This article will cover the process of debugging and evaluating Android based games and apps for performance hotspots using Intel’s Graphics Performance Analyzers. Giving an overview to the tool and actual usage with a live 3D game on the app store to detect its real and current performance bottlenecks. The motivation behind this article is to encourage you to use performance analysis tools to methodically target code to optimize, rather than going in and blindly attempting to optimize code.

Intel GPA  

Intel’s Graphics Performance Analyzer is a collection of tools that let you analyze Desktop and Android based apps. For our focus in this article, we will install and run the Android version of the tool on a Mac.

Installation (http://software.intel.com/en-us/vcsource/tools/intel-gpa

Download the Android version of the tools, unzip, and run the System Analyzer.dmg installer.

Running Intel GPA System Analyzer

When you first run the app, you’ll be presented with a screen offering you the option of typing in an IP address or connecting to any auto-detected Android devices (note the Intel GPA System Analyzer can only analyze Android devices with Intel based architectures, for this walkthrough I’m using a Samsung Galaxy Tab3).

Once connected you’ll be offered a list of installed apps on the device which can be run in analysis mode.

Now simply select the app you’d like to analyze, and your app will be auto-launch and the analysis will begin.

Once you have the Intel GPA tool running, your immediatley greeted with a list of performance metrics and overrides you can experiment with.

Metrics

CPU

In the CPU sub-group you can view the performance load of each individual CPU, but more useful is the Aggregate and Target App CPU Load. By dragging these two metrics on to the right panel, you can view just how much your app is using the CPU vs the overall system usage to enable you to detect if any performance issues are being caused by background operations.

Device IO

In the Device IO sub-group you can view disk and network activity, which can give you a quick indication if your app’s performance is being affected by IO operations.

GPU

The GPU sub-section gives you direct metrics on the actual load on the GPU, with metrics on tiling operations, to verticies per second, to texture and shader processing loads.

  • TA Load and USSE Vertex Load: TA Load tels you the time that the Tile Accelerator is being used. USSE Vertex Load tells you the percentage of time the shader engine is processing vertex instructions. Ideally these loads should be balanced for best performance. However, if the vertex load is low and the TA load is high, it means the scene is too complex. On the other hand, if the vertex load is high and the TA load is low, then it means that the vertex shader should targeted for optimization.
  • PB Primitives/Second: If this value is high, it indicates that your bottleneck is your vertex format size.
  • PB Vertices/Second: If this value is high, it’s worth inspecting the amount of vertex data you’re passing between the vertex and fragment shader.
  • PB Vertices/Primitive: If this value is high, it’s worth looking into reducing the LOD of your models or sharing vertices with index buffers.
  • ISP Load: The Image Synthesis Processor is responsible for hidden surface removal, if this metric is high it’s worth looking into implementing a software culling technique or order your draw calls. It’s also worth looking into if your Z-buffer is being used to manage several render targets, and if so, create a Z-buffer per render target.
  • TSP Load, Texture Unit Load and USSE Pixel Load: The TSP metric gives you a percentage time that the Texture and Shading processor is busy. The Texture Unit Load tells you how busy the texture units are, and the USSE Pixel Load tells you how much time the shader units are processing pixel instructions. Using these metrics in combination will help you gauge which area to optimize. If your TSP Load is high, by looking at the Texture Unit Load and USSE Pixel Load you can deduce if the load is occurring because of texturing or shader complexity. If the Texture Unit load is high, then it’s worth optimizing your texture types by either using compression or reducing the resolution. However if the USSE Pixel Load is high, then it’s worth investigating the complexity of your fragment shaders.
  • USSE Total Load: Tells you the percentage time that the shader units are being used. Worth using along side the Pixel and Vertex Loads to deduce which area is the bottleneck.
  • USSE Cycles/Vertex, Cycles/Pixel and Stall Load: Gives you immediate stats on the processing efficiency of your frament and vertex shaders.

Memory

The Memory sub-section gives you an instant overview of your app’s memory usage and the available system memory. This can be a very quick and useful check for your app leaking memory, or if your performance issues are related to lack of available system memory.

OpenGL

This sub-section gives you really useful metrics for 3D apps and games, with instant access to the FPS count, frame times, number of buffer creations, draw calls and state changes.

The best way to explain these metrics going through each one individually here:

  • Draw Calls & Indexed Draw Calls: Draw calls tend to be an expensive operation in the world of 3D graphics, by looking at this metric, you can decide if perhaps you’re making too many individual calls per frame and need to compact your verticies into one draw call.
  • Vertex Count & Indexed Vertex Count: Gives you immediate stats on the number of verticies you’re pumping through.
  • FPS & Frame Time: The ever critical frames per second metric, anything below 60fps or above 16ms frame time, means no longer butter smooth.
  • Buffer Creations: Depending on your application, buffer creations should occur in the setup phase, as they tend to be a slow operation, if you find that you’re creating buffers at run-time, it means there’s a good chance you can optimize this area.
  • Error Gets: Another slow but useful for debugging operation, you should really be aiming for 0 here.
  • State Metrics: This section gives you immediate metrics on the amount of state changes be it in total, or individually as texture/shader/buffer changes you’re making. If you find that your GPU bound then, looking to reduce the state changes by batching similar draw calls together is a direction you’d be advised to go.

Power

This sub-section, gives you immediate stats on the battery. Though it isn’t really useful for performance, it is pretty cool to gain understanding on your device.

State Overrides

The state overrides sub-section gives you a list of live experiments you can run, in order to test causes for your performance bottlenecks.

1x1 Scissor Rect & Simple Fragment Shader

The 1x1 scissor rect disables pixel rendering and the simple fragment shader replaces your current fragment shader with a simple colour output. Both these overrides let you test if your app is fragment shader bound. If performance doesn’t increase, you’re best served not attempting to optimize any fragment shaders, but instead checking your vertex counts and draw calls.

Disable Alpha Blending

Transparency operations tend to be the biggest performance killer for graphics intensive apps such as games. By using this test, you can quickly see how disabling transparency has on your app’s performance.

Disable Draw Calls

A really simple but useful override to let you check if your app’s making too many draw calls. If so you should look into batching the vertex data of the calls together into one (where possible).

Disable Z-Test

The Z-buffer is typically used to clip objects that are being drawn behind objects. Running this override should slow down your rendering, if it doesn’t it means that you’re probably drawing too many objects in back-to-front order, meaning that the renderer is constantly rendering new objects on top of background objects. By implementing a front-to-back sort or manual occlusion code, your frame rate could be improved.

Show Wireframe

This override shows your render in wireframe mode, which helps you debug what your meshes look like and how they’re layered out in your scene.

Texture 2x2

This override gives you a quick and easy test to find out if your app is texture bound. If when you turn this override on, your apps performance increases, you should try optimize your textures. This can be done by either reducing the number of textures, their resolution, their filter settings or how they’re used in your shaders.

Use Case: Finding the performance bottleneck in an app

The app we’ll be testing is here is called PLAYIR; which is a 3D multiplayer game designer app that lets you create and publish games across mobile and web devices using drag and drop UX, and real-time source code editing. For this use case, we’re going to load up one of the games inside called World of Fighters and experiment with how introducing more characters in the game affects our performance bottleneck.

In the default gameplay scene here, there’s a few of trees and two characters roaming around the level. If we inspect our app using the GPA tools we can see the frame rate hitting around 60fps.

Now when we ramp up the number of enemies to 50, the frame rate drops to 8.

Without using analysis tools, our first thoughts may be that the frame rate has dropped due to more characters having to be drawn on the screen, and we’d go ahead and start pre-maturally optimizing the rendering pipeline.

However, if we actually test this assumption by enabling the 1x1 Scissor Rect and Disable Draw Calls state overrides.

We notice that in fact the frame rate stays the same. Which means that in fact optimizing any rendering would yield no actual performance benefits. Instead our app is CPU bound and we should instead look to investigate other areas of the code (i.e. the AI or Physics subroutines).

Wrapping Up

Hopefully from this guide, you should now be motivated to at least fire up your app using Intel’s GPA tool and check out some of the metrics. Be it simply checking your memory and battery usage to going in and finding out what your actual performance bottleneck is, as we did in the case of PLAYIR. In the world of 3D graphics, it’s very easy to go on the assumption that drawing more things, means slower performance. But it’s always a good idea to first your assumptions, as you may find that your time would be better invested optimizing other areas of the code.

To learn more about Intel tools for the Android developer, visit Intel® Developer Zone for Android.

Other Related Articles

To learn more about Intel tools for the Android developer, visit Intel® Developer Zone for Android.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here