Intel® Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for Android, Internet of Things, Intel® RealSense™ Technology, and Windows to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathon’s, contests, roadshows, and local events.
To ensure outstanding results from 3D4Medical’s brilliant, interactive teaching and reference application for human anatomy on Android* devices, Intel pinpointed opportunities for performance improvements, helping expand the product’s market potential.
Students, medical practitioners, and others can manipulate and explore some 4,000 detailed anatomical structures in 3D4Medical’s Essential Anatomy application. The interface provides a virtual body from which users can peel back layers to reveal deeper structures, as well as isolating, comparing, and analyzing elements of the skeletal, circulatory, and nervous systems—among many others—as shown in Figure 1. Structures can also be zoomed and rotated, for examination in deep detail, from any angle. The company continually adds new features, such as animated functional views of anatomical systems and the ability to "slice" through structures, so users can obtain cross-sectional views.
The deep levels of detail and accuracy in these anatomical models, as well as the rich capabilities for manipulating them on demand, place enormous burdens on Essential Anatomy’s 3D purpose-built rendering engine. Keeping up with 3D4Medical’s high user-experience standards requires real-time rendering and fluid visual movement through the models, while also smoothly supporting other elements of the app, such as the rich user interface and reference material.
Figure 1. User-driven manipulation to expose deeper internal structures.
3D4Medical initially developed Essential Anatomy for iOS and distributed it using the Apple app store. To expand the product’s market potential, the company contracted with an external provider to port it to Android and Microsoft Windows*. As in most such projects, the initial port to Android devices presented opportunities for improvement, to take full advantage of the new platform’s capabilities, in functional areas such as smoothness of movement when manipulating models, particularly when zooming and rotating complex structures with multiple layers. The company engaged with Intel to identify opportunities for code optimization that would help create the best user experience possible, including performance headroom to prepare for additional software features and functionality that may be added in the future.
In particular, both companies were interested in optimizing the Essential Anatomy application for tablets and other devices based on the fourth-generation Intel® Atom™ processor system-on-chip (SoC). These devices are capable of supporting computationally intensive applications, and the platforms are also equipped with Intel Gen7 graphics, which helps support very high-quality graphics quality and performance.
As a first step in improving the performance of Essential Anatomy, Intel recommended porting the entire application to native code for Intel architecture, as opposed to working from the original code base that was created for ARM*. This approach enabled the use of code optimizations and development tools built specifically to deliver high performance on the Intel Atom processor. While significant effort was required to port the application to native code, eliminating any reliance on emulation, the team reported performance gains of approximately 1.55x over the previous version.1
Preliminary Performance Analysis of the Application
Frame-Rate Measurements
To quantify a key aspect of performance as it relates to user experience, the team monitored the frame rate delivered by the Essential Anatomy application as it performed a range of specific tasks. For a broad representation, frame-rate measurements were taken over sampling periods approximately 10-15 seconds long, while the application was idle, as well as while the anatomical model was being rotated on-screen and while the view from the user’s perspective was being zoomed in and out. An early set of results from those measurements is shown in Figure 2.1
Figure 2. Frame rates while performing various application tasks.
1
The majority of frame rate measurements were below six frames per second (fps), both while the application was idle and as it performed the various tasks tested. That frame rate is quite low, and users would experience the associated progression of images on-screen as choppy. While such perceptions are necessarily subjective, a frame rate of approximately 24 fps could be considered a reasonable minimum to convey a fluid sense of movement.
These frame-rate measurements clearly illustrated the need to locate bottlenecks within the rendering pipeline. Identifying those bottlenecks and performing drill-down analysis to generate performance-tuning recommendations to help 3D4Medical resolve them was at the core of the Intel team’s approach. To begin shedding light on the performance opportunities, the team began with investigating how the code made use of hardware resources within the target system.
GPU-Utilization Analysis
To better understand the causes of the frame-rate measurements they had gathered from the Essential Anatomy application, the Intel engineering team considered system-level aspects of that behavior, including utilization of the GPU. The Intel® Graphics Performance Analyzers (Intel® GPA) enabled the team to plot GPU-utilization statistics over time for the same tasks as those used for the frame-rate measurements discussed above. That data is presented in Figure 3.1
Figure 3. GPU-utilization rates while performing various application tasks.
1
Across the data set, measured GPU-utilization levels are very high. This is the case when the application is at idle, as well as when the model is being rotated or zoomed. All time periods measured show values above 60 percent utilization, with a number of time periods for the idle state and rotate task exceeding 90 percent. Furthermore, while the application is zooming the view of the on-screen model, GPU utilization is nearly always above 90 percent.
The high GPU utilization within the application’s idle state reveals that scant GPU resources are available for additional tasks. This GPU-bound behavior of the application as a whole suggested the need to look more deeply into the rendering architecture.
Deeper, Systematic Analysis with the Intel® Graphics Performance Analyzers
Throughout its engagement with Essential Anatomy, Intel application engineers continued using Intel GPA to expose opportunities for graphics-performance improvements from Android and the latest Intel Atom processors. This tool set enabled them to examine the app’s behavior at the system level and all the way down to individual draw calls. Key capabilities of Intel GPA that helped drive the effort and generate performance-tuning recommendations included the following:
- Real-time metrics, including more than two dozen CPU, GPU, and API properties
- "What if" experiments to identify the impact of hypothetical code changes
- Thread-level analysis, helping ensure efficient use of multi-core hardware resources
Capturing frames using Intel GPA Monitor and viewing the frame captures using Intel GPA Frame Analyzer identified the potential for performance improvements from the following rendering optimizations:
- Disable alpha blending when not needed. For example, in a scene where alpha blending was not required other than for the nasal cavity and the UI, simply disabling blending resulted in a frame-rate increase of approximately 15 percent.1
- Cull back-facing geometry. Back-facing triangles are those whose normal doesn’t face the camera. In closed convex 3D meshes, these triangles are behind front-facing ones, making it unnecessary to rasterize them. By modifying the rasterizer state from "cull none" to "cull back facing triangles," significant wasted rasterization and shading effort was avoided, providing a performance improvement of approximately six percent.1
Taking advantage of both of the improvements described above, a preliminary performance improvement of approximately 30 percent was achieved when rendering the full body with all options (muscles and various systems) enabled, as shown in Figure 4.
Figure 4. Intel GPA Frame Analyzer metrics; the second and third columns show performance before and after initial optimization.
Improving draw order is another significant opportunity for performance improvement that was identified during the optimization process. Since blending isn’t required for most of the scene models, the draw order can be optimized to minimize wasted fragment shading time, which is known as "overdraw." Improvements in this area can be achieved by drawing outer geometry (i.e., geometry closer to the camera) before inner geometry.
In addition, in most models generated by the app, many layers of anatomy are not visible. Unless the user manipulates the model to make those structures visible, there is no need to burden the rendering pipeline with shading fragments that are not actually needed for display. The team estimated that reducing overdraw could result in approximately a 10 percent performance improvement. In fact, however, code optimization in that area and others ultimately resulted in an overall 55 percent improvement in frames per second.1
CPU Residency Analysis
As part of its analysis of the application, the Intel team performed user-experience testing, which generated overall positive results, including in areas such as touch response and graphics. Some computationally intensive functionality, such as rendering the full anatomy, resulted in sluggish interactions due to intricate rendering requirements. For that reason, the Intel team placed significant focus on investigating the rotation and zoom operations within the application.
The team also examined how the application used the platform’s CPU hardware when the application was idle, during the rotate operation, and while zooming the display of the model. To provide a more complete data set, the engineers also included baseline utilization metrics while the system was in use, but when the Essential Anatomy application was not running. This data, presented in Figure 5, was gathered in terms of CPU residency for each processor core, expressed as a percentage of time spent in the C0 (full-power) core state.
Figure 5. Per-core residency in C0 (full-power) core state for system baseline, application idle, rotate, and zoom tasks.
Residency in full-power state during application idle approaches 60 percent. In addition to having negative implications for performance as a whole, this high rate of residency and resource utilization corresponds to substantial power dissipation in the platform, which would be expected to negatively affect battery life. Reducing the display refresh rate is one potential means of addressing this issue that the Intel team recommended to 3D4Medical.
Code-Level Hotspot Identification
A common part of the performance-analysis projects that Intel undertakes on behalf of ecosystem members such as 3D4Medical is to perform hotspot analysis, which identifies sections of code that occupy particularly large amounts of processing time. While such hotspots do not necessarily correspond to code inefficiencies, the relatively large investment of system resources associated with them means that even small, incremental improvements in the efficiency of those sections of code can have significant benefit to the application as a whole. Identifying such hotspots is therefore a technique for suggesting software components that could deliver especially good return on investment from the developer resources invested in optimization.
To directly identify such sections of code in the Essential Anatomy application, the Intel team used Intel GPA to profile code behavior at application launch, idle state, and when performing the rotate and zoom tasks. In each case, the shared object libSystem.so
was consistently one of the top consumers of clock cycles, and in many cases, libGLES_intel7.so
was as well. Looking more deeply within libSystem.so
, the optimization team identified the DenseMap
function in particular as a hotspot.
For example, Figure 6 shows a timeline-analysis view—captured while rotate operations were carried out within the application—that clearly identifies each of these three components as a hotspot of interest. Based on those findings, the Intel team suggested to the development organization within 3D4Medical that these sections of code could be good candidates for tuning. Ultimately, optimizations to Essential Anatomy code produced a 20 percent lower processor load, compared to the pre-optimized version.1
Figure 6. Timeline view while the on-screen model is rotated within the application.
Conclusion
Ongoing discussions between Intel and 3D4Medical during the course of this engagement helped ensure that the performance-tuning approach was well synchronized with both current needs and future plans for the Essential Anatomy application. That communication is part of the ongoing collaboration and knowledge transfer between the companies. Once the analysis was complete, Intel developed a comprehensive report for 3D4Medical, including extensive profiling data and recommendations for code optimizations.
As 3D4Medical prioritized the code-optimization tasks within the report and incorporated them into its overall development effort, the company obtained additional value from its relationship with Intel. For example, this engagement provided valuable insight into the potential roles of performance-tuning tools from Intel such as Intel GPA. Moreover, the company continues to benefit from engineering expertise from Intel on topics that range from general performance tuning to optimization of graphics pipelines on Intel architecture. Moreover, Intel makes engineers with specific areas of domain expertise available as part of this type of engagement.
Performance-analysis co-engineering puts the unparalleled software and hardware expertise of Intel’s application engineering organization at the service of independent software vendors such as 3D4Medical. Because this activity is part of Intel’s commitment to enabling software performance on Intel architecture, there is no cost to software companies. As a result, the Essential Anatomy app delivers an outstanding user experience on Android, living up to its potential with smooth results, even for demanding operations such as rotating and zooming complex, multi-layered anatomical models on-screen.
The result is a compelling proof point for the value of co-engineering between Intel and the software ecosystem, as well as the capabilities of tablets based on the fourth-generation Intel Atom processor for robust, real-time graphics using Android. And 3D4Medical has achieved a competitive advantage by extending its Essential Anatomy app to Android users, while meeting its own strict demands for a first-rate user experience.
About the Authors
Stevan Rogers has been with Intel for over 20 years. He specializes in systems configuration and lab management. He is a Technical Marketing Engineer for Line Of Business applications in the Platform Launch and Scale Engineering (PLSE) group.
Priya Vaidya - Priya Vaidya is a Sr. Applications Engineer in the Intel Software and Services Group with over 14 years of experience in power and performance optimization of mobile applications processors. Priya received her MS in Electrical Engg. from UMass, Amherst in 2000, and MS in Biomedical Engg. in 1998. She has over 12 issued US Patents and has multiple IEEE publications.
Raja Bala - Raja Bala is a graphics software engineer in the game developer relations group at Intel. He's a proud alumnus of BITS-Pilani, India and enjoys dissecting the rendering process of games and finding ways to make it faster.
To learn more about 3D4Medical and its products, visit www.3d4medical.com.
For more about Android development for Intel® architecture, visit http://software.intel.com/android.