Intel® Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for the Internet of Things, Android, Intel® RealSense™ Technology and Windows to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathon’s, contests, roadshows, and local events.
Related articles:
Developer's Guide for Intel® Processor Graphics for 4th Generation Intel® Core™ Processors
Touch and Sensors
How to Create a Usable Touch UI
How to Adjust Controls for Touch
1. Introduction
Recently, I had the task of preparing a game engine I’ve been working on for the Games Developer Conference. Given the importance of the event, I needed my game to run fast on the three devices I had in hand, which ranged from the current Ultrabook™ technology to a system two generations old.
In this paper you will learn how to improve the speed of your 3D game and understand what to look out for when porting your application to Ultrabook systems. Whether you are an experienced game developer or a hobby coder getting into the industry, you will no doubt appreciate the importance of performance. A game that runs at a super smooth frame rate will feel polished and professional compared to a game staggering along at a measly five frames per second (FPS). No amount of gorgeous graphics will disguise the fact your game lurches along, tears the screen as it continually misses the monitor’s vertical sync step, and sends your game physics into pure pandemonium. With this case study of an actual game project port, I hope you will gain insight into the real-world problems you may encounter and possible solutions.
Figure 1: You may gain sales with your screen shot, but you’ll pay the price online if your FPS is low!
This article highlights a few of the common causes of performance loss and specifically helps game developers move a typical high-end AAA 3D game title to the Ultrabook device with the performance demanded by modern audiences. Such titles often require a high-end discrete graphics card to work well and put extremely high demands on the GPU. Understanding the architectural differences between a dedicated and integrated GPU can help, but the very best method of improving graphics performance is by analyzing the pipeline for bottlenecks and optimizing those areas without adversely affecting visual quality.
You should have a basic understanding of graphics API calls in general, a familiarity with the components that make up a typical 3D game, and some knowledge or use of an Ultrabook.
2. Why Is Performance Important?
As the market for applications and games becomes increasingly crowded, the unique selling points for your product become ever more crucial for commercial success, and performance today is not just desirable but absolutely essential. Many users would not even consider your game as finished until it ran smoothly and consistently on their device, and would not bother to play the game beyond an initial negative experience.
Given the crucial importance of this requirement and the fact that mobile, tablet, and portable computing is rapidly growing, you can appreciate that performance is critical. You might be complacent when adapting your game to the Ultrabook given its exceptional power over these other devices, but users will demand the highest standard and expect a high-end gaming experience.
Figure 2: Ultrabook™ Systems pack a powerful punch in the right hands
From a skill development point of view, everything you can do to optimize and improve your game code now becomes a vital lesson that can be applied to future projects, making you a better game developer.
3. Why Optimize?
Many developers use a desktop PC system to create and test their 3D games, and the presence of a dedicated graphics card can sometimes create a sense of abundance, resulting in algorithms and shaders that push the very limits of what is possible on the GPU. When you run this game on a more limited platform, it may not perform as expected and result in a dramatic reduction in performance. Ultrabooks are amazingly powerful mobile devices, but they do not provide the same level of brute force rendering available on next-gen, high-end GPUs. In addition, Ultrabooks are designed to be used on the go, so your game may very well find itself running on battery power, requiring an efficient rendering pipeline to prevent rapid power loss. Your approach to creating in-game visuals must respect these facts.
Figure 3: The many destinations of a successful app
When developing an application, developers traditionally start at the top, and trim their way down to run on as many devices as is practical in the available time.
Developing on the Ultrabook and porting your game to a desktop powered by a dedicated graphics card would be the easiest route to take, as this virtually eliminates the need to port. However, you may find yourself competing with games that have set the quality bar substantially higher. This approach does have one advantage: you are conscious of battery life from the very beginning, and therefore, you are more likely to develop a 3D game that dials down intensive activity at specific moments in the game such as title screens and HUD pages. Developing on a desktop and optimizing down to the Ultrabook is more common and generally yields a higher level of quality as your original development philosophy aims high and then works out how to deliver it on more form factors.
4. Desktop to Ultrabook – A Case Study in Performance
My story begins many weeks before the big GDC event, running my game on a relatively modern PCI Express* 3.0 graphics card worth about $200 and getting 60 FPS with visual settings set to the highest quality. It was by no means a high-end gaming rig, but it was capable of running any 3D game at the highest settings with no noticeable lag and packed a mean punch with its six cores, 6 GB of system memory, and an array of super-fast SSD drives. I knew there would be no desktop systems waiting for me at the event, and I did not want to lug a huge PC system half way around the world with me. Naturally, the solution was to take my Ultrabook, the next most powerful device I owned and more than capable of putting on a good show.
Figure 4: GDC 2014 – One of the biggest developer conferences…but no pressure
My Ultrabook has a 4th generation Intel® Core™ processor with Intel® HD Graphics 4000™, and is my device of choice when away from the office. My initial test was painful, dropping so many frames that the whole endeavor seemed far too ambitious. The current build of the 3D game engine relied heavily on shaders and multiple targets for rendering, gobbling up CPU cycles like candy and running everything as fast and as loud as it could. As you can imagine, such a beast was a million miles away from the power-conscious and friendly apps you want on a portable device.
Despite the audaciousness of the plan, I also knew that modern Ultrabooks are very capable gaming systems and when used correctly could match the desktop for productivity and hands down beat it for convenience. I also played many games that ran great on Ultrabooks, and the mission was not impossible, so I set to work to get the FPS up to the needed 60—my goal for the GDC event.
As an old-school coder, I learned to program long before the arrival of performance analyzers and graphics debuggers, so my primary method of detecting bottlenecks is to remove huge chunks of the engine until the performance improved. By selectively re-introducing vital chunks of code back in, I could determine which parts of the engine were slowest. Once the bottlenecks are identified, and as it was not an option to simply remove them altogether, the careful process of reducing the intensity of the component could begin. Typical examples are skipping normal map calculations in the shader for pixels beyond a certain range from the player, or skipping A.I update calls every other cycle to reduce the overhead of these processes. Cumulatively, these small improvements start to add up and before long the game engine is running at full speed again with hardly any loss in visual quality.
For coders new to the world of performance tuning, I would heartily recommend you avoid this method of detecting bottlenecks. Numerous tools are available to help you identify performance problems in your application, which not only provide the location of the bottleneck but the nature of the issue. One such set of free tools is the Intel® Graphics Performance Analyzers, which profiles your application as it runs and gives you a snapshot of what your program is doing and how long it’s taking to do it. While demonstrating the game at the event, I found a few issues that I later fixed to improve performance and smoothness of the final result.
Figure 5: Before & After – Screen Shots of the game before and after optimizations
As you can see in figure 5, I went from 20 fps to 62 fps with only minor visual differences in the before and after scenes. The ‘after’ shot shows the removal of the strong dynamic lighting around the player and a less aggressive fragment shader.
Hungry Shaders
It did not take us long to realize that the biggest drain on our performance was in our graphics rendering step.
Figure 6: Performance Metrics Panel from the original low FPS version
As you can see in Figure 6 the horizontal bar marked in the panel as ‘Rendering’ consumed most of our available cycles, and when we drilled down to the fine detail, it was apparent that rendering the objects to the screen was very costly. From here, it was a short step to realize that a scene rendering hundreds of thousands of polygons, each one using a heavy-duty fragment shader, contributes greatly to a loss in performance. Just how much was it costing? By adding MEDIUM and LOWEST techniques to the shader and scaling back the visual eye candy per-pixel, we gained a factor of six in performance improvement.
To settle on what LOWEST and MEDIUM actually do, we first had to determine the lowest common denominator of features for the game. By figuring out which features where absolutely essential for playing the game and then disregarding whatever remained, I could create the new LOWEST technique within the shader. Early on, this technique was amazingly simple, with almost all elements removed including all shadows, normal mapping, dynamic lighting, texture overlays, specular mapping, and so on. By starting at near-zero, it was possible to run the game and see what the ‘best case’ scenario was for this shader running on the Ultrabook. When I compared a screen shot from the HIGHEST setting to one from the LOWEST setting, I saw the most important missing ingredient that would cause users distress when they reduced the setting. The least subtle elements in the shader were shadows and texture overlays, each of which created a dramatic reduction in quality when absent. Adding overlays back in was relatively inexpensive and I could test the cost by simply adding the shader code for this element back in and running the game again. Shadows on the other hand extolled a high price, both in their generation in another part of the engine and their use within the shader itself. Given the importance of this aspect to preserve visual quality, time was spent investigating various approaches until a faster solution was found, which I’ll detail below.
Producing the MEDIUM technique setting for the shader was a little easier and simply involved writing a shader between the highest and lowest settings, yet always preferring to err on the side of performance. The intent with this setting was to allow all the speed benefits of the lowest setting but include the less costly effects such as player flash light, dynamic lighting, and slightly better shadows.
Had I simply removed all visual quality from the lowest setting, I could have achieved almost all the performance improvement required in one go, but gamers dislike poor graphics almost as much as poor performance. By making an effort to preserve 90% of the visual fidelity of the highest setting, and prioritizing which aspects could be reduced or eliminated, I achieved a significant improvement with minimal loss in visual quality. Moving from 5 FPS to over 40 FPS was my single biggest improvement.
When investigating why your desktop game is running so slow on an Ultrabook, I highly recommend you dismantle your graphics rendering pipeline and ask some serious questions about where the time is being spent. You can try my method of butchery and remove whole slabs of functionality until your pipeline improves, or you can opt for a more sophisticated approach and use a performance analyzer tool. Whatever method you choose, once the issue has been located your next most critical task is to arrive at a solution that not only improves the speed of that element but does so without sacrificing visual quality.
To provide some inspiration for the work required to find these optimal solutions, here are a few of the techniques I devised to solve some of the bottlenecks I discovered.
Cheaper Shadows
To solve the shadow issue mentioned above, I had to look for alternatives to a technique called Cascade Shadow Mapping. The technique will not be discussed here in detail, but you can find more information here: http://msdn.microsoft.com/en-gb/library/windows/desktop/ee416307(v=vs.85).aspx. The basic premise is that four render targets are drawn with the shadows of all objects immediately within view of the player camera, each one at a different level of detail.
Figure 7: Cascade Shadow Mapping – a debug view from the game engine
A shader is then instructed to re-color a pixel on screen based on whether it falls within the shadows previously calculated. The problem is that this is an intense shader effect and requires a lot of video memory. You will notice in the ‘fragment shader’ code below, the IF branch statement is being used several times, and some GPU hardware will incur a penalty in performance for each IF branch used. In extreme cases, some systems will compute every permutation of pixel output meaning there is no benefit to branching over code.
fPercentLit = 0.0f;
if ( iCurrentCascadeIndex==0 )
{
fPercentLit += vShadowTexCoord.z > tex2D(DepthMap1,float2(vShadowTexCoord.x,vShadowTexCoord.y)).x ? 1.0f : 0.0f;
}
else
{
if ( iCurrentCascadeIndex==1 )
{
fPercentLit += vShadowTexCoord.z > tex2D(DepthMap2,float2(vShadowTexCoord.x,vShadowTexCoord.y)).x ? 1.0f : 0.0f;
}
else
{
if ( iCurrentCascadeIndex==2 )
{
fPercentLit += vShadowTexCoord.z > tex2D(DepthMap3,float2(vShadowTexCoord.x,vShadowTexCoord.y)).x ? 1.0f : 0.0f;
}
else
{
if ( iCurrentCascadeIndex==3 && vShadowTexCoord.z<1.0 )
{
fPercentLit += vShadowTexCoord.z > tex2D(DepthMap4,float2(vShadowTexCoord.x,vShadowTexCoord.y)).x ? 1.0f : 0.0f;
}
}
}
}
It’s important that the video memory requirement and the dependence on the IF branch statements be reduced. The solution (of which there are many) is to create a single large shadow mega-texture and deposit the results of the lowest level of detail shadow into this target.
A new cheaper shader technique was written to simply read from this shadow mega-texture without needing a single IF statement. Again the specifics of this technique go beyond the scope of this article, but the underlying practise of first identifying the cause of a performance drop and then creating a second technique to produce a similar visual look without the cost is a sound strategy.
Maintaining Visual Fidelity
One thing to keep in mind as you optimize your engine is to protect the visual quality of your game at every stage of development. It’s easy to simply hack away beautiful yet expensive effects for the sake of performance, but it’s more rewarding to treat each issue as an opportunity to gain better performance while retaining the visual quality your game needs. Not only will you achieve the results you are after, but your game will run even better on higher-end systems, which of course means you can add even more features as your game scales up.
Figure 8: Comparison of a game scene when you reduce the visual quality too much
When you are developing on a desktop, you will be tempted to use clever and sophisticated fragment shaders to create all manner of surface effects and simply removing them for a low-end technique would destroy the appearance of the final image to a point where it no longer resembles the original. Maintaining a consistent visual style across all shader techniques is vital if you want to retain the integrity of your game. New users, impressed with a stunning screen shot in an online magazine, will be mighty disappointed when they run your game and see something significantly different.
Where possible, look for techniques that reproduce the high-end shader effect using low-tech techniques such as pre-baked textures, or even better, limit the expensive pixel effects to an area close to the player.
Spend the Most on Those Closest To You
Sounds like good family advice, but it’s a good strategy when making shaders look great on Ultrabooks. With a single IF branch statement, you can determine if the pixel being calculated is close to the player or not. If so, you can use the expensive high-end shader pixel effect as before, and beyond that range you can revert to a cheaper baked or faked effect.
Figure 9: The blending effect in action, notice the normal map effects up close
A good technique to use in concert with the above is blending, and for the price of an extra IF branch, you can also check if the pixel is between two range points. At the closest two ranges, you use the expensive effect, and beyond the closest range point, you calculate the cheap effect. Between the first and second closest range points, you calculate a blended transition between the two results. It is important to note here that the range between these two points should be relatively narrow to avoid double computation costs. The blending range should only be sufficiently wide to allow the transition to go unnoticed by the player. In the code below, you can see how each pixel is treated based on the distance from the view camera, and between the range of 400 and 600 units, both code branches are computed.
float4 lighting = float4(0,0,0,0);
float4 viewspacePos = mul(IN.WPos, View);
if ( viewspacePos.z < 600.0f )
{
lighting = lit(pow(0.5*(dot(Ln,Nb))+0.5,2),dot(Hn,Nb),24);
}
if ( viewspacePos.z > 400.0f )
{
lighting = lerp ( lighting, cheaplighting, min((viewspacePos.z-400.0f)/200.0f,1.0f) );
}
The result is alarmingly good and creates a soft almost unnoticeable transition when rendered. The upshot for the game is that around 90% of the scene is now using the cheap effect and thus accelerating the speed of the game.
In-Process to Pre-Process
Having spent a good deal of time on the graphics optimization side, we were still running a few FPS short of our target of 60. The balance of visual quality and achievable performance was struck, but other parts of the game engine beyond the shader system were causing processing overhead sufficient to degrade game speed.
The game engine already had an internal performance metrics system that crudely measured each major section of the overall game engine pipeline. In addition to the graphics metric, the engine also measures the time taken for A.I, Physics, Weapons, Debugging, and Occlusion among others. One of the metrics monitored the generation of real-time grass, which allows the engine to provide the game with the illusion of infinite grass. Once we had reduced the cost of graphics processing, we noticed that the relative cost of this process jumped up as the next hungriest element in the game engine pipeline. When you optimize, you should always watch out for these spikes in performance and if you determine that they are using an unreasonable amount of game cycles, then a closer examination is warranted. Knowing what is reasonable often comes down to experience and the intimate understanding of the whole engine, and in this case the grass should not be consuming over 10% of the overall game cycles, not with so many other vital services requiring game cycles. On the desktop PC this spike was not obvious, but on the Ultrabook, it was a substantial performance hit. In addition to the metric spike, it was apparent when playing the game that whenever new grass was generated ahead of the player, the frame rate would stutter as the spike interrupted the normally smooth running of the game.
Figure 10: A field of green – generating grass in real time can be extremely compute intensive
The solution, and another staple of the optimization coder, was to move the entire grass generation system to a pre-process step that happens before the game even starts. Instead of grass being generated on the fly, it was simply moved into place ahead of the player to create a near identical effect. Nothing needs to be generated, just moved, and the Ultrabook breathed a sigh of relief as precious CPU cycles were freed up for the rest of the game engine. I also sighed with relief as the magic 60 FPS was achieved and the game ran at the desired speed.
The Mysterious Case of the Strange Stutter
Having succeeded in achieving ideal gameplay velocity and travelling half way around the world to present the game and engine to the harsh gazes of the GDC attendees, I found that when installing the game on the show devices, a strange stutter effect emerged. The stutter did not exist on the desktop development machines, did not happen on the Ultrabook I used for pre-event testing but was happening on these show devices, and to make things more interesting, they were more powerful than the ones I had tested on.
After much debate and subsequent research back home, the issue was related to something called “internal timer resolution.” In short, all games that run at a machine-independent speed (that is, the player in your game will take the same amount of time to run from A to B, irrespective of the machine you are running the game on) require access to a GetTime() command. There are several to choose from but one of the most popular is the timeGetTime() command that returns the number of milliseconds that has passed since the machine was switched on. It implies that you will get the result in granularities of 1 millisecond, and indeed many desktop systems report the time at this resolution. It so happens that on Ultrabooks and other portable power-saving devices, this granularity is not fixed and can return a resolution in the 10-15 millisecond range. If you are using this timer to control physics, which was the case with our game engine, the result is a seemingly random and jagged stutter as the physics update calls are sporadically jumping from one reported time to another.
The reason the granularity can go from 1 ms to 10-15 ms is that some systems can save on-battery power if they step down the processor, and one of the side effects of this is that the frequency of the ticks can get unpredictable. There are a number of solutions, and the one we chose and recommend was to use the QueryPerformanceTimer() command, which guarantees the granularity of the time value returned by offering a second command that returns the frequency the timer operates under.
5. Tricks and Tips
Do’s
- Augment shaders with additional techniques instead of replacing them when optimizing for Ultrabook. Your game still needs to run on desktops as well as Ultrabooks, and the process of distribution is much easier with a single game binary. Both DirectX* and OpenGL* shaders allow you to create techniques within a single shader. With additional techniques in place, your game code can detect the platform you are running on and select the best technique, whether it be for performance or graphical quality.
- Offer your users an options screen so they can select the level of performance / quality they desire as this is expected by most games players today. It is always a good idea to detect and pre-select the best settings based on their system specification, but it should always be changeable and the default settings you select should always work on the user’s system.
Don’ts
- Do not assume you have to run your game at 60 FPS. You can set the monitor refresh interval on most modern devices to skip one or even three vertical sync signals and gain the same smooth non-tearing screen display at 30 FPS. It’s not going to be as smooth as 60 of course, but if your game timings are adjusted, the game will still feel smooth and very playable.
- Do not underestimate how costly fragment shaders are when developing your game, especially if you are running on low-scoring graphics hardware. If you find your game suffering low performance, switch off or downgrade all shader use as a process of elimination.
- Do not pre-select a resolution for the user that may not be supported by the display device. Use the Windows* API to interrogate the display device for a compatible default resolution.
- Do not assume timeGetTime() returns the time in intervals of 1 ms. When Ultrabook power-saving is enabled, it can be as infrequent as 10-15 ms!
6. A Brief Tour of Ultrabook Gotchas
It might seem an exercise in the obvious, but here is a quick and handy guide to testing, running, and exhibiting your games and 3D applications on an Ultrabook.
Power-Saving
If you are presenting to a large audience and want to show your game in its best light, it is vital you plug in the Ultrabook. Do not run on battery power as the system will protect itself by dialling down all manner of hardware settings that you want to keep on ‘red hot maximum’.
Figure 11: Power Management on the Ultrabook
As an extra precaution, find the Power Management settings through the control panel and double check that when using Plugged-In power, all saving settings are off, and that as many settings as possible are set to HIGH.
Graphics
The Control Panel has another settings panel that gives you access to your specific device’s graphics accelerator settings. You will find settings that control the GPU and driver when in power-savings mode. You must have this setting set to Performance, or the equivalent mode, to ensure your on-board GPU will run as fast as possible.
Figure 12: Graphic Acceleration Settings on the Ultrabook™
It might seem odd that you have to do these things, but the Ultrabook has been designed to conserve power at every turn, allowing you to use the device for hours on end. To achieve maximum performance on the Ultrabook, nothing beats plugging into a wall socket and turning every setting to 11.
Background Tasks
Old hands will nod sagely at this simple but crucial piece of advice, which involves a quick scan for any background tasks that may be running on the Ultrabook when Windows starts up. Originally intended as light-weight and helpful background tasks, when combined they have a propensity to slowly task the CPU with all manner of things.
As vital as some of these are, when you are demonstrating how fast your 3D game can run on an Ultrabook, it is prudent to cancel any tasks that you will not need for that session. Fear not, as they will reappear the next time you boot the Ultrabook, but for the remainder of the Windows session your device will be dedicated to running one application, yours!
7. Conclusions
The subject of game optimization is a broad one, and developers should consider the task of optimization part and parcel of their daily duties. The challenge is to enable your game to run on as wide a range of hardware as possible, and it’s at these times that experience and know-how come to the rescue. Using Intel® tools such as the VTune™ analyzer and the Intel Graphics Performance Analyzers accelerate the process of finding the problem. Articles such as this one may give you a few clues as to likely solutions, but it ultimately comes down to your ability to think laterally. How can you do this another way? Is there a faster way to do this? Is there a smarter way to do this? These are great questions to start the process, and the more you ask them, the better you will be at optimizing your games and applications. As I suggested at the start of this article, you will not only become a better coder, you will have expanded your reach into a market that’s growing at an incredible rate!
Related Content
Codemasters GRID 2* on 4th Generation Intel® Core™ Processors - Game development case study
Not built in a day - lessons learned on Total War: ROME II
Developer's Guide for Intel® Processor Graphics for 4th Generation Intel® Core™ Processors
PERCEPTUAL COMPUTING: Augmenting the FPS Experience
Intel®Developer Zone offers tools and how-to information for cross-platform app development, platform and technology information, code samples, and peer expertise to help developers innovate and succeed. Join our communities for the Internet of Things, Android*, Intel® RealSense™ Technology and Windows* to download tools, access dev kits, share ideas with like-minded developers, and participate in hackathons, contests, roadshows, and local events.
Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license.
Intel, the Intel logo, Ultrabook, and VTune are trademarks of Intel Corporation in the U.S. and/or other countries.
Copyright © 2014 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.