In my previous post about Soil Library, I have talked about adding some new features. One of them was improving mipmap generation by simply using glGenerateMipmap(EXT)
function. In this post, I am going to describe changes needed to be made to implement it and gained benefits.
To be short: For NPOT sizes, I get around 4x faster texture loading and 2x smaller memory consumption. For POT size 2x faster times (no memory difference).
Code here: SOIL_ext @ github, changed soil.c
The Problem
Add the ability to use glGenerateMipmap
in the SOIL library. Old functionality - the custom software solution for mipmap generation - will be (and should be) left unchanged.
The new generation method can be used when passing new flag called SOIL_FLAG_GL_MIPMAPS
. For desktop OpenGL, this should be much faster than the original SOIL method. It can be hardware accelerated and it will work for NPOT textures. When using standard SOIL_FLAG_MIPMAPS
SOIL rescales image to be POT and then creates mipmaps. All of that happens in custom code - CPU side.
Another assumption: Since the lib is small, I do not want to introduce GLEW or other extension loading libraries. Extension loading will be done manually.
Desired usage:
texID = SOIL_load_OGL_texture("test.jpg",
SOIL_LOAD_AUTO,
SOIL_CREATE_NEW_ID,
SOIL_FLAGS_GL_MIPMAPS);
The Solution
Since there is no GL_EXT_mipmap extension, we need to find where our desired function is placed. The easiest way to do that is to download the latest version of glext.h and search for glGenerateMipmap
. We will find two versions:
- glGenerateMipmap - in OpenGL 3.0 core or in
GL_ARB_framebuffer_object
glGenerateMipmapEXT
- in GL_EXT_framebuffer_object
The code will try to find the first one if not then the second function pointer will be obtained. If both tests fail, then we will use the same functionality as SOIL_FLAG_MIPMAPS
(fallback).
There is no need to load all functions from extension actually, only one is essential. First the code below should be added:
typedef void (APIENTRY *P_PFNGLGENERATEMIPMAPPROC)(GLenum target);
static P_PFNGLGENERATEMIPMAPPROC soilGlGenerateMipmap = NULL;
Then the code for loading/checking:
static int has_gen_mipmap_capability = SOIL_CAPABILITY_UNKNOWN;
static int query_gen_mipmap_capability( void );
The above example adds function declaration (we can find the proper declaration in the glext.h) and then the actual function pointer. The last line is a function that has to be invoked some time in the code to load and check the extension. This should be done only in the first time.
Query Extension
Let us go inside query_gen_mipmap_capability()
:
int query_gen_mipmap_capability( void )
{
P_PFNGLGENERATEMIPMAPPROC ext_addr = NULL;
if( has_gen_mipmap_capability == SOIL_CAPABILITY_UNKNOWN )
{
ext_addr =
(P_PFNGLGENERATEMIPMAPPROC)
soilLoadProcAddr("glGenerateMipmap");
if(ext_addr == NULL)
{
ext_addr =
(P_PFNGLGENERATEMIPMAPPROC)
soilLoadProcAddr("glGenerateMipmapEXT");
}
if(ext_addr == NULL)
{
has_gen_mipmap_capability = SOIL_CAPABILITY_NONE;
} else
{
has_gen_mipmap_capability = SOIL_CAPABILITY_PRESENT;
soilGlGenerateMipmap = ext_addr;
}
}
return has_gen_mipmap_capability;
}
The code is quite simple. It basically checks if our function pointer is available in the system. We could check availability of the extension first but our method should be equally safe. Usually SOIL is called after all OpenGL extension setup so our extension for GL_ARB_framebuffer_object
should be already checked.
Let us go to the soilLoadProc
function:
void *soilLoadProcAddr(const char *procName)
{
#ifdef WIN32
PROC p = wglGetProcAddress(procName);
if (soilTestWinProcPointer(p))
return p;
else
return NULL;
#elif defined(__APPLE__) || defined(__APPLE_CC__)
#elif defined ( linux ) || defined( __linux__ )
#if !defined(GLX_VERSION_1_4)
return glXGetProcAddressARB((const GLubyte *)procName);
#else
return glXGetProcAddress((const GLubyte *)procName);
#endif
#else
return NULL; #endif
}
Interesting function soilTestWinProcPointer
:
#ifdef WIN32
static int soilTestWinProcPointer(const PROC pTest)
{
ptrdiff_t iTest;
if(!pTest) return 0;
iTest = (ptrdiff_t)pTest;
if(iTest == 1 || iTest == 2 || iTest == 3 || iTest == -1) return 0;
return 1;
}
#endif
It appears that we cannot assume that wglGetProcAddress
returns NULL
or a proper pointer. We need to perform more testing (for 1, 2, 3 and -1).
Usage
Now we can use our loading code in SOIL texture loading function. This will happen in SOIL_internal_create_OGL_texture
:
if( flags & SOIL_FLAG_MIPMAPS || flags & SOIL_FLAG_GL_MIPMAPS)
{
...
}
In the if
statement, we just need to write:
if ((flags & SOIL_FLAG_GL_MIPMAPS) &&
query_gen_mipmap_capability() == SOIL_CAPABILITY_PRESENT)
{
soilGlGenerateMipmap(opengl_texture_target);
}
else
{
}
Benefits
In the introduction, I used catchy phrases like "4x speedup" or "2x lower memory consumption". Let me explain where those results may come from.
Memory Consumption
For POT size, there will be no difference of course. New method will create exactly the same number of levels as the SOIL way. But for NPOT size situation changes. Let us take simple case:
- Image 540x600 RGB8 - memory needed 540*600*3 bytes = ~950kb
- This image will have mipmaps: 270x300, 135x150, 67x75, 33x37, 16x18, 8x9, 4x4, 2x2, 1x1 - 10 levels (including original image).
- In total, we will need around 1265 kb. (33% more than with no mipmaps of course)
- When we use SOIL method, first we need to rescale image to be POT - new size is 1024x1024! This is 3072kb!
- Mipmaps: 512x512, 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, 1x1. In total, we will have 11 levels! (one more then NPOT).
- Total memory: around 4095kb! As we see it is even 3x larger than NPOT.
The difference is of course bigger when input size is a little bit larger than some POT size. If the input size is only a little bit smaller, then some POT size the difference is small. As mentioned before, for POT size there is no difference (no need to scale the texture).
Performance
The first gain comes from smaller number of pixels to process when we use NPOT textures. The second comes from internal optimization, possibility to use hardware accelerated scaling and lower cost of driver calls (one call to glGenerateMipmap
vs several calls to glTexImage
).
Brief Results
I load one image 50 times and create 50 different texture objects.
- Image 540x600 RGB jpeg: 50 loads:
- 0.5s vs 3.5s
- 62MB vs 200MB (total memory for 50 textures)
- Image 1024x1024 RGB jpeg: 50 loads:
- 1.1s vs 3.1s
- Memory 200MB in both cases of course
Those are only brief results and I will describe my perf test in the next post.
Although we load textures usually in init phase and thus there is no need to fight for the performance at all costs, I think it is important to know that by a simple improvement we can get nice speed-up. It will be significant for scenarios where we dynamically load textures through the game. Or when we load all directory of photos to display them in some gallery. User should see results as soon as possible.
Beside all things: it was quite an interesting experience for me. :) I dug into code and I had to verify my initial thoughts. :)
Notes
One more link to the code SOIL_ext @ github, changed soil.c
CodeProject
More on This Blog...