Topics covered include configuration,
buffer interoperability, and the use of primitive restart to accelerate GPU
rendering. The previous article, part 5, of this series, demonstrated how to
tie computation to data in a multi-device, multi-GPU environment.
Introduction
OpenGL is a common graphical programming API (Application
Programming Interface) that is standards based, cross-language, and
cross-platform. It can be used to create applications that can render 2D and 3D
images on most visualization hardware. OpenGL and OpenCL interoperability can
greatly accelerate both data generation as well as data visualization. Basically,
the OpenCL application maps the OpenGL buffers so they can be modified by
massively-parallel kernels running on the GPU. This keeps the data on the GPU
and avoids costly PCIe bus transfers.
Primitive restart is a new feature added in the
OpenGL 3.1 specification. In short, primitive restart lets the programmer
define a numeric value that acts as a token that tells the OpenGL state
machine to restart an OpenGL rendering instruction to begin with the next data
item. This has multiple advantages:
- For the programmer, this means that multiple lines,
triangle fans and irregular meshes can be rendered with a single command.
- The developers of the OpenGL state machine can optimize
their code so the test for the primitive restart token happens on the GPU
thus eliminating host/GPU communications bottlenecks across the PCIe bus.
- Increase rendering performance can be obtained by
arranging the data to achieve the highest reuse of the cache in the
texture units.
- Higher quality images can be created by alternating the
direction of tessellation as noted in the primitive restart specification
and illustrated in the following two figures:
Two triangle strips with artifacts
Data rendered as
four triangle fans (the center marked with a filled circle)
This article is not intended as an OpenGL tutorial. There
are a number of excellent books and tutorials available on the Internet. An
excellent one is the NeHe tutorial series.
The following application is intended to demonstrate OpenCL and OpenGL
interoperability and the use of primitive restart.
Building the Application
Copy and paste the complete source listing for gltest.cpp
into a file by that name. The complete source is provided at the end of
this article after the walk through. Similarly, copy and paste the source for
the OpenCL kernel, sinewave.cl.
The source can be compiled under Linux with the following
command:
g++ -I
$ATISTREAMSDKROOT/include -L $ATISTREAMSDKROOT/lib/x86_64 gltest.cpp -lglut
-lGLEW -lOpenCL -o gltest
The application runs on the GPU by default. The program runs
on the CPU when anything is specified on the command-line. This simple
application requires that the OpenCL kernel be in the file sinewave.cl
in the same directory.
./gltest # running on the GPU
./gltest CPU # This will run the application on the CPU
By default, the application starts by rendering a surface.
Use the mouse to rotate and zoom the image. Notice that the colors as well as
the shape of the image changes with time to show repeated computation by the
OpenCL kernel.
Pressing ‘D’ or ‘d’ on the keyboard cycles the rendering
mode from a surface, to a collection of points, to a set of lines, and finally
back to a surface again as can be seen in the following three images.
Default rendering as a surface
The surface rendered as a set of colored points
The surface rendered as a set of colored lines
Pressing ‘q’ exits the program.
A Walk Through the gltest.cpp Source Code
The initial section of the code specifies the include files,
constants and a set of global variables.
#include <iostream>
#include <fstream>
using namespace std;
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif
#define WIDTH 1408
#define HEIGHT 1024
const unsigned int mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
cl_platform_id platform;
cl_device_id device;
cl_context context;
cl_command_queue queue;
cl_program program;
cl_kernel kernel;
size_t kernelsize;
size_t global[] = {mesh_width, mesh_height};
har *pathname = NULL;
char *source = NULL;
A VBO (Vertex Buffer Object) is an OpenGL memory buffer in the
high speed memory of the GPU. The following source code shows that gltest.cpp
utilizes two VBOs, one that describes the coordinates of a set of vertices for
the surface and another that describes the color associated with each vertex.
VBOs can also store information such as normals, texcoords, indicies, and other
data.
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float);
GLuint p_vbo;
cl_mem p_vbocl;
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char);
GLuint c_vbo;
cl_mem c_vbocl;
Primitive restart requires that a set of indices be used to
point to each location of the data (or to the primitive restart token). The qIndicies
vector holds these indices.
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);
This code utilizes the OpenGL Utility Toolkit GLUT, which is
a portable toolkit that runs under UNIX, Windows and other operating systems.
GLUT provides an easy API to specify callbacks to functions that handle mouse,
window, keyboard and other events. The following section provides some global
variables for those callbacks as well as forward references to the functions.
float anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; const char* drawStr="fan";
const char* platformString="notset";
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);
For convenience, the title of the window is used to report
important characteristics such as the device (CPU or GPU) and drawing mode.
This information is specified with the following helper function:
void setTitle()
{
char title[256];
sprintf(title, "GL Interop Wrapper: mode %s device %s",
drawStr, platformString);
glutSetWindowTitle(title);
}
The first part of main
initializes the window and
provides a very simple command-line processing to select the CPU or the GPU.
int main(int argc, const char **argv)
{
initgl(argc, argv);
clGetPlatformIDs(1, &platform, NULL);
if(argc > 1) {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
platformString = "CPU";
} else {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
platformString = "GPU";
}
The OpenCL context and queue is created. Note that the
context must be tied to the OpenGL context as highlighted by the #ifdef
preprocessor
statements. Otherwise, nothing will be displayed.
#ifdef _WIN32
HGLRC glCtx = wglGetCurrentContext();
#else GLXContext glCtx = glXGetCurrentContext();
#endif
cl_context_properties props[] = { CL_CONTEXT_PLATFORM,
(cl_context_properties)platform,
#ifdef _WIN32
CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
queue = clCreateCommandQueue(context, device, 0, NULL);
Both the position and color VBOs are created. Note that the
call to glBindBuffer
allocates the space on the device. VBOs are mapped
into the OpenCL memory space with a call to clCreateFromGLBuffer
, which
returns a pointer to global memory that can be passed to the OpenCL kernel.
glGenBuffers(1, &p_vbo);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);
p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);
glGenBuffers(1, &c_vbo);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);
c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);
Even though this is a C-API OpenCL program, C++ was used to
load the program source. The OpenCL program is then built, the kernel arguments
are specified, and the indices for primitive restart are defined.
ifstream file("sinewave.cl");
string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
file.close();
const char* source = prog.c_str();
const size_t kernelsize = prog.length()+1;
program = clCreateProgramWithSource(context, 1, (const char**) &source,
&kernelsize, NULL);
int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS) {
size_t len;
char buffer[2048];
cerr << "Error: Failed to build program executable!" << endl;
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
sizeof(buffer), buffer, &len);
cerr << buffer << endl;
exit(1);
}
kernel = clCreateKernel(program, "sinewave", &err);
if (!kernel || err != CL_SUCCESS) {
cerr << "Error: Failed to create compute kernel!" << endl;
exit(1);
}
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);
qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
int index=0;
for(int i=1; i < mesh_height; i++) {
for(int j=1; j < mesh_width; j++) {
qIndices[index++] = (i)*mesh_width + j;
qIndices[index++] = (i)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j;
qIndices[index++] = RestartIndex;
}
}
The main
method completes by setting the window title
and calling the GLUT main loop, which does not exit.
setTitle();
glutMainLoop();
}
The initgl
method performs all the work of setting up
the window, registering the callbacks, and defining some initial viewing
conditions.
void initgl(int argc, const char** argv)
{
glutInit(&argc, (char**)argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2,
glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
glutInitWindowSize(WIDTH, HEIGHT);
glutCreateWindow("");
glutDisplayFunc(display); glutKeyboardFunc(keyboard);
glutMouseFunc(mouse);
glutMotionFunc(motion);
glewInit();
glClearColor(0.0, 0.0, 0.0, 1.0);
glDisable(GL_DEPTH_TEST);
glViewport(0, 0, WIDTH, HEIGHT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
return;
}
The display
method is where all the work happens.
Basically, the OpenGL buffer objects are acquired, and the kernel is queued.
The OpenGL buffers are released once the kernel completes.
void display()
{
anim += 0.01f;
glFinish();
clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
clSetKernelArg(kernel, 3, sizeof(float), &anim);
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
clFinish(queue);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
The newly computed data is then rendered according to the
drawing mode after which display
cleans up. Note that primitive restart
is enabled with the call to glEnableClientState
and that the unique
index is specified to the OpenGL state machine with glPrimitiveRestartIndexNV
.
Only a single glDrawElements
call is made to render all the triangle
fans.
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.0, 0.0, translate_z);
glRotatef(rotate_x, 1.0, 0.0, 0.0);
glRotatef(rotate_y, 0.0, 1.0, 0.0);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glVertexPointer(4, GL_FLOAT, 0, 0);
glEnableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
glEnableClientState(GL_COLOR_ARRAY);
switch(drawMode) {
case GL_LINE_STRIP:
for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
glDrawArrays(GL_LINE_STRIP, i, mesh_width);
break;
case GL_TRIANGLE_FAN:
glPrimitiveRestartIndexNV(RestartIndex);
glEnableClientState(GL_PRIMITIVE_RESTART_NV);
glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
break;
default:
glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
break;
}
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
The keyboard callback handles keyboard input. Pressing ‘D’
or ‘d’ cycles the drawing mode between rendering a surface, to a bunch of
points, then to lines and back to a surface. Pressing a ‘q’ or ESC terminates
the application.
void keyboard(unsigned char key, int x, int y)
{
switch(key) {
case('q') :
case(27) :
exit(0);
break;
case 'd':
case 'D':
switch(drawMode) {
case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
default: drawMode=GL_POINTS; drawStr = "points"; break;
}
}
setTitle();
glutPostRedisplay();
}
Following are the mouse handlers.
void mouse(int button, int state, int x, int y)
{
if (state == GLUT_DOWN) {
mouse_buttons |= 1<<button;
} else if (state == GLUT_UP) {
mouse_buttons = 0;
}
mouse_old_x = x;
mouse_old_y = y;
glutPostRedisplay();
}
void motion(int x, int y)
{
float dx, dy;
dx = x - mouse_old_x;
dy = y - mouse_old_y;
if (mouse_buttons & 1) {
rotate_x += dy * 0.2;
rotate_y += dx * 0.2;
} else if (mouse_buttons & 4) {
translate_z += dy * 0.01;
}
mouse_old_x = x;
mouse_old_y = y;
}
The complete source for gltest.cpp follows:
#include <iostream>
#include <fstream>
using namespace std;
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif
#define WIDTH 1408
#define HEIGHT 1024
const unsigned int mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
cl_platform_id platform;
cl_device_id device;
cl_context context;
cl_command_queue queue;
cl_program program;
cl_kernel kernel;
size_t kernelsize;
size_t global[] = {mesh_width, mesh_height};
char *pathname = NULL;
char *source = NULL;
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float);
GLuint p_vbo;
cl_mem p_vbocl;
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char);
GLuint c_vbo;
cl_mem c_vbocl;
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);
float anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; const char* drawStr="fan";
const char* platformString="notset";
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);
void setTitle()
{
char title[256];
sprintf(title, "GL Interop Wrapper: mode %s device %s",
drawStr, platformString);
glutSetWindowTitle(title);
}
int main(int argc, const char **argv)
{
initgl(argc, argv);
clGetPlatformIDs(1, &platform, NULL);
if(argc > 1) {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
platformString = "CPU";
} else {
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
platformString = "GPU";
}
#ifdef _WIN32
HGLRC glCtx = wglGetCurrentContext();
#else GLXContext glCtx = glXGetCurrentContext();
#endif
cl_context_properties props[] = { CL_CONTEXT_PLATFORM,
(cl_context_properties)platform,
#ifdef _WIN32
CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
queue = clCreateCommandQueue(context, device, 0, NULL);
glGenBuffers(1, &p_vbo);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);
p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);
glGenBuffers(1, &c_vbo);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);
c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);
ifstream file("sinewave.cl");
string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
file.close();
const char* source = prog.c_str();
const size_t kernelsize = prog.length()+1;
program = clCreateProgramWithSource(context, 1, (const char**) &source,
&kernelsize, NULL);
int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS) {
size_t len;
char buffer[2048];
cerr << "Error: Failed to build program executable!" << endl;
clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
sizeof(buffer), buffer, &len);
cerr << buffer << endl;
exit(1);
}
kernel = clCreateKernel(program, "sinewave", &err);
if (!kernel || err != CL_SUCCESS) {
cerr << "Error: Failed to create compute kernel!" << endl;
exit(1);
}
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);
qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
int index=0;
for(int i=1; i < mesh_height; i++) {
for(int j=1; j < mesh_width; j++) {
qIndices[index++] = (i)*mesh_width + j;
qIndices[index++] = (i)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j-1;
qIndices[index++] = (i-1)*mesh_width + j;
qIndices[index++] = RestartIndex;
}
}
setTitle();
glutMainLoop();
}
void initgl(int argc, const char** argv)
{
glutInit(&argc, (char**)argv);
glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2,
glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
glutInitWindowSize(WIDTH, HEIGHT);
glutCreateWindow("");
glutDisplayFunc(display); glutKeyboardFunc(keyboard);
glutMouseFunc(mouse);
glutMotionFunc(motion);
glewInit();
glClearColor(0.0, 0.0, 0.0, 1.0);
glDisable(GL_DEPTH_TEST);
glViewport(0, 0, WIDTH, HEIGHT);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
return;
}
void display()
{
anim += 0.01f;
glFinish();
clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
clSetKernelArg(kernel, 3, sizeof(float), &anim);
clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
clFinish(queue);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.0, 0.0, translate_z);
glRotatef(rotate_x, 1.0, 0.0, 0.0);
glRotatef(rotate_y, 0.0, 1.0, 0.0);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
glVertexPointer(4, GL_FLOAT, 0, 0);
glEnableClientState(GL_VERTEX_ARRAY);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
glEnableClientState(GL_COLOR_ARRAY);
switch(drawMode) {
case GL_LINE_STRIP:
for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
glDrawArrays(GL_LINE_STRIP, i, mesh_width);
break;
case GL_TRIANGLE_FAN:
glPrimitiveRestartIndexNV(RestartIndex);
glEnableClientState(GL_PRIMITIVE_RESTART_NV);
glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
break;
default:
glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
break;
}
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glutSwapBuffers();
glutPostRedisplay();
}
void keyboard(unsigned char key, int x, int y)
{
switch(key) {
case('q') :
case(27) :
exit(0);
break;
case 'd':
case 'D':
switch(drawMode) {
case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
default: drawMode=GL_POINTS; drawStr = "points"; break;
}
}
setTitle();
glutPostRedisplay();
}
void mouse(int button, int state, int x, int y)
{
if (state == GLUT_DOWN) {
mouse_buttons |= 1<<button;
} else if (state == GLUT_UP) {
mouse_buttons = 0;
}
mouse_old_x = x;
mouse_old_y = y;
glutPostRedisplay();
}
void motion(int x, int y)
{
float dx, dy;
dx = x - mouse_old_x;
dy = y - mouse_old_y;
if (mouse_buttons & 1) {
rotate_x += dy * 0.2;
rotate_y += dx * 0.2;
} else if (mouse_buttons & 4) {
translate_z += dy * 0.01;
}
mouse_old_x = x;
mouse_old_y = y;
}
Following is the complete source code for sinewave.cl. Note
that it operates on __global vectors, which also happen to be OpenGL buffers.
__kernel void sinewave(__global float4* pos, unsigned int width,
unsigned int height, float time, __global uchar4* color)
{
unsigned int x = get_global_id(0);
unsigned int y = get_global_id(1);
float u = x / (float) width;
float v = y / (float) height;
u = u*2.0f - 1.0f;
v = v*2.0f - 1.0f;
float freq = 4.0f;
float w = sin(u*freq + time) * cos(v*freq + time) * 0.5f;
pos[y*width+x] = (float4)(u, w, v, 1.0f);
color[y*width+x] = (uchar4) (
(uchar) 255.f *0.5f*(1.f+sin(w+x)),
(uchar) 255.f *0.5f*(1.f+sin(x)*cos(y)),
(uchar) 255.f *0.5f*(1.f+sin(w+time/10.f)), 0 );
}
Summary
GPGPU devices are powerful visualization as well as
computational devices. Utilizing the OpenCL graphical interoperability
capabilities is a fantastic way to speed visualization applications and utilize
the full capabilities of these devices. Comparing the speed of visualization on
the CPU vs. the GPU with the example code from this article is one way to see
the difference.
Primitive restart is an excellent way to further speed
visualization on GPU devices by keeping both command tokens and data on the GPU
to avoid PCIe bus transfers. Further, it gives the developer tremendous
flexibility in handling irregular meshes and other challenging visualization
tasks. In addition to speed, primitive restart can also generate higher quality
graphics.