Introduction
In any 3-D application, whether it's a video game or a CAD tool, one of the most basic operations is to select an object by clicking on it. This is a no-brainer for users, who expect pick selection to be performed quickly and accurately. But for developers, coding this operation isn't easy at all; it requires a solid understanding of computational geometry and high-performance programming. Specifically, pick selection involves finding the first polygon (triangle) that intersects a given ray. The goal of this article is to present a method for implementing high-speed pick selection using OpenCL (Open Computing Language). Unlike most pick selection routines, which execute on a CPU, the code in this article is intended to execute on a GPU. GPUs aren't as good at general-purpose computing as CPUs, but they compute mathematical operations quickly using parallel processing. The goal of this article is to show how the GPU's parallel processing can be used to detect ray-triangle intersection.
In writing this article, I assume that you've never coded a pick selection routine before and that you have no idea how pick selection relates to ray-triangle intersection. Therefore, the first part of this article describes the theory underlying pick selection. The second section explains how pick selection can be implemented in C and OpenCL.
Figure 1 presents the example rendering in OpenGL, which consists of ten selection targets. The figure on the right illustrates the result: the orange sphere is colored white because the user clicked on it.
Figure 1: Pick Selection Targets Before and After Selection
The goal of this article is to explain how this pick selection routine is implemented in C and OpenCL. But before you delve into the code, you should have a solid understanding of the underlying theory.
1. The Theory Behind Pick Selection
Despite its simplicity for the user, pick selection is a complex problem for programmers. Implementing this procedure requires a thorough knowledge of coordinate systems, matrix-vector operations, and barycentric coordinates. The method described in this section is the same as that described in the paper Fast, Minimum Storage Ray/Triangle Intersection by Tomas Moller and Ben Trumbore. You can download this paper here.
1.1 Converting Mouse-Clicks into Rays
When a user clicks in a rendering window, the event-handling routine receives the (x, y) location of the mouse pointer. This location is given in window coordinates, where x ranges from 0 to the window's pixel width and y ranges from 0 to the window's pixel height. To understand how pick selection works, you need to understand how these coordinates relate to the other coordinate systems used in 3-D rendering:
- Object coordinates - The 3-D vertices that initially define OpenGL figures are given in object coordinates. The modelview transformation positions the model relative to the viewer, and converts object coordinates into eye coordinates.
- Eye coordinates - In this coordinate system, the viewer is located at the origin and looks down the -z axis. The projection transformation defines the viewing region, and converts eye coordinates into clip coordinates.
- Clip coordinates - In this coordinate system, the x, y, and z coordinates range from -w to w, where w is the clip coordinate. Perspective division divides the clip coordinates by w, and converts clip coordinates to normalized device coordinates.
- Normalized device coordinates (NDCs) - In this system, the x, y, and z coordinates range from -1 to 1. The viewport transformation scales the NDCs to occupy the window, and converts NDCs into window coordinates.
- Window coordinates - These coordinates range from 0 to the window's pixel width in the x-direction and 0 to the window's pixel height in the y-direction.
In an ideal world, we'd know the window coordinates or NDCs of each figure in the model. But these coordinates are computed during the rendering process and aren't available to applications. But we do have the figures' object coordinates. This brings us to our first problem: before we can determine which figure the user selected, we need to make sure the mouse click (window coordinates) and the 3-D vertices of the figures (object coordinates) are in the same coordinate system.
Two choices are available. We can convert the click's window coordinates to object coordinates, or convert every vertex of every figure in the model to window coordinates. The first option requires much less computation, so this is the method we'll use.
Converting window coordinates to NDCs is straightforward. NDC's range from -1 to 1 in every dimension, so we can convert the click's location to the NDC system by subtracting half the maximum dimension from both coordinates and dividing the result by half the maximum dimension. Figure 2 shows how this works. On the left side, the window's dimensions are 400-by-300 and the user has clicked on the point (320, 195).
Figure 2: Window Coordinates and Normalized Device Coordinates (NDCs)
NOTE: The window coordinates in this figure equal (0, 0) in the upper-left corner. This is a GLUT convention, and may not hold true for other toolsets.
On the right side of the figure, the user's two-dimensional point (320, 195) is depicted as a three-dimensional arrow extending from the initial point (0.6, -0.3, -1.0). Geometrically speaking, this arrow is called a ray and it's initial point is called its origin, denoted O. Because O is given in normalized device coordinates, we'll call it Ondc, which equals (0.6, -0.3, -1.0). Similarly, the ray's direction is denoted Dndc, and because the ray points in the +z direction, Dndc equals (0, 0, 1). The set of points that form the ray can be given mathematically as Ondc + tDndc, where t identifies the distance from O to a point on the ray. Note that t is always greater than zero.
1.2 Converting the Ray into Object Coordinates
We've defined a ray in normalized device coordinates, but this isn't enough. To see where the ray intersects objects in the model, we need to compute Ondc and Dndc into the object coordinate system. To understand how this works, it's important to understand how the modelview and perspective transformations are performed. Suppose the object coordinates of a vertex v are given as (vx, vy, vz). The modelview transformation is performed by multiplying v by a matrix called the modelview matrix. If we call this matrix M, the result of the modelview transformation equals M * v.
Similarly, the projection transformation of a point is performed by multiplying the point by the perspective matrix, denoted P. Therefore, to convert v from object coordinates, an application computes P * (M * v). Matrix multiplication is associative, so this equals (P * M) * v. The product of P and M is also a matrix, and we'll call this the modelview-projection matrix, or the MVP matrix.
Our goal is to convert Ondc and Dndc to the object coordinate system, not from the object coordinate system. Therefore, we need to reverse the transformation represented by the MVP matrix. As it turns out, we can do this by multiplying Ondc and Dndc by the inverse of the MVP matrix, denoted MVP-1. In equation form, this is given as follows:
Oobj = MVP-1 * Ondc
Dobj = MVP-1 * Dndc
Computing a matrix's inverse is complicated. Thankfully, the OpenGL Math Library provides the inverse function. The following code shows how this works:
mvp_inverse = glm::inverse(mvp_matrix);
At this point, we've derived a method for deriving a ray from the user's mouse click and converting it to object coordinates (Oobj + tDobj). Next, we'll look at how to compute where this ray intersects the triangles in the model. For the sake of simplicity, we'll drop the subscript and identify the ray as O + tD.
1.3 Barycentric Coordinates
OpenGL renders surfaces in three dimensions by drawing triangles, and the rendering depicted in Figure 1 contains thousands of triangles. A ray may intersect zero, one, or multiple triangles as it travels outward from its origin. For the purposes of pick selection, we want the triangle closest to the ray's origin because it must belong to the figure closest to the user. In pseudocode, the algorithm is given as follows:
t_min = 1000
for each figure in the model
for each triangle in the figure
if the ray (O + tD) intersects the triangle
if t is less than t_min
set t_min equal to t
store the figure containing the triangle
end if
end if
end for
end for
if t_min = 1000
no intersection
else
the selected figure = figure corresponding to t_min
For the moment, our concern is determining "if the ray intersects the triangle." To see how ray-triangle intersection is computed, it's important to understand the concept of barycentric coordinates. These coordinates uniquely identify points on a shape, and are measured along barycentric axes. Every triangle has three barycentric axes, and each passes through a vertex and the midpoint of the edge opposite the vertex. Figure 3 depicts a barycentric axis that passes through Vertex K.
Figure 3: A Barycentric Axis
In this figure, the dotted line is the barycentric axis for Vertex K. The corresponding barycentric coordinate is denoted k. k equals 1 at Vertex K and 0 at the midpoint of the edge opposite Vertex K. k is less than zero beyond the opposite edge and greater than 1 to the right of K.
Figure 4 presents all three barycentric axes for Triangle KLM. The corresponding barycentric coordinates, given as (k, l, m) triples, identify points inside the triangle and on its edges.
Figure 4: Barycentric Coordinates
When dealing with barycentric coordinates, there are two points to keep in mind:
- If a point's coordinates all lie between 0 and 1, the point lies inside the shape.
- No matter where a point is located, the sum of its barycentric coordinates is always 1. Therefore, any coordinate can be replaced by the sum of the other two coordinates subtracted from 1. That is, (k, l, m) always equals (k, l, 1 - k - l).
An important question arises. How do we convert barycentric coordinates (k, l, m) to rectangular coordinates (x, y, z)? The answer is simple: multiply each barycentric coordinate by the coordinates of its corresponding vertex. For example, if a point's barycentric coordinates are given as (k, l, m), it's rectangular coordinates can be computed with the following equation:
P = kK + lL + mM
An example will make this clear. Suppose the vertices of Triangle KLM are given as K = (13, 3), L = (9, 10), and M = (1, 1). The following diagram computes the rectangular coordinates for points P0, P1, P2, and P3.
Figure 5: Converting Barycentric Coordinates to Rectangular Coordinates
Now that we know how to represent points on a triangle mathematically, we can examine how to determine if and where a ray intersects a point on a triangle. This is the topic of the next discussion.
1.4 Computing Ray-Triangle Intersection
If we denote a ray as O + tD and a triangle's points as kK + lL + mM, we can compute their intersection with the following equation:
O + tD = kK + lL + mM
The barycentric coordinate (k, l, m) can be represented as (k, l, 1 - k - l), so we can change this equation as follows:
O + tD = kK + lL + (1 - k - l)M
This gives us three unknowns: t, k, and l. By rearranging terms, we arrive at the following linear system:
To simplify this equation, we'll replace K - M with E, L - M with F, and O - M with G. This produces the following result:
Cramer's Rule makes it straightforward to solve for t, k, and l. The resulting equation is given as:
In this equation, vertical bars denote the determinant. That is, |A| is the determinant of A. The determinant can be computed using the formula |A B C| = -(A x C)·B = -(C x B)·A. Rearranging terms produces the following relationship:
By testing for k and l, we can find out whether the point lies on the triangle bounded by K, L, and M. Then, if t is the smallest positive value among all triangles tested, we can be sure the triangle belongs to the figure selected by the user.
2. Implementing Ray-Triangle Intersection in Code
Now that we've examined the mathematics behind pick selection, it's important to see how this algorithm can be implemented in code. This section starts by presenting regular C code that performs ray-triangle intersection. Then I'll show how this can be accelerated with OpenCL. But before I present the code, I need to present the ColGeom
data structure, which stores the model's vertex data.
2.1 The ColGeom Data Structure
This article discusses COLLADA and the ColGeom
data structure, which stores graphic data for a figure in a 3-D model. This data structure is defined as follows:
struct ColGeom {
std::string name;
SourceMap map;
GLenum primitive;
int index_count;
unsigned int* indices;
};</geometry>
The map field, of type SourceMap
, contains the geometric data of the figure represented by the structure. This type is defined with the following statement:
typedef std::map<std::string,> SourceMap;</std::string,>
The map matches the source's semantic name to its data. This name may take values such as POSITION
, NORMALS
, or TEXCOORDS
. The SourceData
element contains the actual data for the object's mesh, and is defined as follows:
struct SourceData {
GLenum type;
unsigned int size;
unsigned int stride;
void* data;
};
For the purposes of this article, the data array contains floating-point values that identify vertex coordinates. As specified by the stride
field, these vertices are accessed three at a time (x, y, and z). The indices
field tells us how the vertices combine to form triangles. For example, if the first six indices are {0, 1, 2, 1, 0, 3}, this means the first triangle consists of the first, second, and third vertices, and the second triangle consists of the second, first, and fourth vertices. Order is important—the orientation of the vertices (clockwise or counterclockwise) determines which triangles are culled by the renderer and which aren't.
2.2 Implementing Pick Selection in C
Now that we know how to access graphic data, we can create a routine that will perform pick selection through ray-triangle intersection. The following code shows how this can be accomplished in C. Note that the mouse coordinates are given as x and y, the index array for the mesh is accessed through geom_vec[i].indices
, and the vertex data is accessed through geom_vec[i].map["POSITION"].data
.
glm::vec4 origin = mvp_inverse * glm::vec4(
(x-half_width)/half_width, (half_height-y)/half_height, -1, 1);
glm::vec4 dir = mvp_inverse * glm::vec4(0, 0, 1, 0);
O = glm::vec3(origin.x, origin.y, origin.z);
D = glm::normalize(glm::vec3(dir.x, dir.y, dir.z));
tmin = 1000.0;
for(i=0; i<num_objects; i++) {
data_array = (float*)geom_vec[i].map["POSITION"].data;
for(j=0; j<geom_vec[i].index_count; j+=3) {
index = geom_vec[i].indices[j]*3;
K = glm::vec3(data_array[index],
data_array[index+1],
data_array[index+2]);
index = geom_vec[i].indices[j+1]*3;
L = glm::vec3(data_array[index],
data_array[index+1],
data_array[index+2]);
index = geom_vec[i].indices[j+2]*3;
M = glm::vec3(data_array[index],
data_array[index+1],
data_array[index+2]);
E = K - M;
F = L - M;
G = O - M;
tkl = 1/glm::dot(glm::cross(D, F), E) *
glm::vec3(glm::dot(glm::cross(G, E), F),
glm::dot(glm::cross(D, F), G),
glm::dot(glm::cross(G, E), D));
if(tkl.x < tmin[i] && tkl.y > 0.0f && tkl.z > 0.0f && (tkl.y + tkl.z) < 1) {
tmin = tkl.x;
sphere = i;
}
}
}
This code consists of two loops: one that iterates through each figure in the model and one that iterates through each triangle in a figure. Each time a triangle is processed, the computed distance, tkl.x
, is compared to the minimum distance, tmin
. If tkl.x
is less than tmin
and the intersection point lies inside the triangle, the output value, sphere
, is set equal to the object index, i
. Once all the triangles in the model have been processed, tmin
must be tested. If tmin
is less than 1000, the value of sphere
will identify the object selected by the user. If tmin
equals 1000, it means the ray didn't intersect with any of the triangles and that the user's mouse click didn't land on any of the figures.
2.3 Implementing Pick Selection in OpenCL
The C code presented earlier gets the job done, but if there are many figures in the model, iterating through all their triangles can take a great deal of time. This is where OpenCL shines. An OpenCL routine, called a kernel, can be executed by thousands of threads at the same time. Therefore, if we send all the triangles for a figure to a GPU at once, the work-items will process them in parallel, thereby achieving better performance than a regular C application could provide.
The application associated with this article renders the model shown in Figure 1 using OpenGL and performs pick selection using OpenCL. The overall application works as follows:
- When the application starts, it reads a COLLADA file (spheres.dae) and initializes a
ColGeom
structure for each of the ten spheres. For each sphere, the application also creates one vertex array object (VAO), one index buffer object (IBO), and two vertex buffer objects (VBOs). The first VBO stores the coordinates of the sphere's vertices and the second stores the normal vector components at each vertex. - When the user clicks inside the window, the GLUT callback function (
mouse
) converts the window coordinates of the mouse event to object coordinates and invokes a function called execute_selection_kernel
. - The
execute_selection_kernel
function creates kernel arguments from O and D and then iterates through each figure in the model. With each iteration, the function creates two OpenCL buffer objects: one from the VBO containing the figure's vertex locations and one from the IBO containing the figure's indices. After making these buffer objects into kernel arguments, the function executes the kernel. - The
pick_sphere
kernel implements ray-triangle intersection. More precisely, each work-item uses O, D, K, L, and M to compute E, F, and G for a given triangle. Then it computes and tests the barycentric coordinates k
and l
. If these coordinates have acceptable values, the work-item computes t
, which determines the distance from the ray's origin to the triangle. After each work-item in a work-group finishes its computation, the work-group determines the smallest value of t
and returns this to the application. - The application finds the smallest value of
t
and determines the corresponding sphere. It sets the color of this sphere to white, thereby indicating selection.
The following code presents the OpenCL instructions in the pick_selection
kernel. Each work-item reads the three coordinates of a triangle, and its goal is to compute a value of t_loc
, which identifies the distance from the ray's origin to the triangle. If the ray doesn't intersect the triangle, the distance from the origin to the triangle remains at 10000.0.
t_loc[get_local_id(0)] = 10000.0f;
if(get_global_id(0) < 528) {
indices = vload3(get_global_id(0), ibo);
K = vload3(indices.x, vbo);
L = vload3(indices.y, vbo);
M = vload3(indices.z, vbo);
E = K - M;
F = L - M;
t_test = dot(cross(D.s012, F), E);
if(t_test > 0.0001f) {
G = O.s012 - M;
k = dot(cross(D.s012, F), G);
if(k > 0.0f && k <= t_test) {
l = dot(cross(G, E), D.s012);
if(l > 0.0f && k + l <= t_test) {
t_loc[get_local_id(0)] = dot(cross(G, E), F)/t_test;
}
}
}
}
barrier(CLK_LOCAL_MEM_FENCE);
If the initial determinant, (D x F)·E
, is negative, the triangle faces the rear of the model instead of the front. Therefore, it is not a candidate for testing. The work-items check for this condition by testing whether the determinant (t_test
) is negative.
However, if the determinant is positive and the barycentric coordinates (k
and l
) are valid, the work-item sets t_loc
equal to ((G x E)·F)/
determinant. Afterward, the first work-item in each work-group searches for the smallest value of t_loc
and passes it to the host application.
3. Conclusion
Pick selection makes it possible for users to interact with 3-D models using 2-D mouse clicks. This procedure is modeled mathematically using ray-triangle intersection. This article has presented the theory of this method and has shown how to implement the theory using regular C and OpenCL.
The primary advantage of executing OpenCL kernels on GPUs is that the kernel can be executed by hundreds or thousands of work-items at the same time. The pick selection code in this article takes advantage of this, and when a kernel executes on the GPU, the work-items process the figure's triangles in parallel. However, performance can be improved further by sending the vertices of every figure to the GPU in a single kernel.
4. Using the code
The code archive for this article contains the source files needed to execute the application. It also contains the COLLADA file for the sphere (sphere.dae) and a Makefile for the project.
History
Submitted for editor approval: 7/24/2013