Introduction
PCs come with an amazingly powerful device: a graphics processing
unit (GPU). It is mostly underutilized, often doing little more than
rendering a desktop to the user. But computing on the GPU is
refreshingly fast compared to conventional CPU processing whenever
significant portions of your program can be run in parallel. The
applications are seemingly endless including: matrix computations,
signal transformations, random number generation, molecular modeling,
and password recovery. Why are GPUs are so effective? They have
hundreds, in some cases thousands, of cores available for parallel
processing. Compare this to the typical one to four CPU cores on
today's PCs. (For a more technical treatment see:
graphics.stanford.edu/~mhouston/public_talks/cs448-gpgpu.pdf
Here I present a way to use the power of NVidia's Cuda-enabled
GPUs for computing using Java with an Eclipse-based IDE. My platform
is Debian Wheezy (64 and 32 bit), but I have also reproduced the
process on Linux Mint 13, and it can be done on many other Linux
distributions. The approach can be adapted to a Windows install, a
process that is well documented elsewhere.
Update Notes
This is a September 2013 update of the original article. Since
writing this article, there are many new developments particularly in
regard to the process for installing the NVidia Development driver on
Linux. As distros evolve, it has become increasingly difficult to
disable the Nouveau driver, a requirement for installing the NVidia
driver. Also, occasionally the compiler (gcc) that ships with the
distro differs from the compiler used to compile the OS's kernel
itself. Finally, Linux systems using the NVidia Optimus technology
require additional gymnastics to configure the driver.
Background
Easily accessing the power of the GPU for general purpose
computing requires a GPU programming utility that exposes a set of
high-level methods and does all of the granular, hardware-level work
for us. The popular choices are OpenGl and Cuda. Cuda works only with
NVidia GPUs. I prefer NVidia devices and this article presents a Cuda
solution.
Eclipse is my favorite IDE for programming in Java, C++, and PHP.
NVidia provides an Eclipse-based IDE called Nsight, which is
pre-configured for Cuda C++ development. Other features, like Java,
PHP, etc., can be added to your Nsight installation from compatible
Eclipse software repositories (e.g. Nsight 5.5 is compatible with the
Eclipse Juno repository).
Direct programming with Cuda requires using unmanaged C++ code. I
prefer programming with managed code. To do this I use a method for
wrapping the C++ functionality of Cuda in bindings that are
accessible to Java. In the past, on a Windows 7 platform, I wrote my own wrappers for use with C#.net code (see my CodeProject Article). With Java, this
is not necessary because open source wrappers are available. I use
JCuda.
There are four basic elements presented here:
- Determining if you have a
compatible GPU
- Installing/configuring Cuda
- Configuring Nsight for Java
- Utilizing JCuda
Sometimes tutorials present steps that the writer followed on an
existing production machine that already had certain prerequisite
configurations in place. Consequently, when a reader follows the
steps, the procedure may fail. To avoid this, I tested the process
described below from fresh installs of Mint 13_64 bit, Linux Mint
13_32 bit, Debian Wheezy x32, and Debian Wheezy x64. For Mint, I
chose the Mate flavor in both cases. Here are the details of my
demonstration machines:
- Mint 13-Mate x64 and Debian Wheezy
x64 were used for my AMD 64 machine with a GeForce GTX 560 Ti GPU
- Mint 13-Mate x32 and Debian Wheezy
x32 used for my Intel 32 machine with a Quadro NVS 160M GPU)
- Fresh OS installs were fully
updated with update manager.
- No other software was added except
gedit for consistency in writing this tutorial
- No other hardware configurations were performed prior to
testing
Special Considerations
Stable, Long Term Service releases for distributions were
explicitly chosen for this project. Interim, releases frequently
change certain basic hardware configurations and filesystem
arrangements. After reviewing and contributing to several hundred
Linux forum posts, I am certain that you will experience fewer
headaches if you do the same.
On Linux systems there are configuration complications with
systems that use the NVidia Optimus technology. Simply stated, GPU
tasks that do not require the high-performance of the NVidia GPU are
delegated to a lower-performance, lower-power consumption GPU,
typically Intel devices. This process is currently not well
implemented on Linux machines. But, it can be made to work! If you
are lucky, your machine has a BIOs setting for disabling Optimus
integration, but many PC manufacturers do not bother to provide this
option. Enter Bumblebee, a program that allows you to specify the GPU
to use for a given application. Because I have not constructed a test
on an Optimus system, details for Optimus-enabled GPUs are not
provided here and you will have to research the Bumblebee gymnastics
independently. Later, when you configure eclipse for JCuda, my
understanding is that Eclipse (and Nsight) can be run with optirun
eclipse and the proper GPU will
be used for debugging your programs. Here are some promising resources: http://forums.linuxmint.com/viewtopic.php?f=47&t=144049
(post # 7) and http://bumblebee-project.org/install.html
Computationally intensive applications, e.g. Fourier
transforms, whether they are done on the CPU or the GPU, will give
your system a stress test. Start small and monitor system
temperatures when you have high computational overhead.
Setup
Step 1: Do you have a compatible GPU?
NVidia has an exhaustive list of Cuda-compatible GPUs on their
Developer Zone web site: http://developer.NVidia.com/Cuda-gpus. Check
to see if yours is listed. Also, determine whether your machine uses
the NVidia Optimus technology and, if it does, see the note above.
Step 2: Install dependencies:
There are some prerequisites. From a terminal, run the following
commands to get them:
- sudo apt-get update
- sudo apt-get install -y
linux-headers-$(uname -r)
- sudo apt-get install freeglut3-dev build-essential libx11-dev
libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev
gcc
Step 3: Download the CUDA Production Release and install
Download the latest Cuda release from:
https://developer.NVidia.com/Cuda-downloads. (Note: The NVidia site
only shows Ubuntu releases for Debian forks like Mint. The Cuda
releases for Ubuntu work well with Mint LTS 13 and Debian Wheezy.)
Select the proper 32/64 choice and prefer the .run file over the .deb
file. My most recent download was cuda_5.5.22_linux_32.run (or
cuda_5.5.22_linux_64.run).
Split the installer into its three
component installer scripts: toolkit, driver, and samples. This
fine-grained control is a great benefit if/when troubles occur. Here
is the syntax for splitting the installer.
sh
cuda_5.5.22_linux_32.run -extract=<theCompletePathToYourDestination>
or
sh cuda_5.5.22_linux_64.run
-extract=<theCompletePathToYourDestination>
The following three files are
created:
- NVidia-Linux-x86-319.37.run or
NVidia-Linux-x64-319.37.run (AKA Developer
Driver)
- cuda-linux-rel-5.5.22-16488124.run (AKA
Toolkit)
- cuda-samples-linux-5.5.22-16488124.run
Install the developer driver
We start by installing the NVidia developer driver. This step
creates the most trouble for Linux users because it varies
substantially from distro to distro. Before you do anything; print
this page, save your work, and be sure you are backed-up.
You cannot have an X server running when you install the developer
drivers. Do a preliminary test to make sure you can drop to a console
and stop your X server. Simultaneously press [ctrl][alt][f2]. If you
are lucky your desktop shows a console prompting you to login. If so,
login and stop the display manager:
- sudo service mdm stop (for Mint
desktops)
- sudo service gdm3 stop (for Gnome
3 desktops)
- sudo service lightdm stop (for xfce desktops)
You should now see the console. If you see a blank screen, do
[ctrl]+[alt]+[f2] again. Now you can either run sudo reboot or
startx to return to your desktop. If this test fails, then you
should install your package manager's NVidia non-free driver, then
try it again... even though in a subsequent step we will be removing
it.
Debian and it's siblings use a default driver called nouveau,
a wonderful, open-source solution for NVidia GPU's that is totally
incompatible with NVidia Cuda development. It must be disabled at
boot time. One way is to modify grub:
gksu gedit /etc/default/grub
Find the line that reads: “GRUB_CMDLINE_LINUX_DEFAULT=...” and
make it read:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nouveau.modeset=0"
Save the file, close gedit, and run:
sudo update-grub
sudo reboot
Another more conservative way is to interrupt the grub bootloader and
manually insert the nouveau.modeset=0 phrase as a one-time boot
option. To do this, your grub configuration must have a timeout that
enables you to view the grub menu. At the grub menu, highlight your
default boot option and press
e to get the grub command line.
Find the line that reads "
Linux ..." and add nouveau.modeset=0 to the
end of the line. Press [cntl][x] to start.
If you use this method, you will need to repeat this process if
you reboot before the driver is installed and nouveau is removed. Here's a reference that
presents the basic idea on a Mint distro:
http://community.linuxmint.com/tutorial/view/842
Next, edit your blacklist configuration file (gksu gedit
/etc/modprobe.d/blacklist.conf) and add these lines to the end:
- blacklist amd76x_edac
- blacklist vga16fb
- blacklist nouveau
- blacklist rivafb
- blacklist NVidiafb
- blacklist rivatv
Then, remove everything
NVidia from the system with:
sudo apt-get remove --purge NVidia*
Drop to a console ([ctrl][alt][f2]), exit the X server (e.g. sudo
service mdm stop), and run the installer:
sudo
sh NVIDIA-Linux-x86-319.37.run (or sudo sh
NVIDIA-Linux-x64-319.37.run)
- Read/accept EULA
- At question: "register kernel
module sources with DKMS", I said YES.
- At question (64 bit only):
"Install 32-bit OpenGL compatibility", I said NO.
- At question: "run the
NVidia-xconfig utility", I said YES.
- On one test machine where I had
not disabled nouveau at boot, the installer asked me if I wanted it
to attempt to remove nouveau. This works occasionally, but
don't count on it.
- When complete, reboot. Hopefully you will see the NVidia
splash screen when your desktop loads.
Your installer may fail. The most common errors are that a display
manager is in use or that there is a conflict (with nouveau).
Retracing the steps above will remedy these problems. But, sometimes
an error will occur if the distro's kernel was compiled with an
earlier version of gcc. (You'll see something like: The compiler
used to compile the kernel (gcc 4.6) does not exactly match the
current compiler (gcc 4.7).) Occasionally selecting to ignore
this will work, but again, don't count on it. You need to install the
gcc version used to compile the kernel (e.g. 4.6 in the example
above). Do this using your preferred package manager. Next, because
your machine now has two gcc versions, we need to create
alternatives. Using the example of gcc 4.6 and gcc 4.7 we run:
sudo update-alternatives --install /usr/bin/gcc gcc
/usr/bin/gcc-4.6 10
sudo update-alternatives --install
/usr/bin/gcc gcc /usr/bin/gcc-4.7 20
Now, when you run:
sudo
update-alternatives --config gcc
You can pick gcc 4.6 as the active version. Later, after
the install, you can switch it back.
Install the Cuda Toolkit
Whew! Now it gets easier. Next, we install the toolkit with:
sudo sh
cuda-linux-rel-5.5.22-16488124.run (or sudo sh
cuda-linux64-rel-5.5.22-16488124.run)
(If you see a gcc version error, see Your installer may fail
under Install the Developer Driver above.)
Your toolkit install console will present the following text when
it is complete:
* Please make sure your PATH includes
/usr/local/cuda-5.5/bin
* Please make sure your LD_LIBRARY_PATH
*
for 32-bit Linux distributions includes /usr/local/cuda-5.5/lib
*
for 64-bit Linux distributions includes
/usr/local/cuda-5.5/lib64:/usr/local/cuda-5.5/lib
* OR
* for
32-bit Linux distributions add /usr/local/cuda-5.5/lib
* for
64-bit Linux distributions add /usr/local/cuda-5.5/lib64 and
/usr/local/cuda-5.5/lib
* to /etc/ld.so.conf and run
ldconfig as root
Save time and frustration
Set your additional paths persistently by editing (creating if
necessary) the .profile file in your home directory. Add
PATH=$PATH:/usr/local/cuda-5.5/bin to the end of the file,
save, then logout and login.
Use a persistent, modular approach for managing your
LD_LIBRARY_PATH. I never edit the /etc/ld.so.conf file.
Rather, my ld.so.conf file contains the line: include
/etc/ld.so.conf.d/*.conf. I create a new file in the
/etc/ld.so.conf.d folder named cuda.conf that has the
following line(s):
- /usr/local/cuda-5.5/lib
- /usr/local/cuda-5.5/lib64 (64 bit installs only)
Then run sudo ldconfig.
Step 4: Test CUDA Using NVidia CUDA Samples
Install the samples by running your third, split-out installer
script:
sudo sh cuda-samples-linux-5.5.22-16488124.run
Now let's run a test. From a terminal, change to the folder where
the deviceQuery sample is located (default is
/usr/local/cuda-5.5/samples/1_Utilities/deviceQuery). Make the
sample with the system compiler:
sudo make
(If you see a gcc version error when you run sudo make, see Your
installer may fail under Install the Developer Driver above.)
Then, run the sample with:
./deviceQuery
I see the following on my 64 bit test system:
/usr/local/cuda-5.5/samples/1_Utilities/deviceQuery $
.
/deviceQuery ./deviceQuery Starting...
Cuda
Device Query (Runtime API) version (CudaRT static linking)
Detected
1 Cuda Capable device(s)
Device 0: "GeForce GTX 560
Ti"
etc., etc., ...
Runtime
Version = 5.5, NumDevs = 1, Device0 = GeForce GTX 560 T
Step 5: Start the Nsight Eclipse edition
Nsight is a fork of Eclipse that is pre-configured for C++ and
Cuda. It is included in your toolkit install (you already have it).
For now, run it from a terminal:
/usr/local/cuda-5.5/libnsight/nsight. (Do not double-click the
file from your file manager.) Later you can make a desktop launcher.
Go ahead and choose the default folder for projects that it
recommends.
Let's test it.
- File > New > Cuda C++
Project
- Pick Import Cuda Sample
- Name the project test
- Click Next
- In the samples list pick Bandwidth
Test
- Click Next
- Basic settings - use defaults
- Click Finish
- From the Project menu: Project
> Build Project
- From the Run menu: Run > Run
My output in the console window is:
[Cuda Bandwidth Test] - Starting...
Running on..Device
0:
GeForce GTX 560 Ti.
etc., ...
Step 6: Configure Nsight for Java Development
Nsight can be expanded through Help>Install New Software.
To add Java development, you need to add
http://download.eclipse.org/releases/juno to your Available
Software Sites. (Note: the Kepler repository does not work as of
Nsight 5.5) Then, install Eclipse Java Development Tools.
Follow the install dialog and restart Nsight.
Step 7: Download and Get Started with the JCuda Bindings
Download the zip for your platform from
http://www.jCuda.org/downloads/downloads.html.
Extract it to a folder in your home directory. Then start Nsight.
Create a new Java Project (File > New > Java Project)
and name it JCudaHello. Right-click the JCudaHello
project in the project explorer and select Properties. Go to
the Java Build Path tree item and select the Libraries
tab. Click Add External Jars, navigate to the extracted folder
you created, and pick jCuda-0.5.5.jar.
With the Libraries tab still open, expand the tree for the
jCuda-0.5.5.jar you
added and click on Native library location (none). Then click
the Edit button. You will be asked for a location. Click
External Folder and again navigate to the extracted folder.
Click OK.
Now, right-click your src folder in the jcudaHello project from
the Project Explorer and select New > Class. Name the
class cudaTest and select the public static void main
method stub:
Click Finish. Delete the code that is pre-generated in cudaTest.java
from the editor pane and paste this in:
import jcuda.Pointer;
import jcuda.runtime.JCuda;
public class test {
public static
void main(String[] args) {
Pointer pointer = new Pointer();
JCuda.cudaMalloc(pointer, 4);
System.out.println("Pointer: " + pointer);
JCuda.cudaFree(pointer);
}
}
When you run it, you should see something like this:
Pointer:
Pointer[nativePointer=0x800100000,byteOffset=0]
Using the project code
The project code is a zipped Eclipse workspace that does not
include any hidden meta-data folders or information files. When you
unzip it to your location of choice, you will see two
sub-directories: JCudaFftDemo and Notes.
First, we need to create an Nsight Java project from the existing
sources in the JCudaFftDemo folder. Start Nsight and choose
your extracted directory (parent directory for JCudaFftDemo) when it
asks you to select a workspace. Create a new Java Project from the
File menu and give it the exact name: JCudaFftDemo. Then,
click Finish. If you expand the trees for the project in the
Project Explorer you should see:
Next, you need to add the JCuda binaries to the Java Build Path.
Right-click the JCudaFftDemo project in the Project Explorer
and select Properties. Go to the Java Build Path tree
item and select the Libraries tab. Click Add External Jars,
navigate to the JCuda binaries you downloaded in Setup – Step 7,
and pick jCuda-0.5.5.jar,
jcublas-0.5.5.jar, and jcufft-0.5.5.jar.
With the Libraries tab still open, one at a time,
expand the trees for the jars you added and click on Native
library location (none). Click the Edit button and set the
location to match your JCuda binaries directory. (We are repeating
Step 7 in the above Setup section, this time for the new
project.)
Then, run it as a Java application. Here is the output console from
my Linux Mint 13, 32 bit laptop:
Creating sin wave input data: Frequency = 11.0, N = 1048576, dt =
5.0E-5 ...
L2 Norm of original signal: 724.10583
Performing a 1D C2C FFT on GPU with JCufft...
GPU FFT time:
0.121 seconds
Performing a 1D C2C FFT on CPU...
CPU time: 3.698 seconds
GPU FFT L2 Norm: 741484.3
CPU FFT L2 Norm: 741484.4
Index at maximum in GPU power
spectrum = 572, frequency = 10.910034
Index at maximum in CPU
power spectrum = 572, frequency = 10.910034
Performing 1D C2C IFFT(FFT) on GPU with JCufft...
GPU time:
0.231 seconds
Performing 1D C2C IFFT(FFT) on CPU...
CPU time: 3.992 seconds
GPU FFT L2 Norm: 724.1056
CPU FFT L2 Norm: 724.10583
More about the project code
First, a word about complex data arrays; CUDA and JCuda can work
with data arrays that contain complex vectors of type float or double,
provided you construct the array as an interleaved, complex number
sequence. This is best demonstrated with an example. Let’s say we have
a complex vector of length 2: (1 + 2i, 3 + 4i). The corresponding
interleaved data array has a length of 4 and has the form: (1, 2, 3,
4). In the project code I use this format for all complex vectors that
are submitted to JCuda methods.
In contrast, for CPU coding simplicity, I use a ComplexFloat
class
to represent complex numbers. When using this class to from a complex
vector, the vector x = (1 + 2i, 3 + 4i) has the form ComplexFloat[2] =
(x[0].Real = 1, x[0].Imaginary = 2, x[1].Real = 3, x[1].Imaginary =
4). The array, and the vector it represents, both have the same
length: 2.
Main.java is the entry point for the application. It creates a
sample signal and performs the demo. The signal produced is:
sin(2*pi*FREQ *t) sampled N times in increments of dT. The demo
computes forward and inverse Fourier transforms of the test signal
— both on the GPU and the CPU — and provides execution
times and signal characteristics for the results.
The CPU FFT part of the code (FftCpuFloat.java) purposely implements
the Cooley–Tukey algorithm in an awkward way that depends on instances
of the ComplexFloat.java class. Little attention is paid to memory
allocation and access. Also, although I have multi-core CPUs, my CPU
thread executes on only one core. Doing this makes the radix-2
procedure intuitive and simple, but there is an overhead cost that
will overstate the advantage of using the GPU.
You can adjust the constants (FREQ, N, and dT) for creating the test
signal from the Main.java class. Using a Linux 32 bit
installation on an older Dell laptop I found that, by varying the
length of the test signal (N), the CPU FFT outperformed the JCuda FFT
with signals that had fewer than 4096 complex elements. Thereafter,
the JCuda FFT speeds overwhelmed my CPU FFT. At N = 4194304, JCuda was
250 times faster than the CPU FFT (CPU = 23 seconds, GPU = 0.9
seconds). Beyond that, the laptop fans blaze during the CPU
computation loop (system temp: 90 C) and fear of thermal overload
prompted me to curtail testing. (My Linux 64 bit desktop, has a 6 core
AMD Phenom II on a Sabretooth mombo, 16 GiB of memory, a GeForce GTX
560 Ti graphics card, and some great fans. It can process FFTs (CPU or
GPU) all night provided I manage memory effectively.)
A fair amount of the speed advantage I observe is due to the
inefficiency of my poorly optimized CPU implementation. More rigorous
CPU/GPU evaluations using optimized CPU code suggest that gains are
roughly 10X. I'll take 10X over 1X, but the practical reality is; the
the power of CUDA's underlying implementation efficiency together with
the intrinsic GPU gain (whatever it really is), collectively gives me
an average 50X boost.
The Notes folder in the project download includes some tips on
how to run a deployed, runnable jar. Basically, you need to use the -Djava.libraries.path
switch to point to your JCuda binaries folder.
In conclusion
Getting setup and becoming acquainted with CUDA, JCuda, and Nsight
takes a fair amount of work. But it's worth it. General-purpose
computing on graphics processing units (GPGPU) is a very important
tool to have in your coding toolbox. I hope this article helps make
the process more accessible to other GPGPU novices like me. I wish you
success as a cutting-edge JCuda coder!
Some references