Something More for Research

Explorer of Research #HEMBAD

Archive for the ‘Open CL’ Category

Open CL

Building a Beowulf cluster with Ubuntu

Posted by Hemprasad Y. Badgujar on December 25, 2014

Building a Beowulf cluster with Ubuntu

The beowulf cluster article on Wikipedia describes the Beowulf cluster as follows:

“A Beowulf cluster is a group of what are normally identical, commercially available computers, which are running a Free and Open Source Software (FOSS), Unix-like operating system, such as BSD, GNU/Linux, or Solaris. They are networked into a small TCP/IP LAN, and have libraries and programs installed which allow processing to be shared among them.” – Wikipedia, Beowulf cluster, 28 February 2011.

This means a Beowulf cluster can be easily built with “off the shelf” computers running GNU/Linux in a simple home network. So building a Beowulf like cluster is within reach if you already have a small TCP/IP LAN at home with desktop computers running Ubuntu Linux (or any other Linux distribution).

There are many ways to install and configure a cluster. There is OSCAR(1), which allows any user, regardless of experience, to easily install a Beowulf type cluster on supported Linux distributions. It installs and configures all required software according to user input.

There is also the NPACI Rocks toolkit(2), which incorporates the latest Red Hat distribution and cluster-specific software. Rocks addresses the difficulties of deploying manageable clusters. Rocks makes clusters easy to deploy, manage, upgrade and scale.

Both of the afore mentioned toolkits for deploying clusters were made to be easy to use and require minimal expertise from the user. But the purpose of this tutorial is to explain how to manually build a Beowulf like cluster. Basically, the toolkits mentioned above do most of the installing and configuring for you, rendering the learning experience mute. So it would not make much sense to use any of these toolkits if you want to learn the basics of how a cluster works. This tutorial therefore explains how to manually build a cluster, by manually installing and configuring the required tools. In this tutorial I assume that you have some basic knowledge of the Linux-based operating system and know your way around the command line. I tried however to make this as easy as possible to follow. Keep in mind that this is new territory for me as well and there’s a good chance that this tutorial shows methods that may not be the best.

I myself started off with the clustering tutorial from SCFBio which gives a great explanation on how to build a simple Beowulf cluster.(3) It describes the prerequisites for building a Beowulf cluster and why these are needed.


  • What’s a Beowulf Cluster, exactly?
  • Building a virtual Beowulf Cluster
  • Building the actual cluster
  • Configuring the Nodes
    • Add the nodes to the hosts file
    • Defining a user for running MPI jobs
    • Install and setup the Network File System
    • Setup passwordless SSH for communication between nodes
    • Setting up the process manager
      • Setting up Hydra
      • Setting up MPD
  • Running jobs on the cluster
    • Running MPICH2 example applications on the cluster
    • Running bioinformatics tools on the cluster
  • Credits
  • References

What’s a Beowulf Cluster, exactly?

The typical setup of a beowulf cluster

The definition I cited before is not very complete. The book “Engineering a Beowulf-style Compute Cluster”(4) by Robert G. Brown gives a more detailed answer to this question (if you’re serious about this, this book is a must read). According to this book, there is an accepted definition of a beowulf cluster. This book describes the true beowulf as a cluster of computers interconnected with a network with the following characteristics:

  1. The nodes are dedicated to the beowulf cluster.
  2. The network on which the nodes reside are dedicated to the beowulf cluster.
  3. The nodes are Mass Market Commercial-Off-The-Shelf (M2COTS) computers.
  4. The network is also a COTS entity.
  5. The nodes all run open source software.
  6. The resulting cluster is used for High Performance Computing (HPC).

Building a virtual Beowulf Cluster

It is not a bad idea to start by building a virtual cluster using virtualization software like VirtualBox. I simply used my laptop running Ubuntu as the master node, and two virtual computing nodes running Ubuntu Server Edition were created in VirtualBox. The virtual cluster allows you to build and test the cluster without the need for the extra hardware. However, this method is only meant for testing and not suited if you want increased performance.

When it comes to configuring the nodes for the cluster, building a virtual cluster is practically the same as building a cluster with actual machines. The difference is that you don’t have to worry about the hardware as much. You do have to properly configure the virtual network interfaces of the virtual nodes. They need to be configured in a way that the master node (e.g. the computer on which the virtual nodes are running) has network access to the virtual nodes, and vice versa.

Building the actual cluster

It is good practice to first build and test a virtual cluster as described above. If you have some spare computers and network parts lying around, you can use those to build the actual cluster. The nodes (the computers that are part of the cluster) and the network hardware are the usual kind available to the general public (beowulf requirement 3 and 4). In this tutorial we’ll use the Ubuntu operating system to power the machines and open source software to allow for distributed parallel computing (beowulf requirement 5). We’ll test the cluster with cluster specific versions of bioinformaticstools that perform some sort of heavy calculations (beowulf requirement 6).

The cluster consists of the following hardware parts:

  • Network
  • Server / Head / Master Node (common names for the same machine)
  • Compute Nodes
  • Gateway

All nodes (including the master node) run the following software:

I will not focus on setting up the network (parts) in this tutorial. I assume that all nodes are part of the same private network and that they are properly connected.

Configuring the Nodes

Some configurations need to be made to the nodes. I’ll walk you through them one by one.

Add the nodes to the hosts file

It is easier if the nodes can be accessed with their host name rather than their IP address. It will also make things a lot easier later on. To do this, add the nodes to the hosts file of all nodes.(8) (9) All nodes should have a static local IP address set. I won’t go into details here as this is outside the scope of this tutorial. For this tutorial I assume that all nodes are already properly configured to have a static local IP address.

Edit the hosts file (sudo vim /etc/hosts) like below and remember that you need to do this for all nodes,	localhost	master	node1	node2	node3

Make sure it doesn’t look like this:	localhost	master	node1	node2	node3

neither like this:	localhost	master	master	node1	node2	node3

Otherwise other nodes will try to connect to localhost when trying to reach the master node.

Once saved, you can use the host names to connect to the other nodes,

$ ping -c 3 master
PING master ( 56(84) bytes of data.
64 bytes from master ( icmp_req=1 ttl=64 time=0.606 ms
64 bytes from master ( icmp_req=2 ttl=64 time=0.552 ms
64 bytes from master ( icmp_req=3 ttl=64 time=0.549 ms

--- master ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.549/0.569/0.606/0.026 ms

Try this with different nodes on different nodes. You should get a response similar to the above.

In this tutorial, master is used as the master node. Once the cluster has been set up, the master node will be used to start jobs on the cluster. The master node will be used to spawn jobs on the cluster. The compute nodes are node1 to node3 and will thus execute the jobs.

Defining a user for running MPI jobs

Several tutorials explain that all nodes need a separate user for running MPI jobs.(8) (9) (6) I haven’t found a clear explanation to why this is necessary, but there could be several reasons:

  1. There’s no need to remember different user names and passwords if all nodes use the same username and password.
  2. MPICH2 can use SSH for communication between nodes. Passwordless login with the use of authorized keys only works if the username matches the one set for passwordless login. You don’t have to worry about this if all nodes use the same username.
  3. The NFS directory can be made accessible for the MPI users only. The MPI users all need to have the same user ID for this to work.
  4. The separate user might require special permissions.

The command below creates a new user with username “mpiuser” and user ID 999. Giving a user ID below 1000 prevents the user from showing up in the login screen for desktop versions of Ubuntu. It is important that all MPI users have the same username and user ID. The user IDs for the MPI users need to be the same because we give access to the MPI user on the NFS directory later. Permissions on NFS directories are checked with user IDs. Create the user like this,

$ sudo adduser mpiuser --uid 999

You may use a different user ID (as long as it is the same for all MPI users). Enter a password for the user when prompted. It’s recommended to give the same password on all nodes so you have to remember just one password. The above command should also create a new directory/home/mpiuser. This is the home directory for user mpiuser and we will use it to execute jobs on the cluster.

Install and setup the Network File System

Files and programs used for MPI jobs (jobs that are run in parallel on the cluster) need to be available to all nodes, so we give all nodes access to a part of the file system on the master node. Network File System (NFS) enables you to mount part of a remote file system so you can access it as if it is a local directory. To install NFS, run the following command on the master node:

master:~$ sudo apt-get install nfs-kernel-server

And in order to make it possible to mount a Network File System on the compute nodes, the nfs-common package needs to be installed on all compute nodes:

$ sudo apt-get install nfs-common

We will use NFS to share the MPI user’s home directory (i.e. /home/mpiuser) with the compute nodes. It is important that this directory is owned by the MPI user so that all MPI users can access this directory. But since we created this home directory with the adduser command earlier, it is already owned by the MPI user,

master:~$ ls -l /home/ | grep mpiuser
drwxr-xr-x   7 mpiuser mpiuser  4096 May 11 15:47 mpiuser

If you use a different directory that is not currently owned by the MPI user, you must change it’s ownership as follows,

master:~$ sudo chown mpiuser:mpiuser /path/to/shared/dir

Now we share the /home/mpiuser directory of the master node with all other nodes. For this the file /etc/exports on the master node needs to be edited. Add the following line to this file,

/home/mpiuser *(rw,sync,no_subtree_check)

You can read the man page to learn more about the exports file (man exports). After the first install you may need to restart the NFS daemon:

master:~$ sudo service nfs-kernel-server restart

This also exports the directores listed in /etc/exports. In the future when the /etc/exports file is modified, you need to run the following command to export the directories listed in /etc/exports:

master:~$ sudo exportfs -a

The /home/mpiuser directory should now be shared through NFS. In order to test this, you can run the following command from a compute node:

$ showmount -e master

In this case this should print the path /home/mpiuser. All data files and programs that will be used for running an MPI job must be placed in this directory on the master node. The other nodes will then be able to access these files through NFS.

The firewall is by default enabled on Ubuntu. The firewall will block access when a client tries to access an NFS shared directory. So you need to add a rule with UFW (a tool for managing the firewall) to allow access from a specific subnet. If the IP addresses in your network have the format192.168.1.*, then is the subnet. Run the following command to allow incoming access from a specific subnet,

master:~$ sudo ufw allow from

You need to run this on the master node and replace “” by the subnet for your network.

You should then be able to mount master:/home/mpiuser on the compute nodes. Run the following commands to test this,

node1:~$ sudo mount master:/home/mpiuser /home/mpiuser
node2:~$ sudo mount master:/home/mpiuser /home/mpiuser
node3:~$ sudo mount master:/home/mpiuser /home/mpiuser

If this fails or hangs, restart the compute node and try again. If the above command runs without a problem, you should test whether/home/mpiuser on any compute node actually has the content from /home/mpiuser of the master node. You can test this by creating a file inmaster:/home/mpiuser and check if that same file appears in node*:/home/mpiuser (where node* is any compute node).

If mounting the NFS shared directory works, we can make it so that the master:/home/mpiuser directory is automatically mounted when the compute nodes are booted. For this the file /etc/fstab needs to be edited. Add the following line to the fstab file of all compute nodes,

master:/home/mpiuser /home/mpiuser nfs

Again, read the man page of fstab if you want to know the details (man fstab). Reboot the compute nodes and list the contents of the/home/mpiuser directory on each compute node to check if you have access to the data on the master node,

$ ls /home/mpiuser

This should lists the files from the /home/mpiuser directory of the master node. If it doesn’t immediately, wait a few seconds and try again. It might take some time for the system to initialize the connection with the master node.

Setup passwordless SSH for communication between nodes

For the cluster to work, the master node needs to be able to communicate with the compute nodes, and vice versa.(8) Secure Shell (SSH) is usually used for secure remote access between computers. By setting up passwordless SSH between the nodes, the master node is able to run commands on the compute nodes. This is needed to run the MPI daemons on the available compute nodes.

First install the SSH server on all nodes:

$ sudo apt-get install ssh

Now we need to generate an SSH key for all MPI users on all nodes. The SSH key is by default created in the user’s home directory. Remember that in our case the MPI user’s home directory (i.e. /home/mpiuser) is actually the same directory for all nodes: /home/mpiuser on the master node. So if we generate an SSH key for the MPI user on one of the nodes, all nodes will automatically have an SSH key. Let’s generate an SSH key for the MPI user on the master node (but any node should be fine),

$ su mpiuser
$ ssh-keygen

When asked for a passphrase, leave it empty (hence passwordless SSH).

When done, all nodes should have an SSH key (the same key actually). The master node needs to be able to automatically login to the compute nodes. To enable this, the public SSH key of the master node needs to be added to the list of known hosts (this is usually a file~/.ssh/authorized_keys) of all compute nodes. But this is easy, since all SSH key data is stored in one location: /home/mpiuser/.ssh/ on the master node. So instead of having to copy master’s public SSH key to all compute nodes separately, we just have to copy it to master’s ownauthorized_keys file. There is a command to push the public SSH key of the currently logged in user to another computer. Run the following commands on the master node as user “mpiuser”,

mpiuser@master:~$ ssh-copy-id localhost

Master’s own public SSH key should now be copied to /home/mpiuser/.ssh/authorized_keys. But since /home/mpiuser/ (and everything under it) is shared with all nodes via NFS, all nodes should now have master’s public SSH key in the list of known hosts. This means that we should now be able to login on the compute nodes from the master node without having to enter a password,

mpiuser@master:~$ ssh node1
mpiuser@node1:~$ echo $HOSTNAME

You should now be logged in on node1 via SSH. Make sure you’re able to login to the other nodes as well.

Setting up the process manager

In this section I’ll walk you through the installation of MPICH and configuring the process manager. The process manager is needed to spawn and manage parallel jobs on the cluster. The MPICH wiki explains this nicely:

“Process managers are basically external (typically distributed) agents that spawn and manage parallel jobs. These process managers communicate with MPICH processes using a predefined interface called as PMI (process management interface). Since the interface is (informally) standardized within MPICH and its derivatives, you can use any process manager from MPICH or its derivatives with any MPI application built with MPICH or any of its derivatives, as long as they follow the same wire protocol.” – Frequently Asked Questions – Mpich.

The process manager is included with the MPICH package, so start by installing MPICH on all nodes with,

$ sudo apt-get install mpich2

MPD has been the traditional default process manager for MPICH till the 1.2.x release series. Starting the 1.3.x series, Hydra is the default process manager.(10) So depending on the version of MPICH you are using, you should either use MPD or Hydra for process management. You can check the MPICH version by running mpich2version in the terminal. Then follow either the steps for MPD or Hydra in the following sub sections.

Setting up Hydra

This section explains how to configure the Hydra process manager and is for users of MPICH 1.3.x series and up. In order to setup Hydra, we need to create one file on the master node. This file contains all the host names of the compute nodes.(11) You can create this file anywhere you want, but for simplicity we create it in the the MPI user’s home directory,

mpiuser@master:~$ cd ~
mpiuser@master:~$ touch hosts

In order to be able to send out jobs to the other nodes in the network, add the host names of all compute nodes to the hosts file,


You may choose to include master in this file, which would mean that the master node would also act as a compute node. The hosts file only needs to be present on the node that will be used to start jobs on the cluster, usually the master node. But because the home directory is shared among all nodes, all nodes will have the hosts file. For more details about setting up Hydra see this page: Using the Hydra Process Manager.

Setting up MPD

This section explains how to configure the MPD process manager and is for users of MPICH 1.2.x series and down. Before we can start any parallel jobs with MPD, we need to create two files in the home directory of the MPI user. Make sure you’re logged in as the MPI user and create the following two files in the home directory,

mpiuser@master:~$ cd ~
mpiuser@master:~$ touch mpd.hosts
mpiuser@master:~$ touch .mpd.conf

In order to be able to send out jobs to the other nodes in the network, add the host names of all compute nodes to the mpd.hosts file,


You may choose to include master in this file, which would mean that the master node would also act as a compute node. The mpd.hosts file only needs to be present on the node that will be used to start jobs on the cluster, usually the master node. But because the home directory is shared among all nodes, all nodes will have the mpd.hosts file.

The configuration file .mpd.conf (mind the dot at the beginning of the file name) must be accessible to the MPI user only (in fact, MPD refuses to work if you don’t do this),

mpiuser@master:~$ chmod 600 .mpd.conf

Then add a line with a secret passphrase to the configuration file,


The secretword can be set to any random passphrase. You may want to use a random password generator the generate a passphrase.

All nodes need to have the .mpd.conf file in the home directory of mpiuser with the same passphrase. But this is automatically the case since/home/mpiuser is shared through NFS.

The nodes should now be configured correctly. Run the following command on the master node to start the mpd deamon on all nodes,

mpiuser@master:~$ mpdboot -n 3

Replace “3” by the number of compute nodes in your cluster. If this was successful, all nodes should now be running the mpd daemon. Run the following command to check if all nodes entered the ring (and are thus running the mpd daemon),

mpiuser@master:~$ mpdtrace -l

This command should display a list of all nodes that entered the ring. Nodes listed here are running the mpd daemon and are ready to accept MPI jobs. This means that your cluster is now set up and ready to rock!

Running jobs on the cluster

Running MPICH2 example applications on the cluster

The MPICH2 package comes with a few example applications that you can run on your cluster. To obtain these examples, download the MPICH2 source package from the MPICH website and extract the archive to a directory. The directory to where you extracted the MPICH2 package should contain an “examples” directory. This directory contains the source codes of the example applications. You need to compile these yourself.

$ sudo apt-get build-dep mpich2
$ wget
$ tar -xvzf mpich2-1.4.1.tar.gz
$ cd mpich2-1.4.1/
$ ./configure
$ make
$ cd examples/

The example application cpi is compiled by default, so you can find the executable in the “examples” directory. Optionally you can build the other examples as well,

$ make hellow
$ make pmandel

Once compiled, place the executables of the examples somewhere inside the /home/mpiuser directory on the master node. It’s common practice to place executables in a “bin” directory, so create the directory /home/mpiuser/bin and place the executables in this directory. The executables should now be available on all nodes.

We’re going to run an MPI job using the example application cpi. Make sure you’re logged in as the MPI user on the master node,

$ su mpiuser

And run the job like this,

When using MPD:

mpiuser@master:~$ mpiexec -n 3 /home/mpiuser/bin/cpi

When using Hydra:

mpiuser@master:~$ mpiexec -f hosts -n 3 /home/mpiuser/bin/cpi

Replace “3” by the number of nodes on which you want to run the job. When using Hydra, the -f switch should point to the file containing the host names. When using MPD, it’s important that you use the absolute path to the executable in the above command, because only then MPD knows where to look for the executable on the compute nodes. The absolute path used should thus be correct for all nodes. But since/home/mpiuser is the NFS shared directory, all nodes have access to this path and the files within it.

The example application cpi is useful for testing because it shows on which nodes each sub process is running and how long it took to run the job. This application is however not useful to test performance because this is a very small application which takes only a few milliseconds to run. As a matter of fact, I don’t think it actually computes pi. If you look at the source, you’ll find that the value of pi is hard coded into the program.

Running bioinformatics tools on the cluster

By running actual bioinformatics tools you can give your cluster a more realistic test run. There are several parallel implementations of bioinformatics tools that are based on MPI. There are two that I currently know of:

It would be nice to test mpiBLAST, but because of a compilation issue, I was not able to do so. After some asking around at the mpiBLAST-Users mailing list, I got an answer:

“That problem is caused by a change in GCC version 4.4.X. We don’t have a fix to give out for the issue as yet, but switching to 4.3.X or lower should solve the issue for the time being.”(7)

Basically, I’m using a newer version of the GCC compiler which fails to build mpiBLAST. In order to compile it, I’d have to use an older version. But to instruct mpicc to use GCC 4.3 instead, requires that MPICH2 be compiled with GCC 4.3. Instead of going through that trouble, I’ve decided to give ClustalW-MPI a try instead.

The MPI implementation of ClustalW is fairly out-dated, but it’s good enough to perform a test run on your cluster. Download the source from the website, extract the package, and compile the source. Copy the resulting executable to the /home/mpiuser/bin directory on the master node. Use for example Entrez to search for some DNA/protein sequences and put these in a single FASTA file (the NCBI website can do that for you). Create several FASTA files with multiple sequences to test with. Copy the multi-sequence FASTA files to a data directory inside mirror (e.g./home/mpiuser/data). Then run a job like this,

When using MPD:

mpiuser@master:~$ mpiexec -n 3 /home/mpiuser/bin/clustalw-mpi /home/mpiuser/data/seq_tyrosine.fasta

When using Hydra:

mpiuser@master:~$ mpiexec -f hosts -n 3 /home/mpiuser/bin/clustalw-mpi /home/mpiuser/data/seq_tyrosine.fasta

and let the cluster do the work. Again, notice that we must use absolute paths. You can check if the nodes are actually doing anything by logging into the nodes (ssh node*) and running the top command. This should display a list of running processes with the processes using the most CPU on the top. In this case, you should see the process clustalw-mpi somewhere along the top.


Thanks to Reza Azimi for mentioning the nfs-common package.


  1. OpenClusterGroup. OSCAR.
  2. Philip M. Papadopoulos, Mason J. Katz, and Greg Bruno. NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters. October 2001, Cluster 2001: IEEE International Conference on Cluster Computing.
  3. Supercomputing Facility for Bioinformatics & Computational Biology, IIT Delhi. Clustering Tutorial.
  4. Robert G. Brown. Engineering a Beowulf-style Compute Cluster. 2004. Duke University Physics Department.
  5. Pavan Balaji, et all. MPICH2 User’s Guide, Version 1.3.2. 2011. Mathematics and Computer Science Division Argonne National Laboratory.
  6. Kerry D. Wong. A Simple Beowulf Cluster.
  7. mpiBLAST-Users: unimplemented: inlining failed in call to ‘int fprintf(FILE*, const char*, …)’
  8. Ubuntu Wiki. Setting Up an MPICH2 Cluster in Ubuntu.
  9. Building a Beowulf Cluster in just 13 steps.
  10. Frequently Asked Questions – Mpich.
  11. Using the Hydra Process Manager – Mpich.

Posted in CLUSTER, Computer Hardware, Computer Hardwares, Computer Languages, Computer Vision, Computing Technology, CUDA, Free Tools, GPU (CUDA), Linux OS, Mixed, My Research Related, Open CL, OpenCV, OpenMP, OPENMPI, PARALLEL | 2 Comments »


Posted by Hemprasad Y. Badgujar on December 19, 2014

MPI is a well-known programming model for Distributed Memory Computing. If you have access to GPU resources, MPI can be used to distribute tasks to computers, each of which can use their CPU and also GPU to process the distributed task.

My toy problem in hand is to use  a mix of MPI and CUDA to handle traditional sparse-matrix vector multiplication. The program can be structured as:

Each node uses both CPU and GPU resources
Each node uses both CPU and GPU resources
  1. Read a sparse matrix from from disk, and split it into sub-matrices.
  2. Use MPI to distribute the sub-matrices to processes.
  3. Each process would call a CUDA kernel to handle the multiplication. The result of multiplication would be copied back to each computer memory.
  4. Use MPI to gather results from each of the processes, and re-form the final matrix.

One of the options is to put both MPI and CUDA code in a single file, This program can be compiled using nvcc, which internally uses gcc/g++ to compile your C/C++ code, and linked to your MPI library:

nvcc -I/usr/mpi/gcc/openmpi-1.4.6/include -L/usr/mpi/gcc/openmpi-1.4.6/lib64 -lmpi -o program

The downside is it might end up being a plate of spaghetti, if you have some seriously long program.

Another cleaner option is to have MPI and CUDA code separate in two files: main.c and respectively. These two files can be compiled using mpicc, and nvcc respectively into object files (.o) and combined into a single executable file using mpicc. This second option is an opposite compilation of the above, using mpicc, meaning that you have to link to your CUDA library.

module load openmpi cuda #(optional) load modules on your node
mpicc -c main.c -o main.o
nvcc -arch=sm_20 -c -o multiply.o
mpicc main.o multiply.o -lcudart -L/apps/CUDA/cuda-5.0/lib64/ -o program

And finally, you can request two processes and two GPUs to test your program on the cluster using PBS script like:

#PBS -l nodes=2:ppn=2:gpus=2
mpiexec -np 2 ./program

The main.c, containing the call to CUDA file, would look like:

#include "mpi.h"
int main(int argc, char *argv[])
/* It's important to put this call at the begining of the program, after variable declarations. */
MPI_Init(argc, argv);
/* Get the number of MPI processes and the rank of this process. */
        MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
        MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
// ==== Call function 'call_me_maybe' from CUDA file ==========
/* ... */

And in, define call_me_maybe() with the ‘extern‘ keyword to make it accessible from main.c (without additional #include …)

/* */
#include <cuda.h>
#include <cuda_runtime.h>
 __global__ void __multiply__ ()
 extern "C" void call_me_maybe()
     /* ... Load CPU data into GPU buffers  */
     __multiply__ <<< ...block configuration... >>> (x, y);
     /* ... Transfer data from GPU to CPU */


Mixing MPI and CUDA

Mixing MPI (C) and CUDA (C++) code requires some care during linking because of differences between the C and C++ calling conventions and runtimes. A helpful overview of the issues can be found at How to Mix C and C++.

One option is to compile and link all source files with a C++ compiler, which will enforce additional restrictions on C code. Alternatively, if you wish to compile your MPI/C code with a C compiler and call CUDA kernels from within an MPI task, you can wrap the appropriate CUDA-compiled functions with the extern keyword, as in the following example.

These two source files can be compiled and linked with both a C and C++ compiler into a single executable on Oscar using:

$ module load mvapich2 cuda
$ mpicc -c main.c -o main.o
$ nvcc -c -o multiply.o
$ mpicc main.o multiply.o -lcudart

The CUDA/C++ compiler nvcc is used only to compile the CUDA source file, and the MPI C compiler mpicc is user to compile the C code and to perform the linking.

01. /* */
03. #include 
04. #include 
06. __global__ void __multiply__ (const float *a, float *b)
07. {
08. const int i = threadIdx.x + blockIdx.x * blockDim.x;
09.     b[i] *= a[i];
10. }
12. extern "C" void launch_multiply(const float *a, const *b)
13. {
14.     /* ... load CPU data into GPU buffers a_gpu and b_gpu */
16.     __multiply__ <<< ...block configuration... >>> (a_gpu, b_gpu);
18.     safecall(cudaThreadSynchronize());
19.     safecall(cudaGetLastError());
21.     /* ... transfer data from GPU to CPU */

Note the use of extern "C" around the function launch_multiply, which instructs the C++ compiler (nvcc in this case) to make that function callable from the C runtime. The following C code shows how the function could be called from an MPI task.

01. /* main.c */
03. #include 
05. void launch_multiply(const float *a, float *b);
07. int main (int argc, char **argv)
08. {
09.        int rank, nprocs;
10.     MPI_Init (&argc, &argv);
11.     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
12.     MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
14.     /* ... prepare arrays a and b */
16.     launch_multiply (a, b);
18.     MPI_Finalize();
19.        return 1;
20. }

Posted in CLUSTER, Computer Hardware, Computer Softwares, Computer Vision, Computing Technology, CUDA, GPU (CUDA), GPU Accelareted, GRID, Open CL, OpenMP, PARALLEL | Tagged: , , | 1 Comment »

Installing CUDA 6 on Ubuntu 12.04

Posted by Hemprasad Y. Badgujar on April 18, 2014

Setup New ubuntu 12.04

  • Clean Install
  • You can manually update via terminal by running:
    sudo apt-get update
    sudo apt-get upgrade

    Additionally you can run:

    sudo apt-get dist-upgrade
  • enable root login
     sudo passwd root
     sudo sh -c 'echo "greeter-show-manual-login=true" >> /etc/lightdm/lightdm.conf'

           Root won’t show up as a user, but “Login” will, which is how you manually log in with users not shown in the greeter.

           Rebooted and then you should be able to login as root.

  • Download the NVIDIA CUDA Toolkit.


Pre-installation Actions

Some actions must be taken beforetheCUDA Toolkit and Driver can be installed on Linux:

  • Verify the system has a CUDA-capable GPU.
  • Verify the system is running a supported version of Linux.
  • Verify the system has gcc installed.
  • Download the NVIDIA CUDA Toolkit.
Note: You can override the install-time prerequisite checks by running the installer with the -override flag. Remember that the prerequisites will still be required to use the NVIDIA CUDA Toolkit.

Verify You Have a CUDA-Capable GPU

To verify that your GPU is CUDA-capable, go to your distribution’s equivalent of System Properties, or, from the command line, enter:

lspci | grep -i nvidia

If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command.

If your graphics card is from NVIDIA and it is listed in, your GPU is CUDA-capable.

The Release Notes for the CUDA Toolkit also contain a list of supported products.

 Verify You Have a Supported Version of Linux

The CUDA Development Tools are only supported on some specific distributions of Linux. These are listed in the CUDA Toolkit release notes.

To determine which distribution and release number you’re running, type the following at the command line:

uname -m && cat /etc/*release

You should see output similar to the following, modified for your particular system:

i386 Red Hat Enterprise Linux WS release 4 (Nahant Update 6)

The i386 line indicates you are running on a 32-bit system. On 64-bit systems running in 64-bit mode, this line will generally read: x86_64. The second line gives the version number of the operating system.

Verify the System Has gcc Installed

The gcc compiler and toolchain generally are installed as part of the Linux installation, and in most cases the version of gcc installed with a supported version of Linux will work correctly.

To verify the version of gcc installed on your system, type the following on the command line:

gcc --version

If an error message displays, you need to install the development tools from your Linux distribution or obtain a version of gcc and its accompanying toolchain from the Web.

For ARMv7 cross development, a suitable cross compiler is required. For example, performing the following on Ubuntu 12.04:

sudo apt-get install g++-4.6-arm-linux-gnueabihf

will install the gcc 4.6 cross compiler on your system which will be used by nvcc. Please refer to th NVCC manual on how to use nvcc to cross compile to the ARMv7 architecture

Choose an Installation Method

The CUDA Toolkit can be installed using either of two different installation mechanisms: distribution-specific packages, or a distribution-independent package. The distribution-independent package has the advantage of working across a wider set of Linux distributions, but does not update the distribution’s native package management system. The distribution-specific packages interface with the distribution’s native package management system. It is recommended to use the distribution-specific packages, where possible.

Note: Distribution-specific packages and repositories are not provided for Redhat 5 and Ubuntu 10.04. For those two Linux distributions, the stand-alone installer must be used.
Note: Standalone installers are not provided for the ARMv7 release. For both native ARMv7 as well as cross developent, the toolkit must be installed using the distribution-specific installer.

Download the NVIDIA CUDA Toolkit

The NVIDIA CUDA Toolkit is available at

Choose the platform you are using and download the NVIDIA CUDA Toolkit

The CUDA Toolkit contains the CUDA driver and tools needed to create, build and run a CUDA application as well as libraries, header files, CUDA samples source code, and other resources.

Download Verification

The download can be verified by comparing the MD5 checksum posted at with that of the downloaded file. If either of the checksums differ, the downloaded file is corrupt and needs to be downloaded again.

To calculate the MD5 checksum of the downloaded file, run the following:

$ md5sum

Runfile Installation


This section describes the installation and configuration of CUDA when using the standalone installer.


Pre-installation Setup

Before the stand-alone installation can be run, perform the pre-installation actions.



If you have already installed a standalone CUDA driver and desire to keep using it, you need to make sure it meets the minimum version requirement for the toolkit. This requirement can be found in the CUDA Toolkit release notes. With many distributions, the driver version number can be found in the graphical interface menus under Applications > System Tools > NVIDIA X Server Settings.. On the command line, the driver version number can be found by running /usr/bin/nvidia-settings.

The package manager installations (RPM/DEB packages) and the stand-alone installer installations (.run file) are incompatible. See below about how to uninstall any previous RPM/DEB installation.


Copy cuda_6.0.37_linux_*.run file to Root home folder for easy access.


The standalone installer can install any combination of the NVIDIA Driver (that includes the CUDA Driver), the CUDA Toolkit, or the CUDA Samples. If needed, each individual installer can be extracted by using the -extract=/absolute/path/to/extract/location/. The extraction path must be an absolute path.

The CUDA Toolkit installation includes a read-only copy of the CUDA Samples. The read-only copy is used to create a writable copy of the CUDA Samples at some other location at any point in time. To create this writable copy, use the script provided with the toolkit. It is equivalent to installing the CUDA Samples with the standalone installer.

Extra Libraries

If you wish to build all the samples, including those with graphical rather than command-line interfaces, additional system libraries or headers may be required. While every Linux distribution is slightly different with respect to package names and package installation procedures, the libraries and headers most likely to be necessary are OpenGL (e.g., Mesa), GLU, GLUT, and X11 (including Xi, Xmu, and GLX).

On Ubuntu, those can be installed as follows:

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libgl1-mesa-dri libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

sudo apt-get install libwxgtk2.8-0 libwxbase2.8-0 wx-common libglu1-mesa libgl1-mesa-glx zlib1g bzip2 gpsd gpsd-clients xcalib libportaudio2

Interaction with Nouveau

Proprietary Video Driver

The built-in nouveau video driver in Ubuntu is incompatible with the CUDA Toolkit, and you have to replace it with the proprietary NVIDIA driver.

$ sudo aptget remove purge \  xserverxorgvideonouveau 
The Nouveau drivers may be installed intoyourroot filesystem (initramfs) and may cause the Display Driver installation to fail. To fix the situation,theinitramfs image must be rebuilt with:

sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)

if Grub2 is used as the bootloader, the rdblacklist=nouveau nouveau.modeset=0 line must be added at the end of the GRUB_CMDLINE_LINUX entry in /etc/default/grub. Then, the Grub configuration must be remade by running:

sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Once this is done, the machine must be rebooted and the installation attempted again.


Graphical Interface Shutdown

Exit the GUI if you are in a GUI environment by pressing Ctrl-Alt-Backspace. Some distributions require you to press this sequence twice in a row; others have disabled it altogether in favor of a command such as sudo service ligthdm stop. Still others require changing the system runlevel using a command such as /sbin/init 3 Consult your distribution’s documentation to find out how to properly exit the GUI. This step is only required in the event that you want to install the NVIDIA Display Driver included in the standalone installer.


NVIDIA Driver RPM/Deb package uninstallation

If you want to install the NVIDIA Display Driver included in the standalone installer, any previous driver installed through RPM or DEB packages MUST be uninstalled first. Such installation may be part of the default installation of your Linux distribution. Or it could have been installed as part of the package installation described in the previous section. To uninstall a DEB package, use sudo apt-get –purge remove package_name or equivalent. To uninstall a RPM package, use sudo yum remove package_name or equivalent.



To install any combination of the driver, toolkit, and the samples, simply execute the .run script. The installation of the driver requires the script to be run with root privileges. Depending on the target location, the toolkit and samples installations may also require root privileges.

Shutdown the all the graphics

Ubuntu uses LightDM, so you need to stop this service:

$ sudo service lightdm stop

press Alt+F1 for Terminal

5.2 Run the installer

Go to (using cd) the directory where you have the CUDA installer (a file with *.run extension) and type the following:

$ sudo chmod +x *.run
$ sudo ./*.run

By default, the toolkit and the samples will install under /usr/local/cuda-6.0 and $(HOME)/NVIDIA_CUDA-6.0_Samples, respectively. In addition, a symbolic link is created from /usr/local/cuda to /usr/local/cuda-6.0. The symbolic link is created in order for existing projects to automatically make use of the newly installed CUDA Toolkit.

If the target system includes both an integrated GPU (iGPU) and a discrete GPU (dGPU), the –no-opengl-libs option must be used. Otherwise, the openGL library used by the graphics driver of the iGPU will be overwritten and the GUI will not work. In addition, the xorg.conf update at the end of the installation must be declined.

Note: Installing Mesa may overwrite the /usr/lib/ that was previously installed by the NVIDIA driver, so a reinstallation of the NVIDIA driver might be required after installing these libraries.


Environment Setup

The PATH variable needs to include /usr/local/cuda-6.0/bin

The LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-6.0/lib on a 32-bit system, and /usr/local/cuda-6.0/lib64 on a 64-bit system

  • To change the environment variables for 32-bit operating systems:

    $ export PATH=/usr/local/cuda-6.0/bin:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib:$LD_LIBRARY_PATH
  • To change the environment variables for 64-bit operating systems:

    $ export PATH=/usr/local/cuda-6.0/bin:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64:$LD_LIBRARY_PATH


Check that the device files/dev/nvidia* exist and have the correct (0666) file permissions. These files are used by the CUDA Driver to communicate with the kernel-mode portion of the NVIDIA Driver. Applications that use the NVIDIA driver, such as a CUDA application or the X server (if any), will normally automatically create these files if they are missing using the setuidnvidia-modprobe tool that is bundled with the NVIDIA Driver. Some systems disallow setuid binaries, however, so if these files do not exist, you can create them manually either by running the command nvidia-smi as root at boot time or by using a startup script such as the one below:


/sbin/modprobe nvidia

if [ "$?" -eq 0 ]; then
  # Count the number of NVIDIA controllers found.
  NVDEVS=`lspci | grep -i NVIDIA`
  N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
  NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`

  N=`expr $N3D + $NVGA - 1`
  for i in `seq 0 $N`; do
    mknod -m 666 /dev/nvidia$i c 195 $i

  mknod -m 666 /dev/nvidiactl c 195 255

  exit 1

/sbin/modprobe nvidia-uvm

if [ "$?" -eq 0 ]; then
  # Find out the major device number used by the nvidia-uvm driver
  D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`

  mknod -m 666 /dev/nvidia-uvm c $D 0
  exit 1


Graphical Interface Restart

Restart the GUI environment using the command startx, init 5, sudo service lightdm start, or the equivalent command on your system.



Post-installation Actions

Some actions must be taken after installingtheCUDA Toolkit and Driver before they can be completely used:

  • Setup evironment variables.
  • Install a writable copy of the CUDA Samples.
  • Verify the installation.

 Environment Setup

The PATH variable needs to include /usr/local/cuda-6.0/bin

The LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-6.0/lib on a 32-bit system, and /usr/local/cuda-6.0/lib64 on a 64-bit system

  • To change the environment variables for 32-bit operating systems:

    $ export PATH=/usr/local/cuda-6.0/bin:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib:$LD_LIBRARY_PATH
  • To change the environment variables for 64-bit operating systems:

    $ export PATH=/usr/local/cuda-6.0/bin:$PATH
    $ export LD_LIBRARY_PATH=/usr/local/cuda-6.0/lib64:$LD_LIBRARY_PATH

(Optional) Install Writable Samples

In order to modify, compile, and run the samples, the samples must be installedwithwrite permissions. A convenience installation script is provided:

$ <dir>

This script is installed with the cuda-samples-60 package. The cuda-samples-60 package installs only a read-only copy in /usr/local/cuda-6.0/samples.

 Verify the Installation

Before continuing, it is important to verify that the CUDA toolkit can find and communicate correctly with the CUDA-capable hardware. To do this, you need to compile and run some of the included sample programs.

Note: Ensure the PATH and LD_LIBRARY_PATH variables are set correctly.

Verify the Driver Version

If you installed the driver, verify that the correct version of it is installed.

This can be done through your System Properties (or equivalent) or by executing the command

cat /proc/driver/nvidia/version

Note that this command will not work on an iGPU/dGPU system.

Compiling the Examples

The version of the CUDA Toolkit can be checked by running nvcc -V in a terminal window. The nvcc command runs the compiler driver that compiles CUDA programs. It calls the gcc compiler for C code and the NVIDIA PTX compiler for the CUDA code.

The NVIDIA CUDA Toolkit includes sample programs in source form. You should compile them by changing to ~/NVIDIA_CUDA-6.0_Samples and typing make. The resulting binaries will be placed under ~/NVIDIA_CUDA-6.0_Samples/bin.

Running the Binaries

After compilation, find and run deviceQuery under ~/NVIDIA_CUDA-6.0_Samples. If the CUDA software is installed and configured correctly, the output for deviceQuery should look similar to that shown in Figure 1.

Figure 1. Valid Results from deviceQuery CUDA Sample

Valid Results from deviceQuery CUDA Sample.


The exact appearance and the output lines might be different on your system. The important outcomes are that a device was found (the first highlighted line), that the device matches the one on your system (the second highlighted line), and that the test passed (the final highlighted line).

If a CUDA-capable device and the CUDA Driver are installed but deviceQuery reports that no CUDA-capable devices are present, this likely means that the /dev/nvidia* files are missing or have the wrong permissions.

On systems where SELinux is enabled, you might need to temporarily disable this security feature to run deviceQuery. To do this, type:

# setenforce 0

from the command line as the superuser.

Running the bandwidthTest program ensures that the system and the CUDA-capable device are able to communicate correctly. Its output is shown in Figure 2.

Figure 2. Valid Results from bandwidthTest CUDA Sample

Valid Results from bandwidthTest CUDA Sample.


Note that the measurements for your CUDA-capable device description will vary from system to system. The important point is that you obtain measurements, and that the second-to-last line (in Figure 2) confirms that all necessary tests passed.

Should the tests not pass, make sure you have a CUDA-capable NVIDIA GPU on your system and make sure it is properly installed.

If you run into difficulties with the link step (such as libraries not being found), consult the Linux Release Notes found in the doc folder in the CUDA Samples directory.

6. Additional Considerations

Now that you have CUDA-capable hardware and the NVIDIA CUDA Toolkit installed, you can examine and enjoy the numerous included programs. To begin using CUDA to accelerate the performance of your own applications, consult the CUDA C Programming Guide, located in /usr/local/cuda-6.0/doc.

A number of helpful development tools are included in the CUDA Toolkit to assist you as you develop your CUDA programs, such as NVIDIA® Nsight™ Eclipse Edition, NVIDIA Visual Profiler, cuda-gdb, and cuda-memcheck.

For technical support on programming questions, consult and participate in the developer forums at


Posted in Apps Development, CUDA, GPU (CUDA), GPU Accelareted, Open CL, OpenCV, OpenMP, PARALLEL | Leave a Comment »

CPU vs GPU performance

Posted by Hemprasad Y. Badgujar on July 18, 2013

in the performance of GPUs and CPUs. This has to be quite a compromise since actual performance depends heavily on the suitability of the chip to a particular problem/algorithm among many other specifics. The simplest method is to plot theoretical peak performance over time; I chose to show it for single and double precision for NVIDIA GPUs and Intel CPUs.

In the past, I have used the graph in the CUDA C Programming Guide, but that is frequently out of date, I have no control of the formatting, I have to settle for a screenshot instead of vector output, and, until I did my own research, I wasn’t sure if it was biased.

Below is Michael Galloy attempt (click to enlarge).

CPU vs. GPU performance

by Michael Galloy

Posted in CLOUD, Computer Hardwares, Computing Technology, CUDA, GPU (CUDA), GPU Accelareted, Open CL, PARALLEL | Leave a Comment »

Installing and using OpenCV with Visual Studio 2010 express

Posted by Hemprasad Y. Badgujar on May 4, 2013

Here’s the low-down on installing and getting started with OpenCV on Visual Studio Express 2010 on Windows.

1) Download OpenCV2.2 from Source Forge

Download the OpenCV-2.2.0-win32-vs2010.exe as it already contains the libraries and the binaries required for running OpenCV.

You can untar the file into a directory named /opencv2.2/

2) Download and install Visual Studio 2010 express (free) version and install on your PC.

3) Once Visual Studio 2010  is installed open VC++ and select  “New Project” and choose Win32 Console Application for e.g. first

4) Include the any relevant OpenCV code into first.cpp (snippet given below)

5) Include the following 3 directories under

Project->first properties->VC++ Directories->Include Directories. Click Edit




Click Apply and OK

6) Now include the library path

Project->first properties->Library Directories


Click Apply and OK

7) Now the last step is to include all the necessary OpenCV libraries during the linking phase

For this go to Project->first properties->Linker->Input->Additional Dependencies and cut and paste all the following libraries by clicking the “Edit”


Click Apply and Ok.

8) Now you are good to go. Build by choosing Debug->Build Solution

It should through fine  when  you now run the code

You should see

Here is a sample snippet

#include "stdafx.h"
#include "cv.h"
#include "highgui.h"

int main( int argc, char** argv ) {
// cvLoadImage determines an image type and creates datastructure with appropriate size
    IplImage* img = cvLoadImage("baboon.jpg",1);
// create a window. Window name is determined by a supplied argument
    cvNamedWindow( "test", CV_WINDOW_AUTOSIZE );
    // Apply Gaussian smooth
    //cvSmooth( img, img, CV_GAUSSIAN, 9, 9, 0, 0 );
    cvErode (img, img,NULL,2);
// Display an image inside and window. Window name is determined by a supplied argument
    cvShowImage( "test", img );

    //Save image
        cvSaveImage( "c:\\shanthi\\baboon2.png", img, 0);
// wait indefinitely for keystroke
// release pointer to an object
    cvReleaseImage( &img );
// Destroy a window
    cvDestroyWindow( argv[1] );

From  blog:

Posted in Entertainment, GPU (CUDA), Open CL, OpenCV | Leave a Comment »

CUDA Open Source Projects

Posted by Hemprasad Y. Badgujar on March 4, 2013

CUDA Open Source Projects

In searching for projects to use for learning and developing with plus requests from the NVidia forums I have put together a list here of free and open source research and projects that use CUDA.  Please if you have one to add or updates of anything here let me know.

GNURadio Software defined radio. A hardware/software combination that does baseband signal processing in software. Experiments were carried out to integrate CUDA into this mix.
MediaCoder A transcoding application for videos with a strong focus on mobile players. Some operations (de-interlacing, scaling, encoding) are have been CUDA accelerated.
Bullet Bullet: physics simulation started to include CUDA but it is not fully capable yet.  Perhaps some CUDA genius will add to it?
Thrust (included in Release 4.0) Excellent Library!! A Parallel Template Library for CUDA. Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity.
Pycuda A module which allows access to the complete range of CUDA functionality in Python, including seamless numpy integration, OpenGL interoperability and lots more. Released under the MIT/X consortium license.
FOLKI-GPU An optical-flow estimation, implemented on CUDA
Flam4 CUDA A CUDA accelerated renderer for fractal frames. Sample videos hereand here. Use other tools like Apophysis 2.0 to generate the parameter files (.flame files). A new and ongoing approach to port fractal frame rendering to CUDA is described here.
CUJ2K A CUDA accelerated JPEG 2000 encoder. Command line tool and C/C++ library. This is student work with excellent documentation. Notable speedup achieved only for large files.
Ocelot A Binary Translation Framework for PTX
Msieve A library for factoring large integers, as in RSA-size numbers. The polynomial selection phase of the general number field sieve has a great deal of CUDA code, and the speedup over a CPU is enormous (10-50x)
PFAC An open library for exact string matching performed on GPUs.
cuSVM A CUDA implementation of Support Vector Classification and Regression.
multisvm In this project, it is described how a naïve implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput.
gpuminer Parallel Data Mining on Graphics Processors
Cmatch Cmatch, performs exact string matching for a set of query sequences and achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version.
R+GPU A popular Open Source solution for Statistical Analysis

Posted in Apps Development, Artificial Intelligence, Computer Languages, CUDA, GPU (CUDA), GPU Accelareted, Image Processing, Neural Network, Open CL, OpenMP, PARALLEL, Project Related, Simulation, Virtualization | 1 Comment »

Configuring Your First OpenCL Project

Posted by Hemprasad Y. Badgujar on October 16, 2012



To actually use OpenCL you’ll need to tell Visual Studio where to find the OpenCL libraries and functions.

First thing we need to do is create a new ‘Empty Project’ in Visual Studio.

Name this project whatever you want but leave the other options as default. Visual Studio will make a new solution for your project and create a new folder to keep your files in. For this article, I’ve named my new project ‘opencl_test’.

With your new empty project created it’s time to tweak the project settings so that your editor knows where to find OpenCL.

Go in to ‘Projects->opencl_test Properties’  and you’ll get your project’s configuration page. The first thing we need to do is change the Configuration that we’re editing. By default, mine is set to Active(Debug). We want to set it to “All Configurations”.

Next we need to find the Additional Include Directories option. (note that ‘opencl_test Properties’ will be different for you if you named your project something else.)

Double-click on the C/C++ menu to expand it and expose the C/C++ sub-menus. Click on General. Now you’ll see the Additional Include Directories option at the top of the right window.

Simply type in the environment variable “$(CUDA_INC_PATH)”. It will point to the actual include directory to wherever the CUDA Toolkit was installed to.

You should now have your settings looking like this:

Now that Visual Studio knows where to find our files, we need to tell it to use those files when compiling our program.

With the Properties page still open, double-click on Linker then on General. In the right window, there will be an option called ‘Additional Library Directories’. This is where we’re going to tell the linker where the libraries for the OpenCL functions are. Think of it like OpenCL’s “guts”.

Again, we’re going to type in the environment variable which points to the library directory for the CUDA Toolkit. In this case, we want to type in “$(CUDA_LIB_PATH)”. This will automatically use the lib folder from the CUDA Toolkit’s install path for us.

When all is said and done, you should look like this:

Referencing proper OpenCL libraries in Visual Studio

Now that Visual Studio knows where to find the library files, we need to tell it which library file to actually use. To do this, we’re going to take a peak at the Input sub-menu under Linker. Click that, and in the right window you’ll see an option called ‘Additional Dependencies’. All you need to do here is enter in ‘opencl.lib’ without the quotes.

For the visual learners out there, here’s what we should be looking at right now:

Linking OpenCL In Visual Studio 2008

Click Apply to apply the changes to the linker settings for your project. Now let’s test these settings to make sure you’ve configured your project correctly and that Visual Studio can find and use the OpenCL files.

To test our project settings, we’re going to make a quick and simple C file that includes the OpenCL header file and uses an OpenCL library function. If you missed a step somewhere, the compiler will spit out some errors at you.

Make a new file. Call it “main.c”. Paste the following in main.c:

#include “CL/cl.h”

int main(int argc, char **argv){
cl_platform_id test;
cl_uint num;
cl_uint ok = 1;
clGetPlatformIDs(ok, &test, &num);

return 0;

Try to compile now. If all is well, you’ll get 0 warnings and 0 errors (depending on your warning settings, which I didn’t cover, you may get two warnings about argc and argv. You can safely ignore those.)

Now, don’t get the wrong impression here. I am not the type of person to tell somebody to just paste some code and compile. You won’t learn that way and I’m here to help you learn. I will explain what’s going on in the next tutorial and don’t want to be redundant in my articles. This is why I choose to simply have you paste this code in main.c to test your configuration.


Posted in Open CL, PARALLEL, Project Related | Leave a Comment »


Posted by Hemprasad Y. Badgujar on October 1, 2012


“The OpenCL specifications” by the Khronos Group

Format: PDF
File Size: 3.3MB
Digital: 377 pages
Price: Free
Publisher: Khronos Group
Author: Aaftab Munshi (Editor)
Published Date: 15 November 2011 (version 1.2, revision 15)
OpenCL-version: 1.2

As a specifications-document, you cannot expect a nice piece of prose, but most of the knowledge you need to know is in it. There are certainly some gaps (especially in clear explanation), but every version is getting better. When studying other sources, always have this document with you as a reference. I printed it as two pages per side (A4).

Read chapters 1 to 3, and leave the rest as a reference. Other books explain the long, long lists of language specifications in a nicer form. Also the most you’ll better learn by doing.

“The OpenCL Programming Book” by Fixstars

Two version available, the 1.0 version and the 1.2 version. To start with the 1.2:

Format: PDF
File Size: 3.2MB
Pages: 325
Price: USD 19.50
Publisher: Fixstars Corporation
Authors: Ryoji Tsuchiyama, Takashi Nakamura, Takuro Iizuka, Aki Asahara, Satoshi Miki, Jeongdo Son and Satoshi Miki. Satoru Tagawa (translator)
Published Date: January 2012
OpenCL-version: 1.2


Format: PDF
File Size: 3.49MB
Pages: 246
Price: free
Publisher: Fixstars Corporation
Authors: Ryoji Tsuchiyama, Takashi Nakamura, Takuro Iizuka, Akihiro Asahara, Satoshi Miki
Published Date: 31 March 2010
OpenCL-version: 1.0

1.0-version: It seems to be translated from Japanese to English, but except some small typos and spelling errors the book is very easy to read. The book explains the chapters you could skip in Khronos’ specifications-document, but certainly is not complete since it discusses OpenCL 1.0 and has a focus on the basics. The parts where that build up a program step-by-step is a bit annoying to read, because they repeat the whole program again while only a few lines have changed. The book would be more like 180-200 pages if written more compact.

1.2-version: Thicker, more up to date and a promise there are less translation-errors.

Full review later.

Heterogeneous Computing with OpenCL by Benedict Gaster, Lee Howes, David R. Kaeli, Perhaad Mistry & Dana Schaa

Format: print 
Pages: 400 (approx.)
Price: USD 69.95
Publisher: Morgan Kaufmann
Authors: Benedict Gaster, Lee Howes, David R. Kaeli, Perhaad Mistry & Dana Schaa
Published Date: Sept 2011
OpenCL-version: 1.1

This is where we all chose OpenCL for: hybrid processors. And this book dives into that world completely, so we actually learn a lot new stuff about the advantages of having a GPU on your lap.

Full review later.

OpenCL in Action by Matthew Scarpino

Format: PDF and/or print
File Size: 8.1MB
Pages: 475 (approx.)
Price: USD 47.99 (e-book). USD 59.99 (p-book + e-book)
Publisher: Addison-Wesley Professional
Authors: Matthew Scarpino
Published Date: non-final version updated regularly, target November 2011
OpenCL-version: 1.1

Just like the above book, “OpenCL in Action” is work-in-progress and you can read along. Matthew Scarpino also wrote SWT/JFace In Action and Programming the Cell-processor, has a profession in Linux and has much experience in IT. The book seems to target an audience who want a more practical guide to learn OpenCL.

Full review later.

“OpenCL Programming Guide” by Aaftab Munshi, Benedict Gaster, Timothy G. Mattson and Dan Ginsburg

Format: PDF and/or print
File Size: ??MB
Pages: 648
Price: USD 35.19 (e-book), USD 43.99 (print), USD 59.39 (bundle)
Publisher: Addison-Wesley Professional
Authors: Aaftab Munshi (Apple, Khronos Group), Benedict Gaster (AMD), Timothy G. Mattson, Dan Ginsburg
Published Date: August 2011
OpenCL-version: 1.1
Homepage: and

Aaftab Munshi is also responsible for the OpenCL-specifications, so he probably knows where he’s talking about.

The 648 pages it is quite bigger than the targeted 480. Currently this is a very good replacement for Fixstars’ book. Disadvantage is that sending the printed book overseas (not USA/Canada) is much too expensive and people from the Eurasian continent, Africa and Latin America should just print it locally – looking into that to find better options.

Full review later.

“Programming Massively Parallel Processors” by David B. Kirk and Wen-mei W. Hwu

Format: Acid-free paper book
Pages: 258 pages
Price: USD 46.40
Publisher: Morgan Kaufmann
Authors: David B. Kirk (NVIDIA) and Wen-mei W. Hwu (University of Illinois)
Published Date: 28 January 2010
OpenCL-version: 1.0

The book claims to discuss both OpenCL and CUDA, but actually there was a chapter added after most of the book was written and the focus is strong towards NVIDIA hardware. It is a nice book for people who need to learn to program CUDA-only software/hardware and don’t want a book that’s too hard to understand. There are assignments at the end of each chapter and important subjects are explained to the bottom, so you don’t need to have a hard time with those assignments. After you read the book you have learned that initialisation of OpenCL programs is tedious and know a lot about optimising kernels for NVIDIA GPUs.

It is not good for people interested in OpenCL-compliant architectures from AMD, ARM and IBM besides NVIDIA’s. It is one of the best resources to understand NVIDIA architectures from a view of a GPGPU-programmer.

Posted in Computer Languages, CUDA, GPU (CUDA), Open CL, OpenCL, PARALLEL, Project Related | Leave a Comment »

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione


Just another site

Algunos Intereses de Abraham Zamudio Chauca

Matematica, Linux , Programacion Serial , Programacion Paralela (CPU - GPU) , Cluster de Computadores , Software Cientifico




A great site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.

Chetan Solanki

Helpful to u, if u need it.....


Explorer of Research #HEMBAD


Explorer of Research #HEMBAD


A great site


This is My Space so Dont Mess With IT !!

%d bloggers like this: