# Something More for Research

• ## Follow Blog via Email

Join 5,054 other followers

• ## Blog Stats

• 125,061 hits

# Archive for the ‘GPU (CUDA)’ Category

## Deep Learning Software/ Framework links

1. Theano – CPU/GPU symbolic expression compiler in python (from MILA lab at University of Montreal)
2. Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua (from Ronan Collobert, Clement Farabet and Koray Kavukcuoglu)
3. Pylearn2 – Pylearn2 is a library designed to make machine learning research easy.
4. Blocks – A Theano framework for training neural networks
5. Tensorflow – TensorFlow™ is an open source software library for numerical computation using data flow graphs.
6. MXNet – MXNet is a deep learning framework designed for both efficiency and flexibility.
7. Caffe -Caffe is a deep learning framework made with expression, speed, and modularity in mind.Caffe is a deep learning framework made with expression, speed, and modularity in mind.
8. Lasagne – Lasagne is a lightweight library to build and train neural networks in Theano.
9. Keras– A theano based deep learning library.
10. Deep Learning Tutorials – examples of how to do Deep Learning with Theano (from LISA lab at University of Montreal)
11. Chainer – A GPU based Neural Network Framework
12. DeepLearnToolbox – A Matlab toolbox for Deep Learning (from Rasmus Berg Palm)
13. Cuda-Convnet – A fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networks. It can model arbitrary layer connectivity and network depth. Any directed acyclic graph of layers will do. Training is done using the back-propagation algorithm.
14. Deep Belief Networks. Matlab code for learning Deep Belief Networks (from Ruslan Salakhutdinov).
15. RNNLM– Tomas Mikolov’s Recurrent Neural Network based Language models Toolkit.
16. RNNLIB-RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition.
17. matrbm. Simplified version of Ruslan Salakhutdinov’s code, by Andrej Karpathy (Matlab).
18. deeplearning4j– Deeplearning4J is an Apache 2.0-licensed, open-source, distributed neural net library written in Java and Scala.
19. Estimating Partition Functions of RBM’s. Matlab code for estimating partition functions of Restricted Boltzmann Machines using Annealed Importance Sampling (from Ruslan Salakhutdinov).
20. Learning Deep Boltzmann Machines Matlab code for training and fine-tuning Deep Boltzmann Machines (from Ruslan Salakhutdinov).
21. The LUSH programming language and development environment, which is used @ NYU for deep convolutional networks
22. Eblearn.lsh is a LUSH-based machine learning library for doing Energy-Based Learning. It includes code for “Predictive Sparse Decomposition” and other sparse auto-encoder methods for unsupervised learning. Koray Kavukcuoglu provides Eblearn code for several deep learning papers on thispage.
23. deepmat– Deepmat, Matlab based deep learning algorithms.
24. MShadow – MShadow is a lightweight CPU/GPU Matrix/Tensor Template Library in C++/CUDA. The goal of mshadow is to support efficient, device invariant and simple tensor library for machine learning project that aims for both simplicity and performance. Supports CPU/GPU/Multi-GPU and distributed system.
25. CXXNET – CXXNET is fast, concise, distributed deep learning framework based on MShadow. It is a lightweight and easy extensible C++/CUDA neural network toolkit with friendly Python/Matlab interface for training and prediction.
26. Nengo-Nengo is a graphical and scripting based software package for simulating large-scale neural systems.
27. Eblearn is a C++ machine learning library with a BSD license for energy-based learning, convolutional networks, vision/recognition applications, etc. EBLearn is primarily maintained by Pierre Sermanet at NYU.
28. cudamat is a GPU-based matrix library for Python. Example code for training Neural Networks and Restricted Boltzmann Machines is included.
29. Gnumpy is a Python module that interfaces in a way almost identical to numpy, but does its computations on your computer’s GPU. It runs on top of cudamat.
30. The CUV Library (github link) is a C++ framework with python bindings for easy use of Nvidia CUDA functions on matrices. It contains an RBM implementation, as well as annealed importance sampling code and code to calculate the partition function exactly (from AIS lab at University of Bonn).
31. 3-way factored RBM and mcRBM is python code calling CUDAMat to train models of natural images (from Marc’Aurelio Ranzato).
32. Matlab code for training conditional RBMs/DBNs and factored conditional RBMs (from Graham Taylor).
33. mPoT is python code using CUDAMat and gnumpy to train models of natural images (from Marc’Aurelio Ranzato).
34. neuralnetworks is a java based gpu library for deep learning algorithms.
35. ConvNet is a matlab based convolutional neural network toolbox.
36. Elektronn is a deep learning toolkit that makes powerful neural networks accessible to scientists outside the machine learning community.
37. OpenNN is an open source class library written in C++ programming language which implements neural networks, a main area of deep learning research.
38. NeuralDesigner  is an innovative deep learning tool for predictive analytics.
39. Theano Generalized Hebbian Learning.

## OpenCV 3.1 with CUDA , QT , Python Complete Installation on Windows in With Extra Modules

#### OpenCV 3.1 with CUDA , QT , Python Complete Installation on Windows in With Extra Modules

The description here was tested on Windows 8.1 Pro. Nevertheless, it should also work on any other relatively modern version of Windows OS. If you encounter errors after following the steps described below, feel free to contact me.

Note :  To use the OpenCV library you have two options: Installation by Using the Pre-built Libraries or Installation by Making Your Own Libraries from the Source Files. While the first one is easier to complete, it only works if you are coding with the latest Microsoft Visual Studio IDE and doesn’t take advantage of the most advanced technologies we integrate into our library.

I am going to skip Installation by Using the Pre-built Libraries is it is easier to install even for New User. So Let’s Work on Installation by Making Your Own Libraries from the Source Files (If you are building your own libraries you can take the source files from OpenCV Git repository.) Building the OpenCV library from scratch requires a couple of tools installed beforehand

• ### IDE : Microsoft Visual Studio. However, you can use any other IDE that has a valid CC++ compiler.

Installing By Downloading from the Product Website Start installing Visual Studio by going to Visual Studio Downloads on the MSDN website and then choosing the edition you want to download.here we will going to use Visual Studio 2012 / ISO keys with  Visual Studio 2012 Update 4 /ISO and Step By Step Installing Visual Studio Professional 2012
• ### Make Tool : Cmake is a cross-platform, open-source build system.

Download and install the latest stable binary version: here we will going to use CMake 3 Choose the windows installer (cmake-x.y.z-win32.exe) and install it. Letting the cmake installer add itself to your path will make it easier but is not required.

The Open Source Computer Vision Library has >2500 algorithms, extensive documentation and sample code for real-time computer vision. It works on Windows, Linux, Mac OS X, Android and iOS.

• ### Python and Python libraries : Installation notes

• It is recommended to uninstall any other Python distribution before installing Python(x,y)
• You may update your Python(x,y) installation via individual package installers which are updated more frequently — see the plugins page
• Please use the Issues page to request new features or report unknown bugs
• Python(x,y) can be easily extended with other Python libraries because Python(x,y) is compatible with all Python modules installers: distutils installers (.exe), Python eggs (.egg), and all other NSIS (.exe) or MSI (.msi) setups which were built for Python 2.7 official distribution – see the plugins page for customizing options
• Another Python(x,y) exclusive feature: all packages are optional (i.e. install only what you need)
• Basemap users (data plotting on map projections): please see the AdditionalPlugins
• ### Sphinx is a python documentation generator

After installation, you better add the Python executable directories to the environment variable PATH in order to run Python and package commands such as sphinx-build easily from the Command Prompt.

1. Right-click the “My Computer” icon and choose “Properties”

2. Click the “Environment Variables” button under the “Advanced” tab

3. If “Path” (or “PATH”) is already an entry in the “System variables” list, edit it. If it is not present, add a new variable called “PATH”.

• Right-click the “My Computer” icon and choose “Properties”

• Click the “Environment Variables” button under the “Advanced” tab

• If “Path” (or “PATH”) is already an entry in the “System variables” list, edit it. If it is not present, add a new variable called “PATH”.

• Add these paths, separating entries by ”;”:

• C:\Python27 – this folder contains the main Python executable
• C:\Python27\Scripts – this folder will contain executables added by Python packages installed with pip (see below)

This is for Python 2.7. If you use another version of Python or installed to a non-default location, change the digits “27” accordingly.

• Now run the Command Prompt. After command prompt window appear, type python and Enter. If the Python installation was successful, the installed Python version is printed, and you are greeted by the prompt>>> . TypeCtrl+Z and Enter to quit.

Add these paths, separating entries by ”;”:

• C:\Python27 – this folder contains the main Python executable
• C:\Python27\Scripts – this folder will contain executables added by Python packages installed with easy_install (see below)

This is for Python 2.7. If you use another version of Python or installed to a non-default location, change the digits “27” accordingly.

After installation, you better add the Python executable directories to the environment variable PATH in order to run Python and package commands such as sphinx-build easily from the Command Prompt.

• ### Install the pip command

Python has a very useful pip command which can download and install 3rd-party libraries with a single command. This is provided by the Python Packaging Authority(PyPA): https://groups.google.com/forum/#!forum/pypa-dev

To install pip, download https://bootstrap.pypa.io/get-pip.py and save it somewhere. After download, invoke the command prompt, go to the directory with get-pip.py and run this command:

C:\> python get-pip.py


Now pip command is installed. From there we can go to the Sphinx install.

Note :pip has been contained in the Python official installation after version

of Python-3.4.0 or Python-2.7.9.
• ### Installing Sphinx with pip

If you finished the installation of pip, type this line in the command prompt:

C:\> pip install sphinx


After installation, type sphinx-build -h on the command prompt. If everything worked fine, you will get a Sphinx version number and a list of options for this command.

That it. Installation is over. Head to First Steps with Sphinx to make a Sphinx project.

Now run the Command Prompt. After command prompt window appear, type python and Enter. If the Python installation was successful, the installed Python version is printed, and you are greeted by the prompt>>> . TypeCtrl+Z and Enter to quit.

• ### Install the easy_install command

Python has a very useful easy_install command which can download and install 3rd-party libraries with a single command. This is provided by the “setuptools” project: https://pypi.python.org/pypi/setuptools.

To install setuptools, download https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py and save it somewhere. After download, invoke the command prompt, go to the directory with ez_setup.py and run this command:

C:\> python ez_setup.py


Now setuptools and its easy_install command is installed. From there we can go to the Sphinx install.

### Installing Sphinx with easy_install

If you finished the installation of setuptools, type this line in the command prompt:
C:\> easy_install sphinx


After installation, type sphinx-build on the command prompt. If everything worked fine, you will get a Sphinx version number and a list of options for this command.

• Numpy is a scientific computing package for Python. Required for the Python interface.

Try the (unofficial) binaries in this site: http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy
You can get numpy 1.6.2 x64 with or without Intel MKL libs to Python 2.7

I suggest WinPython, a Python 2.7 distribution for Windows with both 32- and 64-bit versions.

• ### Numpy :Required for the Python interface abve Installation of python alosho included with Numpy and Scipy libraries

• Go to TBB download page to download the open source binary releases. I choose Commercial Aligned Release, because this has the most stable releases. I downloaded tbb43_20141204os, TBB 4.3 Update 3, specifically tbb43_20141204os for Windows. The release has the header files as well as the import library and DLL files prebuilt for Microsoft Visual C++ 8.0 and 9.0 on both x86(IA32) and x64(intel64). If you are aggressive and need the source code of TBB, you can try stable releases or development releases.
• Install TBB
• Extract the files in the zip file to a local directory, for example, C:\TBB. You should find tbb22_013oss under it. This is the installation directory, and doc, example, include etc should be directly under the installation folder.
• Set a Windows environment variable TBB22_INSTALL_DIR to the above directory, e.g., C:\TBB\tbb22_013oss.
• Develop with TBB
• Add $(TBB22_INSTALL_DIR)\include to your C++ project’s additional include directories. • Add$(TBB22_INSTALL_DIR)\<arch>\<compiler>\lib (e.g., $(TBB22_INSTALL_DIR)\ia32\vc9\lib) to your project’s additional library directories. • Add to your project’s additional dependencies tbb.lib (Release) or tbb_debug.lib (Debug). • Write your C++ code to use TBB. See code below as an example. • Deploy with TBB • The TBB runtime is in TBB DLLs (tbb.dll/tbbmalloc.dll/tbbmalloc_proxy.dll for Release, tbb_debug.dll/tbbmalloc_debug.dll/tbbmalloc_proxy_debug.dll for Debug). They can be found in$(TBB22_INSTALL_DIR)\\\bin.
• Your executable should have these DLLs in the same folder for execution.

### Intel © Integrated Performance Primitives (IPP) may be used to improve the performance of color conversion.(Paid)

Intel Parallel Studio XE 2015 – Cluster Edition includes everything in the Professional edition (compilers, performance libraries, parallel models, performance profiler, threading design/prototyping, and memory & thread debugger). It adds a MPI cluster communications library, along with MPI error checking and tuning to design, build, debug and tune fast parallel code that includes MPI.

• ### Eigen is a C++ template library for linear algebra.

#### “install” Eigen?

In order to use Eigen, you just need to download and extract Eigen‘s source code (see the wiki for download instructions). In fact, the header files in the Eigen subdirectory are the only files required to compile programs using Eigen. The header files are the same for all platforms. It is not necessary to use CMake or install anything.

#### simple first program

Here is a rather simple program to get you started.

#include <iostream>
#include <Eigen/Dense>
int main()
{
MatrixXd m(2,2);
m(0,0) = 3;
m(1,0) = 2.5;
m(0,1) = -1;
m(1,1) = m(1,0) + m(0,1);
std::cout << m << std::endl;
}
• ### Installing CUDA Development Tools

The setup of CUDA development tools on a system running the appropriate version of Windows consists of a few simple steps:

• Verify the system has a CUDA-capable GPU.
• Install the NVIDIA CUDA Toolkit.
• Test that the installed software runs correctly and communicates with the hardware.
• CUDA Toolkit will allow you to use the power lying inside your GPU. we will going to use CUDA 7.5  Toolkit

To verify that your GPU is CUDA-capable, open the Control Panel (StartControl > Panel) and double click on System. In the System Propertieswindow that opens, click the Hardware tab, then Device Manager. Expand the Display adapters entry. There you will find the vendor name and model of your graphics card. If it is an NVIDIA card that is listed inhttp://developer.nvidia.com/cuda-gpus, your GPU is CUDA-capable.

The Release Notes for the CUDA Toolkit also contain a list of supported products.

Choose the platform you are using and download the NVIDIA CUDA Toolkit

The CUDA Toolkit contains the CUDA driver and tools needed to create, build and run a CUDA application as well as libraries, header files, CUDA samples source code, and other resources.

Install the CUDA Software

Before installing the toolkit, you should read the Release Notes, as they provide details on installation and software functionality.

Note: The driver and toolkit must be installed for CUDA to function. If you have not installed a stand-alone driver, install the driver from the NVIDIA CUDA Toolkit.

### Graphical Installation

Install the CUDA Software by executing the CUDA installer and following the on-screen prompts.

### Silent Installation

Alternatively, the installer can be executed in silent mode by executing the package with the -s flag. Additional flags can be passed which will install specific subpackages instead of all packages. Allowed subpackage names are: CUDAToolkit_6.5, CUDASamples_6.5, CUDAVisualStudioIntegration_6.5, and Display.Driver. For example, to install only the driver and the toolkit components:

.exe -s CUDAToolkit_6.5 Display.Driver

This will drastically improve performance for some algorithms (e.g the HOG descriptor). Getting more and more of our algorithms to work on the GPUs is a constant effort of the OpenCV .

• ### JRE :Java run time environment

Installing Ant The binary distribution of Ant consists of the following directory layout:

  ant
+--- bin  // contains launcher scripts
|
+--- lib  // contains Ant jars plus necessary dependencies
|
+--- docs // contains documentation
|      |
|      +--- images  // various logos for html documentation
|      |
|      +--- manual  // Ant documentation (a must read ;-)
|
+--- etc // contains xsl goodies to:
//   - create an enhanced report from xml output of various tasks.
//   - migrate your build files and get rid of 'deprecated' warning
//   - ... and more ;-)


Only the bin and lib directories are required to run Ant. To install Ant, choose a directory and copy the distribution files there. This directory will be known as ANT_HOME.

Before you can run Ant there is some additional set up you will need to do unless you are installing the RPM version from jpackage.org:

• Add the bin directory to your path.
• Set the ANT_HOME environment variable to the directory where you installed Ant. On some operating systems, Ant’s startup scripts can guess ANT_HOME(Unix dialects and Windows NT/2000), but it is better to not rely on this behavior.
• Optionally, set the JAVA_HOME environment variable (see the Advanced section below). This should be set to the directory where your JDK is installed.

Operating System-specific instructions for doing this from the command line are in the Windows, Linux/Unix (bash), and Linux/Unix (csh) sections. Note that using this method, the settings will only be valid for the command line session you run them in. Note: Do not install Ant’s ant.jar file into the lib/ext directory of the JDK/JRE. Ant is an application, whilst the extension directory is intended for JDK extensions. In particular there are security restrictions on the classes which may be loaded by an extension.

 Windows Note: The ant.bat script makes use of three environment variables – ANT_HOME, CLASSPATH and JAVA_HOME. Ensure that ANT_HOME and JAVA_HOME variables are set, and that they do not have quotes (either ‘ or “) and they do not end with \ or with /. CLASSPATH should be unset or empty.

### Check Installation

You can check the basic installation with opening a new shell and typing ant. You should get a message like this

Buildfile: build.xml does not exist!
Build failed


So Ant works. This message is there because you need to write an individual buildfile for your project. With a ant -version you should get an output like

Apache Ant(TM) version 1.9.2 compiled on July 8 2013


If this does not work ensure your environment variables are set right. They must resolve to:

• required: %ANT_HOME%\bin\ant.bat
• optional: %JAVA_HOME%\bin\java.exe
• required: %PATH%=…maybe-other-entries…;%ANT_HOME%\bin;…maybe-other-entries

ANT_HOME is used by the launcher script for finding the libraries. JAVA_HOME is used by the launcher for finding the JDK/JRE to use. (JDK is recommended as some tasks require the java tools.) If not set, the launcher tries to find one via the %PATH% environment variable. PATH is set for user convenience. With that set you can just start ant instead of always typingthe/complete/path/to/your/ant/installation/bin/ant.

• ### OpenNI Framework contains a set of open source APIs that provide support for natural interaction with devices.

Guideline How to Install OpenNI 2.0 + Nite 2.0 + Kinect SDK 1.8 + windows 7 32/64 bit

UPDATED [14th January 2013]



• #### Step 2

a. Install Kinect for Windows SDK b. Install OpenNI 2.2 SDK  32bits / 64bits  ( install both 64 bits and 32bits if u are using win64) c. Install  Nite 2.2  ( install both 64 bits and 32bits if u are using win64)

• Step 3
Run NiViewer.exe for Testing.  It will be located at

64bits
~ (OpenNI ) C:\Program Files\OpenNI2\Samples\Bin
~ (Prime Sensor) C:\Program Files\PrimeSense\NiTE2\Samples\Bin

32bits
~(OpenNI )  C:\Program Files (x86) \OpenNI2\Samples\Bin
~ (Prime Sensor) C:\Program Files (x86) \PrimeSense\NiTE2\Samples\Bin

Programmer Guide
http://www.openni.org/openni-programmers-guide/
• ### Miktex is the best TEX implementation on the Windows OS. but we will use lyx

Install the programs found below in the order shown.

1. Install MiKTeX This is the core program used byLyx to output nice documents. After installing it, make sure thatMiKTeX is able to install missing packages on-the-fly. To do this, go toStart→Programs→MiKTeX→MiKTeX Options →General→Package installation→Ask me first.
Note: You should be able to use other LaTeX packages instead of MikTeX. One example is TeXLive, a CD-based version.
2. Install Ghostscript. Thisis used to create Postscript documents, as well as for viewing .ps and .eps files insideLyX.
3. Install ImageMagick (choose the file “ImageMagick-6.x.x-Q16-windows-dll.exe”). ImageMagickis used to convert images into formats thatLyX can understand.
4. Optionally, install GSview, which lets you view Postscript files. Thisis recommended.
5. Optionally, install a PDF viewer, if you do not already have one. This is also recommended. Adobe Reader is the most common. Alternates include the more compact Foxit reader and GSview, which shows PDF files in addition to Postscript.
6. Download these math fonts math and install them in windows fonts folder. To install the fonts useStart→Settings→Control Panel→Fonts and thenFile→Install New Font. These fontsare used to display symbols in theLyX math editor.
7. Finally, install LyX 1.4.1 for Windows. Installing it last ensures that it can find the programs listed above. If you install or update one of those programs later, go to Reconfigure in the Tools menu to make LyX look for it.

• ### QT for GUI (for X64) and Visual Studio Add-in 1.2.4 for Qt5

Extract it into a directory like C:\Qt Open a shell, cd to the directory containing the extracted Qt files and enter the following command:

configure.exe -release -no-webkit -no-phonon -no-phonon-backend -no-script -no-scripttools
-no-qt3support -no-multimedia -no-ltcg

This will take around 10 minutes. Then enter the next command that will take a lot longer (can easily take even more than a full hour): make Create a system environment variable QTDIR : C:\Qt\qt-everywhere-opensource-src-4.8.2 Add %QTDIR%\bin at the front of the system PATH.

setx -m QTDIR D:/OpenCV/dep/qt/qt-everywhere-opensource-src

## CUDA Unified Memory

CUDA is the language of Nvidia GPU’s.  To extract maximum performance from GPU’s, you’ll want to develop applications in CUDA.

CUDA Toolkit is the primary IDE (integrated development environment) for developing CUDA-enabled applications.  The main roles of the Toolkit IDE are to simplify the software development process, maximize software developer productivity, and provide features that enhance GPU performance.  The Toolkit has been steadily evolving in tandem with GPU hardware and currently sits at Version 6.5.

One of the most important features of CUDA 6.5 is Unified Memory (UM).  (UM was actually first introduced in CUDA v.6.0).  CPU host memory and GPU device memory are physically separate entities, connected by a relatively slow PCIe bus.  Prior to v.6.0, data elements shared in both CPU and GPU memory required two copies – one copy in CPU memory and one copy in GPU memory.  Developers had to allocate memory on the CPU, allocate memory on the GPU, and then copy data from CPU to GPU and from GPU to CPU.  This dual data management scheme added complexity to programs, opportunities for the introduction of software bugs, and an excessive focus of time and energy on data management tasks.

UM corrects this.  UM creates a memory pool that is shared between CPU and GPU, with a single memory address space and single pointers accessible to both host and device code.  The CUDA driver and runtime libraries automatically handle data transfers between host and device memory, thus relieving developers from the burden of explicitly managing those data transfers.  UM improves performance by automatically providing data locality on the CPU or GPU, wherever it might be required by the application algorithm.  UM also guarantees global coherency of data on host and device, thus reducing the introduction of software bugs.

Let’s explore some sample code that illustrates these concepts.  We won’t concern ourselves with the function of this algorithm; instead, we’ll just focus on the syntax. (Credit to Nvidia for this C/CUDA template example).

Without Unified Memory

Without Unified Memory

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 #include  #include  struct DataElement {   char *name;   int value; }; __global__ void Kernel(DataElement *elem) {   printf("On device: name=%s, value=%d\n", elem->name, elem->value;   elem->name[0] = 'd';   elem->value++; } void launch(DataElement *elem) {   DataElement *d_elem;   char *d_name;   int namelen = strlen(elem->name) + 1;   // Allocate memory on GPU   cudaMalloc(&d_elem, sizeofDataElement());   cudaMalloc(&d_name, namelen);   // Copy data from CPU to GPU   cudaMemcpy(d_elem, elem, sizeof(DataElement),      cudaMemcpyHostToDevice);   cudaMemcpy(d_name, elem->name, namelen, cudaMemcpyHostToDevice);   cudaMemcpy(&(d_elem->name), &d_name, sizeof(char*),      cudaMemcpyHostToDevice);   // Launch kernel   Kernel<<< 1, 1 >>>(d_elem);   // Copy data from GPU to CPU   cudaMemcpy(&(elem->value), &(d_elem->value), sizeof(int),      cudaMemcpyDeviceToHost);   cudaMemcpy(elem->name, d_name, namelen, cudaMemcpyDeviceToHost);   cudaFree(d_name);   cudaFree(d_elem); } int main(void) {   DataElement *e;   // Allocate memory on CPU   e = (DataElement*)malloc(sizeof(DataElement));   e->value = 10;   // Allocate memory on CPU   e->name = (char*)malloc(sizeof(char) * (strlen("hello") + 1));   strcpy(e->name, "hello");   launch(e);   printf("On host: name=%s, value=%d\n", e->name, e->value);   free(e->name);   free(e);   cudaDeviceReset(); }

Note these key points:

• L51,55: Allocate memory on CPU
• L24,25: Allocate memory on GPU
• L28-32: Copy data from CPU to GPU
• L35: Run kernel
• L38-40: Copy data from GPU to CPU

With Unified Memory

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 #include  #include  struct DataElement {   char *name;   int value; }; __global__ void Kernel(DataElement *elem) {   printf("On device: name=%s, value=%d\n", elem->name, elem->value;   elem->name[0] = 'd';   elem->value++; } void launch(DataElement *elem) {   // Launch kernel   Kernel<<< 1, 1 >>>(elem);   cudaDeviceSynchronize(); } int main(void) {   DataElement *e;   // Allocate unified memory on CPU and GPU   cudaMallocManaged((void**)&e, sizeof(DataElement));   e->value = 10;   // Allocate unified memory on CPU and GPU   cudaMallocManaged((void**)&(e->name), sizeof(char) *      (strlen("hello") + 1) );   strcpy(e->name, "hello");   launch(e);   printf("On host: name=%s, value=%d\n", e->name, e->value);   cudaFree(e->name);   cudaFree(e);   cudaDeviceReset(); } 

Note these key points:

• L28, 32, 33: Allocate unified memory on CPU and GPU
• L19: Run kernel

With UM, memory is allocated on the CPU and GPU in a single address space and managed with a single pointer.  Note how the “malloc’s” and “cudaMalloc’s” are condensed into single calls to cudaMallocManaged().  Furthermore, explicit cudaMemcpy() data transfers between CPU and GPU are eliminated, as the CUDA runtime handles these transfers automatically in the background. Collectively these actions simplify code development, code maintenance, and data management.

As software project managers, we like UM for the productivity enhancements it provides for our software development teams.  It improves software quality, reduces coding time, effort and cost, and enhances overall performance. As software engineers, we like UM because of reduced coding effort and the fact that we can focus time and effort on writing CUDA kernel code, where all the parallel performance comes from, instead of spending time on memory management tasks.  Unified Memory is major step forward in GPU programming.

## CUDA Random Numbers

### CUDA Random Example

In order to use cuRAND, we need to add two include files into our program:

#include &lt;curand.h&gt;
#include &lt;curand_kernel.h&gt;


cuRAND uses a curandState_t type to keep track of the state of the random sequence. The normal C rand function also has a state, but it is global, and hidden from the programmer. This makes rand not thread-safe, but easier to use.

A curandState_t object must be initialized with a call to curand_init which has the following parameters:

• seed: The seed determines the beginning point of the sequence of random numbers.
• sequence: The sequence number is another seed-like value. It is used so that, if all cores have the same seed, but different sequence numbers, then they will get different random values.
• offset: The amount we skip ahead in the random sequence. This can be zero.
• state: A pointer to the curandState_t object to initialize.

Once we have an initialized curandState_t object, we can get random numbers with the curand function which takes a pointer to a curandState_t object and returns to us a random unsigned integer.

The following program uses these functions to generate random numbers:

#include <unistd.h>
#include <stdio.h>

/* we need these includes for CUDA's random number stuff */
#include <curand.h>
#include

#define MAX 100

/* this GPU kernel function calculates a random number and stores it in the parameter */
__global__ void random(int* result) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread  */
curandState_t state;

/* we have to initialize the state */
curand_init(0, /* the seed controls the sequence of random values that are produced */
0, /* the sequence number is only important with multiple cores */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&state);

/* curand works like rand - except that it takes a state as a parameter */
*result = curand(&state) % MAX;
}

int main( ) {
/* allocate an int on the GPU */
int* gpu_x;
cudaMalloc((void**) &gpu_x, sizeof(int));

/* invoke the GPU to initialize all of the random states */
random<<<1, 1>>>(gpu_x);

/* copy the random number back */
int x;
cudaMemcpy(&x, gpu_x, sizeof(int), cudaMemcpyDeviceToHost);

printf("Random number = %d.\n", x);

/* free the memory we allocated */
cudaFree(gpu_x);

return 0;
}


When run, this program produces the exact same random number each time. This is because the seed passed in was 0. In order to get a different random number each time, we can pass in the current time as the seed.


#include <unistd.h>
#include <stdio.h>

/* we need these includes for CUDA's random number stuff */

#include
#include

#define MAX 100

/* this GPU kernel function calculates a random number and stores it in the parameter */
__global__ void random(unsigned int seed, int* result) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread  */
curandState_t state;

/* we have to initialize the state */
curand_init(seed, /* the seed controls the sequence of random values that are produced */
0, /* the sequence number is only important with multiple cores */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&state);

/* curand works like rand - except that it takes a state as a parameter */
*result = curand(&state) % MAX;
}

int main( ) {
/* allocate an int on the GPU */
int* gpu_x;
cudaMalloc((void**) &gpu_x, sizeof(int));

/* invoke the GPU to initialize all of the random states */
random<<<1, 1>>>(time(NULL), gpu_x);

/* copy the random number back */
int x;
cudaMemcpy(&x, gpu_x, sizeof(int), cudaMemcpyDeviceToHost);

printf("Random number = %d.\n", x);

/* free the memory we allocated */
cudaFree(gpu_x);

return 0;
}


### Using Random Numbers Across Cores

If we want to get random numbers in multiple GPU cores, then we would need each core to have its own curandState_t.

If we want each run of the program to produce different sequences of random numbers, then we would need to set the seed to the current time.

However, now we would likely have each core get the same sequence of numbers. This is probably undesirable. To avoid it, we set the sequence parameter to the thread’s ID.

This way, each thread will have a different stream of random numbers, which will also be different each time the program is run.

The following program illustrates this by creating N curandState_t objects, then launching a GPU kernel to get N random numbers from them, in parallel.

#include <unistd.h>
#include <stdio.h>

/* we need these includes for CUDA's random number stuff */
#include
#include

#define N 25

#define MAX 100

/* this GPU kernel function is used to initialize the random states */
__global__ void init(unsigned int seed, curandState_t* states) {

/* we have to initialize the state */
curand_init(seed, /* the seed can be the same for each core, here we pass the time in from the CPU */
blockIdx.x, /* the sequence number should be different for each core (unless you want all
cores to get the same sequence of numbers for some reason - use thread id! */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&states[blockIdx.x]);
}

/* this GPU kernel takes an array of states, and an array of ints, and puts a random int into each */
__global__ void randoms(curandState_t* states, unsigned int* numbers) {
/* curand works like rand - except that it takes a state as a parameter */
numbers[blockIdx.x] = curand(&states[blockIdx.x]) % 100;
}

int main( ) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread  */
curandState_t* states;

/* allocate space on the GPU for the random states */
cudaMalloc((void**) &states, N * sizeof(curandState_t));

/* invoke the GPU to initialize all of the random states */
init<<<n, 1="">>>(time(0), states);

/* allocate an array of unsigned ints on the CPU and GPU */
unsigned int cpu_nums[N];
unsigned int* gpu_nums;
cudaMalloc((void**) &gpu_nums, N * sizeof(unsigned int));

/* invoke the kernel to get some random numbers */
randoms<<<n, 1="">>>(states, gpu_nums);

/* copy the random numbers back */
cudaMemcpy(cpu_nums, gpu_nums, N * sizeof(unsigned int), cudaMemcpyDeviceToHost);

/* print them out */
for (int i = 0; i < N; i++) {
printf("%u\n", cpu_nums[i]);
}

/* free the memory we allocated for the states and numbers */
cudaFree(states);
cudaFree(gpu_nums);

return 0;
}


This program is also the first to use multiple GPU kernel functions.

### Random Distributions

In addition to the curand function which, together with modular arithmetic, can return to us random integers from any range we wish, cuRAND provides functions to get floating point numbers from different distributions:

__device__ float curand_uniform (curandState_t *state)

__device__ float curand_normal (curandState_t *state)


curand_uniform returns a random number between 0.0 and 1.0 following a uniform distribution. This means that all floating point numbers in that range are equally likely to be produced.

curand_normal also returns a random number between 0.0 and 1.0, but it follows a normal distribution, meaning that the number 0.5 is more likely to be produced than numbers near 0.0 or 1.0. Normal distributions would be important for modelling many natural phenomenon accurately.

## OpenCV: Color-spaces and splitting channels

### Conversion between color-spaces

Our goal here is to visualize each of the three channels of these color-spaces: RGB, HSV, YCrCb and Lab. In general, none of them are absolute color-spaces and the last three (HSV, YCrCb and Lab) are ways of encoding RGB information. Our images will be read in BGR (Blue-Green-Red), because of OpenCV defaults. For each of these color-spaces there is a mapping function and they can be found at OpenCV cvtColor documentation.
One important point is: OpenCV imshow() function will always assume that the Mat shown is in BGR color-space. Which means, we will always need to convert back to see what we want. Let’s start.

#### HSV

While in BGR, an image is treated as an additive result of three base colors (blue, green and red), HSV stands for Hue, Saturation and Value (Brightness). We can say that HSV is a rearrangement of RGB in a cylindrical shape. The HSV ranges are:

• 0 > H > 360 ⇒ OpenCV range = H/2 (0 > H > 180)
• 0 > S > 1 ⇒ OpenCV range = 255*S (0 > S > 255)
• 0 > V > 1 ⇒ OpenCV range = 255*V (0 > V > 255)

#### YCrCb or YCbCr

It is used widely in video and image compression schemes. The YCrCb stands for Luminance (sometimes you can see Y’ as luma), Red-difference and Blue-difference chroma components. The YCrCb ranges are:

• 0 > Y > 255
• 0 > Cr > 255
• 0 > Cb > 255

#### L*a*b

In this color-opponent space, L stands for the Luminance dimension, while a and b are the color-opponent dimensions. The Lab ranges are:

• 0 > L > 100 ⇒ OpenCV range = L*255/100 (1 > L > 255)
• -127 > a > 127 ⇒ OpenCV range = a + 128 (1 > a > 255)
• -127 > b > 127 ⇒ OpenCV range = b + 128 (1 > b > 255)

### Splitting channels

All the color-spaces mentioned above were constructed using three channels (dimensions). It is a good exercise to visualize each of these channels and realize what they really store, because when I say that the third channel of HSV stores the brightness, what do you expect to see? Remember: a colored image is made of three-channels (in our cases) and when we see each of them separately, what do you think the output will be? If you said a grayscale image, you are correct! However, you might have seen these channels as colored images out there. So, how? For that, we need to choose a fixed value for the other two channels. Let’s do this!
To visualize each channel with color, I used the same values used on the Slides 53 to 65 from CS143, Lecture 03 from Brown University.

RGB or BGR

HSV

YCrCb or YCbCr

Lab or CIE Lab

# Building VTK5 with Visual Studio

1. Download VTK 5.10.1 the (VTK-5.10.1.zip) to unzip the file. (C: \ VTK-5.10.1)Http://Www.Vtk.Org/VTK/resources/software.Html#previous
Https://Github.Com/Kitware/VTK/tree/v5.10.1

## CMake

1. You want to specify the destination of the input destination and solution files of source code.
• Where is the source code: C:\VTK-5.10.1
• Where is build the binaries: C:\VTK-5.10.1\build
2. Press the [Configure] to select the Visual Studio that is the target.
3. It makes various settings.
• BUILD_SHAREED_LIBS ☑ (check)
• BUILD_TESTING ☐ (uncheck)
• CMAKE_CONFIGURATION_TYPES Debug;Release
• CMAKE_INSTALL_PREFIX C:\Program Files\VTK (or C:\Program Files (x86)\VTK)
 Name: CMAKE_DEBUG_POSTFIX Type: STRING Value: -gd Description:

* Debug string to be added to the file name of the build generated files of the (last).

5. And output the solution file by pressing the [Generate].

## Build

1. Start Visual Studio with administrative privileges VTK solution file (C: \ VTK-5.10.1 \ build \ VTK.sln) to open.
2. It wants to modify the source code.
• vtkOStreamWrapper.cxx
60 line

//VTKOSTREAM_OPERATOR(ostream&);
vtkOStreamWrapper& vtkOStreamWrapper::operator << (ostream& a) {
this->ostr << (void *)&a;
return *this;
}

3925 line

if (this->IFile->read(result, 80).fail())


3944 line

if (this->IFile->read(dummy, 8).fail())


4001 line

if (this->IFile->read(dummy, 4).fail())


4008 line

if (this->IFile->read((char*)result, sizeof(int)).fail())


4025 line

if (this->IFile->read(dummy, 4).fail())


4048 line

if (this->IFile->read(dummy, 4).fail())


4055 line

if (this->IFile->read((char*)result, sizeof(int)*numInts).fail())


4072 line

if (this->IFile->read(dummy, 4).fail())


4095 line

if (this->IFile->read(dummy, 4).fail())


4102 line

if (this->IFile->read((char*)result, sizeof(float)*numFloats).fail())


4119 line

if (this->IFile->read(dummy, 4).fail())

• vtkConvexHull2D.cxx
31 lines

#include <algorithm>

31 lines

#include <algorithm>

• vtkNormalizeMatrixVectors.cxx
30 Line

#include <algorithm>

• vtkPairwiseExtractHistogram2D.cxx
39 line

#include <algorithm>

• vtkControlPointsItem.cxx
35 lines

#include <algorithm>

• vtkPiecewisePointHandleItem.cxx
31 lines

#include <algorithm>

• vtkParallelCoordinatesRepresentation.cxx
83 line

#include <algorithm>

1. It wants to build the VTK. (ALL_BUILD)
1. The configuration of the solution (Debug, Release) set the.
2. Choose the ALL_BUILD project from Solution Explorer.
3. [Build]> to build VTK Press [Build Solution].
2. It wants to install the VTK. (INSTALL)
1. Choose the INSTALL project from Solution Explorer.
2. [Build]> [projects only]> to install the VTK Press [INSTALL only the Build menu.CMAKE_INSTALL_PREFIX necessary files are copied to the specified output destination.

## Environment Variable

1. Environment variable VTK_ROOT create a VTK of path: Set the (C \ Program Files \ VTK).
2. Environment variable Path I add a% VTK_ROOT% \ bin; to.

# Building VTK6 with Visual Studio

1. Download VTK 6.1.0 the (VTK-6.1.0.zip) to unzip the file. (C: \ VTK-6.1.0)Http://Www.Vtk.Org/VTK/resources/software.Html#latestcand
Https://Github.Com/Kitware/VTK/tree/v6.1.0

## CMake

1. You want to specify the destination of the input destination and solution files of source code.
• Where is the source code: C:\VTK-6.1.0
• Where is build the binaries: C:\VTK-6.1.0\build
2. Press the [Configure] to select the Visual Studio that is the target.
3. It makes various settings.
• BUILD_SHAREED_LIBS ☑ (check)
• BUILD_TESTING ☐ (uncheck)
• CMAKE_CONFIGURATION_TYPES Debug;Release
• CMAKE_INSTALL_PREFIX C:\Program Files\VTK (or C:\Program Files (x86)\VTK)
 Name: CMAKE_DEBUG_POSTFIX Type: STRING Value: -gd Description:

* Debug string to be added to the file name of the build generated files of the (last).

5. And output the solution file by pressing the [Generate].

## Build

1. Start Visual Studio with administrative privileges VTK solution file (C: \ VTK-6.1.0 \ build \ VTK.sln) to open.
2. It wants to build the VTK. (ALL_BUILD)
1. The configuration of the solution (Debug, Release) set the.
2. Choose the ALL_BUILD project from Solution Explorer.
3. [Build]> to build VTK Press [Build Solution].
3. It wants to install the VTK. (INSTALL)
1. Choose the INSTALL project from Solution Explorer.
2. [Build]> [projects only]> to install the VTK Press [INSTALL only the Build menu.CMAKE_INSTALL_PREFIX necessary files are copied to the specified output destination.

## Environment Variable

1. Environment variable VTK_DIR create a VTK of path: Set the (C \ Program Files \ VTK).
2. Environment variable Path I add a% VTK_DIR% \ bin; to.

# Building VTK6 + Qt5 with Visual Studio

1. Download VTK 6.1.0 the (VTK-6.1.0.zip) to unzip the file. (C: \ VTK-6.1.0)Http://Www.Vtk.Org/VTK/resources/software.Html#latestcand
Https://Github.Com/Kitware/VTK/tree/v6.1.0
2. Qt 5.4.0 with OpenGLをダウンロード、インストールする。(C:\Qt)

• Qt 5.4.0 for Windows 32-bit (VS 2013, OpenGL, 694 MB)
(qt-opensource-windows-x86-msvc2013_opengl-5.4.0.exe)
• Qt 5.4.0 for Windows 64-bit (VS 2013, OpenGL, 709 MB)
(qt-opensource-windows-x86-msvc2013_64_opengl-5.4.0.exe)

## CMake

1. You want to specify the destination of the input destination and solution files of source code.
• Where is the source code: C:\VTK-6.1.0
• Where is build the binaries: C:\VTK-6.1.0\build
2. Press the [Configure] to select the Visual Studio that is the target.
3. It makes various settings.
(Grouped and helpful to put a check to Advanced.) * Win32 is Msvc2013_opengl , x64 is msvc2013_64_openglspecified in. Ungrouped Entries

• Qt5Core_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Core
• Qt5Designer_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Designer
• Qt5Gui_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Gui
• Qt5Network_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Network
• Qt5OpenGL_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5OpenGL
• Qt5Sql_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Sql
• Qt5WebKit_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5WebKit
• Qt5WebKitWidgets_DIRC:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5WebKitWidgets
• Qt5Widgets_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Widgets
• Qt5Xml_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/cmake/Qt5Xml

BUILD

• BUILD_SHAREED_LIBS ☑ (check)
• BUILD_TESTING ☐ (uncheck)

CMAKE

• CMAKE_CONFIGURATION_TYPES Debug;Release
• CMAKE_INSTALL_PREFIX C:\Program Files\VTK (or C:\Program Files (x86)\VTK)

Module

• Module_vtkGUISupportQt ☑ (check)
• Module_vtkGUISupportQtOpenGL ☑ (check)
• Module_vtkGUISupportQtSQL ☑ (check)
• Module_vtkGUISupportQtWebkit ☑ (check)
• Module_vtkRenderingQt ☑ (check)
• Module_vtkViewsQt ☑ (check)

OPENGL

• OPENGL_gl_LIBRARY opengl
• OPENGL_glu_LIBRARY glu32

QT

• QT_MKSPECS_DIR C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/mkspecs/win32-msvc2013
• QT_QMAKE_EXECUTABLE C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/bin/qmake.exe
• QT_QTCORE_LIBRARY_DEBUG C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/Qt5Cored.lib
• QT_QTCORE_LIBRARY_DEBUG C:/Qt/Qt5.4.0/5.4/msvc2013_64_opengl/lib/Qt5Core.lib

VTK

• VTK_Group_Qt ☑ (check)
• VTK_INSTALL_QT_PLUGIN_DIR ${CMAKE_INSTALL_PREFIX}/${VTK_INSTALL_QT_DIR}
• VTK_QT_VERSION 5
 Name: CMAKE_PREFIX_PATH Type: PATH Value: C:\Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x64 (or C:\Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x86) Description:

* Windows Kits path if Visual Studio 2013 8.1 \ Lib \ Winv6.3, if Visual Studio 2012 8.0 I specify the \ Lib \ Win8.

 Name: CMAKE_DEBUG_POSTFIX Type: STRING Value: -gd Description:

* Debug string to be added to the file name of the build generated files of the (last).

5. And output the solution file by pressing the [Generate].

## Build

1. Start Visual Studio with administrative privileges VTK solution file (C: \ VTK-6.1.0 \ build \ VTK.sln) to open.
2. It wants to build the VTK. (ALL_BUILD)
1. The configuration of the solution (Debug, Release) set the.
2. Choose the ALL_BUILD project from Solution Explorer.
3. [Build]> to build VTK Press [Build Solution].
3. It wants to install the VTK. (INSTALL)
1. Choose the INSTALL project from Solution Explorer.
2. [Build]> [projects only]> to install the VTK Press [INSTALL only the Build menu.CMAKE_INSTALL_PREFIX necessary files are copied to the specified output destination.

## Environment Variable

1. Environment variable VTK_DIR create a VTK of path: Set the (C \ Program Files \ VTK).
2. Environment variable QTDIR by creating a Qt of the path (C: \ Qt \ Qt5.4.0 \ 5.4 \ msvc2013_64_opengl \ (or C: \ Qt \ Qt5.4.0 \ 5.4 \ msvc2013_opengl \)) to set.
3. Environment variable Path in;% VTK_DIR% \ bin;% I add a QTDIR% \ bin.

## Introduction

This article describes the step by step process of creating project template in Visual Studio 2012 and VSIX installer that deploys the project template. Each step contains an image snapshot that helps the reader to keep focused.

## Background

A number of predefined project and project item templates are installed when you install Visual Studio. You can use one of the many project templates to create the basic project container and a preliminary set of items for your application, class, control, or library. You can also use one of the many project item templates to create, for example, a Windows Forms application or a Web Forms page to customize as you develop your application.

You can create custom project templates and project item templates and have these templates appear in the New Project and Add New Item dialog boxes. The article describes the complete process of creating and deploying the project template.

## Using the Code

Here, I have taken a very simple example which contains nearly no code but this can be extended as per your needs.

## Create Project Template

First of all, create the piece (project or item) that resembles the thing you want to get created started from the template we are going to create.

Then, export the template (we are going to use the exported template as a shortcut to build our Visual Studio template package):

## Visual Studio Project Templates

We are creating a project template here.

Fill all the required details:

A zip file should get created:

## Creating Visual Studio Package Project

To use VSIX projects, you need to install the Visual Studio 2012 VSSDK.

You should see new project template “Visual Studio Package” after installing SDK.

Select C# as our project template belongs to C#.

Provide details:

Currently, we don’t need unit test project but they are good to have.

In the solution, double-click the manifest, so designer opens.

Fill all the tabs. The most important is Assert. Here you give path of our project template(DummyConsoleApplication.zip).

As a verification step, build the solution, you should see a .vsix being generated after its dependency project:

## Installing the Extension

Project template is located under “Visual C#” node.

## CUDA  with Visual Studio -The very first program

This post is my sharing about how to config CUDA 5.0 (exactly 5.0.35) with Visual C++ Express 2010 on Windows 7. Besides, some other issues are mentioned including how to compile a CUDA program, how to measure runtime of a function or part of code, how to make Visual C++ and Visual Assist X aware of CUDA C++ code, and the last thing is the answer to the questtion: ” Is it possible to program (write code, compile only) on a non CUDA machine?”.

1. Installation

You need 2 program, Visual C++ 2010 Express and CUDA 5 (32 bit or 64 bit based on your system). After downloading them, install the Visual C++ first, then the CUDA library (choose all the options). There is nothing special about this step.

The files of a CUDA program are classified as two types: the normal C++ source file (*.cpp and *.h, ect.) and the CUDA C++ file (*.cu and *.cuh). The CUDA source file must be compiled by NVCC program (a compiler from Nvidia) and the resulted binary code will be combined with the code from the normal C++ file, which is compiled by VS C++ compiler. So the problem is that how to make this compilation run smoothly. And here are steps of writing a CUDA program:

+ Open VS C++ 2010 Express.

+ File->New->Project->Empty Project, enter the name of the project, Exp1.

+ In the Solution Explorer Tab, add new source file for your project, choose the C++ File (.cpp) type and type the name of the file as main.cu.

/**
* A matrix multiplication using the cuBLAS library.
*/


#include <cstdlib>
#include <iostream>
#include <string>

#include <time.h>

#include <cublas.h>

typedef float ScalarT;

// Some helper functions //
/**
* Calculates 1D index from row-major order to column-major order.
*/
#define index(r,c,rows) (((c)*(rows))+(r))

#define CudaSafeCall( err ) __cudaSafeCall( err, __FILE__, __LINE__ )
inline void __cudaSafeCall( cublasStatus err, const char *file, const int line )
{
if( err != CUBLAS_STATUS_SUCCESS )
{
std::cerr << “CUDA call failed at ” << file << “:” << line << std::endl;
exit (EXIT_FAILURE);
}
}

#define AllocCheck( err ) __allocCheck( err, __FILE__, __LINE__ )
inline void __allocCheck( void* err, const char *file, const int line )
{
if( err == 0 )
{
std::cerr << “Allocation failed at ” << file << “:” << line << std::endl;
exit (EXIT_FAILURE);
}
}

void printMat( const ScalarT* const mat, size_t rows, size_t columns, std::string prefix = “Matrix:” )
{
// Maximum to print
const size_t max_rows = 5;
const size_t max_columns = 16;

std::cout << prefix << std::endl;
for( size_t r = 0; r < rows && r < max_rows; ++r )
{
for( size_t c = 0; c < columns && c < max_columns; ++c )
{
std::cout << mat[index(r,c,rows)] << ” “;
}
std::cout << std::endl;
}
}
// Main program //
int main( int argc, char** argv )
{
size_t HA = 4200;
size_t WA = 23000;
size_t WB = 1300;
size_t HB = WA;
size_t WC = WB;
size_t HC = HA;

size_t r, c;

cudaEvent_t tAllStart, tAllEnd;
cudaEvent_t tKernelStart, tKernelEnd;
float time;

// Prepare host memory and input data //
ScalarT* A = ( ScalarT* )malloc( HA * WA * sizeof(ScalarT) );
AllocCheck( A );
ScalarT* B = ( ScalarT* )malloc( HB * WB * sizeof(ScalarT) );
AllocCheck( B );
ScalarT* C = ( ScalarT* )malloc( HC * WC * sizeof(ScalarT) );
AllocCheck( C );

for( r = 0; r < HA; r++ )
{
for( c = 0; c < WA; c++ )
{
A[index(r,c,HA)] = ( ScalarT )index(r,c,HA);
}
}

for( r = 0; r < HB; r++ )
{
for( c = 0; c < WB; c++ )
{
B[index(r,c,HB)] = ( ScalarT )index(r,c,HB);
}
}

// Initialize cuBLAS //

cublasStatus status;
cublasInit();

// Prepare device memory //
ScalarT* dev_A;
ScalarT* dev_B;
ScalarT* dev_C;

status = cublasAlloc( HA * WA, sizeof(ScalarT), ( void** )&dev_A );
CudaSafeCall( status );

status = cublasAlloc( HB * WB, sizeof(ScalarT), ( void** )&dev_B );
CudaSafeCall( status );

status = cublasAlloc( HC * WC, sizeof(ScalarT), ( void** )&dev_C );
CudaSafeCall( status );

cudaEventCreate(&tAllStart);
cudaEventCreate(&tAllEnd);
cudaEventRecord(tAllStart, 0);

status = cublasSetMatrix( HA, WA, sizeof(ScalarT), A, HA, dev_A, HA );
CudaSafeCall( status );

status = cublasSetMatrix( HB, WB, sizeof(ScalarT), B, HB, dev_B, HB );
CudaSafeCall( status );

// Call cuBLAS function //
cudaEventCreate(&tKernelStart);
cudaEventCreate(&tKernelEnd);
cudaEventRecord(tKernelStart, 0);

// Use of cuBLAS constant CUBLAS_OP_N produces a runtime error!
const char CUBLAS_OP_N = ‘n'; // ‘n’ indicates that the matrices are non-transposed.
cublasSgemm( CUBLAS_OP_N, CUBLAS_OP_N, HA, WB, WA, 1, dev_A, HA, dev_B, HB, 0, dev_C, HC ); // call for float
// cublasDgemm( CUBLAS_OP_N, CUBLAS_OP_N, HA, WB, WA, 1, dev_A, HA, dev_B, HB, 0, dev_C, HC ); // call for double
status = cublasGetError();
CudaSafeCall( status );

cudaEventRecord(tKernelEnd, 0);
cudaEventSynchronize(tKernelEnd);

cudaEventElapsedTime(&time, tKernelStart, tKernelEnd);
std::cout << “time (kernel only): ” << time << “ms” << std::endl;

// Load result from device //
cublasGetMatrix( HC, WC, sizeof(ScalarT), dev_C, HC, C, HC );
CudaSafeCall( status );

cudaEventRecord(tAllEnd, 0);
cudaEventSynchronize(tAllEnd);

cudaEventElapsedTime(&time, tAllStart, tAllEnd);

std::cout << “time (incl. data transfer): ” << time << “ms” << std::endl;

// Print result //
//printMat( A, HA, WA, “\nMatrix A:” );
//printMat( B, HB, WB, “\nMatrix B:” );
//printMat( C, HC, WC, “\nMatrix C:” );

// Free CUDA memory //
status = cublasFree( dev_A );
CudaSafeCall( status );

status = cublasFree( dev_B );
CudaSafeCall( status );

status = cublasFree( dev_C );
CudaSafeCall( status );

status = cublasShutdown();
CudaSafeCall( status );

// Free host memory //
free( A );
free( B );
free( C );

return EXIT_SUCCESS;
}



+ Config the project as a CUDA project. In the Solution Explorer, right click on the name of the project and choose Build Customizations, in the dialog appeared, check the CUDA 5.0 option, then OK.

+ Right click on the CUDA code file (main.cu in this example), choose Properties. In the dialog appeared, choose CUDA C/C++ as the image below:

+ In the Property Manager tab (View->Property Manager), right click on the Microsoft.Cpp.Win32.user as the image below and choose Properties.

+ In the VC++ Directories, you have to add some paths (to folders) of CUDA include files, reference folder, library files, like in the images below (do not close the dialog after this step):

+ In the Linker tree, choose Input and add the library files needed for CUDA programs as in the image below:

+ You will be asked to save the configuration (for all CUDA programs), choose Yes. The configuration steps (start with the operations in the Property Manager above) are needed only one time.
Now you can build your program (use Release option).

3. Timing measurement

In earlier time of CUDA (version <5.0) there are two ways that can be used to measure the time of a program, a function or a part of the proram. But in CUDA 5 (or in the best of my knowledge with CUDA 5), only one way: using cudaEvent_t.

+ Declaration:

cudaEvent_t tAllStart, tAllEnd;
float time;

+ Start recording time information:

cudaEventCreate(&tAllStart);
cudaEventCreate(&tAllEnd);
cudaEventRecord(tAllStart, 0);

+ Stop recording time information:

cudaEventRecord(tAllEnd, 0);
cudaEventSynchronize(tAllEnd);

+ Get the time and output:

cudaEventElapsedTime(&time, tAllStart, tAllEnd);
std::cout << “time (incl. data transfer): ” << time << “ms” << std::endl;

4. How to make Visual C++ and Visual Assist X be aware of the CUDA source files?

You can get this information from the links below:

5. Is it possible to program (write code and compile only) on a non CUDA machine?

This question is related to my circumstance because I have one CUDA desktop machine at the lab, which can be remoted controlled from my house, so I would like to write and compile the program on my labtop, then copy the program file to the desktop machine to run. Fortunately, the question is YES. We can write and compile CUDA program on a non CUDA machine. You install the Visual tool first, then the CUDA toolkit but do not select the CUDA driver option since your machine does not have any CUDA device. The same steps should be followed with the laptop for getting things done.

## Display more than one image in a single window

There is no inbuilt support to display more than one image in OpenCV. Here is a function illustrating how to display more than one image in a single window using Intel OpenCV. The method used is to set the ROIs of a Single Big image and then resizing and copying the input images on to the Single Big Image.

  1#include <cv.h>
2#include <highgui.h>
3
4#include <stdio.h>
5#include <stdarg.h>
6
7/*Function///////////////////////////////////////////////////////////////
8
9Name:       cvShowManyImages
10
11Purpose:    This is a function illustrating how to display more than one
12               image in a single window using Intel OpenCV
13
14Parameters: char *title: Title of the window to be displayed
15            int nArgs:   Number of images to be displayed
16            ...:         IplImage*, which contains the images
17
18Language:   C++
19
20The method used is to set the ROIs of a Single Big image and then resizing
21and copying the input images on to the Single Big Image.
22
23This function does not stretch the image...
24It resizes the image without modifying the width/height ratio..
25
26This function can be called like this:
27
28cvShowManyImages("Images", 2, img1, img2);
29or
30cvShowManyImages("Images", 5, img2, img2, img3, img4, img5);
31
32This function can display upto 12 images in a single window.
33It does not check whether the arguments are of type IplImage* or not.
34The maximum window size is 700 by 660 pixels.
35Does not display anything if the number of arguments is less than
36    one or greater than 12.
37
38If you pass a pointer that is not IplImage*, Error will occur.
39Take care of the number of arguments you pass, and the type of arguments,
40which should be of type IplImage* ONLY.
41
42Idea was from [[BettySanchi]] of OpenCV Yahoo! Groups.
43
44If you have trouble compiling and/or executing
45this code, I would like to hear about it.
46
47You could try posting on the OpenCV Yahoo! Groups
48[url]http://groups.yahoo.com/group/OpenCV/messages/ [/url]
49
50Parameswaran,
51Chennai, India.
52
53cegparamesh[at]gmail[dot]com
54
55...
56///////////////////////////////////////////////////////////////////////*/
57
58void cvShowManyImages(char* title, int nArgs, ...) {
59
60    // img - Used for getting the arguments
61    IplImage *img;
62
63    // [[DispImage]] - the image in which input images are to be copied
64    IplImage *DispImage;
65
66    int size;
67    int i;
68    int m, n;
69    int x, y;
70
71    // w - Maximum number of images in a row
72    // h - Maximum number of images in a column
73    int w, h;
74
75    // scale - How much we have to resize the image
76    float scale;
77    int max;
78
79    // If the number of arguments is lesser than 0 or greater than 12
80    // return without displaying
81    if(nArgs <= 0) {
82        printf("Number of arguments too small....\n");
83        return;
84    }
85    else if(nArgs > 12) {
86        printf("Number of arguments too large....\n");
87        return;
88    }
89    // Determine the size of the image,
90    // and the number of rows/cols
91    // from number of arguments
92    else if (nArgs == 1) {
93        w = h = 1;
94        size = 300;
95    }
96    else if (nArgs == 2) {
97        w = 2; h = 1;
98        size = 300;
99    }
100    else if (nArgs == 3 || nArgs == 4) {
101        w = 2; h = 2;
102        size = 300;
103    }
104    else if (nArgs == 5 || nArgs == 6) {
105        w = 3; h = 2;
106        size = 200;
107    }
108    else if (nArgs == 7 || nArgs == 8) {
109        w = 4; h = 2;
110        size = 200;
111    }
112    else {
113        w = 4; h = 3;
114        size = 150;
115    }
116
117    // Create a new 3 channel image
118    [[DispImage]] = cvCreateImage( cvSize(100 + size*w, 60 + size*h), 8, 3 );
119
120    // Used to get the arguments passed
121    va_list args;
122    va_start(args, nArgs);
123
124    // Loop for nArgs number of arguments
125    for (i = 0, m = 20, n = 20; i < nArgs; i++, m += (20 + size)) {
126
127        // Get the Pointer to the IplImage
128        img = va_arg(args, IplImage*);
129
130        // Check whether it is NULL or not
131        // If it is NULL, release the image, and return
132        if(img == 0) {
133            printf("Invalid arguments");
134            cvReleaseImage(&DispImage);
135            return;
136        }
137
138        // Find the width and height of the image
139        x = img->width;
140        y = img->height;
141
142        // Find whether height or width is greater in order to resize the image
143        max = (x > y)? x: y;
144
145        // Find the scaling factor to resize the image
146        scale = (float) ( (float) max / size );
147
148        // Used to Align the images
149        if( i % w == 0 && m!= 20) {
150            m = 20;
151            n+= 20 + size;
152        }
153
154        // Set the image ROI to display the current image
155        cvSetImageROI(DispImage, cvRect(m, n, (int)( x/scale ), (int)( y/scale )));
156
157        // Resize the input image and copy the it to the Single Big Image
158        cvResize(img, DispImage);
159
160        // Reset the ROI in order to display the next image
161        cvResetImageROI(DispImage);
162    }
163
164    // Create a new window, and show the Single Big Image
165    cvNamedWindow( title, 1 );
166    cvShowImage( title, DispImage);
167
168    cvWaitKey();
169    cvDestroyWindow(title);
170
171    // End the number of arguments
172    va_end(args);
173
174    // Release the Image Memory
175    cvReleaseImage(&DispImage);
176}
177


You can use this function as in this sample program:

 1int main() {
2
4
6
8
10
12
14
15    cvShowManyImages("Image", 6, img1, img2, img3, img4, img5, img6);
16
17    return 0;
18
19}
20


The method used is to set the ROIs of a Single Big image and then resizing and copying the input images on to the Single Big Image.

This function does not stretch the image… It resizes the image without modifying the width/height ratio..

This function can be called like this:

1   cvShowManyImages("Image", 2, img1, img2);

or

1   cvShowManyImages("Image", 6, img1, img2, img3, img4, img5, img6);

upto 12 images.

This function can display upto 12 images in a single window. It does not check whether the arguments are of type IplImage* or not. The maximum window size is 700 by 660 pixels. Does not display anything if the number of arguments is less than one or greater than 12.

If you pass a pointer that is not IplImage*, Error will occur. Take care of the number of arguments you pass, and the type of arguments, which should be of type IplImage* ONLY.

Idea was from BettySanchi of OpenCV Yahoo! Groups.

If you have trouble compiling and/or executing this code, I would like to hear about it.

Here is a sample ScreenShot:

## How to change Screen buffer size in DOS Command Prompt

Through Command 
type in cmd
mode con:cols=200 lines=170


The lines=170 part actually adjusts the Height in the ‘Screen Buffer Size’ setting, NOT the Height in the ‘Window Size’ setting.

Easily proven by running the command with a setting for ‘lines=2000’ (or whatever buffer you want) and then check the ‘Properties’ of the window, you’ll see that indeed the buffer is now set to 2000.

My batch script ends up looking like this:

@echo off
cmd "mode con:cols=200 lines=2000"

Before Windows came into existence, the most commonly used Operating System was the Disk Operating System (DOS). Even though DOS was not included in Windows for many years, there is still the command prompt feature. Command prompt is a text command line interpreter. A buffer is a transitory holding segment of RAM which contains transferrable information in the form of commands. Windows 7 can be customized to increase the buffer size of the command prompt. To increase the buffer size in Windows 7, go to Run in Start menu and type cmd. You will get the command prompt window. Right click on the window and click Properties. You can modify the buffer size value in the Options tab.

To increase the memory buffer used by the command prompt:

• Click on Start > Run > cmd
• Right click on the command prompt window > Properties
• In the “Option” tab, modify the value next to the “Buffer size” entry

Posted in Coding Problems, Computer Vision, GPU (CUDA), OpenCV | Tagged: , | Leave a Comment »

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione

ThuyDX

Just another WordPress.com site

Algunos Intereses de Abraham Zamudio Chauca

Matematica, Linux , Programacion Serial , Programacion Paralela (CPU - GPU) , Cluster de Computadores , Software Cientifico

josephdung

thoughts...

Tech_Raj

A great WordPress.com site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.

VentureBeat

News About Tech, Money and Innovation

Chetan Solanki

Helpful to u, if u need it.....

ScreenCrush