Something More for Research

Explorer of Research #HEMBAD

Archive for the ‘OpenCV Tutorial’ Category

OpenCV Tutorial

Bilateral Filtering

Posted by Hemprasad Y. Badgujar on September 14, 2015

Popular Filters

When smoothing or blurring images (the most popular goal of smoothing is to reduce noise), we can use diverse linear filters, because linear filters are easy to achieve, and are kind of fast, the most used ones are Homogeneous filter, Gaussian filter, Median filter, et al.

When performing a linear filter, we do nothing but output pixel’s value g(i,j)  which is determined as a weighted sum of input pixel values f(i+k, j+l):

g(i, j)=SUM[f(i+k, j+l) h(k, l)];

in which, h(k, l)) is called the kernel, which is nothing more than the coefficients of the filter.

Homogeneous filter is the most simple filter, each output pixel is the mean of its kernel neighbors ( all of them contribute with equal weights), and its kernel K looks like:


 Gaussian filter is nothing but using different-weight-kernel, in both x and y direction, pixels located in the middle would have bigger weight, and the weights decrease with distance from the neighborhood center, so pixels located on sides have smaller weight, its kernel K is something like (when kernel is 5*5):


Median filter is something that replace each pixel’s value with the median of its neighboring pixels. This method is great when dealing with “salt and pepper noise“.

Bilateral Filter

By using all the three above filters to smooth image, we not only dissolve noise, but also smooth edges, which make edges less sharper, even disappear. To solve this problem, we can use a filter called bilateral filter, which is an advanced version of Gaussian filter, it introduces another weight that represents how two pixels can be close (or similar) to one another in value, and by considering both weights in image,  Bilateral filter can keep edges sharp while blurring image.

Let me show you the process by using this image which have sharp edge.



Say we are smoothing this image (we can see noise in the image), and now we are dealing with the pixel at middle of the blue rect.

22   23

Left-above picture is a Gaussian kernel, and right-above picture is Bilateral filter kernel, which considered both weight.

We can also see the difference between Gaussian filter and Bilateral filter by these pictures:

Say we have an original image with noise like this



By using Gaussian filter, the image is smoother than before, but we can see the edge is no longer sharp, a slope appeared between white and black pixels.



However, by using Bilateral filter, the image is smoother, the edge is sharp, as well.


OpenCV code

It is super easy to make these kind of filters in OpenCV:

1 //Homogeneous blur:
2 blur(image, dstHomo, Size(kernel_length, kernel_length), Point(-1,-1));
3 //Gaussian blur:
4 GaussianBlur(image, dstGaus, Size(kernel_length, kernel_length), 0, 0);
5 //Median blur:
6 medianBlur(image, dstMed, kernel_length);
7 //Bilateral blur:
8 bilateralFilter(image, dstBila, kernel_length, kernel_length*2, kernel_length/2);

and for each function, you can find more details in OpenCV Documentation

Test Images

Glad to use my favorite Van Gogh image :



From left to right: Homogeneous blur, Gaussian blur, Median blur, Bilateral blur.

(click iamge to view full size version :p )

kernel length = 3:

homo3 Gaussian3 Median3 Bilateral3

kernel length = 9:

homo9 Gaussian9 Median9 Bilateral9
kernel length = 15:

homo15 Gaussian15 Median15 Bilateral15

kernel length = 23:

homo23 Gaussian23 Median23 Bilateral23
kernel length = 31:

homo31 Gaussian31 Median31 Bilateral31
kernel length = 49:

homo49 Gaussian49 Median49 Bilateral49
kernel length = 99:

homo99 Gaussian99 Median99 Bilateral99

Trackback URL.

Posted in C, Image / Video Filters, Image Processing, OpenCV, OpenCV, OpenCV Tutorial | Leave a Comment »

OpenCV: Color-spaces and splitting channels

Posted by Hemprasad Y. Badgujar on July 18, 2015

Conversion between color-spaces

Our goal here is to visualize each of the three channels of these color-spaces: RGB, HSV, YCrCb and Lab. In general, none of them are absolute color-spaces and the last three (HSV, YCrCb and Lab) are ways of encoding RGB information. Our images will be read in BGR (Blue-Green-Red), because of OpenCV defaults. For each of these color-spaces there is a mapping function and they can be found at OpenCV cvtColor documentation.
One important point is: OpenCV imshow() function will always assume that the Mat shown is in BGR color-space. Which means, we will always need to convert back to see what we want. Let’s start.

OpenCV Program: Split Channels (356 downloads )


While in BGR, an image is treated as an additive result of three base colors (blue, green and red), HSV stands for Hue, Saturation and Value (Brightness). We can say that HSV is a rearrangement of RGB in a cylindrical shape. The HSV ranges are:

  • 0 > H > 360 ⇒ OpenCV range = H/2 (0 > H > 180)
  • 0 > S > 1 ⇒ OpenCV range = 255*S (0 > S > 255)
  • 0 > V > 1 ⇒ OpenCV range = 255*V (0 > V > 255)

YCrCb or YCbCr

It is used widely in video and image compression schemes. The YCrCb stands for Luminance (sometimes you can see Y’ as luma), Red-difference and Blue-difference chroma components. The YCrCb ranges are:

  • 0 > Y > 255
  • 0 > Cr > 255
  • 0 > Cb > 255


In this color-opponent space, L stands for the Luminance dimension, while a and b are the color-opponent dimensions. The Lab ranges are:

  • 0 > L > 100 ⇒ OpenCV range = L*255/100 (1 > L > 255)
  • -127 > a > 127 ⇒ OpenCV range = a + 128 (1 > a > 255)
  • -127 > b > 127 ⇒ OpenCV range = b + 128 (1 > b > 255)

Splitting channels

All the color-spaces mentioned above were constructed using three channels (dimensions). It is a good exercise to visualize each of these channels and realize what they really store, because when I say that the third channel of HSV stores the brightness, what do you expect to see? Remember: a colored image is made of three-channels (in our cases) and when we see each of them separately, what do you think the output will be? If you said a grayscale image, you are correct! However, you might have seen these channels as colored images out there. So, how? For that, we need to choose a fixed value for the other two channels. Let’s do this!
To visualize each channel with color, I used the same values used on the Slides 53 to 65 from CS143, Lecture 03 from Brown University.


Original image (a) and its channels with color: blue (b), green (c) and red (d). On the second row, each channel in grayscale (single channel image), respectively.


Original image (a) and its channels with color: hue (b), saturation (c) and value or brightness (d). On the second row, each channel in grayscale (single channel image), respectively.

YCrCb or YCbCr

Original image (a) and its channels with color: luminance (b), red-difference (c) and blue difference (d). On the second row, each channel in grayscale (single channel image), respectively.

Lab or CIE Lab

Original image (a) and its channels with color: luminance (b), a-dimension (c) and b-dimension (d). On the second row, each channel in grayscale (single channel image), respectively.

Posted in Computer Vision, GPU (CUDA), OpenCV, OpenCV, OpenCV Tutorial | Tagged: , , | Leave a Comment »

The conversion and copy CvMat, Mat and between IplImage

Posted by Hemprasad Y. Badgujar on May 15, 2015

The conversion and copy CvMat, Mat and between IplImage

In OpenCV Mat, CvMat and IplImage types can represent and display the image. IplImage derived from the CvMat, and CvMat that is derived from the CvArr CvArr -> CvMat -> IplImage, Mat type is a C ++ version of the matrix type (CvArr used as a function of the parameters, either passed or are CvMat IplImage, inside it is by CvMat deal with).

Mat type which focuses on computing, mathematics higher, OpenCV Mat ​​type of calculation is also optimized; while CvMat and IplImage type is more focused on the “image”, OpenCV on which the image manipulation (zoom, single extraction, image thresholding operation, etc.) were optimized.Many times the need for mutual conversion of three types, here a brief overview.

Conversion and copy
CvMat and between Mat

1 replication between CvMat,

  1. // Note: deep copy – separately allocated space, two independent
  2. CvMat* a;
  3. CvMat* b = cvCloneMat(a);   //copy a to b

2 Copy between Mat,

  1. // Note: shallow copy – not just copy the data to create a matrix head, data sharing (change a, b, c of the same effect will be on any one of the other two production)
  2. Mat a;
  3. Mat b = a; //a “copy” to b
  4. Mat c(a); //a “copy” to c
  5. // Note: deep copy
  6. Mat a;
  7. Mat b = a.clone(); //a copy to b
  8. Mat c;
  9. a.copyTo(c); //a copy to c

3, CvMat turn Mat

  1. // Use the constructor Mat: Mat :: Mat (const CvMat * m, bool copyData = false); copyData default is false
  2. CvMat* a;
  3. // Note: the following three consistent results, are shallow copy
  4. Mat b(a);   //a “copy” to b
  5. Mat b(a, false);    //a “copy” to b
  6. Mat b = a;  //a “copy” to b
  7. // Note: When the parameter copyData set to true, it was a deep copy (copying the entire image data)
  8. Mat b = Mat(a, true); //a copy to b

4, Mat turn CvMat

  1. // Note: shallow copy
  2. Mat a;
  3. CvMat b = a; //a “copy” to b
  4. // Note: deep copy
  5. Mat a;
  6. CvMat *b;
  7. CvMat temp = a;  // into CvMat type, instead of copying data
  8. CVCopy  (& temp, b);  // true copy data

Conversion and copy ================ ======================== IplImage above between the two ======== 1. Copy IplImage between this does not go into details, that is cvCopy use with cvCloneImage difference, Zhang posted online map:

2, IplImage turn Mat

  1. // Use the constructor Mat: Mat :: Mat (const IplImage * img, bool copyData = false); default is false copyData
  2. IplImage* srcImg = cvLoadImage(“Lena.jpg”);
  3. // Note: the following three consistent results, are shallow copy
  4. Mat M(srcImg);
  5. Mat M(srcImg, false);
  6. Mat M = srcImg;
  7. // Note: When the parameter copyData set to true, it was a deep copy (copying the entire image data)
  8. Mat M(srcImg, true);

3, Mat turn IplImage

  1. // Note: shallow copy – again, just to create an image first, but not to copy data
  2. Mat M;
  3. IplImage img = M;
  4. IplImage img = IplImage(M);

4, IplImage turn CvMat

  1. // Method a: cvGetMat function
  2. IplImage* img;
  3. CvMat temp;
  4. CvMat* mat = cvGetMat(img, &temp);  //深拷贝
  5. // Act II: cvConvert function
  6. CvMat *mat = cvCreateMat(img->height, img->width, CV_64FC3);  //注意height和width的顺序
  7. cvConvert (img, mat);     // a deep copy

5, CvMat turn IplImage

  1. // Method a: cvGetImage function
  2. CvMat M;
  3. IplImage* img = cvCreateImageHeader(M.size(), M.depth(), M.channels());
  4. cvGetImage (& M, img);     // a deep copy: The function returns img
  5. // Also be written as
  6. CvMat M;
  7. IplImage* img = cvGetImage(&M, cvCreateImageHeader(M.size(), M.depth(), M.channels()));
  8. // Act II: cvConvert function
  9. CvMat M;
  10. IplImage* img = cvCreateImage(M.size(), M.depth(), M.channels());
  11. cvConvert (& M, img);  // a deep copy


A final note:

1, Mat type is automatic memory management, no explicit release (of course, you can also call the manual release () method to force Mat matrix data release); and CvMat you need to call cvReleaseMat (& cvmat) to release, IplImage call cvReleaseImage (& iplimage) to release.
2, the establishment of CvMat matrix, the first parameter is the number of rows, the second parameter is the number of columns: CvMat * cvCreateMat (int rows, int cols, int type); 3, when establishing IplImage image, CvSize first parameter width, namely the number of columns; the second argument is the height of that line number: IplImage * cvCreateImage (CvSize size, int depth, int channels); CvSize CvSize (int width, int height); 4, IplImage internal buffer per line is by 4 byte alignment, CvMat not have this limitation.

Posted in OpenCV, OpenCV Tutorial | Tagged: , | Leave a Comment »

Assessing the pixel values of an image

Posted by Hemprasad Y. Badgujar on March 14, 2015

Assessing the pixel values of an image

#include "opencv2/core/core.hpp"
#include "opencv2/highgui/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "iostream"
using namespace cv;
using namespace std;
int main( )
 Mat src1;
 src1 = imread("lena.jpg", CV_LOAD_IMAGE_COLOR); 
 namedWindow( "Original image", CV_WINDOW_AUTOSIZE ); 
 imshow( "Original image", src1 ); 
 Mat gray;
 cvtColor(src1, gray, CV_BGR2GRAY);
 namedWindow( "Gray image", CV_WINDOW_AUTOSIZE ); 
 imshow( "Gray image", gray );
 // know the number of channels the image has
 cout<<"original image channels: "<<src1.channels()<<"gray image="" channels:="" "<="" *******************="" read="" the="" pixel="" intensity="" *********************="" single="" channel="" grey="" scale="" (type="" 8uc1)="" and="" coordinates="" x="5" y="2" by="" convention,="" {row="" number="y}" {column="" intensity.val[0]="" contains="" a="" value="" from="" 0="" to="" 255="" scalar="" intensity1="," 5);="" cout="" <<="" "intensity=" << endl << " "="" intensity1.val[0]="" endl="" endl;="" 3="" with="" bgr="" color="" 8uc3)="" values="" can="" be="" stored="" in="" "int"="" or="" "uchar".="" here="" int="" is="" used.="" vec3b="" intensity2=",15);" blue="intensity2.val[0];" green="intensity2.val[1];" red="intensity2.val[2];" write="" **********************="" this="" an="" example="" opencv="""" documentation="" mat="" h(10,="" 10,="" cv_64f);="" for(int="" i="0;" <="" h.rows;="" i++)="" j="0;" h.cols;="" j++)="",j)="1./(i+j+1);" cout<<h<<endl<<endl;="" modify="" pixels="" of="" for="" (int="" {="" j<src1.cols;=""<vec3b="">(i,j)[0] = 0;,j)[1] = 200;,j)[2] = 0; 
 namedWindow( "Modify pixel", CV_WINDOW_AUTOSIZE ); 
 imshow( "Modify pixel", src1 );
 return 0;

Posted in OpenCV, OpenCV Tutorial | Tagged: , , , , | Leave a Comment »

Detect and Track Objects

Posted by Hemprasad Y. Badgujar on January 20, 2015

Detect and Track Objects With OpenCV

In the following, I made an overview of tutorials and guides to getting strted how to use OpenCV for detection and tracking objects. OpenCV is a library for computer visions designed for analyze, process, and understand the objects from images aiming to produce information.

  • OpenCV Tutorials – comprehensive list with basic OpenCV tutorials and source code based on the OpenCV library;
  • Object Detection & Tracking Using Color – example of application where OpenCV is used to detect objects based on color differences;
  • Face Detection Using OpenCV – guide how to use OpenCV to detect one or more faces from the same image;
  • SURF in OpenCV – tutorial how to use SURF algorithm designed to detect key-points and descriptors in images;
  • Introduction to Face Detection and Face Recognition – face detection and recognition are two of the most common applications in computer vision from robotics, and this tutorial present the steps how a face is detected and recognized from images;
  • Find Objects with a Webcam – using a simple webcam mounted on a robot and the Simple Qt interface designed to work with OpenCV, as you can see in this tutorial any object can be detected and tracked in images;
  • Features 2D + Homography to Find a Known Object – tutorial with programming code and explanation in order to use two important functions included in OpenCV. These two functions – findHomography and perspectiveTransform – are used to find objects in images. The findHomography is a function based on a technique called Key-point Matching, while the perspectiveTransform is an advanced class capable of mapping the points from an image;
  • Back Projection – tutorial based on calcBackProject function designed to calculate the back project of the histogram;
  • Tracking Colored Objects in OpenCV – tutorial for detection and tracking the colored objects from images using the OpenCV library;
  • OpenCV Tutorials – Based on “Learning OpenCV – Computer Vision with the OpenCV Library” – in order to be familiar with computer vision concepts, these tutorials can be useful for beginner and advanced users to start building applications or to improve the skills;
  • Image Processing on Pandaboard using OpenCV and Kinect – in this presentation you can find information about image processing with a Pandaboard single board computer using the Kinect sensor and the OpenCV library;
  • Video Capture using OpenCV with VC++ – OpenCV library can be integrated with Visual Studio and this article explain you as a programmer how to use the Visual C++ together with OpenCV;

Tutorials for Detecting and Tracking Objects with Mobile Devices

Mobile devices such as smartphones and tablets with iOS or Android operating systems can be integrated into robots and used to detect and track objects. Below is an overview of tutorials with comprehensive information for tracking objects using different mobile devices.

 Particle filter based trackers

  • Particle Filter Color Tracker [Link 1]
    • Matlab and c/c++ code.
    • Key words: region tracker, color histogram, ellipsoidal region, particle filter, SIR resampling.
  • Region Tracker based on a color Particle Filter [Link 1] [Example]
    • Matlab and c/c++ code.
    • Key words: region tracker, color histogram, ellipsoidal region, particle filter, SIR resampling.
  • Region Tracker based on an intensity Particle Filter [Link]
    • Matlab and c/c++ code.
    • Key words: region tracker, intensity histogram, ellipsoidal region, particle filter, SIR resampling.
  • Particle Filter Object Tracking [Link]
    • C/C++.

Mean shift based trackers

  • Scale and Orientation Adaptive Mean Shift Tracking. [Link]
    • Matlab.
  • Robust Mean Shift  Tracking with Corrected Background-Weighted Histogram. [Link]
    • Matlab.
  • Robust Object Tracking using Joint Color-Texture Histogram. [Link]
    • Matlab.
  • Tracking moving video objects using mean-shift algorithm. [Link]
    • Matlab.
  • Mean-shift Based Moving Object Tracker [Link]
    • C/C++.
  • Mean-Shift Video Tracking [Link]
    • Matlab.
  • Gray scale mean shift algorithm for tracking. [Link]
    • Matlab.
  • Mean shift tracking algorithm for tracking [Link]
    • Matlab.

Deformable/articulable object trackers

  • Visual Tracking with Integral Histograms and Articulating Blocks [Link]
    • Matlab and c/c++ code
    • Key words: region tracker, intensity histogram, multi-rectangular regions, integral histogram, exhaustive search, graph cut segmentation.

Appearance learning based trackers

  • Robust Object Tracking with Online Multiple Instance Learning. [Link]
    • C/C++.
  • Visual Tracking via Adaptive Structural Local Sparse Appearance Model. [Link]
    • Matlab.
  • Online Discriminative Object Tracking with Local Sparse Representation. [Link]
    • Matlab
  • Superpixel Tracking. [Link]
    • Matlab.
  • Online Multiple Support Instance Tracking. [Link]
    • Matlab.
  • Incremental Learning for Robust Visual Tracking. [Link]
    • Matlab.
  • Tracking with Online Multiple Instance Learning (MILTrack). [Link]
    • C/C++, OpenCV
  • Predator. [Link]
    • Matlab.
  • Object Tracking via Partial Least Squares Analysis. [Link]
    • Matlab.
  • Robust Object Tracking via Sparsity-based Collaborative Model. [Link]
    • Matlab.
  • On-line boosting trackers. [Link]
    • C/C++.

Advanced appearance model based trackers

  • Real-Time Compressive Tracking [Link]


Below is a list with resources including OpenCV documentation, libraries, and OpenCV compatible tools.

Posted in Computer Vision, OpenCV, OpenCV, OpenCV Tutorial | Tagged: , , , | Leave a Comment »

Real-Time Object Tracking

Posted by Hemprasad Y. Badgujar on January 19, 2015

Real-Time Object Tracking Using OpenCV


//Written by  Kyle Hounslow 2013

//Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software")
//, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 
//and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

//The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.


#include <opencv\highgui.h>
#include <opencv\cv.h>

using namespace cv;
//initial min and max HSV filter values.
//these will be changed using trackbars
int H_MIN = 0;
int H_MAX = 256;
int S_MIN = 0;
int S_MAX = 256;
int V_MIN = 0;
int V_MAX = 256;
//default capture width and height
const int FRAME_WIDTH = 640;
const int FRAME_HEIGHT = 480;
//max number of objects to be detected in frame
const int MAX_NUM_OBJECTS=50;
//minimum and maximum object area
const int MIN_OBJECT_AREA = 20*20;
//names that will appear at the top of each window
const string windowName = "Original Image";
const string windowName1 = "HSV Image";
const string windowName2 = "Thresholded Image";
const string windowName3 = "After Morphological Operations";
const string trackbarWindowName = "Trackbars";
void on_trackbar( int, void* )
{//This function gets called whenever a
	// trackbar position is changed

string intToString(int number){

	std::stringstream ss;
	ss << number;
	return ss.str();
void createTrackbars(){
	//create window for trackbars

	//create memory to store trackbar name on window
	char TrackbarName[50];
	sprintf( TrackbarName, "H_MIN", H_MIN);
	sprintf( TrackbarName, "H_MAX", H_MAX);
	sprintf( TrackbarName, "S_MIN", S_MIN);
	sprintf( TrackbarName, "S_MAX", S_MAX);
	sprintf( TrackbarName, "V_MIN", V_MIN);
	sprintf( TrackbarName, "V_MAX", V_MAX);
	//create trackbars and insert them into window
	//3 parameters are: the address of the variable that is changing when the trackbar is moved(eg.H_LOW),
	//the max value the trackbar can move (eg. H_HIGH), 
	//and the function that is called whenever the trackbar is moved(eg. on_trackbar)
	//                                  ---->    ---->     ---->      
    createTrackbar( "H_MIN", trackbarWindowName, &H_MIN, H_MAX, on_trackbar );
    createTrackbar( "H_MAX", trackbarWindowName, &H_MAX, H_MAX, on_trackbar );
    createTrackbar( "S_MIN", trackbarWindowName, &S_MIN, S_MAX, on_trackbar );
    createTrackbar( "S_MAX", trackbarWindowName, &S_MAX, S_MAX, on_trackbar );
    createTrackbar( "V_MIN", trackbarWindowName, &V_MIN, V_MAX, on_trackbar );
    createTrackbar( "V_MAX", trackbarWindowName, &V_MAX, V_MAX, on_trackbar );

void drawObject(int x, int y,Mat &frame){

	//use some of the openCV drawing functions to draw crosshairs
	//on your tracked image!

    //UPDATE:JUNE 18TH, 2013
    //added 'if' and 'else' statements to prevent
    //memory errors from writing off the screen (ie. (-25,-25) is not within the window!)

    else line(frame,Point(x,y),Point(x,0),Scalar(0,255,0),2);
    if(y+25<frame_height) line(frame,point(x,y),point(x,y+25),scalar(0,255,0),2);="" else="" line(frame,point(x,y),point(x,frame_height),scalar(0,255,0),2);="" if(x-25="">0)
    else line(frame,Point(x,y),Point(0,y),Scalar(0,255,0),2);
    if(x+25<frame_width) line(frame,point(x,y),point(x+25,y),scalar(0,255,0),2);="" else="" line(frame,point(x,y),point(frame_width,y),scalar(0,255,0),2);="" puttext(frame,inttostring(x)+","+inttostring(y),point(x,y+30),1,1,scalar(0,255,0),2);="" }="" void="" morphops(mat="" &thresh){="" create="" structuring="" element="" that="" will="" be="" used="" to="" "dilate"="" and="" "erode"="" image.="" the="" chosen="" here="" is="" a="" 3px="" by="" rectangle="" mat="" erodeelement="getStructuringElement(" morph_rect,size(3,3));="" dilate="" with="" larger="" so="" make="" sure="" object="" nicely="" visible="" dilateelement="getStructuringElement(" morph_rect,size(8,8));="" erode(thresh,thresh,erodeelement);="" dilate(thresh,thresh,dilateelement);="" trackfilteredobject(int="" &x,="" int="" &y,="" threshold,="" &camerafeed){="" temp;="" threshold.copyto(temp);="" these="" two="" vectors="" needed="" for="" output="" of="" findcontours="" vector<="" vector<point=""> > contours;
	vector hierarchy;
	//find contours of filtered image using openCV findContours function
	findContours(temp,contours,hierarchy,CV_RETR_CCOMP,CV_CHAIN_APPROX_SIMPLE );
	//use moments method to find our filtered object
	double refArea = 0;
	bool objectFound = false;
	if (hierarchy.size() > 0) {
		int numObjects = hierarchy.size();
        //if number of objects greater than MAX_NUM_OBJECTS we have a noisy filter
        if(numObjects<max_num_objects){ for="" (int="" index="0;">= 0; index = hierarchy[index][0]) {

				Moments moment = moments((cv::Mat)contours[index]);
				double area = moment.m00;

				//if the area is less than 20 px by 20px then it is probably just noise
				//if the area is the same as the 3/2 of the image size, probably just a bad filter
				//we only want the object with the largest area so we safe a reference area each
				//iteration and compare it to the area in the next iteration.
                if(area>MIN_OBJECT_AREA && arearefArea){
					x = moment.m10/area;
					y = moment.m01/area;
					objectFound = true;
					refArea = area;
				}else objectFound = false;

			//let user know you found an object
			if(objectFound ==true){
				putText(cameraFeed,"Tracking Object",Point(0,50),2,1,Scalar(0,255,0),2);
				//draw object location on screen

		}else putText(cameraFeed,"TOO MUCH NOISE! ADJUST FILTER",Point(0,50),1,2,Scalar(0,0,255),2);
int main(int argc, char* argv[])
	//some boolean variables for different functionality within this
    bool trackObjects = false;
    bool useMorphOps = false;
	//Matrix to store each frame of the webcam feed
	Mat cameraFeed;
	//matrix storage for HSV image
	Mat HSV;
	//matrix storage for binary threshold image
	Mat threshold;
	//x and y values for the location of the object
	int x=0, y=0;
	//create slider bars for HSV filtering
	//video capture object to acquire webcam feed
	VideoCapture capture;
	//open capture object at location zero (default location for webcam);
	//set height and width of capture frame
	//start an infinite loop where webcam feed is copied to cameraFeed matrix
	//all of our operations will be performed within this loop
		//store image to matrix;
		//convert frame from BGR to HSV colorspace
		//filter HSV image between values and store filtered image to
		//threshold matrix
		//perform morphological operations on thresholded image to eliminate noise
		//and emphasize the filtered object(s)
		//pass in thresholded frame to our object tracking function
		//this function will return the x and y coordinates of the
		//filtered object

		//show frames 

		//delay 30ms so that screen can refresh.
		//image will not appear without this waitKey() command

	return 0;

Posted in Computer Vision, OpenCV, OpenCV Tutorial | Tagged: , | Leave a Comment »

Writing Video to a File

Posted by Hemprasad Y. Badgujar on January 19, 2015

<pre>#include <opencv\highgui.h>
#include <opencv\cv.h>

using namespace cv;
using namespace std;

string intToString(int number){

	std::stringstream ss;
	ss << number;
	return ss.str();

int main(int argc, char* argv[])

	VideoCapture cap(0); // open the video camera no. 0

	if (!cap.isOpened())  // if not success, exit program
		return -1;

	char* windowName = "Webcam Feed";
	namedWindow(windowName,CV_WINDOW_AUTOSIZE); //create a window to display our webcam feed

	while (1) {

		Mat frame;

		bool bSuccess =; // read a new frame from camera feed

		if (!bSuccess) //test if frame successfully read

		imshow(windowName, frame); //show the frame in "MyVideo" window

		//listen for 10ms for a key to be pressed

		case 27:
			//'esc' has been pressed (ASCII value for 'esc' is 27)
			//exit program.
			return 0;



	return 0;



<pre>#include <opencv\highgui.h>
#include <opencv\cv.h>

using namespace cv;
using namespace std;

string intToString(int number){

	std::stringstream ss;
	ss << number;
	return ss.str();

int main(int argc, char* argv[])
	bool recording = false;
	bool startNewRecording = false;
	int inc=0;
	bool firstRun = true;

	VideoCapture cap(0); // open the video camera no. 0
	VideoWriter oVideoWriter;//create videoWriter object, not initialized yet

	if (!cap.isOpened())  // if not success, exit program
		cout << "ERROR: Cannot open the video file" << endl;
		return -1;

	namedWindow("MyVideo",CV_WINDOW_AUTOSIZE); //create a window called "MyVideo"

	double dWidth = cap.get(CV_CAP_PROP_FRAME_WIDTH); //get the width of frames of the video
	double dHeight = cap.get(CV_CAP_PROP_FRAME_HEIGHT); //get the height of frames of the video

	cout << "Frame Size = " << dWidth << "x" << dHeight << endl;

	//set framesize for use with videoWriter
	Size frameSize(static_cast(dWidth), static_cast(dHeight));

	while (1) {


			oVideoWriter  = VideoWriter("D:/MyVideo"+intToString(inc)+".avi", CV_FOURCC('D', 'I', 'V', '3'), 20, frameSize, true); //initialize the VideoWriter object
			//oVideoWriter  = VideoWriter("D:/MyVideo"+intToString(inc)+".avi", (int)cap.get(CV_CAP_PROP_FOURCC), 20, frameSize, true); //initialize the VideoWriter object 

			recording = true;
			startNewRecording = false;
			cout<<"New video file created D:/MyVideo"+intToString(inc)+".avi "<


Posted in Computer Vision, OpenCV, OpenCV, OpenCV Tutorial | Tagged: , , , | Leave a Comment »

People Detection

Posted by Hemprasad Y. Badgujar on January 19, 2015

People Detection Sample from OpenCV

<pre class="cpp">#include <opencv2/opencv.hpp>
using namespace cv;
int main (int argc, const char * argv[])
    VideoCapture cap(0);
    cap.set(CV_CAP_PROP_FRAME_WIDTH, 320);
    cap.set(CV_CAP_PROP_FRAME_HEIGHT, 240);
    if (!cap.isOpened())
        return -1;
    Mat img;
    namedWindow("opencv", CV_WINDOW_AUTOSIZE);
    HOGDescriptor hog;
    while (true)
        cap >> img;
        if (img.empty())
        vector found, found_filtered;
        hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);
        size_t i, j;
        for (i=0; i<found.size(); i++)="" {="" rect="" r="found[i];" for="" (j="0;" j<found.size();="" j++)="" if="" (j!="i" &&="" (r="" &="" found[j])="=" r)="" break;="" found.size())="" found_filtered.push_back(r);="" }=""  ="" (i="0;" i<found_filtered.size();="" r.x="" +="cvRound(r.width*0.1);" r.width="cvRound(r.width*0.8);" r.y="" r.height="cvRound(r.height*0.8);" rectangle(img,="",="",="" scalar(0,255,0),="" 3);="" imshow("opencv",="" img);="" (waitkey(10)="">=0)
    return 0;

Posted in Computer Vision, OpenCV, OpenCV, OpenCV Tutorial | Tagged: | Leave a Comment »

OpenCV Viola & Jones object detection in MATLAB

Posted by Hemprasad Y. Badgujar on January 19, 2015

In image processing, one of the most successful object detectors devised is theViola and Jones detector, proposed in their seminal CVPR paper in 2001. A popular implementation used by image processing researchers and implementers is provided by the OpenCV library. In this post, I’ll show you how run the OpenCV object detector in MATLAB for Windows. You should have some familiarity with OpenCV and with the Viola and Jones detector to work through this tutorial.

Steps in the object detector

MATLAB is able to call functions in shared libraries. This means that, using the compiled OpenCV DLLs, we are able to directly call various OpenCV functions from within MATLAB. The flow of our MATLAB program, including the required OpenCV external function calls (based on this example), will go something like this:

  1. cvLoadHaarClassifierCascade: Load object detector cascade
  2. cvCreateMemStorage: Allocate memory for detector
  3. cvLoadImage: Load image from disk
  4. cvHaarDetectObjects: Perform object detection
  5. For each detected object:
    1. cvGetSeqElem: Get next detected object of type cvRect
    2. Display this detection result in MATLAB
  6. cvReleaseImage: Unload the image from memory
  7. cvReleaseMemStorage: De-allocate memory for detector
  8. cvReleaseHaarClassifierCascade: Unload the cascade from memory

Loading shared libraries

The first step is to load the OpenCV shared libraries using MATLAB’sloadlibrary() function. To use the functions listed in the object detector steps above, we need to load the OpenCV libraries cxcore2410.dll, cv2410.dll andhighgui2410.dll. Assuming that OpenCV has been installed to "C:\Program Files\OpenCV", the libraries are loaded like this:

opencvPath = 'C:\Program Files\OpenCV';
includePath = fullfile(opencvPath, 'cxcore\include');
    fullfile(opencvPath, 'bin\cxcore2410.dll'), ...
    fullfile(opencvPath, 'cxcore\include\cxcore.h'), ...
        'alias', 'cxcore2410', 'includepath', includePath);
    fullfile(opencvPath, 'bin\cv2410.dll'), ...
    fullfile(opencvPath, 'cv\include\cv.h'), ...
        'alias', 'cv2410', 'includepath', includePath);
    fullfile(opencvPath, 'bin\highgui2410.dll'), ...
    fullfile(opencvPath, 'otherlibs\highgui\highgui.h'), ...
        'alias', 'highgui2410', 'includepath', includePath);

You will get some warnings; these can be ignored for our purposes. You can display the list of functions that a particular shared library exports with thelibfunctions() command in MATLAB For example, to list the functions exported by the highgui library:

>> libfunctions('highgui2410')
Functions in library highgui2410:
cvConvertImage             cvQueryFrame
cvCreateCameraCapture      cvReleaseCapture
cvCreateFileCapture        cvReleaseVideoWriter
cvCreateTrackbar           cvResizeWindow
cvCreateVideoWriter        cvRetrieveFrame
cvDestroyAllWindows        cvSaveImage
cvDestroyWindow            cvSetCaptureProperty
cvGetCaptureProperty       cvSetMouseCallback
cvGetTrackbarPos           cvSetPostprocessFuncWin32
cvGetWindowHandle          cvSetPreprocessFuncWin32
cvGetWindowName            cvSetTrackbarPos
cvGrabFrame                cvShowImage
cvInitSystem               cvStartWindowThread
cvLoadImage                cvWaitKey
cvLoadImageM               cvWriteFrame

The first step in our object detector is to load a detector cascade. We are going to load one of the frontal face detector cascades that is provided with a normal OpenCV installation:

classifierFilename = 'C:/Program Files/OpenCV/data/haarcascades/haarcascade_frontalface_alt.xml';
cvCascade = calllib('cv2410', 'cvLoadHaarClassifierCascade', classifierFilename, ...

The function calllib() returns a libpointer structure containing two fairly self-explanatory fields, DataType and Value. To display the return value fromcvLoadHaarClassifierCascade(), we can run:

>> cvCascade.Value
ans =
               flags: 1.1125e+009
               count: 22
    orig_window_size: [1x1 struct]
    real_window_size: [1x1 struct]
               scale: 0
    stage_classifier: [1x1 struct]
         hid_cascade: []

The above output shows that MATLAB has successfully loaded the cascade file and returned a pointer to an OpenCV CvHaarClassifierCascade object.

Prototype M-files

We could now continue implementing all of our OpenCV function calls from the object detector steps like this, however we will run into a problem when cvGetSeqElem is called. To see why, try this:

libfunctions('cxcore2410', '-full')

The -full option lists the signatures for each imported function. The signature for the function cvGetSeqElem() is listed as:

[cstring, CvSeqPtr] cvGetSeqElem(CvSeqPtr, int32)

This shows that the return value for the imported cvGetSeqElem() function will be a pointer to a character (cstring). This is based on the function declaration in thecxcore.h header file:

CVAPI(char*)  cvGetSeqElem( const CvSeq* seq, int index );

However, in step 5.1 of our object detector steps, we require a CvRect object. Normally in C++ you would simply cast the character pointer return value to aCvRect object, but MATLAB does not support casting of return values fromcalllib(), so there is no way we can cast this to a CvRect.

The solution is what is referred to as a prototype M-file. By constructing a prototype M-file, we can define our own signatures for the imported functions rather than using the declarations from the C++ header file.

Let’s generate the prototype M-file now:

    fullfile(opencvPath, 'bin\cxcore2410.dll'), ...
    fullfile(opencvPath, 'cxcore\include\cxcore.h'), ...
        'mfilename', 'proto_cxcore');

This will automatically generate a prototype M-file named proto_cxcore.m based on the C++ header file. Open this file up and find the function signature forcvGetSeqElem and replace it with the following:

% char * cvGetSeqElem ( const CvSeq * seq , int index );{fcnNum}='cvGetSeqElem'; fcns.calltype{fcnNum}='cdecl'; fcns.LHS{fcnNum}='CvRectPtr'; fcns.RHS{fcnNum}={'CvSeqPtr', 'int32'};fcnNum=fcnNum+1;

This changes the return type for cvGetSeqElem() from a char pointer to aCvRect pointer.

We can now load the library using the new prototype:

    fullfile(opencvPath, 'bin\cxcore2410.dll'), ...

An example face detector

We now have all the pieces ready to write a complete object detector. The code listing below implements the object detector steps listed above to perform face detection on an image. Additionally, the image is displayed in MATLAB and a box is drawn around any detected faces.

opencvPath = 'C:\Program Files\OpenCV';
includePath = fullfile(opencvPath, 'cxcore\include');
inputImage = 'lenna.jpg';
%% Load the required libraries
if libisloaded('highgui2410'), unloadlibrary highgui2410, end
if libisloaded('cv2410'), unloadlibrary cv2410, end
if libisloaded('cxcore2410'), unloadlibrary cxcore2410, end
    fullfile(opencvPath, 'bin\cxcore2410.dll'), @proto_cxcore);
    fullfile(opencvPath, 'bin\cv2410.dll'), ...
    fullfile(opencvPath, 'cv\include\cv.h'), ...
        'alias', 'cv2410', 'includepath', includePath);
    fullfile(opencvPath, 'bin\highgui2410.dll'), ...
    fullfile(opencvPath, 'otherlibs\highgui\highgui.h'), ...
        'alias', 'highgui2410', 'includepath', includePath);
%% Load the cascade
classifierFilename = 'C:/Program Files/OpenCV/data/haarcascades/haarcascade_frontalface_alt.xml';
cvCascade = calllib('cv2410', 'cvLoadHaarClassifierCascade', classifierFilename, ...
%% Create memory storage
cvStorage = calllib('cxcore2410', 'cvCreateMemStorage', 0);
%% Load the input image
cvImage = calllib('highgui2410', ...
    'cvLoadImage', inputImage, int16(1));
if ~cvImage.Value.nSize
    error('Image could not be loaded');
%% Perform object detection
cvSeq = calllib('cv2410', ...
    'cvHaarDetectObjects', cvImage, cvCascade, cvStorage, 1.1, 2, 0, ...
%% Loop through the detections and display bounding boxes
imshow(imread(inputImage)); %load and display image in MATLAB
for n =
    cvRect = calllib('cxcore2410', ...
        'cvGetSeqElem', cvSeq, int16(n));
    rectangle('Position', ...
        [cvRect.Value.x cvRect.Value.y ...
        cvRect.Value.width cvRect.Value.height], ...
        'EdgeColor', 'r', 'LineWidth', 3);
%% Release resources
calllib('cxcore2410', 'cvReleaseImage', cvImage);
calllib('cxcore2410', 'cvReleaseMemStorage', cvStorage);
calllib('cv2410', 'cvReleaseHaarClassifierCascade', cvCascade);

As an example, the following is the output after running the detector above on a greyscale version of the Lenna test image:

Note: If you get a segmentation fault attempting to run the code above, tryevaluating the cells one-by-one (e.g. by pressing Ctrl-Enter) – it seems to fix the problem.

Posted in Computer Vision, OpenCV, OpenCV, OpenCV Tutorial | Tagged: , | Leave a Comment »

Tutorial: OpenCV haartraining

Posted by Hemprasad Y. Badgujar on January 18, 2015

Rapid Object Detection With A Cascade of Boosted Classifiers Based on Haar-like Features

The OpenCV library provides us a greatly interesting demonstration for a face detection. Furthermore, it provides us programs (or functions) that they used to train classifiers for their face detection system, called HaarTraining, so that we can create our own object classifiers using these functions. It is interesting.Objective

However, I could not follow how OpenCV developers performed the haartraining for their face detection system exactly because they did not provide us several information such as what images and parameters they used for training. The objective of this report is to provide step-by-step procedures for following people.

My working environment is Visual Studio + cygwin on Windows XP, or on Linux. The cygwin is required because I use several UNIX commands. I am sure that you will use the cygwin (especially I mean UNIX commands) not only for this haartraining but also for others in the future if you are one of engineer or science people.

FYI: I recommend you to work haartrainig with something different concurrently because you have to wait so many days during training (it would possibly take one week). I typically experimented as 1. run haartraining on Friday 2. forget about it completely 3. see results on next Friday 4. run another haartraining (loop).


Data Prepartion

FYI: There are database lists on Face Recognition Homepage – Databases. andComputer Vision Test Images.

Positive (Face) Images

We need to collect positive images that contain only objects of interest, e.g., faces.

Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns, and 5000 positive frontal face patterns were derived from 1000 original faces. I describe how to increase number of samples at the later chapter.

Before, I downloaded and used The UMIST Face Database (Dead Link) because cropped face images were available at there. The UMIST Face Database has video-like image sequences from side-faces to frontal faces. I thought training with such images would generate a face detector which is robust to facial pose. However, the generated face detector did not work well. Probably, I dreamed too much. It was a story on 2006.

I obtained a cropped frontal face database based on CMU PIE Database. I use it too. This dataset has a large illumination variations, thus this would result in the same bad result with the case of the UMIST Face Database which had large variations in poses.
#Sorry, it looks redistribution (of modifications) of PIE database is not allowed. I made only a generated (distorted and diminished) .vec file available at the Download section. The PIE database is free (send a request e-mail), but it does not include the cropped faces originally.

MIT CBCL Face Data is another choice. They have 2,429 frontal faces with few illumination variations and pose variations. This data would be good for haartraining. However, the size of image is originally small 19 x 19. So, we can not perform experiments to determine good sizes.

Probably, the OpenCV developers used the FERET database. It looks that the FERET database became available to download over internet from Jan. 31, 2008(?).

Negative (Background) Images

We need to collect negative images that does not contain objects of interest, e.g., faces to train haarcascade classifier.

Kuranov et. al. [3] states as they used 3000 negative images.

Fortunately, I found (Negatives sets, Set 1 – Various negatives) which has about 3500 images (Dead Link). But, this collection was used for eye detection, and includes some faces in some pictures. Therefore, I deleted all suspicious images which looked including faces. About 2900 images were remained, and I added 100 images to there. The number should be enough.

The collection is available at the Download section (But, it may take forever to download.)

Natural Test (Face in Background) Images

We can synthesize testing image sets using the createsamples utility, but having a natural testing image dataset is still good.

There is a CMU-MIT Frontal Face Test Set that the OpenCV developers used for their experiments. This dataset has a ground truth text including information for locations of eyes, noses, and lip centers and tips, however, it does not have locations of faces expressed by rectangle regions required by the haartraining utilities as default.

I created a simple script to compute facial regions from given ground truth information. My computation works as follows:

1. Get margin as nose height - mouse height
Lower boundary is located below the margin from the mouse
Upper boundary is located above the margin from the eye
2. Get margin as left mouse tip - right mouse tip
Right boundary is located right the margin from the right eye
Left boundary is located left the margin from the left eye

This was not perfect, but looked okay.

The generated ground truth text and image dataset is available at the Download section, you may download only the ground truth text. By the way, I converted GIF to PNG because OpenCV does not support GIF. The mogrify (ImageMagick) command would be useful to do such conversion of image types

$ mogrify -format png *.gif

How to Crop Images Manually Fast

To collect positive images, you may have to crop images a lot by your hand.

I created a multi-platform software imageclipper to help to do it. This software is not only for haartraining but also for other computer vision/machine learning researches. This software has characteristics as follows:

  • You can open images in a directory sequentially
  • You can open a video file too, frame by frame
  • Clipping and moving to the next image can be done by one button (SPACE)
  • You will select a region to clip by dragging left mouse button
  • You can move or resize your selected region by dragging right mouse button
  • Your selected region is shown on the next image too.

Create Samples (Reference)

We can create training samples and testing samples with the createsamples utility. In this section, I describe functionalities of the createsamples software because the Tutorial [1] did not explain them clearly for me (but please see the Tutorial [1] also for further options).

This is a list of options, but there are mainly four functions and the meanings of options become different in different functions. It confuses us.

Usage: ./createsamples
  [-info <description_file_name>]
  [-img <image_file_name>]
  [-vec <vec_file_name>]
  [-bg <background_file_name>]
  [-num <number_of_samples = 1000>]
  [-bgcolor <background_color = 0>]
  [-inv] [-randinv] [-bgthresh <background_color_threshold = 80>]
  [-maxidev <max_intensity_deviation = 40>]
  [-maxxangle <max_x_rotation_angle = 1.100000>]
  [-maxyangle <max_y_rotation_angle = 1.100000>]
  [-maxzangle <max_z_rotation_angle = 0.500000>]
  [-show [<scale = 4.000000>]]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]

1. Create training samples from one

The 1st function of the createsamples utility is to create training samples from one image applying distortions. This function (cvhaartraining.cpp#cvCreateTrainingSamples) is launched when options, -img, -bg, and -vec were specified.

  • -img <one_positive_image>
  • -bg <collection_file_of_negatives>
  • -vec <name_of_the_output_file_containing_the_generated_samples>

For example,

$ createsamples -img face.png -num 10 -bg negatives.dat -vec samples.vec -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0 -w 20 -h 20

This generates <num> number of samples from one <positive_image> applying distortions. Be careful that only the first <num> negative images in the <collection_file_of_negatives> are used.

The file of the <collection_file_of_negatives> is as follows:


such as


Let me call this file format as collection file format.

How to create a collection file

This format can easily be created with the find command as

$ cd [your working directory]
$ find [image dir] -name '*.[image ext]' > [description file]

such as

$ find ../../data/negatives/ -name '*.jpg' > negatives.dat

2. Create training samples from some

The 2nd function is to create training samples from some images without applying distortions. This function (cvhaartraining.cpp#cvCreateTestSamples) is launched when options, -info, and -vec were specified.

  • -info <description_file_of_samples>
  • -vec <name_of_the_output_file_containing_the_generated_samples>

For example,

$ createsamples -info samples.dat -vec samples.vec -w 20 -h 20

This generates samples without applying distortions. You may think this function as a file format conversion function.

The format of the <description_file_of_samples> is as follows:

[filename] [# of objects] [[x y width height] [... 2nd object] ...]
[filename] [# of objects] [[x y width height] [... 2nd object] ...]
[filename] [# of objects] [[x y width height] [... 2nd object] ...]

where (x,y) is the left-upper corner of the object where the origin (0,0) is the left-upper corner of the image such as

img/img1.jpg 1 140 100 45 45
img/img2.jpg 2 100 200 50 50 50 30 25 25
img/img3.jpg 1 0 0 20 20

Let me call this format as a description file format against the collection file format although the manual [1] does not differentiate them.

This function crops regions specified and resize these images and convert into .vec format, but (let me say again) this function does not generate many samples from one image (one cropped image) applying distortions. Therefore, you may use this 2nd function only when you have already sufficient number of natural images and their ground truths (totally, 5000 or 7000 would be required).

Note that the option -num is used only to restrict the number of samples to generate, not to increase number of samples applying distortions in this case.

How to create a description file

I write how to create a description file when already-cropped image files are available here because some people had asked how to create it at the OpenCV forum. Note that my tutorial steps do not require to perform this.

For such a situation, you can use the find command and the identify command (cygwin should have identify (ImageMagick) command) to create a description file as

$ cd <your working directory>
$ find <dir> -name '*.<ext>' -exec identify -format '%i 1 0 0 %w %h' \{\} \; > <description_file>

such as

$ find ../../data/umist_cropped -name '*.pgm' -exec identify -format '%i 1 0 0 %w %h' \{\} \; > samplesdescription.dat

If all images have the same size, it becomes simpler and faster,

$ find <dir> -name '*.<ext>' -exec echo \{\} 1 0 0 <width> <height> \; > <description_file>

such as

$ find ../../data/umist_cropped -name '*.pgm' -exec echo \{\} 1 0 0 20 20 \; > samplesdescription.dat

How to automate to crop images? If you can do it, you do not need haartraining. You have an object detector already (^-^

3. Create test samples

The 3rd function is to create test samples and their ground truth from single image applying distortions. This function (cvsamples.cpp#cvCreateTrainingSamplesFromInfo) is triggered when options, -img, -bg, and -info were specified.

  • -img <one_positive_image>
  • -bg <collection_file_of_negatives>
  • -info <generated_description_file_for_the_generated_test_images>

In this case, -w and -h are used to determine the minimal size of positives to be embeded in the test images.

$ createsamples -img face.png -num 10 -bg negatives.dat -info test.dat -maxxangle 0.6 -maxyangle 0 -maxzangle 0.3 -maxidev 100 -bgcolor 0 -bgthresh 0

Be careful that only the first <num> negative images in the <collection_file_of_negatives> are used.

This generates tons of jpg files

The output image filename format is as <number>_<x>_<y>_<width>_<height>.jpg, where x, y, width and height are the coordinates of placed object bounding rectangle.

Also, this generates <description_file_for_test_samples> of the description file format (the same format with <description_file_of_samples> at the 2nd function).

4. Show images

The 4th function is to show images within a vec file. This function (cvsamples.cpp#cvShowVecSamples) is triggered when only an option, -vec, was specified (no -info, -img, -bg). For example,

$ createsamples -vec samples.vec -w 20 -h 20

EXTRA: random seed

The createsamples software applys the same sequence of distortions for each image. We may want to apply the different sequence of distortions for each image because, otherwise, our resulting detection may work only for specific distortions.

This can be done by modifying createsamples slightly as:

Add below in the top


Add below in the main function


The modified source code is available at svn:createsamples.cpp

Create Samples

Create Training Samples

Kuranov et. al. [3] mentions as they used 5000 positive frontal face patterns and 3000 negatives for training, and 5000 positive frontal face patterns were derived from 1000 original faces.

However, you may have noticed that none of 4 functions of the createsamples utility provide us a function to generate 5000 positive images from 1000 images at burst. We have to use the 1st function of the createsamples to generate 5 (or some) positives form 1 image, repeat the procedures 1000 (or some) times, and finally merge the generated output vec files. *1

I wrote a program, mergevec.cpp, to merge vec files. I also wrote a script,, to repeat the procedures 1000 (or some) times. I specified 7000 instead of 5000 as default because the Tutorial [1] states as “the reasonable number of positive samples is 7000.” Please modify the path to createsamples and its option parameters directly written in the file.

The input format of is

$ perl <positives.dat> <negatives.dat> <vec_output_dir> [<totalnum = 7000>] [<createsample_command_options = "./createsamples -w 20 -h 20...">]

And, the input format of mergevec is

$ mergevec <collection_file_of_vecs> <output_vec_file_name>

A collection file (a file containing list of filenames) can be generated as

$ find [dir_name] -name '*.[ext]' > [collection_file_name]


$ cd HaarTraining/bin 
$ find ../../data/negatives/ -name '*.jpg' > negatives.dat
$ find ../../data/umist_cropped/ -name '*.pgm' > positives.dat

$ perl positives.dat negatives.dat samples 7000 "./createsamples  -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 20 -h 20"
$ find samples/ -name '*.vec' > samples.dat # to create a collection file for vec files
$ mergevec samples.dat samples.vec
$ # createsamples -vec samples.vec -show -w 20 -h 20 # Extra: If you want to see inside

Kuranov et. al. [3] states as 20×20 of sample size achieved the highest hit rate. Furthermore, they states as “For 18×18 four split nodes performed best, while for 20×20 two nodes were slightly better. Thus, -w 20 -h 20 would be good.

Create Testing Samples

Testing samples are images which include positives in negative background images and locations of positives are known in the images. It is possible to create such testing images by hand. We can also use the 3rd function of createsamples to synthesize such images. But, we can specify only one image using it, thus, creating a script to repeat the procedure would help us. The script is available at Please modify the path to createsamples and its option parameters directly in the file.

The input format of the is as

$ perl <positives.dat> <negatives.dat> <output_dir> [<totalnum = 1000>] [<createsample_command_options = "./createsamples -w 20 -h 20...">]

This generates lots of jpg files and info.dat in the <output_dir>. The jpg file name format is as <number>_<x>_<y>_<width>_<height>.jpg, where x, y, width and height are the coordinates of placed object bounding rectangle.


$ # cd HaarTraining/bin 
$ # find ../../data/negatives/ -name '*.jpg' > negatives.dat 
$ # find ../../data/umist_cropped/ -name '*.pgm' > positives.dat
$ perl positives.dat negatives.dat tests 1000 "./createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 -maxzangle 0.5 maxidev 40"
$ find tests/ -name 'info.dat' -exec cat \{\} \; > tests.dat # merge info files


Haar Training

Now, we train our own classifier using the haartraining utility. Here is the usage of the haartraining.

Usage: ./haartraining
  -data <dir_name>
  -vec <vec_file_name>
  -bg <background_file_name>
  [-npos <number_of_positive_samples = 2000>]
  [-nneg <number_of_negative_samples = 2000>]
  [-nstages <number_of_stages = 14>]
  [-nsplits <number_of_splits = 1>]
  [-mem <memory_in_MB = 200>]
  [-sym (default)] [-nonsym]
  [-minhitrate <min_hit_rate = 0.995000>]
  [-maxfalsealarm <max_false_alarm_rate = 0.500000>]
  [-weighttrimming <weight_trimming = 0.950000>]
  [-mode <BASIC (default) | CORE | ALL>]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]
  [-bt <DAB | RAB | LB | GAB (default)>]
  [-err <misclass (default) | gini | entropy>]
  [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
  [-minpos <min_number_of_positive_samples_per_cluster = 500>]

Kuranov et. al. [3] states as 20×20 of sample size achieved the highest hit rate. Furthermore, they states as “For 18×18 four split nodes performed best, while for 20×20 two nodes were slightly better. The difference between weak tree classifiers with 2, 3 or 4 split nodes is smaller than their superiority with respect to stumps.”

Furthermore, there was a description as “20 stages were trained. Assuming that my test set is representative for the learning task, I can expect a false alarm rate about 0.5^{20} \approx 9.6e-07and a hit rate about 0.999^{20} \approx 0.98.”

Therefore, use of 20×20 of sample size with nsplit = 2, nstages = 20, minhitrate = 0.9999 (default: 0.995), maxfalselarm = 0.5 (default: 0.5), and weighttrimming = 0.95 (default: 0.95) would be good such as

$ haartraining -data haarcascade -vec samples.vec -bg negatives.dat -nstages 20 -nsplits 2 -minhitrate 0.999 -maxfalsealarm 0.5 -npos 7000 -nneg 3019 -w 20 -h 20 -nonsym -mem 512 -mode ALL

The “-nonsym” option is used when the object class does not have vertical (left-right) symmetry. If object class has vertical symmetry such as frontal faces, “-sym (default)” should be used. It will speed up processing because it will use only the half (the centered and either of the left-sided or the right-sided) haar-like features.

The “-mode ALL” uses Extended Sets of Haar-like Features [2]. Default is BASIC and it uses only upright features, while ALL uses the full set of upright and 45 degree rotated feature set[1].

The “-mem 512” is the available memory in MB for precalculation [1]. Default is 200MB, so increase if more memory is available. We should not specify all system RAM because this number is only for precalculation, not for all. The maximum possible number to be specified would be 2GB because there is a limit of 4GB on the 32bit CPU (2^32 ≒ 4GB), and it becomes 2GB on Windows (kernel reserves 1GB and windows does something more).

There are other options that [1] does not list such as

 [-bt <DAB | RAB | LB | GAB (default)>]
 [-err <misclass (default) | gini | entropy>]
 [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
 [-minpos <min_number_of_positive_samples_per_cluster = 500>]

Please see my modified version of haartraining document [5] for details.

#Even if you increase the number of stages, the training may finish in an intermediate stage when it exceeded your desired minimum hit rate or false alarm because more cascading will decrease these rate for sure (0.99 until current * 0.99 next = 0.9801 until next). Or, the training may finish because all samples were rejected. In the case, you must increase number of training samples.

#You can use OpenMP (multi-processing) with compilers such as Intel C++ compiler and MS Visual Studio 2005 Professional Edition or better. See How to enable OpenMP section.

#One training took three days.

Generate a XML File

The haartraing generates a xml file when the process is completely finished (from OpenCV beta5).

If you want to convert an intermediate haartraining output dir tree data into a xml file, there is a software at the OpenCV/samples/c/convert_cascade.c (that is, in your installation directory). Compile it.

The input format is as

$ convert_cascade --size="<sample_width>x<sampe_height>" <haartraining_ouput_dir> <ouput_file>


$ convert_cascade --size="20x20" haarcascade haarcascade.xml


Performance Evaluation

We can evaluate the performance of the generated classifier using the performance utility. Here is the usage of the performance utility.

Usage: ./performance
  -data <classifier_directory_name>
  -info <collection_file_name>
  [-maxSizeDiff <max_size_difference = 1.500000>]
  [-maxPosDiff <max_position_difference = 0.300000>]
  [-sf <scale_factor = 1.200000>]
  [-nos <number_of_stages = -1>]
  [-rs <roc_size = 40>]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]

Please see my modified version of haartraining document [5] for details of options.

I cite how the performance utility works here:

During detection, a sliding window was moved pixel by pixel over the picture at each scale. Starting with the original scale, the features were enlarged by 10% and 20%, respectively (i.e., representing a rescale factor of 1.1 and 1.2, respectively) until exceeding the size of the picture in at least one dimension. Often multiple faces are detect at near by location and scale at an actual face location. Therefore, multiple nearby detection results were merged. Receiver Operating Curves (ROCs) were constructed by varying the required number of detected faces per actual face before merging into a single detection result. During experimentation only one parameter was changed at a time. The best mode of a parameter found in an experiment was used for the subsequent experiments. [3]

Execute the performance utility as

$ performance -data haarcascade -w 20 -h 20 -info tests.dat -ni
$ performance -data haarcascade.xml -info tests.dat -ni

Be careful that you have to tell the size of training samples when you specify the classifier directory although the classifier xml file includes the information inside *2.

-ni option suppresses to create resulted image files of detection. As default, the performance utility creates the resulted image files of detection and stores them into directories that a prefix ‘det-‘ is added to test image directories. When you want to use this function, you have to create destination directories beforehand by yourself. Execute next command to create destination directories

$ cat tests.dat | perl -pe 's!^(.*)/.*$!det-$1!g' | xargs mkdir -p

where tests.dat is the collection file for testing images which you created at the step of Now you can execute the performance utility without ‘-ni’ option.

An output of the performance utility is as follows:

|            File Name           | Hits |Missed| False|
|tests/01/img01.bmp/0001_0153_005|     0|     1|     0|
|                           Total|   874|   554|    72|
Number of stages: 15
Number of weak classifiers: 68
Total time: 115.000000
        874     72      0.612045        0.050420
        874     72      0.612045        0.050420
        360     2       0.252101        0.001401
        115     0       0.080532        0.000000
        26      0       0.018207        0.000000
        8       0       0.005602        0.000000
        4       0       0.002801        0.000000
        1       0       0.000700        0.000000

‘Hits’ shows the number of correct detections. ‘Missed’ shows the number of missed detections or false negatives (Truly there exists, but the detector missed to detect it). ‘False’ shows the number of false alarms or false positives (Truly there does not exist, but the detector alarmed as there exists.)

The latter table is for ROC plot. Please see my modified version of haartraining document [5] for more.

Fun with a USB camera

Fun with a USB camera or some image files with the facedetect utility.

$ facedetect --cascade=<xml_file> [filename(image or video)|camera_index]

I modified facedetect.c slightly because the facedetect utility did not work in the same manner with the performance utility. I added options to change parameters on command line. The source code is available at the Download section (or direct link facedetect.c). Now the usage is as follows:

Usage: facedetect  --cascade="<cascade_xml_path>" or -c <cascade_xml_path>
  [ -sf < scale_factor = 1.100000 > ]
  [ -mn < min_neighbors = 1 > ]
  [ -fl < flags = 0 > ]
  [ -ms < min_size = 0 0 > ]
  [ filename | camera_index = 0 ]
See also: cvHaarDetectObjects() about option parameters.

FYI: The original facedetect.c used min_neighbors = 2 although performance.cpp uses min_neighbors = 1. It affected face detection results considerably.


PIE Expeirment 1

The PIE dataset has only frontal faces with big illumination variations. The dataset used in PIE experiments looks as follows:

img01_01.png img01_10.png img01_21.png
1st 10th 21st
  • List of Commands
    • I used -w 18 -h 20 because the original images were not square but rectangle with ratio about 18:20. I applied little distortions on this experiment.
    • The training took 3 days on Intel Xeon 2GHz with 1GB memory machine.
  • Performance Evaluation with pie_test (synthesize tests)haarcascade_frontalface_pie1.performance_pie_tests.txt
    |            File Name           | Hits |Missed| False|
    |                           Total|   847|   581|    67|
    Number of stages: 16
    Number of weak classifiers: 113
    Total time: 123.000000
            847     67      0.593137        0.046919
            847     67      0.593137        0.046919
            353     2       0.247199        0.001401
            110     0       0.077031        0.000000
            15      0       0.010504        0.000000
            1       0       0.000700        0.000000
  • Performance evaluation with cmu_tests (natural tests)haarcascade_frontalface_pie1.performance_cmu_tests.txt
    |            File Name           | Hits |Missed| False|
    |                           Total|    20|   491|     9|
    Number of stages: 16
    Number of weak classifiers: 113
    Total time: 5.830000
    	20	9	0.039139	0.017613
    	20	9	0.039139	0.017613
    	2	0	0.003914	0.000000

PIE Experiment 2

PIE Experiment 3

PIE Experiment 4

PIE Experiment 5

PIE Experiment 6

UMIST Experiment 1

The UMIST is a multi-view face dataset.

1a000.png 1a021.png 1a033.png
0th frame 21st frame 33rd frame

UMIST Experiment 2

CBCL Experiment 1



The created detectors outperformed the opencv default xml in terms of synthesized test samples created from training samples. This shows that the training was successfully performed. However, the detector did not work well in general test samples. This might mean that the detector was over-trained or over-fitted to the specific training samples. I still don’t know good parameters or training samples to generalize detectors well.

False alarm rates of all of my generated detectors were pretty low compared with the opencv default detector. I don’t know which parameters are especially different. I set false alarm rate with 0.5 and this makes sense theoretically. I don’t know.

Training illumination varying faces in one detector resulted in pretty poor. The generated detector became sensitive to illumination rather than robust to illumination. This detector does not detect non-illuminated normal frontal faces. This makes sense because normal frontal faces did not exist in training sets so many. Training multi-view faces in one time resulted in the same thing.

We should train different detectors for each face pose or illumination state to construct a multi-view or illumination varied face detector as Fast Multi-view Face Detection. Viola and Jones extended their work for multi-view by training 12 separated face poses detectors. To achieve rapidness, they further constructed a pose estimator by C4.5 decision tree re-using the haar-like features, they further cascaded the pose estimator and face detector (Of course, this means that if pose estimation fails, the face detection also fails).

Theory behind

The advantage of the haar-like features is the rapidness in detection phase, not accuracy. We of course can construct another face detector which achieves better accuracy using, e.g., PCA or LDA although it becomes slow in detection phase. Use such features when you do not require rapidness. PCA does not require to train AdaBoost, so training phase would quickly finish. I am pretty sure that there exist such face detection method already although I did not search (I do not search because I am sure).


The files are available at (old repository)

Directory Tree

  • HaarTraining haartraining
    • src Source Code, haartraining and my additional c++ source codes are at here.
    • src/Makefile Makefile for Linux, please read comments inside
    • bin Binaries for Windows are ready, my perl scripts are also at here. This directory would be a working directory.
    • make Visual Studio Project Files
  • data The collected Image Datasets
  • result Generated Files (vec and xml etc) and results

This is a svn repository, so you can download files at burst if you have a svn client (you should have it on cygwin or Linux). For example,

$ svn co tutorial-haartraining

Sorry, but downloading (checkout) image datasets may take forever…. I created a zip file once, but google code repository did not allow me to upload such a big file (100MB). I recommend you to check out only the HaarTraining directory first as

$ svn co HaarTraining

Here, the list of my additional utilities (I put them in HaarTraining/src and HaarTraining/bin directory):

The following additional utilities can be obtained from OpenCV/samples/c in your OpenCV install directory (I also put them in HaarTraining/src directory).

How to enable OpenMP

I bundled windows binaries in the Download section, but I did not enable OpenMP (multi-processing) support. Therefore, I write how to compile the haartraining utility to use OpenMP with Visual Studio 2005 Professional Edition here based on my distribution files (The procedure should be same for the originals too, but I did not verify.)

The solution file is in HaarTraining\make\haartraining.sln. Open it.

Right click cvhaartraining project > Properties. You will see a picture as below.

Follow Configuration Properties > C/C++ > Language > Change ‘OpenMP Support’ to ‘Yes (/openmp)’ as the above picture shows. If you can not see it, probably your environment does not support OpenMP.

Build cvhaartraining only (Right click the project > Project Only > Rebuild only cvhaartraining) and do the same procedure (enable OpenMP) for haartraining project. Now, haartraining.exe should work with OpenMP.

You may use Process Explorer to verify whether it is utilizing OpenMP or not.

Run the Process Explorer > View > Show Lower Pane (Ctrl+L) > choose ‘haartraining.exe’ process and see the Lower Pane. If you can see two threads not one thread, it is utilizing OpenMP.


*1 There was a choice to modify codes for the 2nd function to apply distortions and generate many images from one image, but I chose to write scripts to repeat the 1st function because the same method can be applied for creation of test samples too.
*2 The performance utility supports both classifier directory and haarcascade xml file, in details, cvLoadHaarClassifierCascade() function supports both

Posted in Computer Languages, OpenCV, OpenCV, OpenCV Tutorial | Tagged: , , | Leave a Comment »

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione


Just another site

Abraham Zamudio [Matematico]

Matematica Aplicada, Linux ,Programacion Cientifica , HIgh Performance COmputing, APrendizaje Automatico




A great site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.

Chetan Solanki

Helpful to u, if u need it.....


Explorer of Research #HEMBAD


Explorer of Research #HEMBAD


A great site


This is My Space so Dont Mess With IT !!

%d bloggers like this: