Something More for Research

Explorer of Research #HEMBAD

Posts Tagged ‘Computer Vision’

Professional ways of tracking GPU memory leakage

Posted by Hemprasad Y. Badgujar on January 25, 2015

Depending on what I am doing and what I need to track/trace and profile I utilise all 4 packages above. They also have the added benefit of being a: free; b: well maintained; c: free; d: regularly updated; e: free.

In case you hadn’t guessed I like the free part:)

In regards of object management, I would recommend an old C++ coding principle: as soon as you create an object, add the line that deletes it, every new should always (eventually) have a delete. That way you know that you are destroying the objects you create, however it will not save you from orphaned memory block memory leaks, where you change where pointers are pointing, for example:

myclass* firstInstance = new myclass();
myclass* secondInstance = new myclass();
firstInstance = secondInstance;
delete firstInstance;
delete secondInstance;

You will now have created a small memory leak where the data for the real firstInstance is now not being pointed at by any pointer. Very hard to detect when this happens in a large code-base, and more common that it should be.

generally these are the pairings you need to be aware of to ensure you properly dispose of all your objects:

new -> delete
new[] -> delete[]
malloc() -> free() // or you can use realloc(0) instead of free()
calloc() -> free() // or you can use realloc(0) instead of free()
realloc(nonzero) -> free() // or you can use realloc(0) instead of free()

If you are coming from a language with garbage collection to C++ it can take a while to get used to, but it quickly becomes habit:)

Posted in C, Computer Languages, Computer Vision, Computing Technology, CUDA | Tagged: , , , , , | Leave a Comment »

Computer Vision source codes

Posted by Hemprasad Y. Badgujar on January 19, 2015

Feature Detection and Description

General Libraries:

  • VLFeat – Implementation of various feature descriptors (including SIFT, HOG, and LBP) and covariant feature detectors (including DoG, Hessian, Harris Laplace, Hessian Laplace, Multiscale Hessian, Multiscale Harris). Easy-to-use Matlab interface. See Modern features: Software – Slides providing a demonstration of VLFeat and also links to other software. Check also VLFeat hands-on session training
  • OpenCV – Various implementations of modern feature detectors and descriptors (SIFT, SURF, FAST, BRIEF, ORB, FREAK, etc.)

Fast Keypoint Detectors for Real-time Applications:

  • FAST – High-speed corner detector implementation for a wide variety of platforms
  • AGAST – Even faster than the FAST corner detector. A multi-scale version of this method is used for the BRISK descriptor (ECCV 2010).

Binary Descriptors for Real-Time Applications:

  • BRIEF – C++ code for a fast and accurate interest point descriptor (not invariant to rotations and scale) (ECCV 2010)
  • ORB – OpenCV implementation of the Oriented-Brief (ORB) descriptor (invariant to rotations, but not scale)
  • BRISK – Efficient Binary descriptor invariant to rotations and scale. It includes a Matlab mex interface. (ICCV 2011)
  • FREAK – Faster than BRISK (invariant to rotations and scale) (CVPR 2012)

SIFT and SURF Implementations:

Other Local Feature Detectors and Descriptors:

  • VGG Affine Covariant features – Oxford code for various affine covariant feature detectors and descriptors.
  • LIOP descriptor – Source code for the Local Intensity order Pattern (LIOP) descriptor (ICCV 2011).
  • Local Symmetry Features – Source code for matching of local symmetry features under large variations in lighting, age, and rendering style (CVPR 2012).

Global Image Descriptors:

  • GIST – Matlab code for the GIST descriptor
  • CENTRIST – Global visual descriptor for scene categorization and object detection (PAMI 2011)

Feature Coding and Pooling

  • VGG Feature Encoding Toolkit – Source code for various state-of-the-art feature encoding methods – including Standard hard encoding, Kernel codebook encoding, Locality-constrained linear encoding, and Fisher kernel encoding.
  • Spatial Pyramid Matching – Source code for feature pooling based on spatial pyramid matching (widely used for image classification)

Convolutional Nets and Deep Learning

  • Caffe – Fast C++ implementation of deep convolutional networks (GPU / CPU / ImageNet 2013 demonstration).
  • OverFeat – C++ library for integrated classification and localization of objects.
  • EBLearn – C++ Library for Energy-Based Learning. It includes several demos and step-by-step instructions to train classifiers based on convolutional neural networks.
  • Torch7 – Provides a matlab-like environment for state-of-the-art machine learning algorithms, including a fast implementation of convolutional neural networks.
  • Deep Learning – Various links for deep learning software.

Facial Feature Detection and Tracking

  • IntraFace – Very accurate detection and tracking of facial features (C++/Matlab API).

Part-Based Models

Attributes and Semantic Features

Large-Scale Learning

  • Additive Kernels – Source code for fast additive kernel SVM classifiers (PAMI 2013).
  • LIBLINEAR – Library for large-scale linear SVM classification.
  • VLFeat – Implementation for Pegasos SVM and Homogeneous Kernel map.

Fast Indexing and Image Retrieval

  • FLANN – Library for performing fast approximate nearest neighbor.
  • Kernelized LSH – Source code for Kernelized Locality-Sensitive Hashing (ICCV 2009).
  • ITQ Binary codes – Code for generation of small binary codes using Iterative Quantization and other baselines such as Locality-Sensitive-Hashing (CVPR 2011).
  • INRIA Image Retrieval – Efficient code for state-of-the-art large-scale image retrieval (CVPR 2011).

Object Detection

3D Recognition

Action Recognition





  • Animals with Attributes – 30,475 images of 50 animals classes with 6 pre-extracted feature representations for each image.
  • aYahoo and aPascal – Attribute annotations for images collected from Yahoo and Pascal VOC 2008.
  • FaceTracer – 15,000 faces annotated with 10 attributes and fiducial points.
  • PubFig – 58,797 face images of 200 people with 73 attribute classifier outputs.
  • LFW – 13,233 face images of 5,749 people with 73 attribute classifier outputs.
  • Human Attributes – 8,000 people with annotated attributes. Check also this link for another dataset of human attributes.
  • SUN Attribute Database – Large-scale scene attribute database with a taxonomy of 102 attributes.
  • ImageNet Attributes – Variety of attribute labels for the ImageNet dataset.
  • Relative attributes – Data for OSR and a subset of PubFig datasets. Check also this link for the WhittleSearch data.
  • Attribute Discovery Dataset – Images of shopping categories associated with textual descriptions.

Fine-grained Visual Categorization

Face Detection

  • FDDB – UMass face detection dataset and benchmark (5,000+ faces)
  • CMU/MIT – Classical face detection dataset.

Face Recognition

  • Face Recognition Homepage – Large collection of face recognition datasets.
  • LFW – UMass unconstrained face recognition dataset (13,000+ face images).
  • NIST Face Homepage – includes face recognition grand challenge (FRGC), vendor tests (FRVT) and others.
  • CMU Multi-PIE – contains more than 750,000 images of 337 people, with 15 different views and 19 lighting conditions.
  • FERET – Classical face recognition dataset.
  • Deng Cai’s face dataset in Matlab Format – Easy to use if you want play with simple face datasets including Yale, ORL, PIE, and Extended Yale B.
  • SCFace – Low-resolution face dataset captured from surveillance cameras.

Handwritten Digits

  • MNIST – large dataset containing a training set of 60,000 examples, and a test set of 10,000 examples.

Pedestrian Detection

Generic Object Recognition

  • ImageNet – Currently the largest visual recognition dataset in terms of number of categories and images.
  • Tiny Images – 80 million 32×32 low resolution images.
  • Pascal VOC – One of the most influential visual recognition datasets.
  • Caltech 101 / Caltech 256 – Popular image datasets containing 101 and 256 object categories, respectively.
  • MIT LabelMe – Online annotation tool for building computer vision databases.

Scene Recognition

Feature Detection and Description

  • VGG Affine Dataset – Widely used dataset for measuring performance of feature detection and description. Check VLBenchmarksfor an evaluation framework.

Action Recognition

RGBD Recognition


Posted in Computer Vision, OpenCV, OpenCV | Tagged: , , , , , , , , , | Leave a Comment »

Computer Vision Databases

Posted by Hemprasad Y. Badgujar on October 15, 2014

Computer Vision Databases

Index by Topic

  1. Action Databases
  2. Biological/Medical
  3. Face Databases
  4. Fingerprints
  5. General Images
  6. General RGBD and depth datasets
  7. Gesture Databases
  8. Image, Video and Shape Database Retrieval
  9. Object Databases
  10. People, Pedestrian, Eye/Iris, Template Detection/Tracking Databases
  11. Segmentation
  12. Surveillance
  13. Textures
  14. General Videos
  15. Other Collection Pages
  16. Miscellaneous Topics

Action Databases

  1. An analyzed collation of various labeled video datasets for action recognition (Kevin Murphy)
  2. 50 Salads – fully annotated 4.5 hour dataset of RGB-D video + accelerometer data, capturing 25 people preparing two mixed salads each (Dundee University, Sebastian Stein)
  3. ASLAN Action similarity labeling challenge database (Orit Kliper-Gross)
  4. Berkeley MHAD: A Comprehensive Multimodal Human Action Database (Ferda Ofli)
  5. BEHAVE Interacting Person Video Data with markup (Scott Blunsden, Bob Fisher, Aroosha Laghaee)
  6. CVBASE06: annotated sports videos (Janez Pers)
  7. G3D – synchronised video, depth and skeleton data for 20 gaming actions captured with Microsoft Kinect (Victoria Bloom)
  8. Hollywood 3D – 650 3D action recognition in the wild videos, 14 action classes (Simon Hadfield)
  9. Human Actions and Scenes Dataset (Marcin Marszalek, Ivan Laptev, Cordelia Schmid)
  10. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion (Brown University)
  11. i3DPost Multi-View Human Action Datasets (Hansung Kim)
  12. i-LIDS video event image dataset (Imagery library for intelligent detection systems) (Paul Hosner)
  13. INRIA Xmas Motion Acquisition Sequences (IXMAS) (INRIA)
  14. JPL First-Person Interaction dataset – 7 types of human activity videos taken from a first-person viewpoint (Michael S. Ryoo, JPL)
  15. KTH human action recognition database (KTH CVAP lab)
  16. LIRIS human activities dataset – 2 cameras, annotated, depth images (Christian Wolf, et al)
  17. MuHAVi – Multicamera Human Action Video Data (Hossein Ragheb)
  18. Oxford TV based human interactions (Oxford Visual Geometry Group)
  19. Rochester Activities of Daily Living Dataset (Ross Messing)
  20. SDHA Semantic Description of Human Activities 2010 contest – aerial views (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
  21. SDHA Semantic Description of Human Activities 2010 contest – Human Interactions (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
  22. TUM Kitchen Data Set of Everyday Manipulation Activities (Moritz Tenorth, Jan Bandouch)
  23. TV Human Interaction Dataset (Alonso Patron-Perez)
  24. Univ of Central Florida – Feature Films Action Dataset (Univ of Central Florida)
  25. Univ of Central Florida – YouTube Action Dataset (sports) (Univ of Central Florida)
  26. Univ of Central Florida – 50 Action Category Recognition in Realistic Videos (3 GB) (Kishore Reddy)
  27. UCF 101 action dataset 101 action classes, over 13k clips and 27 hours of video data (Univ of Central Florida)
  28. Univ of Central Florida – Sports Action Dataset (Univ of Central Florida)
  29. Univ of Central Florida – ARG Aerial camera, Rooftop camera and Ground camera (UCF Computer Vision Lab)
  30. UCR Videoweb Multi-camera Wide-Area Activities Dataset (Amit K. Roy-Chowdhury)
  31. Verona Social interaction dataset (Marco Cristani)
  32. Videoweb (multicamera) Activities Dataset (B. Bhanu, G. Denina, C. Ding, A. Ivers, A. Kamal, C. Ravishankar, A. Roy-Chowdhury, B. Varda)
  33. ViHASi: Virtual Human Action Silhouette Data (userID: VIHASI password: virtual$virtual) (Hossein Ragheb, Kingston University)
  34. WorkoutSU-10 Kinect dataset for exercise actions (Ceyhun Akgul)
  35. YouCook – 88 open-source YouTube cooking videos with annotations (Jason Corso)
  36. WVU Multi-view action recognition dataset (Univ. of West Virginia)


  1. Annotated Spine CT Database for Benchmarking of Vertebrae Localization, 125 patients, 242 scans (Ben Glockern)
  2. Computed Tomography Emphysema Database (Lauge Sorensen)
  3. Dermoscopy images (Eric Ehrsam)
  4. DIADEM: Digital Reconstruction of Axonal and Dendritic Morphology Competition (Allen Institute for Brain Science et al)
  5. DIARETDB1 – Standard Diabetic Retinopathy Database (Lappeenranta Univ of Technology)
  6. DRIVE: Digital Retinal Images for Vessel Extraction (Univ of Utrecht)
  7. MiniMammographic Database (Mammographic Image Analysis Society)
  8. MIT CBCL Automated Mouse Behavior Recognition datasets (Nicholas Edelman)
  9. Mouse Embryo Tracking Database – cell division event detection (Marcelo Cicconet, Kris Gunsalus)
  10. Retinal fundus images – Ground truth of vascular bifurcations and crossovers (Univ of Groningen)
  11. Spine and Cardiac data (Digital Imaging Group of London Ontario, Shuo Li)
  12. Univ of Central Florida – DDSM: Digital Database for Screening Mammography (Univ of Central Florida)
  13. VascuSynth – 120 3D vascular tree like structures with ground truth (Mengliu Zhao, Ghassan Hamarneh)
  14. York Cardiac MRI dataset (Alexander Andreopoulos)

Face Databases

  1. 3D Mask Attack Database (3DMAD) – 76500 frames of 17 persons using Kinect RGBD with eye positions (Sebastien Marcel)
  2. Audio-visual database for face and speaker recognition (Mobile Biometry MOBIO
  3. BANCA face and voice database (Univ of Surrey)
  4. Binghampton Univ 3D static and dynamic facial expression database (Lijun Yin, Peter Gerhardstein and teammates)
  5. BioID face database (BioID group)
  6. Biwi 3D Audiovisual Corpus of Affective Communication – 1000 high quality, dynamic 3D scans of faces, recorded while pronouncing a set of English sentences.
  7. CMU Facial Expression Database (CMU/MIT)
  8. CMU/MIT Frontal Faces (CMU/MIT)
  9. CMU/MIT Frontal Faces (CMU/MIT)
  10. CMU Pose, Illumination, and Expression (PIE) Database (Simon Baker)
  11. CSSE Frontal intensity and range images of faces (Ajmal Mian)
  12. Face Recognition Grand Challenge datasets (FRVT – Face Recognition Vendor Test)
  13. FaceTracer Database – 15,000 faces (Neeraj Kumar, P. N. Belhumeur, and S. K. Nayar)
  14. FDDB: Face Detection Data set and Benchmark – studying unconstrained face detection (University of Massachusetts Computer Vision Laboratory)
  15. FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
  16. Facial Recognition Technology (FERET) Database (USA National Institute of Standards and Technology)
  17. Hannah and her sisters database – a dense audio-visual person-oriented ground-truth annotation of faces, speech segments, shot boundaries (Patrick Perez, Technicolor)
  18. Hong Kong Face Sketch Database
  19. Japanese Female Facial Expression (JAFFE) Database (Michael J. Lyons)
  20. LFW: Labeled Faces in the Wild – unconstrained face recognition.
  21. Manchester Annotated Talking Face Video Dataset (Timothy Cootes)
  22. MIT Collation of Face Databases (Ethan Meyers)
  23. MMI Facial Expression Database – 2900 videos and high-resolution still images of 75 subjects, annotated for FACS AUs.
  24. MORPH (Craniofacial Longitudinal Morphological Face Database) (University of North Carolina Wilmington)
  25. MIT CBCL Face Recognition Database (Center for Biological and Computational Learning)
  26. NIST mugshot identification database (USA National Institute of Standards and Technology)
  27. ORL face database: 40 people with 10 views (ATT Cambridge Labs)
  28. Oxford: faces, flowers, multi-view, buildings, object categories, motion segmentation, affine covariant regions, misc (Oxford Visual Geometry Group)
  29. PubFig: Public Figures Face Database (Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar)
  30. Re-labeled Faces in the Wild – original images, but aligned using “deep funneling” method. (University of Massachusetts, Amherst)
  31. SCface – Surveillance Cameras Face Database (Mislav Grgic, Kresimir Delac, Sonja Grgic, Bozidar Klimpak))
  32. Trondheim Kinect RGB-D Person Re-identification Dataset (Igor Barros Barbosa)
  33. UB KinFace Database – University of Buffalo kinship verification and recognition database
  34. XM2VTS Face video sequences (295): The extended M2VTS Database (XM2VTS) – (Surrey University)
  35. Yale Face Database – 11 expressions of 10 people (A. Georghaides)
  36. Yale Face Database B – 576 viewing conditions of 10 people (A. Georghaides)


  1. FVC fingerpring verification competition 2002 dataset (University of Bologna)
  2. FVC fingerpring verification competition 2004 dataset (University of Bologna)
  3. FVC – a subset of FVC (Fingerprint Verification Competition) 2002 and 2004 fingerprint image databases, manually extracted minutiae data & associated documents (Umut Uludag)
  4. NIST fingerprint databases (USA National Institute of Standards and Technology)
  5. SPD2010 Fingerprint Singular Points Detection Competition (SPD 2010 committee)

General Images

  1. Aerial color image dataset (Swiss Federal Institute of Technology)
  2. AMOS: Archive of Many Outdoor Scenes (20+m) (Nathan Jacobs)
  3. Brown Univ Large Binary Image Database (Ben Kimia)
  4. Caltech-UCSD Birds-200-2011 (Catherine Wah)
  5. Columbia Multispectral Image Database (F. Yasuma, T. Mitsunaga, D. Iso, and S.K. Nayar)
  6. HIPR2 Image Catalogue of different types of images (Bob Fisher et al)
  7. Hyperspectral images of natural scenes – 2002 (David H. Foster)
  8. Hyperspectral images of natural scenes – 2004 (David H. Foster)
  9. ImageNet Linguistically organised (WordNet) Hierarchical Image Database – 10E7 images, 15K categories (Li Fei-Fei, Jia Deng, Hao Su, Kai Li)
  10. ImageNet Large Scale Visual Recognition Challenge (Alex Berg, Jia Deng, Fei-Fei Li)
  11. OTCBVS Thermal Imagery Benchmark Dataset Collection (Ohio State Team)
  12. McGill Calibrated Colour Image Database (Adriana Olmos and Fred Kingdom)
  13. Tiny Images Dataset 79 million 32×32 color images (Fergus, Torralba, Freeman)

General RGBD Datasets

  1. Cornell-RGBD-Dataset – Office Scenes (Hema Koppula)
  2. NYU Depth Dataset V2 – Indoor Segmentation and Support Inference from RGBD Images
  3. Oakland 3-D Point Cloud Dataset (Nicolas Vandapel)
  4. Washington RGB-D Object Dataset – 300 common household objects adn 14 scenes. (University of Washington and Intel Labs Seattle)

Gesture Databases

  1. FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
  2. Hand gesture and marine silhouettes (Euripides G.M. Petrakis)
  3. IDIAP Hand pose/gesture datasets (Sebastien Marcel)
  4. Sheffield gesture database – 2160 RGBD hand gesture sequences, 6 subjects, 10 gestures, 3 postures, 3 backgrounds, 2 illuminations (Ling Shao)

Image, Video and Shape Database Retrieval

  1. Brown Univ 25/99/216 Shape Databases (Ben Kimia)
  2. IAPR TC-12 Image Benchmark (Michael Grubinger)
  3. IAPR-TC12 Segmented and annotated image benchmark (SAIAPR TC-12): (Hugo Jair Escalante)
  4. ImageCLEF 2010 Concept Detection and Annotation Task (Stefanie Nowak)
  5. ImageCLEF 2011 Concept Detection and Annotation Task – multi-label classification challenge in Flickr photos
  6. CLEF-IP 2011 evaluation on patent images
  7. McGill 3D Shape Benchmark (Siddiqi, Zhang, Macrini, Shokoufandeh, Bouix, Dickinson)
  8. NIST SHREC 2010 – Shape Retrieval Contest of Non-rigid 3D Models (USA National Institute of Standards and Technology)
  9. NIST SHREC – other NIST retrieval contest databases and links (USA National Institute of Standards and Technology)
  10. NIST TREC Video Retrieval Evaluation Database (USA National Institute of Standards and Technology)
  11. Princeton Shape Benchmark (Princeton Shape Retrieval and Analysis Group)
  12. Queensland cross media dataset – millions of images and text documents for “cross-media” retrieval (Yi Yang)
  13. TOSCA 3D shape database (Bronstein, Bronstein, Kimmel)

Object Databases

  1. 2.5D/3D Datasets of various objects and scenes (Ajmal Mian)
  2. Amsterdam Library of Object Images (ALOI): 100K views of 1K objects (University of Amsterdam/Intelligent Sensory Information Systems)
  3. Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild – 12 class, 3000+ images each with 3D annotations (Yu Xiang, Roozbeh Mottaghi, Silvio Savarese)
  4. Caltech 101 (now 256) category object recognition database (Li Fei-Fei, Marco Andreeto, Marc’Aurelio Ranzato)
  5. Columbia COIL-100 3D object multiple views (Columbia University)
  6. Densely sampled object views: 2500 views of 2 objects, eg for view-based recognition and modeling (Gabriele Peters, Universiteit Dortmund)
  7. German Traffic Sign Detection Benchmark (Ruhr-Universitat Bochum)
  8. GRAZ-02 Database (Bikes, cars, people) (A. Pinz)
  9. Linkoping 3D Object Pose Estimation Database (Fredrik Viksten and Per-Erik Forssen)
  10. Microsoft Object Class Recognition image databases (Antonio Criminisi, Pushmeet Kohli, Tom Minka, Carsten Rother, Toby Sharp, Jamie Shotton, John Winn)
  11. Microsoft salient object databases (labeled by bounding boxes) (Liu, Sun Zheng, Tang, Shum)
  12. MIT CBCL Car Data (Center for Biological and Computational Learning)
  13. MIT CBCL StreetScenes Challenge Framework: (Stan Bileschi)
  14. NEC Toy animal object recognition or categorization database (Hossein Mobahi)
  15. NORB 50 toy image database (NYU)
  16. PASCAL Image Database (motorbikes, cars, cows) (PASCAL Consortium)
  17. PASCAL 2007 Challange Image Database (motorbikes, cars, cows) (PASCAL Consortium)
  18. PASCAL 2008 Challange Image Database (PASCAL Consortium)
  19. PASCAL 2009 Challange Image Database (PASCAL Consortium)
  20. PASCAL 2010 Challange Image Database (PASCAL Consortium)
  21. PASCAL 2011 Challange Image Database (PASCAL Consortium)
  22. PASCAL 2012 Challange Image Database Category classification, detection, and segmentation, and still-image action classification (PASCAL Consortium)
  23. UIUC Car Image Database (UIUC)
  24. UIUC Dataset of 3D object categories (S. Savarese and L. Fei-Fei)
  25. Venezia 3D object-in-clutter recognition and segmentation (Emanuele Rodola)

People, Pedestrian, Eye/Iris, Template Detection/Tracking Databases

  1. 3D KINECT Gender Walking data base (L. Igual, A. Lapedriza, R. Borràs from UB, CVC and UOC, Spain)
  2. Caltech Pedestrian Dataset (P. Dollar, C. Wojek, B. Schiele and P. Perona)
  3. CASIA gait database (Chinese Academy of Sciences)
  4. CASIA-IrisV3 (Chinese Academy of Sciences, T. N. Tan, Z. Sun)
  5. CAVIAR project video sequences with tracking and behavior ground truth (CAVIAR team/Edinburgh University – EC project IST-2001-37540)
  6. Daimler Pedestrian Detection Benchmark 21790 images with 56492 pedestrians plus empty scenes (M. Enzweiler, D. M. Gavrila)
  7. Driver Monitoring Video Dataset (RobeSafe + Jesus Nuevo-Chiquero)
  8. Edinburgh overhead camera person tracking dataset (Bob Fisher, Bashia Majecka, Gurkirt Singh, Rowland Sillito)
  9. Eyetracking database summary (Stefan Winkler)
  10. HAT database of 27 human attributes (Gaurav Sharma, Frederic Jurie)
  11. INRIA Person Dataset (Navneet Dalal)
  12. ISMAR09 ground truth video dataset for template-based (i.e. planar) tracking algorithms (Sebastian Lieberknecht)
  13. MIT CBCL Pedestrian Data (Center for Biological and Computational Learning)
  14. MIT eye tracking database (1003 images) (Judd et al)
  15. Modena and Reggio Emilia first person head motion videos (Univ of Modena and Reggio Emilia)
  16. Notre Dame Iris Image Dataset (Patrick J. Flynn)
  17. PETS 2009 Crowd Challange dataset (Reading University & James Ferryman)
  18. PETS: Performance Evaluation of Tracking and Surveillance (Reading University & James Ferryman)
  19. PETS Winter 2009 workshop data (Reading University & James Ferryman)
  20. Pixel-based change detection benchmark dataset (Goyette et al)
  21. Pointing’04 ICPR Workshop Head Pose Image Database
  22. Transient Biometrics Nails Dataset V01 (Igor Barros Barbosa)
  23. UBIRIS: Noisy Visible Wavelength Iris Image Databases (University of Beira)
  24. Univ of Central Florida – Crowd Dataset (Saad Ali)
  25. Univ of Central Florida – Crowd Flow Segmentation datasets (Saad Ali)
  26. UTIRIS cross-spectral iris image databank (Mahdi Hosseini)
  27. York Univ Eye Tracking Dataset (120 images) (Neil Bruce)


  1. Alpert et al. Segmentation evaluation database (Sharon Alpert, Meirav Galun, Ronen Basri, Achi Brandt)
  2. Berkeley Segmentation Dataset and Benchmark (David Martin and Charless Fowlkes)
  3. GrabCut Image database (C. Rother, V. Kolmogorov, A. Blake, M. Brown)
  4. LabelMe images database and online annotation tool (Bryan Russell, Antonio Torralba, Kevin Murphy, William Freeman)


  1. AVSS07: Advanced Video and Signal based Surveillance 2007 datasets (Andrea Cavallaro)
  2. ETISEO Video Surveillance Download Datasets (INRIA Orion Team and others)
  3. Heriot Watt Summary of datasets for human tracking and surveillance (Zsolt Husz)
  4. Openvisor – Video surveillance Online Repository (Univ of Modena and Reggio Emilia)
  5. SPEVI: Surveillance Performance EValuation Initiative (Queen Mary University London)
  6. Udine Trajectory-based anomalous event detection dataset – synthetic trajectory datasets with outliers (Univ of Udine Artificial Vision and Real Time Systems Laboratory)


  1. Color texture images by category (
  2. Columbia-Utrecht Reflectance and Texture Database (Columbia & Utrecht Universities)
  3. DynTex: Dynamic texture database (Renaud Piteri, Mark Huiskes and Sandor Fazekas)
  4. Oulu Texture Database (Oulu University)
  5. Prague Texture Segmentation Data Generator and Benchmark (Mikes, Haindl)
  6. Uppsala texture dataset of surfaces and materials – fabrics, grains, etc.
  7. Vision Texture (MIT Media Lab)

General Videos

  1. Large scale YouTube video dataset – 156,823 videos (2,907,447 keyframes) crawled from YouTube videos (Yi Yang)

Other Collections

  1. CANTATA Video and Image Database Index site (Multitel)
  2. Computer Vision Homepage list of test image databases (Carnegie Mellon Univ)
  3. ETHZ various, including 3D head pose, shape classes, pedestrians, pedestrians, buildings (ETH Zurich, Computer Vision Lab)
  4. Leibe’s Collection of people/vehicle/object databases (Bastian Leibe)
  5. Lotus Hill Image Database Collection with Ground Truth (Sealeen Ren, Benjamin Yao, Michael Yang)
  6. Oxford Misc, including Buffy, Flowers, TV characters, Buildings, etc (Oxford Visual geometry Group)
  7. PEIPA Image Database Summary (Pilot European Image Processing Archive)
  8. Univ of Bern databases on handwriting, online documents, string edit and graph matching (Univ of Bern, Computer Vision and Artificial Intelligence)
  9. USC Annotated Computer Vision Bibliography database publication summary (Keith Price)
  10. USC-SIPI image databases: texture, aerial, favorites (eg. Lena) (USC Signal and Image Processing Institute)


  1. 3D mesh watermarking benchmark dataset (Guillaume Lavoue)
  2. Active Appearance Models datasets (Mikkel B. Stegmann)
  3. Aircraft tracking (Ajmal Mian)
  4. Cambridge Motion-based Segmentation and Recognition Dataset (Brostow, Shotton, Fauqueur, Cipolla)
  5. Catadioptric camera calibration images (Yalin Bastanlar)
  6. Chars74K dataset – 74 English and Kannada characters (Teo de Campos –
  7. COLD (COsy Localization Database) – place localization (Ullah, Pronobis, Caputo, Luo, and Jensfelt)
  8. Columbia Camera Response Functions: Database (DoRF) and Model (EMOR) (M.D. Grossberg and S.K. Nayar)
  9. Columbia Database of Contaminants’ Patterns and Scattering Parameters (Jinwei Gu, Ravi Ramamoorthi, Peter Belhumeur, Shree Nayar)
  10. Dense outdoor correspondence ground truth datasets, for optical flow and local keypoint evaluation (Christoph Strecha)
  11. DTU controlled motion and lighting image dataset (135K images) (Henrik Aanaes)
  12. EISATS: .enpeda.. Image Sequence Analysis Test Site (Auckland University Multimedia Imaging Group)
  13. FlickrLogos-32 – 8240 images of 32 product logos (Stefan Romberg)
  14. Flowchart images (Allan Hanbury)
  15. Geometric Context – scene interpretation images (Derek Hoiem)
  16. Image/video quality assessment database summary (Stefan Winkler)
  17. INRIA feature detector evaluation sequences (Krystian Mikolajczyk)
  18. INRIA’s PERCEPTION’s database of images and videos gathered with several synchronized and calibrated cameras (INRIA Rhone-Alpes)
  19. INRIA’s Synchronized and calibrated binocular/binaural data sets with head movements (INRIA Rhone-Alpes)
  20. KITTI dataset for stereo, optical flow and visual odometry (Geiger, Lenz, Urtasun)
  21. Large scale 3D point cloud data from terrestrial LiDAR scanning (Andreas Nuechter)
  22. Linkoping Rolling Shutter Rectification Dataset (Per-Erik Forssen and Erik Ringaby)
  23. Middlebury College stereo vision research datasets (Daniel Scharstein and Richard Szeliski)
  24. MPI-Sintel optical flow evaluation dataset (Michael Black)
  25. Multiview stereo images with laser based groundtruth (ESAT-PSI/VISICS,FGAN-FOM,EPFL/IC/ISIM/CVLab)
  26. The Cancer Imaging Archive (National Cancer Institute)
  27. NCI Cancer Image Archive – prostate images (National Cancer Institute)
  28. NIST 3D Interest Point Detection (Helin Dutagaci, Afzal Godil)
  29. NRCS natural resource/agricultural image database (USDA Natural Resources Conservation Service)
  30. Occlusion detection test data (Andrew Stein)
  31. The Open Video Project (Gary Marchionini, Barbara M. Wildemuth, Gary Geisler, Yaxiao Song)
  32. Outdoor Ground Truth Evaluation Dataset for Sensor-Aided Visual Handheld Camera Localization (Daniel Kurz, metaio)
  33. Pics ‘n’ Trails – Dataset of Continuously archived GPS and digital photos (Gamhewage Chaminda de Silva)
  34. PRINTART: Artistic images of prints of well known paintings, including detail annotations. A benchmark for automatic annotation and retrieval tasks with this database was published at ECCV. (Nuno Miguel Pinho da Silva)
  35. RAWSEEDS SLAM benchmark datasets (Rawseeds Project)
  36. Robotic 3D Scan Repository – 3D point clouds from robotic experiments of scenes (Osnabruck and Jacobs Universities)
  37. ROMA (ROad MArkings) : Image database for the evaluation of road markings extraction algorithms (Jean-Philippe Tarel, et al)
  38. Stuttgart Range Image Database – 66 views of 45 objects
  39. UCL Ground Truth Optical Flow Dataset (Oisin Mac Aodha)
  40. Univ of Genoa Datasets for disparity and optic flow evaluation (Manuela Chessa)
  41. Validation and Verification of Neural Network Systems (Francesco Vivarelli)
  42. VSD: Technicolor Violent Scenes Dataset – a collection of ground-truth files based on the extraction of violent events in movies
  43. WILD: Weather and Illumunation Database (S. Narasimhan, C. Wang. S. Nayar, D. Stolyarov, K. Garg, Y. Schechner, H. Peri)

Posted in Computer Vision, OpenCV, Project Related, Research Menu | Tagged: , , , , , | Leave a Comment »

Computer Vision Algorithm Implementations

Posted by Hemprasad Y. Badgujar on May 6, 2014

Participate in Reproducible Research

General Image Processing

(C/C++ code, BSD lic) Image manipulation, matrix manipulation, transforms
(C/C++ code, BSD lic) Basic image processing, matrix manipulation and feature extraction algorithms: rotation, flip, photometric normalisations (Histogram Equalization, Multiscale Retinex, Self-Quotient Image or Gross-Brajovic), edge detection, 2D DCT, 2D FFT, 2D Gabor, PCA to do Eigen-Faces, LDA to do Fisher-Faces. Various metrics (Euclidean, Mahanalobis, ChiSquare, NormalizeCorrelation, TangentDistance, …)
(C/C++ code, MIT lic) A Free Experimental System for Image Processing (loading, transforms, filters, histogram, morphology, …)
(C/C++ code, GPL and LGPL lic) CImg Library is an open source C++ toolkit for image processing
Generic Image Library (GIL)boost integration
(C/C++ code, MIT lic) Adobe open source C++ Generic Image Library (GIL)
SimpleCV a kinder, gentler machine vision library
(python code, MIT lic) SimpleCV is a Python interface to several powerful open source computer vision libraries in a single convenient package
PCL, The Point Cloud Library
(C/C++ code, BSD lic) The Point Cloud Library (or PCL) is a large scale, open project for point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.
Population, imaging library in C++ for processing, analysing, modelling and visualising
(C/C++ code, CeCill lic) Population is an open-source imaging library in C++ for processing, analysing, modelling and visualising including more than 200 algorithms designed by V. Tariel.
(C/C++ code, LGPL 3) A computer vision framework based on Qt and OpenCV that provides an easy to use interface to display, analyze and run computer vision algorithms. The library is provided with multiple application examples including stereo, SURF, Sobel and and Hough transform.
Machine Vision Toolbox
(MATLAB/C, LGPL lic) image processing, segmentation, blob/line/point features, multiview geometry, camera models, colorimetry.
(Java code, Apache lic) BoofCV is an open source Java library for real-time computer vision and robotics applications. BoofCV is organized into several packages: image processing, features, geometric vision, calibration, visualize, and IO.
(C++ code, MIT lic) Simd is free open source library in C++. It includes high performance image processing algorithms. The algorithms are optimized with using of SIMD CPU extensions such as SSE2, SSSE3, SSE4.2 and AVX2.
Free but not open source – ArrayFire (formely LibJacket) is a matrix library for CUDA
(CUDA/C++, free lic) ArrayFire offers hundreds of general matrix and image processing functions, all running on the GPU. The syntax is very Matlab-like, with the goal of offering easy porting of Matlab code to C++/ArrayFire.

Image Acquisition, Decoding & encoding

(C/C++ code, LGPL or GPL lic) Record, convert and stream audio and video (lot of codec)
(C/C++ code, BSD lic) PNG, JPEG,… images, avi video files, USB webcam,…
(C/C++ code, BSD lic) Video file decoding/encoding (ffmpeg integration), image capture from a frame grabber or from USB, Sony pan/tilt/zoom camera control using VISCA interface
lib VLC
(C/C++ code, GPL lic) Used by VLC player: record, convert and stream audio and video
(C/C++ code, LGPL lic) RTSP streams
(C/C++ code, GPL lic) Loading & saving DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, TIFF, and more
(C/C++ code, LGPL lic) Loading & saving various image format
(C/C++ code, GPL & FPL lic) PNG, BMP, JPEG, TIFF loading
(C/C++ code, LGPL lic) VideoMan is trying to make the image capturing process from cameras, video files or image sequences easier.


(C/C++ code, BSD lic) Pyramid image segmentation
(C/C++ code, Microsoft Research Lic) Branch-and-Mincut Algorithm for Image Segmentation
Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo

Machine Learning

(C/C++ code, BSD lic) Gradient machines ( multi-layered perceptrons, radial basis functions, mixtures of experts, convolutional networks and even time-delay neural networks), Support vector machines, Ensemble models (bagging, adaboost), Non-parametric models (K-nearest-neighbors, Parzen regression and Parzen density estimator), distributions (Kmeans, Gaussian mixture models, hidden Markov models, input-output hidden Markov models, and Bayes classifier), speech recognition tools

Object Detection

(C/C++ code, BSD lic) Viola-jones face detection (Haar features)
(C/C++ code, BSD lic) MLP & cascade of Haar-like classifiers face detection
Hough Forests
(C/C++ code, Microsoft Research Lic) Class-Specific Hough Forests for Object Detection
Efficient Subwindow Object Detection
(C/C++ code, Apache Lic) Christoph Lampert “Efficient Subwindow” algorithms for Object Detection
INRIA Object Detection and Localization Toolkit
(C/C++ code, Custom Lic) Histograms of Oriented Gradients library for Object Detection

Object Category Labelling

Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo
Multi-label optimization
(C/C++/MATLAB code) The gco-v3.0 library is for optimizing multi-label energies. It supports energies with any combination of unary, pairwise, and label cost terms.

Optical flow

(C/C++ code, BSD lic) Horn & Schunck algorithm, Lucas & Kanade algorithm, Lucas-Kanade optical flow in pyramids, block matching.
(C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU.
(C/C++/Matlab code, Custom Lic.) The RLOF library provides GPU / CPU implementation of Optical Flow and Feature Tracking method.

Features Extraction & Matching

SIFT by R. Hess
(C/C++ code, GPL lic) SIFT feature extraction & RANSAC matching
(C/C++ code) SURF feature extraction algorihtm (kind of fast SIFT)
(C/C++ code, Ecole Polytechnique and ENS Cachan for commercial Lic) Affine SIFT (ASIFT)
VLFeat (formely Sift++)
(C/C++ code) SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, and quick shift
A GPU Implementation of Scale Invariant Feature Transform (SIFT)
(C/C++ code, GPL lic) An enhance version of RANSAC that considers the correlation between data points

Nearest Neighbors matching

(C/C++ code, BSD lic) Approximate Nearest Neighbors (Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration)
(C/C++ code, LGPL lic) Approximate Nearest Neighbor Searching


(C/C++ code, BSD lic) Kalman, Condensation, CAMSHIFT, Mean shift, Snakes
KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker
(C/C++ code, public domain) Kanade-Lucas-Tomasi Feature Tracker
(C/C++/OpenGL/Cg code, ) A GPU-based Implementation of the Kanade-Lucas-Tomasi Feature Tracker
(C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU
On-line boosting trackers
(C/C++, LGPL) On-line boosting tracker, semi-supervised tracker, beyond semi-supervised tracker
Single Camera background subtraction tracking
(C/C++, LGPL) Background subtraction based tracking algorithm using OpenCV.
Multi-camera tracking
(C/C++, LGPL) Multi-camera particle filter tracking algorithm using OpenCv and intel IPP.

Simultaneous localization and mapping

Real-Time SLAM – SceneLib
(C/C++ code, LGPL lic) Real-time vision-based SLAM with a single camera
(C/C++ code, Isis Innovation Limited lic) Parallel Tracking and Mapping for Small AR Workspaces
(C/C++ code, BSD lic) GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices

Camera Calibration & constraint

(C/C++ code, BSD lic) Chessboard calibration, calibration with rig or pattern
Geometric camera constraint – Minimal Problems in Computer Vision
Minimal problems in computer vision arise when computing geometrical models from image data. They often lead to solving systems of algebraic equations.
Camera Calibration Toolbox for Matlab
(Matlab toolbox) Camera Calibration Toolbox for Matlab by Jean-Yves Bouguet (C implementation in OpenCV)

Multi-View Reconstruction

Bundle Adjustment – SBA
(C/C++ code, GPL lic) A Generic Sparse Bundle Adjustment Package Based on the Levenberg-Marquardt Algorithm
Bundle Adjustment – SSBA
(C/C++ code, LGPL lic) Simple Sparse Bundle Adjustment (SSBA)


Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo
LIBELAS: Library for Efficient LArge-scale Stereo Matching
(C/C++ code) Disparity maps, stereo

Structure from motion

(C/C++ code, GPL lic) A structure-from-motion system for unordered image collections
Patch-based Multi-view Stereo Software (Windows version)
(C/C++ code, GPL lic) A multi-view stereo software that takes a set of images and camera parameters, then reconstructs 3D structure of an object or a scene visible in the images
libmv – work in progress
(C/C++ code, MIT lic) A structure from motion library
Multicore Bundle Adjustment
(C/C++/GPU code, GPL3 lic) Design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems.
(C/C++/GPU code, MPL2 lic) OpenMVG (Multiple View Geometry) “open Multiple View Geometry” is a library for computer-vision scientists and especially targeted to the Multiple View Geometry community. It is designed to provide an easy access to the classical problem solvers in Multiple View Geometry and solve them accurately..

Visual odometry

LIBVISO2: Library for VISual Odometry 2
(C/C++ code, Matlab, GPL lic) Libviso 2 is a very fast cross-platfrom (Linux, Windows) C++ library with MATLAB wrappers for computing the 6 DOF motion of a moving mono/stereo camera.

Posted in Apps Development, C, Computer Hardware, Computer Network & Security, CUDA, Game Development, GPU (CUDA), GPU Accelareted, Graphics Cards, Image Processing, OpenCV, PARALLEL, Simulation, Virtualization | Tagged: , , , , , , , , , , , , , , , , , , , | 3 Comments »

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione


Just another site

Algunos Intereses de Abraham Zamudio Chauca

Matematica, Linux , Programacion Serial , Programacion Paralela (CPU - GPU) , Cluster de Computadores , Software Cientifico




A great site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.


News About Tech, Money and Innovation

Chetan Solanki

Helpful to u, if u need it.....


Explorer of Research #HEMBAD


Explorer of Research #HEMBAD


A great site


This is My Space so Dont Mess With IT !!

%d bloggers like this: