Something More for Research

Explorer of Research in Information Technology

  • Blog Stats

    • 39,687 hits
  • Subscribe

  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 3,113 other followers

  • Calendar

    November 2014
    M T W T F S S
    « Oct    
     12
    3456789
    10111213141516
    17181920212223
    24252627282930
  • Goodreads

  • Spam Blocked

  • Pages

  • Blogs I Follow

  • Categories

How to add Twitter and Facebook buttons to your Moodle site

Posted by Hemprasad Y. Badgujar on November 18, 2014

Adding Twitter and Facebook like buttons to a website is always a good idea if you’d like to spread the word about your site through social networking. In this tutorial I will walk you through how to add the commonly seen buttons to your Moodle site.

First things first. Before we do anything in Moodle, Let’s get the buttons from Twitter and Facebook’s official websites.

Get Twitter “Share a link” button

1) Go to Twitter’s official buttons page: http://twitter.com/about/resources/buttons

2) Select “Share a link” and enter the desired options as shown in the figure below.

screenshot

3) Once you are happy about the button’s preview, you can keep the page open for later use.

Get Facebook “like” button

1) Go to the Facebook developers plugin page: http://developers.facebook.com/docs/reference/plugins/like/

2) Configure the like button as shown in the figure below. For the like button to work in Moodle you can only use the IFRAME version rather than the HTML5/XFBML version.

screenshot

3) Once you are happy with the button’s preview, you can click the “Get Code” button. In the popup window you need to choose the IFRAME option as shown in the figure below. Keep the page open for later use.

screenshot

Ok, now the buttons are ready for use we can dive into Moodle to add the buttons.

Step 1

In a new window, Log into Moodle as an administrator. Select the “HTML” option from the “Add a block” drop-down menu.

screenshot

Step 2

Now you should see that a new HTML block has been added. Click the configuration icon, which is the second icon, as shown in the figure below. (Your Moodle site’s configuration icon will look different )

screenshot

Step 3

On the “Configuring a (new HTML block) block” page, turn on the HTML Source Editor for the “Content” text field by clicking the HTML icon in the editor menu as shown below.

screenshot

Step 4

Copy and paste the relevant buttons’ code from the previous Twitter and Facebook pages into the HTML Source Editor and click the “Update” button.

screenshot

Step 5

Enter a block title and configure other options before saving.

Step 6

You need to turn off editing to see the changes.

screenshot

Conclusion:

Using a Moodle HTML block to add Twitter and Facebook buttons is only one way of doing it. If your Moodle theme offers you extra block/widget areas to enter HTML code you can take advantage of those as well.

For example, in our premium Moodle Theme “Ace”, you can go to the theme settings page and add the Twitter and Facebook buttons to the page’s header as shown below.

screenshot

Posted in Computer Software, Computer Softwares, Installation | Leave a Comment »

Pedestrian DATASETs for Vision based Detection and Tracking

Posted by Hemprasad Y. Badgujar on November 8, 2014

person / human / Pedestrian  identification/verification/detection / tracking /

DID

Name Description Tags

URL

254 ChokePoint Dataset We collected a video dataset, termed ChokePoint, designed for experiments in person identification/verification under real-world surveillance conditions link
253 Street View House Number (SVHN) SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatti… streetview number link
252 Volleyball Activity Dataset 2014 This dataset contains 7 challenging volleyball activity classes annotated in 6 videos from professionals in the Austrian Volley League (season 2011/12). A total… action activity sport volleyball detection recognition video analysis link
251 ETHZ CVL RueMonge 2014 This ETHZ CVL RueMonge 2014 dataset used for 3D reconstruction and semantic mesh labelling for urban scene understanding. It was first published in [1] and p… semantic segmentation 3d reconstruction architecture paris benchmark source code urban recognition classification outdoor pointcloud mesh link
250 Curve Text (CUTE80) Curve text dataset, namely CUTE80 that consists of 80 curve text line images with complex background, perspective distortion effect and poor resolution effect (… text, detection, recognition, classification link
249 Image Sequence Analysis Test Site (EISATS) The .enpeda.. Image Sequence Analysis Test Site (EISATS) offers sets of long bi- or trinocular image sequences recorded in the context of vision-based driver as… stereo vision optical flow motion analysis semantic segmentation link
248 VIDEO datasets overview Many different labeled video datasets have been collected over the past few years, but it is hard to compare them at a glance. So we have created a handy spread… video benchmark recognition classification detection object action link
247 PASCAL VOC Parts The PASCAL VOC is augmented with segmentation annotation for semantic parts of objects. For example, for the person category, we provide segmentation mask for 2… detection recognition pascal object part pedestrian human segmentation semantic link
246 Bristol Egocentric Object Interactions Dataset The BEOID dataset includes object interactions ranging from preparing a coffee to operating a weight lifting machine and opening a door. The dataset is recorded… video interaction object ecocentric pose 3d tracking link
245 Video SumMe We provide a new dataset (SumMe) consisting of 25 videos, each annotated with at least 15 human summaries (390 in total). The data consists of videos, annotatio… video summary benchmark human link
244 Pedestrian Parsing on Surveillance Scenes (PPSS) dataset The Pedestrian Parsing dataset contains 3,673 images from 171 videos of different Surveillance Scenes (PPSS), where 2,064 images are occluded and 1,609 are not…. Pedestrian, Parsing, Segmentation link
243 PEdesTrian Attribute (PETA) dataset A new large-scale PEdesTrian Attribute (PETA) dataset. The dataset is by far the largest of its kind, covering more than 60 attributes on 19000 images. In comp… Pedestrian, crowd link
242 Stanford Dogs Dataset The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for… classification, detection, fine-grained categorization, dogs link
241 Malaya Abrupt Motion (MAMo) Dataset The Malaya Abrupt Motion (MAMo) dataset is targeted for visual tracking, particularly for abrupt motion tracking. It was collected from publicly accessible data… visual tracking, abrupt motion tracking link
240 Microsoft COCO – Common Objects in Context Microsoft COCO is a new image recognition and segmentation dataset that will be released in Summer 2014. Microsoft COCO has several features: More than 70 ca… object context segmentation detection recognition benchmark link
239 CUHK crowd dataset CUHK crowd dataset introduces the largest publicly available crowd dataset of 474 videos from 215 crowded scenes. It has been used in the paper: “Scene-Independ… crowd analysis, group detection and analysis, scene understanding link
238 Multiple Foreground Video Co-segmentation The multiple foreground video co-segmentation dataset, consisting of four sets, each with a video pair and two foreground objects in common. The dataset includ… video co-segmentation link
237 MOViCS video co-segmentation dataset The video co-segmentation dataset contains 4 video sets which totally has 11 videos with 5 frames of each video labeled with the pixel-level ground-truth…. video co-segmentation dataset link
236 iCoseg dataset iCoseg dataset introduces the largest publicly available co-segmentation dataset of 38 groups (643 images), along with pixel ground-truth hand annotations…. image co-segmentation link
235 Kindergarten Video Surveillance The dataset consist of the about 50 hours obtained from kindergarten surveillance videos. Dataset, totally approximately 100 videos sequences (1000GB, 50 hours)… Human action behavior segmentation video background, video surveillance. link
234 UMD Dynamic Scene Recognition Dataset Dataset to classify dynamic scenes. The dataset has been described and used in the paper: “Moving Vistas: Exploiting Motion for Dynamic Scenes”, Shroff, N… Scene link
233 PASCAL Context We would like to announce the release of PASCAL-Context dataset. We augmented PASCAL VOC 2010 dataset with annotations for 400+ additional categories. In the cu… semantic segmentation pascal benchmark category recognition dense shape link
232 Pratheepan Human Skin Detection Dataset The images in this dataset are downloaded randomly from Google for human skin detection research. It has been used in the paper: W.R. Tan, C.S. Chan, Y. Prathee… skin detection, skin segmentation, human detection, skin dataset link
231 RGB-D human detection and tracking dataset The dataset was captured in a real-world clothing store using Kinect (640*480, about 30fps). Publications: [1] J. Liu, Y. Liu, Y. Cui, and Y. Q. Chen, Real-ti… Human detection, people tracking, scene understanding, RGB-D, Kinect, depth, pose link
230 FGVC-Aircraft Fine-Grained Visual Classification of Aircraft (FGVC-Aircraft) is a benchmark dataset for the fine grained visual categorization of aircraft. Data, annotatio… fine-grain classification recognition benchmark evaluation aircraft link
229 Paris Rue Madame Paris-rue-Madame dataset contains 3D Mobile Laser Scanning (MLS) data from rue Madame, a street in the 6th Parisian district (France). The test zone contains ap… semantic segmentation pointcloud 3d laser classification link
228 MPI VehicleScenes Abstract Scene understanding has (again) become a focus of computer vision research, leveraging advances in detection, context modeling, and tracking. In thi… semantic segmentation scene understanding classification 3d car pedestrian link
227 Omnidirectional and panoramic image dataset We share our omnidirectional and panoramic image dataset (with annotations) to be used for human and car detection. Please reach through: http://cvrg.iyte.edu…. panorama detection car omnidirection human recognition link
226 Fish4Knowledge The Fish4Knowledge project (groups.inf.ed.ac.uk/f4k/) is pleased to announce the availability of 2 subsets of our tropical coral reef fish video and extracted… classification animal fish video motion nature recognition water camera link
225 California-ND An Annotated Dataset For Near-Duplicate Detection In Personal Photo Collections Managing photo collections involves a variety of image quality assessment tas… retrieval duplicate copyright groundtruth detection link
224 Extreme View Dataset 15 wide baseline stereo image pairs with large viewpoint change, provided ground truth homographies. Image size (~1000×700 pixels, RGB) D. Mishkin and M. … feature detection description matching viewpoint link
223 SHOT 3D shape description The 3D shape description dataset consists of multiple sub-datasets Descriptor Matching – Dataset 1 & 2 (Stanford) These datasets, created from some of the m… 3d shape descriptor benchmark reconstruction registration matching link
222 Ford Car Dataset The Ford Car dataset is joint effort of Pandey et al. (for collecting images, Lidar points, calibration etc.) and us (for annotation of 2D and 3D objects). … car detection lidar 3d groundtruth sfm link
221 EPFL Multi-View Cars Th EPFL Multi-View Car dataset contains 20 sequences of cars as they rotate by 360 degrees. There is one image approximately every 3-4 degrees. Using the time o… pose multiview car detection estimation rotation link
220 3D Mask Attack Dataset The 3D Mask Attack Database (3DMAD) is a biometric (face) spoofing database. It currently contains 76500 frames of 17 persons, recorded using Kinect for both re… 3d biometry face recognition segmentation front-view emotion link
219 JPL First-Person Interaction JPL First-Person Interaction dataset (JPL-Interaction dataset) is composed of human activity videos taken from a first-person viewpoint. The dataset particularl… video action recognition interactive motion human link
218 VidPairs The VidPairs dataset contains 133 pairs of images, taken from 1080p HD (~2 megapixel) official movie trailers. Each pair consists of images of the same scene wi… video pair matching patch descriptor flow dense optical link
217 Youtube-Objects dataset The YouTube-Objects dataset is composed of videos collected from YouTube by querying for the names of 10 object classes. It contains between 9 and 24 videos for… video object detection segmentation flow optical link
216 CVC Partial Occlusion Virtual Pedestrian The CVC Partial Occlusion Virtual Pedestrian datasets (CVC-01 to CVC-06) cover a range of scenarios of occluded pedestrians generated in a virtual and real envi… detection classification tracking pedestrian synthetic urban occlusion link
215 WILD -Weather and Illumunation Database The Weather and Illumunation Database (WILD) is an extensive database of high quality images of an outdoor urban scene, acquired every hour over all seasons. It… webcam light illumination camera video static change urban time depth estimation weather newyork link
214 The Webcam Clip Art Dataset This is a subset of the dataset introduced in the SIGGRAPH Asia 2009 paper, Webcam Clip Art: Appearance and Illuminant Transfer from Time-lapse Sequences. As… webcam light illumination camera video static change urban nature time link
213 ChairGest Gestures ChairGest is an open challenge / benchmark. The task consists in spotting and recognizing gestures from multiple synchronized sensors: 1 Kinect and 4 Xsens Ine… benchmark recognition kinect gesture detection human link
212 polo instance segmentation The polo instance segmentation dataset is a semantic segmentation task for hough transform based segmentation masks. … n/a
211 POSTECH Labeled Faces in the Wild POS Labeled Faces in the Wild, a collection of face which is proposed for studying face identification in unconstrained environment, its purpose is serving as a… face identification wild recognition registration link
210 Traffic Video dataset The Traffic Video dataset consists of X video of an overhead camera showing a street crossing with multiple traffic scenarios. The dataset can be downloaded … urban traffic tracking detection overhead view road video link
209 Symmetry Set The Symmetry set dataset is a collection of images at different illuminations for the purpose of image matching using local symmetry features. Image Matching… symmetry matching feature image illumination lighting urban building link
208 Landmark 1000 The Landmark 1000 or 1k dataset is a collection of the top 1000 popular flickr landmarks mined from flickr. It is maintained by Noah Snavely and published in… landmark 3d reconstruction pose estimation pointcloud world location link
207 CASIA Gait Recognition Dataset Dataset A (former NLPR Gait Database) was created on Dec. 10, 2001, including 20 persons. Each person has 12 image sequences, 4 sequences for each of the three … gait recognition biometry action classification motion human link
206 GaTech VideoContext The GaTech VideoContext dataset consists of over 100 groundtruth annotated outdoor videos with over 20000 frames for the task of geometric context evaluation i… video geometry context classification semantic segmentation unsupervised supervised outdoor urban nature link
205 GaTech VideoStab The GaTech VideoStab dataset consists of N videos for the task of video stabilization. This code is implemented in Youtube video editor for stabilization. … video stabilization camera path link
204 UCF Person and Car VideoSeg The UCF Person and Car VideoSeg dataset consists of six videos with groundtruth for video object segmentation. Surfing, jumping, skiing, sliding, big car, sm… video segmentation object motion model camera groundtruth link
203 GaTech VideoSeg The GaTech VideoSeg dataset consists of two (waterski and yunakim?) video sequences for object segmentation. There exists no groundtruth segmentation annotat… video segmentation object motion model camera link
202 GaTech SegTrack The SegTrack dataset consists of six videos (five are used) with ground truth pixelwise segmentation (6th penguin is not usable). The dataset is used for accura… video segmentation object proposal flow optical motion model camera stationary groundtruth link
201 50 Salads The dataset captures 25 people preparing 2 mixed salads each and contains over 4h of annotated accelerometer and RGB-D video data. Annotated activities correspo… action activity recognition classification detection tracking video link
200 Landmark 3D This dataset provides a collection of web images and 3D models for research on landmark recognition (especially for methods based on 3D models). We hope it coul… landmark recognition classification retrieval 3d reconstruction codebook matching feature flickr link
199 THUR15000 We introduce a labeled dataset of categorized images for evaluating sketch based image retrieval. Using Flickr, we downloaded about 3000 images for each of the … group saliency object detection visual attention sketch shape retrieval internet link
198 THUS10000 The THUS10000 benchmark dataset comprises of 10,000 images, each of which has an unambiguous salient object and the object region is accurately annotated with p… segmentation saliency object detection visual attention link
197 Stanford Background Dataset The Stanford Background Dataset is a new dataset introduced in Gould et al. (ICCV 2009) for evaluating methods for geometric and semantic scene understanding. T… semantic segmentation urban classification nature geometry link
196 New College Data The New College Data Set contains 30GB of data intended for use by the mobile robotics and vision research communities. Our anticipated users are parties intere… odometry urban path 3d reconstruction panorama stereo navigation link
195 Yotta The Yotta dataset consists of 70 images for semantic labeling given in 11 classes. It also contains multiple videos and camera matrices for 14km or driving. … semantic segmentation urban video camera 3d reconstruction classification link
194 HCI 4D Lightfields The HCI 4D Lightfields dataset contains 11 objects with corresponding lightfields for depth estimation. Datasets can be downloaded individually below. For ma… 3d 4d lightfield benchmark depth reconstruction evaluation link
193 City planar and non-planar The city planar and non-planar datset consists of urban scenes accompanied by text files describing the plane/non-plane locations. Training Set (University)… plane detection 3d urban building estimation link
192 Our Database of Faces The Our Database of Faces (ORL) dataset contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different tim… face recognition illumination human expression link
191 Daimler Mono Pedestrian Classification Benchmark The Daimler Mono Pedestrian Classification Benchmark dataset consists of two parts: a base data set. The base data set contains a total of 4000 pedestrian- a… pedestrian classification outdoor urban object scale illumination link
190 Daimler Mono Pedestrian Detection Benchmark The Daimler Mono Pedestrian Detection Benchmark dataset contains a large training and test set. The training set contains 15.560 pedestrian samples (image cut-o… pedestrian detection outdoor urban mono scale object link
189 Farman Institute 3D Point Sets The Farman Institute 3D Point Sets dataset contains 11 objects by a 3D laser scanner. This dataset was peer-reviewed by Image Processing On Line: Farman Institu… 3d laser scanner object reconstruction model point link
188 KTH Multiview Football The KTH Multiview Football dataset contains 771 images of football players includes images taken from 3 views at 257 time instances 14 annotated body joints … multiview pedestrian tracking detection object camera outdoor game soccer pose recognition link
187 Aspect Layout dataset The Aspect Layout dataset is designed to allow evaluation of object detection for aspect ratios in perspective images. Author text: In this project we see… detection object aspect ratio perspective layout link
186 Symmetry Facades The Symmetry Facades dataset contains 9 building facades with multiple images. It used for coupled symmetry and structure from motion detection. Coupled Str… symmetry facade building urban reconstruction sfm 3d repetition link
185 Kung-Fu fighter Multi-View The test sequences provide interested researchers a real-world multi-view test data set captured in the blue-c portals. The data is meant to be used for testing… multiview tracking segmentation camera action link
184 MSR Learning to Rank The MSR Learning to Rank are two large scale datasets for research on learning to rank: MSLR-WEB30k with more than 30,000 queries and a random sampling of it MS… rank learning sampling search link
183 MSR RGB-D 7-Scenes The MSR RGB-D Dataset 7-Scenes dataset is a collection of tracked RGB-D camera frames. The dataset may be used for evaluation of methods for different applicati… depth video kinect tracking location reconstruction link
182 MSR Action The MSR Action datasets is a collection of various 3D datasets for action recognition. See details http://research.microsoft.com/en-us/um/people/zliu/action… video action recognition detection reconstruction 3d link
181 All I Have Seen (AIHS) The All I Have Seen (AIHS) dataset is created to study the properties of total visual input in humans, for around two weeks Nebojsa Jojic wore a camera capturin… video summary user study clustering similarity outdoor indoor scene 3d link
180 Airport MotionSeg The Airport MotionSeg dataset contains 12 sequences of videos of an aiprort scenario with small and large moving objects and various speeds. It is challenging b… motion segmentation airport video clustering camera zoom link
179 CMP Facades The CMP Facade dataset consists of facade images assembled at the Center for Machine Perception, which includes 600 rectified images of facades from various sou… facade rectification urban semantic classification recognition structure similarity link
178 VSUMM (Video SUMMarization) The VSUMM (Video SUMMarization) dataset is of 50 videos from Open Video. All videos are in MPEG-1 format (30 fps, 352 x 240 pixels), in color and with sound. Th… video summary type user study keyframe static similarity link
177 SIPI textures The Textures volume currently contains 154 images, all monochrome, 129 512×512 and 25 1024×1024. For the Brodatz texture images, the number in parenthesis (i… texture, segmentation, classification, benchmark, synthetic, evaluation link
176 Brodatz texture The Brodatz dataset consists of 112 textures in grayscale images of various texture types. http://www.ee.oulu.fi/research/imag/texture/image_data/Brodatz32.h… texture, segmentation, classification, benchmark, synthetic link
175 Outex texture bench The Outex dataset is part of a framework for empirical evaluation of texture classification and segmentation algorithms. The framework is being constructed acc… texture, segmentation, classification, benchmark, synthetic link
174 Pittsburgh Fast-food Image dataset The Pittsburgh Fast-food Image dataset (PFID) consists of 4545 still images, 606 stereo pairs, 3033600 videos for structure from motion, and 27 privacy-preservi… food recognition classification reconstruction video laboratory real link
173 MuHAVi and MAS human action The Multicamera Human Action Video Data (MuHAVi) Manually Annotated Silhouette Data (MAS) are two datasets consisting of selected action sequences for the eval… human action behavior segmentation video background link
172 DynTex dataset The DynTex dataset consists of a comprehensive set of Dynamic Textures. Dynamic, or temporal, texture is a spatially repetitive, time-varying visual pattern tha… texture, segmentation, dynamic, synthetic, video repetition link
171 CHALEARN Multi-modal Gesture Challenge The CHALEARN Multi-modal Gesture Challenge is a dataset +700 sequences for gesture recognition using images, kinect depth, segmentation and skeleton data. ht… gesture, kinect, recognition, human, action, illumination, depth, segmentation, skeleton link
170 Sheffield Kinect Gesture (SKIG) dataset The Sheffield Kinect Gesture (SKIG) dataset contains 2160 hand gesture sequences (1080 RGB sequences and 1080 depth sequences) collected from 6 subjects. … gesture, kinect, recognition, human, action, illumination, depth link
169 QMUL Junction Dataset The QMUL Junction dataset is a busy traffic scenario for research on activity analysis and behavior understanding. Video length: 1 hour (90000 frames) Fra… detection tracking crowd counting pedestrian video motion behavior link
168 Mall Dataset The Mall dataset was collected from a publicly accessible webcam for crowd counting and profiling research. Ground truth: Over 60,000 pedestrians were label… detection tracking crowd counting pedestrian indoor video webcam link
167 Text and Vision (TVGraz) Dataset The Text and Vision (TVGraz) dataset is an annotated multi-modal dataset which currently contains 10 visual object categories, 4030 images and associated text. … text appearance classification evaluation link
166 ICG Multi-Camera Datasets The ICG Multi-Camera datasets consist of Easy Data Set (just one person) Medium Data Set (3-5 persons, used for the experiments) Hard Data Set (crowded sc… multiview pedestrian tracking detection object camera calibration graz indoor video link
165 ICG Multi-Camera and Virtual PTZ The ICG Multi-Camera and Virtual PTZ dataset contains the video streams and calibrations of several static Axis P1347 cameras and one panoramic video from a sph… multiview pedestrian tracking detection object camera calibration graz network video panorama crowd outdoor link
164 ICG Lab 6 (Multi-Camera Multi-Object Tracking) The ICG Lab 6 (Multi-Camera Multi-Object Tracking) dataset contains 6 indoor people tracking scenarios recorded at our laboratory using 4 static Axis P1347 came… multiview pedestrian tracking detection object laboratory camera calibration evaluation segmentation graz link
163 ICG Longterm Pedestrian Dataset The Longterm Pedestrian dataset consists of images from a stationary camera running 24 hours for 7 days at about 1fps. It used for adaptive detection and backg… pedestrian change detection background illumination robust indoor coffee graz link
162 ICG PRID 2011 The Person Re-ID (PRID) 2011 dataset was created in co-operation with the Austrian Institute of Technology for the purpose of testing person re-identification a… pedestrian classification identification multiview trajectory illumination appearance change graz link
161 ICG Annotated Facial Landmarks in the Wild (AFLW) The Annotated Facial Landmarks in the Wild (AFLW) consists of a large-scale collection of annotated face images gathered from the web, exhibiting a large variet… face detection landmark pose age annotation link
160 Caltech Lanes Dataset The Caltech Lanes dataset includes four clips taken around streets in Pasadena, CA at different times of day. The archive below includes 1225 individual frame… urban road lane detection caltech pasadena link
159 Caltech Game Covers Dataset The Caltech Game Covers dataset consists of CD/DVD covers of video games. The set was downloaded from freecovers.net during the summer of 2008. The set includes… classification retrieval game cover caltech hierarchy taxonomy link
158 Caltech Buildings Dataset The Caltech Buildings dataset consists of images taken for 50 buildings around the Caltech campus. Five different images were taken for each building from diffe… building urban retrieval hierarchy taxonomy caltech link
157 Background Models Challenge (BMC) Background Models Challenge (BMC) is a dataset and competition for the comparison of background subtraction algorithms. The main topics are detection of movi… background model change motion segmentation surveillance video synthetic link
156 KUL Belgium Traffic Signs BelgiumTS is a large dataset with 10000+ traffic sign annotations, thousands of physically distinct traffic signs. 4 video sequences recorded with 8 high resolu… traffic sign classification urban road belgium camera calibration link
155 KUL Belgium Traffic Sign Classification BelgiumTSC dataset is built for traffic sign classification purposes. Is is a subset of BelgiumTS dataset and contains cropped images around annotations for 62 … traffic sign classification urban road belgium link
154 WordNet WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a di… language, hierarchy, imagenet, classification link
153 MSRC Kinect Gesture Dataset The Microsoft Research Cambridge-12 Kinect gesture dataset consists of sequences of human movements, represented as body-part locations, and the associated gest… gesture, kinect, recognition, human, action link
152 Colosseum and San Marco The Colosseum and San Marco are two image datasets for dense multiview stereo reconstructions used for evaluating the visual photo realism. The datasets are … 3d, reconstruction, landmark, urban, sfm, aerial, streetside, flickr link
151 People in WBCN This dataset is for people tracking in wide baseline camera networks and was designed as a contest at ICPR 2012. The contest consists of two challenges: … detection, tracking, pedestrian, trajectory, crowd, overlap, occlusion, aerial link
150 SDHA Contest The Semantic Description of Human Activities (SDHA) was a contest at ICPR 2010. The contest is composed of three different types of activity recognition cha… detection, tracking, pedestrian, trajectory, crowd, overlap, occlusion, aerial link
149 NYU Depth v2 The NYU-Depth V2 data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinec… semantic segmentation depth kinect label reconstruction link
148 NYU Depth v1 The NYU-Depth data set is comprised of video sequences from a variety of indoor scenes as recorded by both the RGB and Depth cameras from the Microsoft Kinect. … semantic segmentation depth kinect label reconstruction link
147 FlickrLogos-32 The FlickrLogos-32 dataset contains photos showing brand logos and is meant for the evaluation of multi-class logo recognition as well as logo retrieval methods… flickr, logo, detection, retrieval, image, object recognition, machine learning, classification brand bounding box link
146 Multiple Instance Learning dataset MIL data sets used in our 2002 NIPS paper for Elepphant, Musk, TREC http://www.cs.cmu.edu/~juny/MILL/MIL-experiments.htm… machine learning, classification link
145 KnapSack KNAPSACK_01 is a dataset directory which contains some examples of data for 01 Knapsack problems. In the 01 Knapsack problem, we are given a knapsack of fixe… machine learning, classification link
144 MNIST hand-written letters The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of … text, classification, letter link
143 KITTI Odometry http://www.cvlibs.net/datasets/kitti/eval_odometry.php Related Datasets TUM RGB-D Dataset: Indoor dataset captured with Microsoft Kinect and high-accuracy… registration, localization, odometry, slam, matching, navigation, urban path 3d reconstruction link
142 German Traffic Sign Recognition Benchmark The German Traffic Sign Recognition Benchmark is a dataset for multi-class detection problem in natural images and do cordially invite you to participate. The b… detection, traffic, urban, recognition link
141 Berkeley Multimodal Human Action Database (MHAD) The Berkeley Multimodal Human Action Database (MHAD) contains 11 actions performed by 7 male and 5 female subjects in the range 23-30 years of age except for on… action classification multiview motion recognition link
140 RGB-D Person Re-identification The RGB-D Person Re-identification dataset is for person re-identification using depth information. The main motivation is that the standard techniques (such as… identification, classification, shape, depth, pedestrian, 3d link
139 image panorama gdbicp Generalized Dual Bootstrap-ICP Algorithm … registration, panorama, matching link
138 Buffy The Buffy dataset contains images selected from the TV series, Buffy: the Vampire Slayer. We select a set of 452 images from the first two episodes for training… segmentation, detection, buffy, movie, human link
137 Synthetic CAD models The Synthetic CAD Models dataset consists of X synthetic CAD models for detection (planar) primitives. Efficient RANSAC for Point-Cloud Shape Detection Ruwe… model, ransac, 3d object, reconstruction, primitive, synthetic link
136 3D Object in Clutter Recognition and Segmentation The dataset is composed of 150 synthetic scenes, captured with a (perspective) virtual camera, and each scene contains 3 to 5 objects. The model set is composed… recognition, segmentation, mesh, synthetic link
135 Quad 6K The Quad 6K dataset is a Structure-from-Motion dataset taken at Arts Quad at Cornell University campus and consists of 6514 images with ground truth positions o… reconstruction, sfm, urban, groundtruth, landmark, 3d gps link
134 Image Memorability Image memorability dataset contains target and filler images, precomputed features and annotations, and memorability. It gives features and annotations for t… aesthetics, semantic, quality, memorability link
133 Aesthetic Photo.net The Photo.net dataset contains 20,278 images with properties to aesthetics, emotion, and image quality. The only differences are that it is a larger dataset, an… aesthetics, quality, emotion link
132 Aesthetic Visual Analysis Aesthetic Visual Analysis (AVA) dataset studies the organization of content by aesthetic preference. It contains over 250,000 images along with a rich variety o… aesthetics, semantic, quality, memorability link
131 Dubrovnik6K and Rome16K The Dubrovnik6K and Rome16K datasets are image collections for SfM reconstruction, where the suffix refers to the number of images in the dataset. Dubrovnik6… reconstruction, sfm, urban, landmark, dubrovnik, rome link
129 NBVbench The NBVbench is a reference object and benchmark criteria for defining and evaluating the performance of a next best view (NBV) method. … reconstruction, view, planning, geometry link
128 The Stanford 3D Scanning Repository The Stanford 3D Scanning Repository dataset is a compilation of 3D scans of objects like Stanford Bunny, Happy Buddha, Dragon, Armadillo and Lucy. These contain… reconstruction, laser, bunny, triangulation link
127 Stable Structure from Motion The Stable Structure from Motion datasets due to size limitations cannot put the images online. Instead here are the tracked image points and the final reconstr… sfm, reconstruction, geometry, stability, robust, 3d, landmark, church link
126 ISPRS Urban Classification ISPRS Test Project on Urban Classification and 3D Building Reconstruction http://www.itc.nl/ISPRS_WGIII4/tests_datasets.html fix details…. 3d, reconstruction, building, urban, city link
125 Google Street View Pittsburgh Research The Google Street View Pittsburgh Research dataset is a street-level image collection provided by Google for research purposes. The dataset provided here co… 3d, reconstruction, sfm, urban, pittsburgh, panorama n/a
124 CMU Geometric Context The CMU Geometric Context dataset by Derek Hoiem, Alexei A. Efros, Martial Hebert consists of 300 images used for training and testing the geometric context met… reconstruction, single view, depth, context, geometry link
123 CMU/VMR Urban Image+Laser CMU/VMR Urban Image+Laser dataset contains 372 images linked with 3D laser points projections. There are additional images (due to the laser scanner being turne… reconstruction, sfm, urban, semantic, segmentation, laser link
122 Symmetric Bundle Adjustment The Symmetric Bundle Adjustment dataset contains four sequences of the CAB building, Barcelona, Redmond and Capitole for 3D reconstruction considering symmetrie… reconstruction, sfm, urban, bundle adjustment, symmetry link
121 Oakland 3D This repository contains labeled 3-D point cloud laser data collected from a moving platform in a urban environment. Data are provided for research purposes. … reconstruction, sfm, urban, semantic, segmentation, laser link
120 Samantha The SAMANTHA (Structure-and-Motion Pipeline on a Hierarchical Cluster Tree) dataset contains 4 sequences for 3D reconstruction: Pozzoveggiani, Piazza Dante, Pia… reconstruction, sfm, landmark, model, geometry link
119 AdelaideRMF AdelaideRMF: Robust Model Fitting Data Set AdelaideRMF is a data set for robust geometric model fitting (homography estimation and fundamental matrix estimat… feature, matching, geometry, model link
118 Hermann Maier Nagano 1998 The Hermann Maier Nagano 1998 dataset is used for deformable extremely difficult tracking scenario. There are 4 parts to this sequence a) 00088 – 00554 Maie… tracking, single, trajectory, clutter, deformation n/a
117 YorkUrbanDB The York Urban Line Segment Database is a compilation of 102 images (45 indoor, 57 outdoor) of urban environments consisting mostly of scenes from the campus of… vanishing, point, pose, urban, reconstruction, outdoor, geometry, manhattan link
116 Sheffield Building Sheffield Building Image Dataset consists of over 3,000 low-resolution images of forty different buildings – typically between 70 and 120 images per building. T… retrieval, classification, urban, sheffield link
115 Pankrac Marseille Our repetitive pattern dataset with 106 images of app. 30 buildings from Pankrac, Prague and Marseille appearing in more than one image, number of appearances r… classification, retrieval, symmetry, repetition, urban link
114 TUD Shapes 1+2 This material is supplementary to Michael Stark, Bernt Schiele. How Good are Local Features for Classes of Geometric Objects. Eleventh IEEE International C… shape object classification tool binary link
113 Penn-Fudan Pedestrian Penn-Fudan Pedestrian Detection and Segmentation… pedestrian detection segmentation background motion link
112 SHREC Unlike the previous SHREC contests, the objective of this SHREC 2012 contest is to evaluate the performance of 3D-mesh segmentation techniques instead of evalua… segmentation, mesh, part, 3d link
111 Grabcut To evaluate our method we designed a new ground truth database of 50 images. The following zip-files contain: Data, Segmentation, Labelling – Lasso, Labelling -… segmentation, bounding box, color link
110 EITZ Sketch Quality Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available … shape, matching, retrieval, partial, sketch link
109 EITZ Sketch-Based Image Retrieval We introduce a benchmark for evaluating the performance of large scale sketch-based image retrieval systems. The necessary data is acquired in a controlled user… shape, matching, retrieval, partial, sketch link
108 ICG Sketch Retrieval The ICG Sketch Retrieval dataset consists of XXX hand-drawn sketches for five categories. It is used for content-based image retrieval using shape features for … shape, matching, retrieval, partial, sketch n/a
107 BIWI Pedestrians We provide the three datasets used for testing our system for our ICCV 2007 publication, including annotations. Data was recorded using a pair of AVT Marlins mo… detection, tracking, pedestrian, trajectory, crowd, overlap, occlusion link
106 BIWI Walking Pedestrians (EWAP) The BIWI Walking Pedestrians (EWAP) dataset shows walking pedestrians in busy scenarios from a bird eye view. Manually annotated. Data used for training in our … detection, tracking, pedestrian, trajectory, crowd, overlap, occlusion, aerial link
105 MSR 3D Video These sequences were used for our video interpolation work described in High-quality video view interpolation using a layered representation, C.L. Zitnick, … reconstruction, camera, segmentation, depth link
104 Make3D Depth The Make3D Depth dataset s designed to learn features to estimate scene depth from a single image. This dataset contains aligned image and range data: Make3… depth, learning, single view, outdoor, indoor link
103 COIL-100 The COIL-100 (Columbia University Image Library) consists of 100 objects. For formal documentation look at the corresponding compressed technical report, [gzipp… classification, retrieval link
102 Tiny Images The Tiny Images dataset consists of 79,302,017 images, each being a 32×32 color image. This data is stored in the form of large binary files which can be accese… classification, tiny, color, retrieval link
101 CIFAR-10 / 100 The CIFAR-10 dataset consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. … classification, tiny, color, patch, scene, object link
100 Sowerby The Sowerby dataset contains 105 images for semantic segmentation…. semantic, segmentation, outdoor n/a
99 BSDS500 This new dataset is an extension of the BSDS300, where the original 300 images are used for training / validation and 200 fresh images, together with human anno… segmentation, edge, contour, detection link
98 BSDS300 The goal of this work is to provide an empirical basis for research on image segmentation and boundary detection. To this end, we have collected 12,000 hand-la… segmentation, edge, contour, detection link
97 KU Leuven Facade The KU Leuven Facade dataset is used for architectural styles classification. M. Mathias, A. Martinovic, J. Weissenberg, S. Haegler, L. Van Gool: Automatic A… classification, architecture, urban link
96 USPS Handwritten Digits Name: Classes Train. Ex. Test. Ex. Features USPS 10 7291 2007 256 8-bit grayscale images of “0″ through “9″; handwritten digits; … text, recognition, classification, handwritten link
95 Stroke Width Transform Text Stroke Width Transform Text dataset is by Boris Epstein and consists of 307 images and XXX text instances. Detecting Text in Natural Scenes with Stroke Wid… text, detection, recognition, classification link
94 Chars74K The Chars74K dataset consists of 64 classes (0-9, A-Z, a-z), 7705 characters obtained from natural images, 3410 hand drawn characters using a tablet PC, 62992 s… text, detection, recognition, classification link
93 Street View Text The Street View Text (SVT) dataset contains 647 words and 3796 letters in 249 images harvested from Google Street View. The dataset is more challenging becaus… text, detection, recognition, classification, outdoor, urban link
92 ICDAR 2011 This challenge is set up around three tasks: Text Localisation, Text Segmentation and Word Recognition. Participation in any or all tasks is welcome. Check the … text, detection, recognition, classification link
91 ICDAR 2003 The ICDAR 2003 datasets available for download on this site: Robust Reading , Robust Word Recognition , Robust OCR , Text Locating and Cursive Script . Pleas… text, detection, recognition, classification link
90 eTrims The eTrims dataset is comprised of two datasets, the 4-Class eTRIMS Dataset with 4 annotated object classes and the 8-Class eTRIMS Dataset with 8 annotated obje… semantic, segmentation, urban, reconstruction link
89 Corel Photo Gallery This image database is a part of the “Corel Gallery Magic” (commercial product). It contains 80000 images divided into 800 categories of 100 images. These image… semantic, segmentation, outdoor n/a
88 Change Detection The dataset folder contains 7 folders (one for each category). Each category folder contains 4 to 6 folders (one for each video). Each video folder contains: … change, detection, background link
87 Simpsons 40 years Simpsons Homer 40 years is a dataset showing Homer Simpson over the course of 40 years. It is used for video segmentation and shape matching between frames…. video, segmentation, shape, matching n/a
86 ICG Graz240 The ICG Graz240 dataset consists of 240 buildings with 5400 redundant images with a total of 5542 window instances. Window detection itself is difficult due to … segmentation, detection, semantic, urban, graz n/a
85 Leaves The Leaves dataset from X contains X images of leaves. Leaves dataset taken by Markus Weber. California Institute of Technology PhD student under Pietro Per… shape, binary, matching, retrieval, partial n/a
84 Aachen Retrieval The Aachen dataset consists of 4479 images taken with multiple cameras (3GB), 369 query images taken with the camera of a mobile phone together with their SIFT … retrieval, aachen, landmark, sfm, reconstruction link
83 Ikonos Aerial Since its launch in September 1999, Space Imaging IKONOS earth imaging satellite has provided a reliable stream of image data that has become the standard for c… reconstruction, sfm, urban, aerial link
82 Zurich City Hall Zurich City Hall dataset (also CIPA dataset) nformation: Place: City Hall, Zurich, Switzerland Number of Images: 15, 1280 x 1000 pixels Camera: Fuji DS 30… reconstruction, sfm, urban, zurich link
81 Zurich Hoengg Zurich Hoengg (Switzerland) is an aerial dataset. The dataset consists of 4 aerial images in colour (Figures 2-5), scanned with 14 microns, the format is Ti… aerial, semantic, segmentation, outdoor link
80 Hopkins 155 The Hopkins 155 Dataset has been created with the goal of providing an extensive benchmark for testing feature based motion segmentation algorithms. It contains… flow, stereo, motion, segmentation, urban link
79 LabelMe The goal of LabelMe is to provide an online annotation tool to build image databases for computer vision research. You can contribute to the database by visitin… segmentation, semantic, outdoor, detection, urban, software link
78 Caltech Pedestrian The Caltech Pedestrian Dataset consists of approximately 10 hours of 640×480 30Hz video taken from a vehicle driving through regular traffic in an urban environ… pedestrian, detection, urban link
77 Daimler Stereo Pedestrian Daimler Stereo Pedestrian Detection Benchmark C. Keller, M. Enzweiler, and D. M. Gavrila, A New Benchmark for Stereo-based Pedestrian Detection, Proc. of th… pedestrian, detection, urban link
76 Daimler Pedestrian Classification Daimler Multi-Cue, Occluded Pedestrian Classification Benchmark Training and test samples have a resolution of 48 x 96 pixels with a 12-pixel border around t… detection, classification, pedestrian, urban link
75 ETHZ Shape The ETHZ Shape classes dataset from Vittorio Ferrari [?] consists of five object classes and a total of 255 images. All classes contain significant intra-class … shape, detection, matching, segmentation, clutter, applelogo, bottle, giraffe, nature, swan, mug link
74 PMVS 3D Photography The following are multiview stereo data sets captured in our lab: a set of images, camera parameters and extracted apparent contours of a single rigid object. E… sfm, reconstruction, depth, dense, mesh link
73 Strecha Dense MVS An evaluation benchmark for dense MVS for these datasets fountain-P11, Herz-Jesu-P8, entry-P10, castle-P19, Herz-Jesu-P25, castle-P30 . Images (corrected for… sfm, reconstruction, benchmark, depth, dense, mesh link
72 Acute3D Aiguille du Midi MVS Aiguille du Midi. France showing photographs with Camera: Mamiya ZD. 55mm. – Resolution: 5Mpixels, 53 images – Photographer: B. Vallet (Imagine/EVD – 2006) … sfm, reconstruction, mesh, large scale, outdoor link
71 University of Tsukuba Stereo Flow This dataset contains 1800 stereo pairs with ground truth disparity maps, occlusion maps and discontinuity maps that will help to further develop the state of t… flow, depth, stereo, graphics, synthetic, optical, link
70 MPI Sintel Flow The MPI Sintel Flow dataset is an optical flow / stereo dataset based on the Blender movie Sintel: http://www.sintel.org The goal of this project was to crea… flow, depth, stereo, graphics, synthetic link
69 HCI Robust Vision Estimate robust and reliable depth or motion fields on our challenging real world videos! … flow, depth, stereo, outdoor link
68 The KITTI Vision Benchmark Suite We take advantage of our autonomous driving platform Annieway to develop novel challenging real-world computer vision benchmarks. Our tasks of interest are: ste… stereo, depth, flow, detection tracking, reconstruction, sfm, odometry, segmentation, semantic car depth link
67 Middlebury MVS Dino The object is a plaster dinosaur (stegosaurus). Click on thumbnail for a full-sized (640×480) image. Resolution of ground truth model: 0.00025m (you may wish to… sfm, reconstruction, benchmark, multiview, 3d, link
66 Middlebury MVS Temple The object is a plaster reproduction of Temple of the Dioskouroi in Agrigento, Sicily. Click on thumbnail for a full-sized (640×480) image. Resolution of ground… sfm, reconstruction, benchmark, multiview, 3d, link
65 Middlebury Stereo Two-frame stereo depth estimation… flow, depth, stereo link
64 Middlebury Flow flow, depth link
63 Paris500k The Paris500k dataset consists of 501,356 geotagged images collected from Flickr and Panoramio. The dataset was collected from a geographic bounding box rather … retrieval, paris, landmark, geotag, flickr, panoramio, sfm, reconstruction link
62 Deformed Lattice Detection The Deformed Lattice Detection In Real-World Images dataset is used for regular grid detection. The authors have developed a robust and fast lattice detection a… texture, segmentation, symmetry, lattice, detection, urban link
61 Digital Papercutting Papercutting is a widespread and ancient artform which, as far as we could tell, had no previous computational treatment. We developed algorithms to analyze the… symmetry link
60 PSU HUB The PSU HUB dataset is a detection, tracking dataset. Ground truth trajectory and grouping information for pedestrians walking in the PSU student union building… detection, tracking, pedestrian, trajectory, crowd, overlap, occlusion link
59 Near-Regular Textures The Near-Regular Textures dataset contains textures from completely regular to completely irregular patterns, with a focus on near-regular textures. It also inc… texture, segmentation, classification, symmetry, regular, stochastic link
58 INRIA Horses The INRIA Horses dataset from Frederic Jurie and Vittorio Ferrari consists of 170 images with one or more horses in side-view at several scales and cluttered ba… detection, shape, segmentation, clutter, nature, horse link
57 Weizmann Horses The multi-scale Weizmann horses (originally from Eran Borenstein, adapted by Jamie Shotton) consists of 656 images which is split into 50+50training, 50+50 vali… detection, shape, segmentation, clutter, nature, horse link
56 ETHZ Extended Shape The ETHZ Extended Shape classes dataset from Konrad Schindler is larger dataset of shape categories, created by merging ETHZ shape classes with Konrad Schindler… detection, shape, segmentation, clutter link
55 Prague Texture Segmentation The Prague Texture Segmentation Datagenerator and Benchmark is designed to mutually compare and rank different (dynamic/static) texture segmenters (supervised o… texture, segmentation, classification, benchmark, synthetic link
54 Notre Dame The Notre Dame de Paris dataset used for 3D SfM reconstruction and contains 715 images provided by Noah Snavely. There are also version for NotreDame by Mic… limited, flickr, landmark, sfm, paris, front-view, reconstruction link
53 DTU Robot The DTU Robot dataset consists of color images of 60 scenes acquired in a controlled setup from 119 different positions and under different lighting. For each s… feature, detection, description, matching, sfm, reconstruction, illumination link
52 Graffiti The Graffiti dataset by Krystian Mikolajczyk and Cordelia Schmid contains 48 images split into 8 sequences with 6 images each showing different structured and t… feature, detection, description, rectification, benchmark link
51 PN Learning PN Learning… single, tracking, learning n/a
50 Babenko Tracking The Babenko tracking dataset contains 12 video sequences for single object tracking. For each clip they provide (1) a directory with the original image s… tracking single object animal face occlusion video link
49 PhotoTourism Pair The data is taken from Photo Tourism reconstructions from Trevi Fountain (Rome), Notre Dame (Paris) and Half Dome (Yosemite). Each dataset consists of a series … feature, matching, description, pair, sfm link
48 CALTECH 101 Category Patch Pairs The CALTECH 101 Category Patch Pairs dataset measures invariance to intra-category variation. The dataset contains a training set and testing set of image patc… feature, matching, description, pair, binary link
47 CMP Retrieval CMP Dataset by Ondra Chum contains 5 million images collected from the internet…. retrieval, urban, large scale link
46 Paris Retrieval The Paris dataset consists of 6412 images. Images have high resolution and are in JPEG format. http://www.robots.ox.ac.uk/~vgg/data/parisbuildings/paris_1…. retrieval, urban, paris, landmark link
45 Oxford Buildings The Oxford Buildings dataset by James Philbin and Andrew Zisserman consists of 5062 images collected from Flickr by searching for particular Oxford landmarks. T… retrieval, urban, oxford, landmark link
44 UK Bench The UK Bench dataset from Henrik Stewenius and David Nister contains 10200 imags of N=2550 groups with each four images at size 640×480. The images are rotated,… retrieval image object centered rotation link
43 ZuBud The Zurich Building dataset (ZuBud) from Hao Shao, Tomas Svoboda and Luc Van Gool [?] contains 1005 images with 201 buildings each in five views. There is also … retrieval, urban, procedural, rectification link
42 Hollywood Videos Hollywood-2 datset contains 12 classes of human actions and 10 classes of scenes distributed over 3669 video clips and approximately 20.1 hours of video in t… action, classification, video, segmentation link
41 KTH Action The current video database containing six types of human actions (walking, jogging, running, boxing, hand waving and hand clapping) performed several times by 2… action, classification, video, segmentation link
40 Weizmann Action The Weizmann actions dataset by Blank, Gorelick, Shechtman, Irani, and Basri consists of ten different types of actions: bending, jumping jack, jumping, jump in… segmentation, action, classification n/a
39 Leuven Stereo Scene The Leuven Stereo Scene dataset is a scene and depth dataset. There exist two variants of this dataset – a CVPR 2007 paper [1] by Leibe et al. for detection and… segmentation, semantic, reconstruction, urban, sfm, 3d, leuven, depth, stereo link
38 IcgBench The Interactive Segmentation (IcgBench) dataset from Jakob Santner contains 243 images and 262 segmentation. Some images have multiple segmentations. The annota… interactive, segmentation, user link
37 MSRC vNIPS The MSRC vNIPS dataset is the MSRC v2 dataset with new annotations for much more accurate segmentations for 93 images. Efficient Inference in Fully Connected… segmentation, semantic, outdoor link
36 MSRC v2 The MSRC v2 dataset is an extension of the MSRC v1 dataset from Microsoft Research in Cambridge. It contains 591 images and 23 object classes with accurate pixe… segmentation, semantic, outdoor link
35 MSRC v1 The MSRC v1 dataset from Microsoft Research in Cambridge contains 240 images and 9 object classes with coarse pixel-wise labeled images. The dataset is commonl… segmentation, semantic, outdoor link
34 CamVid The Cambridge-driving Labeled Video Database (CamVid) dataset from Gabriel Brostow [?] contains ten minutes of video footage and corresponding semantically labe… sfm, depth, semantic, segmentation, urban link
33 ECP New York 2011 The ECP New York dataset contains 10 manually segmented buildings from New York City, USA. Segmentation evaluating using Dice coefficient is calculated for the … segmentation, semantic, procedural, reconstruction, urban, newyork link
32 ECP Paris 2011 The ECP Paris 2011 dataset consists of 104 images taken from rue Monge in the fifth district of Paris, we kept only 20 for training and 10 for testing. Howev… segmentation, semantic, procedural, reconstruction, urban, paris link
31 ECP Paris 2010 The Ecole Centrale Paris 2010 (Paris 2010) dataset consists of 30 images of densely annotated building facades in seven classes – wall, window, sky, shop, balco… segmentation, semantic, procedural, reconstruction, urban, paris link
30 ICG Graz50 This is a dataset of rectified facade images and semantic labels. The goal of the annotation is to study the layout of the facades. It contains 50 images of… segmentation, semantic, procedural, reconstruction, urban, graz link
29 The Yale Face The Yale Face dataset from A. Georghiades contains 5760 single light source images of ten subjects, each shown in 9 poses and 64 illumination setups (leading to… face, pedestrian, detection, pose, illumination link
28 CMU Faces – Frontal faces The MIT + CMU frontal face dataset from H. Rowley contains 130 images with 507 labeled frontal faces from movie, portrait and media sources. It is mostly graysc… front-view, face, detection link
27 Idiap/ETHZ Faces and Poses Idiap/ETHZ Faces and Poses Dataset dataset by L. Jie, B. Caputo and V. Ferrari contains 1703 image-caption pairs. [author] Captions contain the names of some of… face, pose, pedestrian, text link
26 We Are Family Stickmen The We Are Family Stickmen dataset from Eichner and Ferrari contains X images with X people in group photos for human pose estimation with annotated 2D human bo… pose, pedestrian, body part link
25 PASCAL VOCs The PASCAL VOC Challenge datasets by Mark Everingham is a yearly dataset which has a central evaluation server and the final test data is not released. The late… detection segmentation pose pedestrian chair animal car building airplane link
24 UIUC Cars This UIUC Cars dataset by Shivani Agarwal, Aatif Awan and Dan Roth contains images of side views of cars for use in evaluating object detection algorithms. The … car, sideview, detection, scale, recognition, urban, scale link
23 Graz02 The Graz02 dataset by Andreas Opelt and Axel Pinz contains four categories of images: bikes, people, cars and a single background class. The annotation has been… bike, pedestrian, background, detection, clutter, car, graz link
22 Graz01 The Graz01 dataset by Andreas Opelt and Axel Pinz contains four types of images: bikes, people, background with no bikes, background with no people…. bike, pedestrian, background, detection, clutter, graz, occlusion link
21 ImageNET The ImageNET dataset is the latest dataset by Li Fei-Fei containing various dataset ranging from 1000 to 10000 categories…. retrieval, segmentation, classification link
20 CALTECH 256 The CALTECH 256 dataset by Li Fei-Fei contains 30607 images for 256 categories…. classification centered object scene image link
19 CALTECH 101 The CALTECH 101 dataset by Li Fei-Fei contains images for 101 categories with about 40 to 800 images per category. Most categories have about 50 images at rough… classification centered object scene image link
18 Leeds Cows The Leeds Cows dataset by Derek Magee consists of 14 different video sequences showing a total of 18 cows walking from right to left in front of different backg… detection segmentation cow video background animal link
17 TUD Motorbike The TUD Motorbike dataset from Bastian Leibe contains 115 images collected from the internet. Each image contains one or more motorbikes at different scales and… motorbike, detection, pascal link
16 PETS 2009 The PETS 2009 dataset contains 3 parts showing multi-view sequences containing pedestrians walking in an outdoor environment. The parts are used for person coun… front-view, outdoor, pedestrian, detection, tracking, overlap, occlusion link
15 PETS 2006 The PETS 2006 dataset contains 7 parts showing multi-sensor sequences containing left-luggage scenarios with increasing scene complexity at a train station scen… front-view, indoor, pedestrian, detection, tracking link
14 INRIA People The INRIA People dataset from Navneet Dalal and Bill Triggs [DalalCVPR2005] consists of training and testing data. The training contains 1805 images and X peopl… detection, pedestrian, sideview, front-view link
13 CBCL / MIT Pedestrian MIT Pedestrian dataset from Papageorgiou and Poggio [IJCV2000] contains 509 training and 200 test images of pedestrians in city scenes (plus left-right reflecti… pedestrian, front-view, detection, urban link
12 TUD Pedestrians training The TUD Pedestrians training dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 210 and 400 training images with X pedestrians with signifi… segmentation, pedestrian, sideview link
11 TUD Campus The TUD Campus dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 71 images and 303 highly overlapping pedestrians with large scale changes… segmentation, pedestrian, sideview, overlap link
10 TUD Pedestrians The TUD Pedestrians dataset from Micha Andriluka, Stefan Roth and Bernt Schiele [AndrilukaCVPR2008] consists of 250 images with 311 fully visible people with si… segmentation, pedestrian, sideview link
9 TUD Crossing The TUD Crossing dataset from Micha Andriluka, Stefan Roth and Bernt Schiele consists of 201 images with 1008 highly overlapping pedestrians with significant va… segmentation, pedestrian, sideview, overlap link
8 Tools2D The Tools 2D dataset from Bronstein, Bronstein, Bruckstein, and Kimmel [?] for partial similarity experiments and consists of 15 shapes: 5 humans, 5 horses and … shape, binary, matching, retrieval, partial link
7 Mythological Creatures The Mythological Creatures consists of articulated shapes (silhouettes) for partial similarity experiments and contains 15 shapes: 5 humans, 5 horses and 5 cent… shape, binary, matching, retrieval, partial, animal link
6 SIID The SIID silhouette dataset contains… and is from the Shape Indexing of Image Database (SIID). Download SIID silhouette dataset http://www.lems.brown.edu/… shape, binary, matching, retrieval link
5 KIMA216 The Kimia 216 has 18 classes each consisting of 12 images. It contains shapes silhouettes for birds, bones, brick, camels, car, children, classic cards, elephan… shape, binary, matching, retrieval, kimia, animal link
4 KIMA99 The Kimia 99 has 9 classes each consisting of each 11 images. They are part of the Shape Indexing of Image Database (SIID) project, which also contains the SIID… shape, binary, matching, retrieval, kimia link
3 KIMIA25 The Kimia 25 consists of 6 classes and 25 images. They are part of the Shape Indexing of Image Database (SIID) project, which also contains the SIID silhouette … shape, binary, matching, retrieval, kimia link
2 MPEG-7 Core Experiment CE-Shape-1 MPEG-7 Core Experiment CE-Shape-1 [?] is a popular database for shape matching evaluation consisting of 70 shape categories, where each category is represented … shape, binary, matching, retrieval, bullseye link

Posted in Computer Vision, OpenCV | Tagged: , , , , , , | Leave a Comment »

Literature Review versus Literature Survey. What is the difference?

Posted by Hemprasad Y. Badgujar on November 3, 2014

Literature Survey: Is the process of analyzing, summarizing, organizing, and presenting novel conclusions from the results of technical review of large number of recently published scholarly articles. The results of the literature survey can contribute to the body of knowledge when peer-reviewed and published as survey articles

Literature Review: Is the process of technically and critically reviewing published papers to extract technical and scientific metadata from the presented contents. The metadata are usually used during literature survey to technically compare different but relevant works and draw conclusions on weaknesses and strengths of the works.

Second View: The second view over literature survey and review is that in survey, researchers usually utilize the author-provided contents available in the published works to qualitatively analyze and compare them with other related works. While in the former, you should not perform qualitative analysis. Rather it should be quantitative meaning that every research work under study should be implemented and benchmarked under certain criteria. The results of this benchmarking study can be used to compare them together and criticize or appreciate the works.

So basically you can look at current literature and find which approach is dominating in your field. Hope it helps. I try to revise it if I came a cross other points or useful comments here.

we can use the following definitions from CS journals.

  • According to the definition of survey paper provided by IEEE Communications Surveys & Tutorials journal (one of the best CS journals), “The term survey, as applied here, is defined to mean a survey of the literature. A survey article should provide a comprehensive review of developments in a selected area“.
  • In ACM Computing Survey (another prestigious CS journal), survey paper is described as “A paper that summarizes and organizes recent research results in a novel way that integrates and adds understanding to work in the field. A survey article emphasizes the classification of the existing literature, developing a perspective on the area, and evaluating trends.”
  • In Elsevier journal of Computer Science Review, you will see here4 that “Critical review of the relevant literature“ is required a component of every typical survey paper.

 

Posted in Computer Research, Documentations, Journals & Conferences, My Research Related, Placement, Project Related, Research Menu | Tagged: , , , , , , | Leave a Comment »

Computer Vision Databases

Posted by Hemprasad Y. Badgujar on October 15, 2014

Computer Vision Databases

Index by Topic

  1. Action Databases
  2. Biological/Medical
  3. Face Databases
  4. Fingerprints
  5. General Images
  6. General RGBD and depth datasets
  7. Gesture Databases
  8. Image, Video and Shape Database Retrieval
  9. Object Databases
  10. People, Pedestrian, Eye/Iris, Template Detection/Tracking Databases
  11. Segmentation
  12. Surveillance
  13. Textures
  14. General Videos
  15. Other Collection Pages
  16. Miscellaneous Topics

Action Databases

  1. An analyzed collation of various labeled video datasets for action recognition (Kevin Murphy)
  2. 50 Salads – fully annotated 4.5 hour dataset of RGB-D video + accelerometer data, capturing 25 people preparing two mixed salads each (Dundee University, Sebastian Stein)
  3. ASLAN Action similarity labeling challenge database (Orit Kliper-Gross)
  4. Berkeley MHAD: A Comprehensive Multimodal Human Action Database (Ferda Ofli)
  5. BEHAVE Interacting Person Video Data with markup (Scott Blunsden, Bob Fisher, Aroosha Laghaee)
  6. CVBASE06: annotated sports videos (Janez Pers)
  7. G3D – synchronised video, depth and skeleton data for 20 gaming actions captured with Microsoft Kinect (Victoria Bloom)
  8. Hollywood 3D – 650 3D action recognition in the wild videos, 14 action classes (Simon Hadfield)
  9. Human Actions and Scenes Dataset (Marcin Marszalek, Ivan Laptev, Cordelia Schmid)
  10. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion (Brown University)
  11. i3DPost Multi-View Human Action Datasets (Hansung Kim)
  12. i-LIDS video event image dataset (Imagery library for intelligent detection systems) (Paul Hosner)
  13. INRIA Xmas Motion Acquisition Sequences (IXMAS) (INRIA)
  14. JPL First-Person Interaction dataset – 7 types of human activity videos taken from a first-person viewpoint (Michael S. Ryoo, JPL)
  15. KTH human action recognition database (KTH CVAP lab)
  16. LIRIS human activities dataset – 2 cameras, annotated, depth images (Christian Wolf, et al)
  17. MuHAVi – Multicamera Human Action Video Data (Hossein Ragheb)
  18. Oxford TV based human interactions (Oxford Visual Geometry Group)
  19. Rochester Activities of Daily Living Dataset (Ross Messing)
  20. SDHA Semantic Description of Human Activities 2010 contest – aerial views (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
  21. SDHA Semantic Description of Human Activities 2010 contest – Human Interactions (Michael S. Ryoo, J. K. Aggarwal, Amit K. Roy-Chowdhury)
  22. TUM Kitchen Data Set of Everyday Manipulation Activities (Moritz Tenorth, Jan Bandouch)
  23. TV Human Interaction Dataset (Alonso Patron-Perez)
  24. Univ of Central Florida – Feature Films Action Dataset (Univ of Central Florida)
  25. Univ of Central Florida – YouTube Action Dataset (sports) (Univ of Central Florida)
  26. Univ of Central Florida – 50 Action Category Recognition in Realistic Videos (3 GB) (Kishore Reddy)
  27. UCF 101 action dataset 101 action classes, over 13k clips and 27 hours of video data (Univ of Central Florida)
  28. Univ of Central Florida – Sports Action Dataset (Univ of Central Florida)
  29. Univ of Central Florida – ARG Aerial camera, Rooftop camera and Ground camera (UCF Computer Vision Lab)
  30. UCR Videoweb Multi-camera Wide-Area Activities Dataset (Amit K. Roy-Chowdhury)
  31. Verona Social interaction dataset (Marco Cristani)
  32. Videoweb (multicamera) Activities Dataset (B. Bhanu, G. Denina, C. Ding, A. Ivers, A. Kamal, C. Ravishankar, A. Roy-Chowdhury, B. Varda)
  33. ViHASi: Virtual Human Action Silhouette Data (userID: VIHASI password: virtual$virtual) (Hossein Ragheb, Kingston University)
  34. WorkoutSU-10 Kinect dataset for exercise actions (Ceyhun Akgul)
  35. YouCook – 88 open-source YouTube cooking videos with annotations (Jason Corso)
  36. WVU Multi-view action recognition dataset (Univ. of West Virginia)

Biological/Medical

  1. Annotated Spine CT Database for Benchmarking of Vertebrae Localization, 125 patients, 242 scans (Ben Glockern)
  2. Computed Tomography Emphysema Database (Lauge Sorensen)
  3. Dermoscopy images (Eric Ehrsam)
  4. DIADEM: Digital Reconstruction of Axonal and Dendritic Morphology Competition (Allen Institute for Brain Science et al)
  5. DIARETDB1 – Standard Diabetic Retinopathy Database (Lappeenranta Univ of Technology)
  6. DRIVE: Digital Retinal Images for Vessel Extraction (Univ of Utrecht)
  7. MiniMammographic Database (Mammographic Image Analysis Society)
  8. MIT CBCL Automated Mouse Behavior Recognition datasets (Nicholas Edelman)
  9. Mouse Embryo Tracking Database – cell division event detection (Marcelo Cicconet, Kris Gunsalus)
  10. Retinal fundus images – Ground truth of vascular bifurcations and crossovers (Univ of Groningen)
  11. Spine and Cardiac data (Digital Imaging Group of London Ontario, Shuo Li)
  12. Univ of Central Florida – DDSM: Digital Database for Screening Mammography (Univ of Central Florida)
  13. VascuSynth – 120 3D vascular tree like structures with ground truth (Mengliu Zhao, Ghassan Hamarneh)
  14. York Cardiac MRI dataset (Alexander Andreopoulos)

Face Databases

  1. 3D Mask Attack Database (3DMAD) – 76500 frames of 17 persons using Kinect RGBD with eye positions (Sebastien Marcel)
  2. Audio-visual database for face and speaker recognition (Mobile Biometry MOBIO http://www.mobioproject.org/)
  3. BANCA face and voice database (Univ of Surrey)
  4. Binghampton Univ 3D static and dynamic facial expression database (Lijun Yin, Peter Gerhardstein and teammates)
  5. BioID face database (BioID group)
  6. Biwi 3D Audiovisual Corpus of Affective Communication – 1000 high quality, dynamic 3D scans of faces, recorded while pronouncing a set of English sentences.
  7. CMU Facial Expression Database (CMU/MIT)
  8. CMU/MIT Frontal Faces (CMU/MIT)
  9. CMU/MIT Frontal Faces (CMU/MIT)
  10. CMU Pose, Illumination, and Expression (PIE) Database (Simon Baker)
  11. CSSE Frontal intensity and range images of faces (Ajmal Mian)
  12. Face Recognition Grand Challenge datasets (FRVT – Face Recognition Vendor Test)
  13. FaceTracer Database – 15,000 faces (Neeraj Kumar, P. N. Belhumeur, and S. K. Nayar)
  14. FDDB: Face Detection Data set and Benchmark – studying unconstrained face detection (University of Massachusetts Computer Vision Laboratory)
  15. FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
  16. Facial Recognition Technology (FERET) Database (USA National Institute of Standards and Technology)
  17. Hannah and her sisters database – a dense audio-visual person-oriented ground-truth annotation of faces, speech segments, shot boundaries (Patrick Perez, Technicolor)
  18. Hong Kong Face Sketch Database
  19. Japanese Female Facial Expression (JAFFE) Database (Michael J. Lyons)
  20. LFW: Labeled Faces in the Wild – unconstrained face recognition.
  21. Manchester Annotated Talking Face Video Dataset (Timothy Cootes)
  22. MIT Collation of Face Databases (Ethan Meyers)
  23. MMI Facial Expression Database – 2900 videos and high-resolution still images of 75 subjects, annotated for FACS AUs.
  24. MORPH (Craniofacial Longitudinal Morphological Face Database) (University of North Carolina Wilmington)
  25. MIT CBCL Face Recognition Database (Center for Biological and Computational Learning)
  26. NIST mugshot identification database (USA National Institute of Standards and Technology)
  27. ORL face database: 40 people with 10 views (ATT Cambridge Labs)
  28. Oxford: faces, flowers, multi-view, buildings, object categories, motion segmentation, affine covariant regions, misc (Oxford Visual Geometry Group)
  29. PubFig: Public Figures Face Database (Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar)
  30. Re-labeled Faces in the Wild – original images, but aligned using “deep funneling” method. (University of Massachusetts, Amherst)
  31. SCface – Surveillance Cameras Face Database (Mislav Grgic, Kresimir Delac, Sonja Grgic, Bozidar Klimpak))
  32. Trondheim Kinect RGB-D Person Re-identification Dataset (Igor Barros Barbosa)
  33. UB KinFace Database – University of Buffalo kinship verification and recognition database
  34. XM2VTS Face video sequences (295): The extended M2VTS Database (XM2VTS) - (Surrey University)
  35. Yale Face Database – 11 expressions of 10 people (A. Georghaides)
  36. Yale Face Database B – 576 viewing conditions of 10 people (A. Georghaides)

Fingerprints

  1. FVC fingerpring verification competition 2002 dataset (University of Bologna)
  2. FVC fingerpring verification competition 2004 dataset (University of Bologna)
  3. FVC – a subset of FVC (Fingerprint Verification Competition) 2002 and 2004 fingerprint image databases, manually extracted minutiae data & associated documents (Umut Uludag)
  4. NIST fingerprint databases (USA National Institute of Standards and Technology)
  5. SPD2010 Fingerprint Singular Points Detection Competition (SPD 2010 committee)

General Images

  1. Aerial color image dataset (Swiss Federal Institute of Technology)
  2. AMOS: Archive of Many Outdoor Scenes (20+m) (Nathan Jacobs)
  3. Brown Univ Large Binary Image Database (Ben Kimia)
  4. Caltech-UCSD Birds-200-2011 (Catherine Wah)
  5. Columbia Multispectral Image Database (F. Yasuma, T. Mitsunaga, D. Iso, and S.K. Nayar)
  6. HIPR2 Image Catalogue of different types of images (Bob Fisher et al)
  7. Hyperspectral images of natural scenes – 2002 (David H. Foster)
  8. Hyperspectral images of natural scenes – 2004 (David H. Foster)
  9. ImageNet Linguistically organised (WordNet) Hierarchical Image Database – 10E7 images, 15K categories (Li Fei-Fei, Jia Deng, Hao Su, Kai Li)
  10. ImageNet Large Scale Visual Recognition Challenge (Alex Berg, Jia Deng, Fei-Fei Li)
  11. OTCBVS Thermal Imagery Benchmark Dataset Collection (Ohio State Team)
  12. McGill Calibrated Colour Image Database (Adriana Olmos and Fred Kingdom)
  13. Tiny Images Dataset 79 million 32×32 color images (Fergus, Torralba, Freeman)

General RGBD Datasets

  1. Cornell-RGBD-Dataset – Office Scenes (Hema Koppula)
  2. NYU Depth Dataset V2 – Indoor Segmentation and Support Inference from RGBD Images
  3. Oakland 3-D Point Cloud Dataset (Nicolas Vandapel)
  4. Washington RGB-D Object Dataset – 300 common household objects adn 14 scenes. (University of Washington and Intel Labs Seattle)

Gesture Databases

  1. FG-Net Aging Database of faces at different ages (Face and Gesture Recognition Research Network)
  2. Hand gesture and marine silhouettes (Euripides G.M. Petrakis)
  3. IDIAP Hand pose/gesture datasets (Sebastien Marcel)
  4. Sheffield gesture database – 2160 RGBD hand gesture sequences, 6 subjects, 10 gestures, 3 postures, 3 backgrounds, 2 illuminations (Ling Shao)

Image, Video and Shape Database Retrieval

  1. Brown Univ 25/99/216 Shape Databases (Ben Kimia)
  2. IAPR TC-12 Image Benchmark (Michael Grubinger)
  3. IAPR-TC12 Segmented and annotated image benchmark (SAIAPR TC-12): (Hugo Jair Escalante)
  4. ImageCLEF 2010 Concept Detection and Annotation Task (Stefanie Nowak)
  5. ImageCLEF 2011 Concept Detection and Annotation Task – multi-label classification challenge in Flickr photos
  6. CLEF-IP 2011 evaluation on patent images
  7. McGill 3D Shape Benchmark (Siddiqi, Zhang, Macrini, Shokoufandeh, Bouix, Dickinson)
  8. NIST SHREC 2010 – Shape Retrieval Contest of Non-rigid 3D Models (USA National Institute of Standards and Technology)
  9. NIST SHREC – other NIST retrieval contest databases and links (USA National Institute of Standards and Technology)
  10. NIST TREC Video Retrieval Evaluation Database (USA National Institute of Standards and Technology)
  11. Princeton Shape Benchmark (Princeton Shape Retrieval and Analysis Group)
  12. Queensland cross media dataset – millions of images and text documents for “cross-media” retrieval (Yi Yang)
  13. TOSCA 3D shape database (Bronstein, Bronstein, Kimmel)

Object Databases

  1. 2.5D/3D Datasets of various objects and scenes (Ajmal Mian)
  2. Amsterdam Library of Object Images (ALOI): 100K views of 1K objects (University of Amsterdam/Intelligent Sensory Information Systems)
  3. Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild – 12 class, 3000+ images each with 3D annotations (Yu Xiang, Roozbeh Mottaghi, Silvio Savarese)
  4. Caltech 101 (now 256) category object recognition database (Li Fei-Fei, Marco Andreeto, Marc’Aurelio Ranzato)
  5. Columbia COIL-100 3D object multiple views (Columbia University)
  6. Densely sampled object views: 2500 views of 2 objects, eg for view-based recognition and modeling (Gabriele Peters, Universiteit Dortmund)
  7. German Traffic Sign Detection Benchmark (Ruhr-Universitat Bochum)
  8. GRAZ-02 Database (Bikes, cars, people) (A. Pinz)
  9. Linkoping 3D Object Pose Estimation Database (Fredrik Viksten and Per-Erik Forssen)
  10. Microsoft Object Class Recognition image databases (Antonio Criminisi, Pushmeet Kohli, Tom Minka, Carsten Rother, Toby Sharp, Jamie Shotton, John Winn)
  11. Microsoft salient object databases (labeled by bounding boxes) (Liu, Sun Zheng, Tang, Shum)
  12. MIT CBCL Car Data (Center for Biological and Computational Learning)
  13. MIT CBCL StreetScenes Challenge Framework: (Stan Bileschi)
  14. NEC Toy animal object recognition or categorization database (Hossein Mobahi)
  15. NORB 50 toy image database (NYU)
  16. PASCAL Image Database (motorbikes, cars, cows) (PASCAL Consortium)
  17. PASCAL 2007 Challange Image Database (motorbikes, cars, cows) (PASCAL Consortium)
  18. PASCAL 2008 Challange Image Database (PASCAL Consortium)
  19. PASCAL 2009 Challange Image Database (PASCAL Consortium)
  20. PASCAL 2010 Challange Image Database (PASCAL Consortium)
  21. PASCAL 2011 Challange Image Database (PASCAL Consortium)
  22. PASCAL 2012 Challange Image Database Category classification, detection, and segmentation, and still-image action classification (PASCAL Consortium)
  23. UIUC Car Image Database (UIUC)
  24. UIUC Dataset of 3D object categories (S. Savarese and L. Fei-Fei)
  25. Venezia 3D object-in-clutter recognition and segmentation (Emanuele Rodola)

People, Pedestrian, Eye/Iris, Template Detection/Tracking Databases

  1. 3D KINECT Gender Walking data base (L. Igual, A. Lapedriza, R. Borràs from UB, CVC and UOC, Spain)
  2. Caltech Pedestrian Dataset (P. Dollar, C. Wojek, B. Schiele and P. Perona)
  3. CASIA gait database (Chinese Academy of Sciences)
  4. CASIA-IrisV3 (Chinese Academy of Sciences, T. N. Tan, Z. Sun)
  5. CAVIAR project video sequences with tracking and behavior ground truth (CAVIAR team/Edinburgh University – EC project IST-2001-37540)
  6. Daimler Pedestrian Detection Benchmark 21790 images with 56492 pedestrians plus empty scenes (M. Enzweiler, D. M. Gavrila)
  7. Driver Monitoring Video Dataset (RobeSafe + Jesus Nuevo-Chiquero)
  8. Edinburgh overhead camera person tracking dataset (Bob Fisher, Bashia Majecka, Gurkirt Singh, Rowland Sillito)
  9. Eyetracking database summary (Stefan Winkler)
  10. HAT database of 27 human attributes (Gaurav Sharma, Frederic Jurie)
  11. INRIA Person Dataset (Navneet Dalal)
  12. ISMAR09 ground truth video dataset for template-based (i.e. planar) tracking algorithms (Sebastian Lieberknecht)
  13. MIT CBCL Pedestrian Data (Center for Biological and Computational Learning)
  14. MIT eye tracking database (1003 images) (Judd et al)
  15. Modena and Reggio Emilia first person head motion videos (Univ of Modena and Reggio Emilia)
  16. Notre Dame Iris Image Dataset (Patrick J. Flynn)
  17. PETS 2009 Crowd Challange dataset (Reading University & James Ferryman)
  18. PETS: Performance Evaluation of Tracking and Surveillance (Reading University & James Ferryman)
  19. PETS Winter 2009 workshop data (Reading University & James Ferryman)
  20. Pixel-based change detection benchmark dataset (Goyette et al)
  21. Pointing’04 ICPR Workshop Head Pose Image Database
  22. Transient Biometrics Nails Dataset V01 (Igor Barros Barbosa)
  23. UBIRIS: Noisy Visible Wavelength Iris Image Databases (University of Beira)
  24. Univ of Central Florida – Crowd Dataset (Saad Ali)
  25. Univ of Central Florida – Crowd Flow Segmentation datasets (Saad Ali)
  26. UTIRIS cross-spectral iris image databank (Mahdi Hosseini)
  27. York Univ Eye Tracking Dataset (120 images) (Neil Bruce)

Segmentation

  1. Alpert et al. Segmentation evaluation database (Sharon Alpert, Meirav Galun, Ronen Basri, Achi Brandt)
  2. Berkeley Segmentation Dataset and Benchmark (David Martin and Charless Fowlkes)
  3. GrabCut Image database (C. Rother, V. Kolmogorov, A. Blake, M. Brown)
  4. LabelMe images database and online annotation tool (Bryan Russell, Antonio Torralba, Kevin Murphy, William Freeman)

Surveillance

  1. AVSS07: Advanced Video and Signal based Surveillance 2007 datasets (Andrea Cavallaro)
  2. ETISEO Video Surveillance Download Datasets (INRIA Orion Team and others)
  3. Heriot Watt Summary of datasets for human tracking and surveillance (Zsolt Husz)
  4. Openvisor – Video surveillance Online Repository (Univ of Modena and Reggio Emilia)
  5. SPEVI: Surveillance Performance EValuation Initiative (Queen Mary University London)
  6. Udine Trajectory-based anomalous event detection dataset – synthetic trajectory datasets with outliers (Univ of Udine Artificial Vision and Real Time Systems Laboratory)

Textures

  1. Color texture images by category (textures.forrest.cz)
  2. Columbia-Utrecht Reflectance and Texture Database (Columbia & Utrecht Universities)
  3. DynTex: Dynamic texture database (Renaud Piteri, Mark Huiskes and Sandor Fazekas)
  4. Oulu Texture Database (Oulu University)
  5. Prague Texture Segmentation Data Generator and Benchmark (Mikes, Haindl)
  6. Uppsala texture dataset of surfaces and materials – fabrics, grains, etc.
  7. Vision Texture (MIT Media Lab)

General Videos

  1. Large scale YouTube video dataset – 156,823 videos (2,907,447 keyframes) crawled from YouTube videos (Yi Yang)

Other Collections

  1. CANTATA Video and Image Database Index site (Multitel)
  2. Computer Vision Homepage list of test image databases (Carnegie Mellon Univ)
  3. ETHZ various, including 3D head pose, shape classes, pedestrians, pedestrians, buildings (ETH Zurich, Computer Vision Lab)
  4. Leibe’s Collection of people/vehicle/object databases (Bastian Leibe)
  5. Lotus Hill Image Database Collection with Ground Truth (Sealeen Ren, Benjamin Yao, Michael Yang)
  6. Oxford Misc, including Buffy, Flowers, TV characters, Buildings, etc (Oxford Visual geometry Group)
  7. PEIPA Image Database Summary (Pilot European Image Processing Archive)
  8. Univ of Bern databases on handwriting, online documents, string edit and graph matching (Univ of Bern, Computer Vision and Artificial Intelligence)
  9. USC Annotated Computer Vision Bibliography database publication summary (Keith Price)
  10. USC-SIPI image databases: texture, aerial, favorites (eg. Lena) (USC Signal and Image Processing Institute)

Miscellaneous

  1. 3D mesh watermarking benchmark dataset (Guillaume Lavoue)
  2. Active Appearance Models datasets (Mikkel B. Stegmann)
  3. Aircraft tracking (Ajmal Mian)
  4. Cambridge Motion-based Segmentation and Recognition Dataset (Brostow, Shotton, Fauqueur, Cipolla)
  5. Catadioptric camera calibration images (Yalin Bastanlar)
  6. Chars74K dataset – 74 English and Kannada characters (Teo de Campos – t.decampos@surrey.ac.uk)
  7. COLD (COsy Localization Database) – place localization (Ullah, Pronobis, Caputo, Luo, and Jensfelt)
  8. Columbia Camera Response Functions: Database (DoRF) and Model (EMOR) (M.D. Grossberg and S.K. Nayar)
  9. Columbia Database of Contaminants’ Patterns and Scattering Parameters (Jinwei Gu, Ravi Ramamoorthi, Peter Belhumeur, Shree Nayar)
  10. Dense outdoor correspondence ground truth datasets, for optical flow and local keypoint evaluation (Christoph Strecha)
  11. DTU controlled motion and lighting image dataset (135K images) (Henrik Aanaes)
  12. EISATS: .enpeda.. Image Sequence Analysis Test Site (Auckland University Multimedia Imaging Group)
  13. FlickrLogos-32 – 8240 images of 32 product logos (Stefan Romberg)
  14. Flowchart images (Allan Hanbury)
  15. Geometric Context – scene interpretation images (Derek Hoiem)
  16. Image/video quality assessment database summary (Stefan Winkler)
  17. INRIA feature detector evaluation sequences (Krystian Mikolajczyk)
  18. INRIA’s PERCEPTION’s database of images and videos gathered with several synchronized and calibrated cameras (INRIA Rhone-Alpes)
  19. INRIA’s Synchronized and calibrated binocular/binaural data sets with head movements (INRIA Rhone-Alpes)
  20. KITTI dataset for stereo, optical flow and visual odometry (Geiger, Lenz, Urtasun)
  21. Large scale 3D point cloud data from terrestrial LiDAR scanning (Andreas Nuechter)
  22. Linkoping Rolling Shutter Rectification Dataset (Per-Erik Forssen and Erik Ringaby)
  23. Middlebury College stereo vision research datasets (Daniel Scharstein and Richard Szeliski)
  24. MPI-Sintel optical flow evaluation dataset (Michael Black)
  25. Multiview stereo images with laser based groundtruth (ESAT-PSI/VISICS,FGAN-FOM,EPFL/IC/ISIM/CVLab)
  26. The Cancer Imaging Archive (National Cancer Institute)
  27. NCI Cancer Image Archive – prostate images (National Cancer Institute)
  28. NIST 3D Interest Point Detection (Helin Dutagaci, Afzal Godil)
  29. NRCS natural resource/agricultural image database (USDA Natural Resources Conservation Service)
  30. Occlusion detection test data (Andrew Stein)
  31. The Open Video Project (Gary Marchionini, Barbara M. Wildemuth, Gary Geisler, Yaxiao Song)
  32. Outdoor Ground Truth Evaluation Dataset for Sensor-Aided Visual Handheld Camera Localization (Daniel Kurz, metaio)
  33. Pics ‘n’ Trails – Dataset of Continuously archived GPS and digital photos (Gamhewage Chaminda de Silva)
  34. PRINTART: Artistic images of prints of well known paintings, including detail annotations. A benchmark for automatic annotation and retrieval tasks with this database was published at ECCV. (Nuno Miguel Pinho da Silva)
  35. RAWSEEDS SLAM benchmark datasets (Rawseeds Project)
  36. Robotic 3D Scan Repository – 3D point clouds from robotic experiments of scenes (Osnabruck and Jacobs Universities)
  37. ROMA (ROad MArkings) : Image database for the evaluation of road markings extraction algorithms (Jean-Philippe Tarel, et al)
  38. Stuttgart Range Image Database – 66 views of 45 objects
  39. UCL Ground Truth Optical Flow Dataset (Oisin Mac Aodha)
  40. Univ of Genoa Datasets for disparity and optic flow evaluation (Manuela Chessa)
  41. Validation and Verification of Neural Network Systems (Francesco Vivarelli)
  42. VSD: Technicolor Violent Scenes Dataset – a collection of ground-truth files based on the extraction of violent events in movies
  43. WILD: Weather and Illumunation Database (S. Narasimhan, C. Wang. S. Nayar, D. Stolyarov, K. Garg, Y. Schechner, H. Peri)

Posted in Computer Vision, OpenCV, Project Related, Research Menu | Tagged: , , , , , | Leave a Comment »

cuSVM for CUDA 6.0 and Matlab x64

Posted by Hemprasad Y. Badgujar on October 13, 2014

cuSVM for CUDA 6.0 and Matlab x64

This page shows how to build cuSVM, GPU accelerated SVM with dense format. Library has been written by AUSTIN CARPENTER. The procedure use CUDA 6.0, MATLAB x64 and Visual Studio 2012. The code and project files were modified in order to compile and link library, many steps were taken from http://www.parallelcoding.com/2012/02/09/cusvm-in-visual-studio-2010-with-cuda-4-0/

Modifications:

  1. Addmatlab variables:
    1. cuSVMTrainIter – contains number of iteration the solver does
    2. cuSVMTrainObj –  contains the final objective function value after the trainning
  2. In file cuSVMSolver.cu lines 869-874 all calls of cudaMemcpyToSymbol was changed, because of changes made in CUDA 6.0 runtime library –http://stackoverflow.com/questions/12947914/error-in-cudamemcpytosymbol-using-cuda-5
    before the change:
    mxCUDA_SAFE_CALL(cudaMemcpyToSymbol(„taumin”, &h_taumin, sizeof(float) ));
    after the change:
    mxCUDA_SAFE_CALL(cudaMemcpyToSymbol(taumin, &h_taumin, sizeof(float) ));
  3. In functions FindBI, FindBJ, FindStoppingJ – change the way reduction in shared memory was done (http://stackoverflow.com/questions/6510427/cuda-finding-max-using-reduction-error)
  4. The kernel cache size is constrained to 400MB, if you want bigger cache you can modify cuSVMSolver.cu line 24
    #define KERNEL_CACHE_SIZE (400*1024*1024)

 

Build Procedure

Download preconfigure cuSVM Visual Studio 2010 solution with LibSVM and matlab scritp for classification

All steps describe below are done, you have to check if all paths are set correctly and yours GPU computational capability is set properly.

My setup:

  • windows 7 x64
  • visual studio 2012
  • CUDA 6.0
  • Matlab R2014a
  • the code was tested on Quadro 5000 and Geforce GTX 580

Prerequisites:

Determine paths:

  1. Matlab include path, mine is „D:\Program Files\MATLAB\R2014a\extern\include” (Matlab was installed on drive d:\)
  2. Matlab library path: „D:\Program Files\MATLAB\R2014a\extern\lib\win64\microsoft”
  3. CUDA toolkit include path: „C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include”
  4. GPU compute capability, mine is 1.2 in case of GeForce GT 330M(compute_12,sm_12), and 3.0 in case GeForce GTX 690 (compute_30,sm_30)

 

Changes made in projects properties (the same steps are for both projects: cuSVMPredict, cuSVMTrain):

  1. Open solution in VS 2010
  2. Right click on project (cuSVMTrain or cuSVMPredict)  and choose „Build Customizations …”, make sure that „CUDA 5.0(.targets, .props)” is checked
  3. Right click oncuSVMTrain and choose project „Properties”
    1. Expand „Configuration Properties”
      1. General->Target Extension: .mexw64
      2. General->Configuration Type: Dynamic Library (.dll)
    2. Expand c/c++-
      1. General->Additional Include Directories: $(SolutionDir)inc\;D:\Program Files\MATLAB\R2014a\extern\include;$(CudaToolkitIncludeDir);%(AdditionalIncludeDirectories)
    3. ExpandCUDA C/C++
      1. Common->Additional Include Directories: $(SolutionDir)inc\;D:\Program Files\MATLAB\R2014a\extern\include;$(CudaToolkitIncludeDir);%(AdditionalIncludeDirectories)
      2. Common->Target Machine Platform: 64-bit (–machine 64)
      3. Device->Code Generation: compute_30,sm_30– this depends on your GPU compute capability
    4. Expand Linker
      1. General->Additional Library Directories: %(AdditionalLibraryDirectories); $(CudaToolkitLibDir); D:\Program Files\MATLAB\R2014a\extern\lib\win64\microsoft
      2. Input->Additional Dependencies: cuda.lib;cublas.lib;libmex.lib;libmat.lib;libmx.lib;cudart.lib;kernel32.lib;user32.lib;gdi32.lib;winspool.lib;comdlg32.lib;advapi32.lib;shell32.lib;ole32.lib;oleaut32.lib;uuid.lib;odbc32.lib;odbccp32.lib;%(AdditionalDependencies)
      3. Input->Module Definition File: TrainModule.def (for cuSVMTrain project, for cuSVMPredict set PredictModule.def)
    5. Expand Build Events
      1. Post-Build Event->Command Line:
        echo copy „$(CudaToolkitBinDir)\cudart*.dll” „$(OutDir)”
        copy „$(CudaToolkitBinDir)\cudart*.dll” „$(OutDir)”
        each command in separate line

Eventually you can check if it is „Release” or „Debug” build.

 

How to use cuSVM

The zip package contains two folders:

  • cuSVM – Visual Studio 2012 solution
  • cuSVMmatlab – contains:
  1. libsvm,
  2. compile cuSVMTrain.mexw64 and cuSVMPredict.mexw64 in Libfolder,
  3. sample datasets in datafolder
  4. matlab script cuSVMTest.m
  1. Build cuSVM in Release or Debug mode – important check your GPU compute capability
  2. Copy cuSVMTrain.mexw64 and cuSVMPredict.mexw64 to Lib folder
  3. Add Libfolder matlab search path.
  4. If you want classify some dataset open  mfile.

 

 

 

Posted in Computing Technology, GPU (CUDA), GPU Accelareted, Image Processing | Tagged: , | Leave a Comment »

Posted by Hemprasad Y. Badgujar on October 12, 2014

Mathematics Function handbook

Abstract. Function handbook contains all elementary, trigonometric and hyperbolic functions — their definitions, graphs, properties and identities.

 

Function handbook

1. Introduction

When working at scientific calculator Li-L project we wanted to give some gift to its the future users. This scientific calculator has an embedded Help available just in two clicks for every part of the application and we had an idea to create function handbook and include it as a part of the Help, so that click in the Li-L calculator keypad took the user to the handbook page with function description.

The most challenging part of the idea was graphics. We wanted every function to have its graph and we set high standards for graphics: it was to be properly scaled, numbered and antialiased. It was interesting and difficult job: every figure was created in several stages with final stage of editing on pixel level. When the task was completed, maybe first time in my life I saw all the functions in the same scale, properly curved and cut, so that all the features were clearly visible.

The result got was so great we decided to put this beauty online. And now you can access what Li-L calculator users have on their desktops. Below you will find functions put into several lists — every list holds all functions, but ordering is different, so that you can look up the function by its notation, name or article head.

Li-L calculator handbook contains all elementary, trigonometric and hyperbolic functions. Scientific calculator Li-X got handbook extended with gamma and Bessel functions and scientific calculator Li-Xc got handbook extended with modified Bessel functions.

2. Content

  1. or sqrt — square root
  2. ^ — power
  3. Γ — gamma function
  4. abs — absolute value
  5. arccos — trigonometric arc cosine
  6. arccot or arcctg — trigonometric arc cotangent
  7. arccsc or arccosec — trigonometric arc cosecant
  8. arcosh or arch — arc-hyperbolic cosine
  9. arcoth or arcth — arc-hyperbolic cotangent
  10. arcsch — arc-hyperbolic cosecant
  11. arcsec — trigonometric arc secant
  12. arcsin — trigonometric arc sine
  13. arctan or arctg — trigonometric arc tangent
  14. arsech or arsch — arc-hyperbolic secant
  15. arsinh or arsh — arc-hyperbolic sine
  16. artanh or arth — arc-hyperbolic tangent
  17. ceil — ceiling
  18. cos — trigonometric cosine
  19. cosh or ch — hyperbolic cosine
  20. cot or ctg — trigonometric cotangent
  21. coth or cth — hyperbolic cotangent
  22. csc or cosec — trigonometric cosecant
  23. csch — hyperbolic cosecant
  24. exp — exponent
  25. floor — floor
  26. Iν — modified Bessel function of the first kind
  27. Jν — Bessel function of the first kind
  28. Kν — modified Bessel function of the second kind
  29. ln — natural logarithm
  30. log or lg — decimal logarithm
  31. n! — factorial
  32. sec — trigonometric secant
  33. sech — hyperbolic secant
  34. sign — signum
  35. sin — trigonometric sine
  36. sinh or sh — hyperbolic sine
  37. tan or tg — trigonometric tangent
  38. tanh or th — hyperbolic tangent
  39. Yν — Bessel function of the second kind

3. Function index by name

  1. Absolute value abs
  2. Arc-hyperbolic cosecant arcsch
  3. Arc-hyperbolic cosine arcosh or arch
  4. Arc-hyperbolic cotangent arcoth or arcth
  5. Arc-hyperbolic secant arsech or arsch
  6. Arc-hyperbolic sine arsinh or arsh
  7. Arc-hyperbolic tangent artanh or arth
  8. Bessel function of the first kind Jν
  9. Bessel function of the second kind Yν
  10. Ceiling function ceil
  11. Decimal logarithm log or lg
  12. Exponent exp
  13. Factorial n!
  14. Floor function floor
  15. Gamma function Γ
  16. Hyperbolic cosecant csch
  17. Hyperbolic cosine cosh or ch
  18. Hyperbolic cotangent coth or cth
  19. Hyperbolic secant sech
  20. Hyperbolic sine sinh or sh
  21. Hyperbolic tangent tanh or th
  22. Modified Bessel function of the first kind Iν
  23. Modified Bessel function of the second kind Kν
  24. Natural logarithm ln
  25. Power ^
  26. Signum sign
  27. Square root or sqrt
  28. Trigonometric arc cosecant arccsc or arccosec
  29. Trigonometric arc cosine arccos
  30. Trigonometric arc cotangent arccot or arcctg
  31. Trigonometric arc secant arcsec
  32. Trigonometric arc sine arcsin
  33. Trigonometric arc tangent arctan or arctg
  34. Trigonometric cosecant csc or cosec
  35. Trigonometric cosine cos
  36. Trigonometric cotangent cot or ctg
  37. Trigonometric secant sec
  38. Trigonometric sine sin
  39. Trigonometric tangent tan or tg

4. Function index by notation

  1. — square root
  2. ^ — power
  3. Γ — gamma function
  4. abs — absolute value
  5. arccos — trigonometric arc cosine
  6. arccosec — trigonometric arc cosecant
  7. arccot — trigonometric arc cotangent
  8. arccsc — trigonometric arc cosecant
  9. arcctg — trigonometric arc cotangent
  10. arch — arc-hyperbolic cosine
  11. arcosh — arc-hyperbolic cosine
  12. arcoth — arc-hyperbolic cotangent
  13. arcsch — arc-hyperbolic cosecant
  14. arcsec — trigonometric arc secant
  15. arcsin — trigonometric arc sine
  16. arctan — trigonometric arc tangent
  17. arctg — trigonometric arc tangent
  18. arcth — arc-hyperbolic cotangent
  19. arsch — arc-hyperbolic secant
  20. arsech — arc-hyperbolic secant
  21. arsh — arc-hyperbolic sine
  22. arsinh — arc-hyperbolic sine
  23. artanh — arc-hyperbolic tangent
  24. arth — arc-hyperbolic tangent
  25. ceil — ceiling
  26. ch — hyperbolic cosine
  27. cos — trigonometric cosine
  28. cosec — trigonometric cosecant
  29. cosh — hyperbolic cosine
  30. cot — trigonometric cotangent
  31. coth — hyperbolic cotangent
  32. csc — trigonometric cosecant
  33. csch — hyperbolic cosecant
  34. ctg — trigonometric cotangent
  35. exp — exponent
  36. floor — floor
  37. Iν — modified Bessel function of the first kind
  38. Jν — Bessel function of the first kind
  39. Kν — modified Bessel function of the second kind
  40. lg — decimal logarithm
  41. ln — natural logarithm
  42. log — decimal logarithm
  43. n! — factorial
  44. sec — trigonometric secant
  45. sech — hyperbolic secant
  46. sh — hyperbolic sine
  47. sign — signum
  48. sin — trigonometric sine
  49. sinh — hyperbolic sine
  50. sqrt — square root
  51. tan — trigonometric tangent
  52. tanh — hyperbolic tangent
  53. tg — trigonometric tangent
  54. th — hyperbolic tangent
  55. Yν — Bessel function of the second kind

References. The handbook is a part of scientific calculator Li-L and scientific calculator Li-X products.

Posted in Apps Development, Computer Languages, Computer Research, Free Tools, Image / Video Filters, My Research Related, PARALLEL | Leave a Comment »

Fast Fourier transform — FFT

Posted by Hemprasad Y. Badgujar on October 12, 2014

Fast Fourier transform — FFT

Category. Digital signal processing (DSP) software development.

Abstract. The article is a practical tutorial for fast Fourier transform — FFT — understanding and implementation. Article contains theory, C++ source code and programming instructions. Popular Cooley-Tukey technique is considered.

References. Case study: ECG processing — R-peaks detection and 4D ultrasound volume image processing.

Download fast Fourier transform — FFT — C++ source code (zip, 4 Kb)

Fourier transform

1. Introduction to fast Fourier transform

Fast Fourier transformFFT — is speed-up technique for calculating discrete Fourier transformDFT, which in turn is discrete version of continuous Fourier transform, which indeed is origin for all its versions. So, historically continuous form of the transform was discovered, then discrete form was created for sampled signals and then algorithm for fast calculation of discrete version was invented.

2. Understanding FFT

First of all let us have a look at what Fourier transform is. Fourier transform is an integral of the form:

Fourier transform (1)

The transform operates in complex domain. Recall, that imaginary exponent could be written as:

Imaginary exponent (2)

For sampled function continuous transform (1) turns into discrete one:

Discrete Fourier transform (DFT) (3)

Expression (3) is discrete Fourier transformDFT. Here {f0, f1, … , fN-1} is input discrete function and {F0, F1, … , FN-1} is result of Fourier transform.

It is easily could be seen that to program DFT it is enough to write double loop and just calculate sums of products of input samples and imaginary exponents. The number of operations required is obviously of O(N2) order. But due to transform properties it is possible to reduce the number of operations to the order of O(Nlog2N). Now, let us explore tricks we can use to speed-up calculations.

Let us put N=8 and write down our DFT:

Discrete Fourier transform for N=8 (4)

Easily could be seen we can split the sum into two similar sums separating odd and even terms and factoring out the latter sum:

Transform separation (5)

Now we can split the sums in brackets again:

Transform separation (6)

Thus, we have 3 — log28 — levels of summation. The deepest one in parenthesis, the middle one in brackets and the outer one. For every level exponent multiplier for the second term is the same.

Transform separation (7)

And now the most important observation one can make to get speed-up: periodicity of the exponent multipliers.

Imaginary exponent periodicity (8)

For the exponent multiplier in parenthesis period is n=2, which means sums in parenthesis are exactly the same for n=0, 2, 4, 6 and for n=1, 3, 5, 7. It means on deepest level in parenthesis we need 4×2=8 — number of sums times period — sums in total. And note, since n=1, 3, 5, 7 corresponds to the half of the period π, exponent multiplier is the same as for n=0, 2, 4, 6 but with the opposite sign.

Imaginary exponent periodicity (9)

Indeed they are 1 for n=0, 2, 4, 6 and -1 for n=1, 3, 5, 7.

For the exponent multiplier in brackets period is n=4, which means we have the same sums for pairs n=0, 4; n=1, 5; n=2, 6 and n=3, 7. It means on the middle level in brackets we have 2×4=8 sums and the second half of them could be received again by changing sign in first half of them — due to the fact distance between n and n+2 is π. Thus, for n=0, 4 the factor is 1 and for n=2, 6 it is -1; for n=1, 5 it equals -i and forn=3, 7 it is i.

On the outer level we have just one sum for every transform component, and the period of the exponent multiplier is 8. Which gives us 1×8=8 sums and second half of them could be received by changing sign in the first half.

So, on every calculation level we have 8 sums. In terms of N it means we have log2N levels and N sums on every level, which gives usO(Nlog2N) order of number of operations. On the other hand having the constant number of sums on every level means we can process data in-place.

In summary, we have got fast Fourier transformFFT. Now it is time to develop step-by-step instruction list to be carved in code.

3. FFT algorithm

Having understanding of what features of DFT we are going to exploit to speed-up calculation we can write down the following algorithm:

  1. Prepare input data for summation — put them into convenient order;
  2. For every summation level:
    1. For every exponent factor of the half-period:
      1. Calculate factor;
      2. For every sum of this factor:
        1. Calculate product of the factor and the second term of the sum;
        2. Calculate sum;
        3. Calculate difference.

First step of reordering is putting input data into their natural order in (7): {f0, f1, f2, f3, f4, f5, f6, f7}→{f0, f4, f2, f6, f1, f5, f3, f7}. Since this new order was received as result of successive splitting terms into even and odd ones, in binary domain it looks like ordering based on bit greatness starting from the least significant bit — see table 1.

0 000 000 000 0
1 001 010 100 4
2 010 100 010 2
3 011 110 110 6
4 100 001 001 1
5 101 011 101 5
6 110 101 011 3
7 111 111 111 7

Table. 1. Reordering in binary domain.

This leads to “mirrored arithmetics” — result binary column in the mirror looks like {0, 1, 2, 3, 4, 5, 6, 7} — our initial order indeed. Thus, to reorder input elements it is enough just to count in mirrored manner. Since double mirroring gives again the same number, reordering reduces to swaps.

Summation levels include our parenthesis, brackets and outer level. In general case this leads to iterations on pairs, quadruples, octads and so on.

Further, we iterate on components of half-period, second half-period we are getting as result of taking differences instead of sums for the first half. Thus, for the deepest level of parenthesis period is 2 and half-period is 1, which means this cycle will be executed only once. For the second level period is 4 and half-period is 2 and cycle will be executed 2 times. In general case we have succession 1, 2, 4, 8, … for this cycle.

Factor calculation is calculation of our imaginary exponent. To restrict the number of trigonometric function calls (2) we use the recurrence:

Trigonometric recurrence (10)

Which is indeed expression

Trigonometric recurrence (11)

written in a tricky way not to lose significance for small δ — for this case cosδ≈1 and sinδδ, which tells us that for cosδ memory will be just packed with .999999(9) but for sinδ there will be much more useful information. Thus, (10) is just the way to eliminate cosδ from calculations. If you look back at (7) you will find, that for N=8 δ=π/2, π/4 — not a small numbers. But for transforms of much bigger Nδ=π/2, π/4, π/8, … up to 2π/N, for sure, could be very small.

The innermost loop looks for sums, where calculated imaginary exponent present, calculates product and takes sum and difference, which is the sum for the second half-period at π distance, where our exponent changes its sign but not the absolute value according to (9). To perform in-place processing we utilize the following scheme:

Fig. 1. Butterfly.Fig. 1. Butterfly.

This operation is elementary brick of in-place FFT calculation and usually is called butterfly. The bottom term is multiplied by imaginary exponent and then sum of the terms is stored in place of the upper term and difference is stored in place of the bottom term. General butterfly picture is depicted below — fig. 2.

Fig. 2. All butterflies for N=8.Fig. 2. All butterflies for N=8.

Elegant scheme. It is time to engrave this beauty in code. But before we delve into programming let us make a small digression: it is known thing that Fourier transform is a transfer to frequency domain, so, let us see how to be back from the realm of frequencies.

4. Inverse Fourier transform

Expression for inverse Fourier transform is

Inverse Fourier transform (12)

and its discrete counterpart is

Inverse discrete Fourier transform (13)

Thus, the difference between forward (3) and inverse (13) transforms is just a sign and not necessary scale factor — one does not need it if interested in ratio between components but not in absolute values. It means that the routine for forward transform with slight modification can perform inverse one as well.

5. FFT programming

Here is our FFT class declaration.

class CFFT
{
public:
   //   FORWARD FOURIER TRANSFORM
   //     Input  - input data
   //     Output - transform result
   //     N      - length of both input data and result
   static bool Forward(const complex *const Input, complex *const Output,
      const unsigned int N);

   //   FORWARD FOURIER TRANSFORM, INPLACE VERSION
   //     Data - both input data and output
   //     N    - length of both input data and result
   static bool Forward(complex *const Data, const unsigned int N);

   //   INVERSE FOURIER TRANSFORM
   //     Input  - input data
   //     Output - transform result
   //     N      - length of both input data and result
   //     Scale  - if to scale result
   static bool Inverse(const complex *const Input, complex *const Output,
      const unsigned int N, const bool Scale = true);

   //   INVERSE FOURIER TRANSFORM, INPLACE VERSION
   //     Data  - both input data and output
   //     N     - length of both input data and result
   //     Scale - if to scale result
   static bool Inverse(complex *const Data, const unsigned int N,
      const bool Scale = true);

protected:
   //   Rearrange function and its inplace version
   static void Rearrange(const complex *const Input, complex *const Output,
      const unsigned int N);
   static void Rearrange(complex *const Data, const unsigned int N);

   //   FFT implementation
   static void Perform(complex *const Data, const unsigned int N,
      const bool Inverse = false);

   //   Scaling of inverse FFT result
   static void Scale(complex *const Data, const unsigned int N);
};

Class has four public methods for performing FFT: two functions for forward transform and two ones for inverse transform. Every couple consists of in-place version and version that preserves input data and outputs transform result into provided array.

Protected section of the class has as well four functions: two functions for data preprocessing — putting them into convenient order, core function for transform performing and auxiliary function for scaling the result of inverse transform.

Every of four public methods is very similar and is indeed wrapper that controls processing workflow. Let us see how one of them designed.

//   FORWARD FOURIER TRANSFORM, INPLACE VERSION
//     Data - both input data and output
//     N    - length of both input data and result
bool CFFT::Forward(complex *const Data, const unsigned int N)
{
   //   Check input parameters
   if (!Data || N < 1 || N & (N - 1))
      return false;
   //   Rearrange
   Rearrange(Data, N);
   //   Call FFT implementation
   Perform(Data, N);
   //   Succeeded
   return true;
}

Inside wrapper you can find check of input parameters, then data preparation — rearrangement, — and transform itself. Rearrange function uses our “mirrored mathematics” to define new position for every element and swaps elements:

//   Inplace version of rearrange function
void CFFT::Rearrange(complex *const Data, const unsigned int N)
{
   //   Swap position
   unsigned int Target = 0;
   //   Process all positions of input signal
   for (unsigned int Position = 0; Position < N; ++Position)
   {
      //   Only for not yet swapped entries
      if (Target > Position)
      {
         //   Swap entries
         const complex Temp(Data[Target]);
         Data[Target] = Data[Position];
         Data[Position] = Temp;
      }
      //   Bit mask
      unsigned int Mask = N;
      //   While bit is set
      while (Target & (Mask >>= 1))
         //   Drop bit
         Target &= ~Mask;
      //   The current bit is 0 - set it
      Target |= Mask;
   }
}

While loop implements addition of 1 in mirrored manner to get target position for the element.

Now there is turn of the core method, which performs our fast Fourier transform.

//   FFT implementation
void CFFT::Perform(complex *const Data, const unsigned int N,
   const bool Inverse /* = false */)
{
   const double pi = Inverse ? 3.14159265358979323846 : -3.14159265358979323846;
   //   Iteration through dyads, quadruples, octads and so on...
   for (unsigned int Step = 1; Step < N; Step <<= 1)
   {
      //   Jump to the next entry of the same transform factor
      const unsigned int Jump = Step << 1;
      //   Angle increment
      const double delta = pi / double(Step);
      //   Auxiliary sin(delta / 2)
      const double Sine = sin(delta * .5);
      //   Multiplier for trigonometric recurrence
      const complex Multiplier(-2. * Sine * Sine, sin(delta));
      //   Start value for transform factor, fi = 0
      complex Factor(1.);
      //   Iteration through groups of different transform factor
      for (unsigned int Group = 0; Group < Step; ++Group)
      {
         //   Iteration within group 
         for (unsigned int Pair = Group; Pair < N; Pair += Jump)
         {
            //   Match position
            const unsigned int Match = Pair + Step;
            //   Second term of two-point transform
            const complex Product(Factor * Data[Match]);
            //   Transform for fi + pi
            Data[Match] = Data[Pair] - Product;
            //   Transform for fi
            Data[Pair] += Product;
         }
         //   Successive transform factor via trigonometric recurrence
         Factor = Multiplier * Factor + Factor;
      }
   }
}

The code is exact reflection of our FFT algorithm and butterfly scheme in fig. 2. Difference between forward and inverse transforms is reflected in the first line of the method where the proper sign for exponent argument is set. Initializations inside outer loop are just preparations for successive calculation of factors via trigonometric recurrence. And the job done inside inner loop, which performs butterfly operations. Trigonometric recurrence in the last line is exactly our expression (10).

Wrappers for inverse transform are designed the same way as for forward one:

//   INVERSE FOURIER TRANSFORM, INPLACE VERSION
//     Data  - both input data and output
//     N     - length of both input data and result
//     Scale - if to scale result
bool CFFT::Inverse(complex *const Data, const unsigned int N,
   const bool Scale /* = true */)
{
   //   Check input parameters
   if (!Data || N < 1 || N & (N - 1))
      return false;
   //   Rearrange
   Rearrange(Data, N);
   //   Call FFT implementation
   Perform(Data, N, true);
   //   Scale if necessary
   if (Scale)
      CFFT::Scale(Data, N);
   //   Succeeded
   return true;
}

The only difference is conditional scaling of the result at postprocessing stage. By default the scaling is performed according to (13) but if one interested only in relative values she can drop the corresponding flag to skip this operation. Scaling itself is a primitive function below.

//   Scaling of inverse FFT result
void CFFT::Scale(complex *const Data, const unsigned int N)
{
   const double Factor = 1. / double(N);
   //   Scale all data entries
   for (unsigned int Position = 0; Position < N; ++Position)
      Data[Position] *= Factor;
}

Well, our FFT is ready. You can download full source code here:

Download fast Fourier transform — FFT — C++ source code (zip, 4 Kb)

Full file listings are available online as well:

Optimized for high performance source code of complex number class you can find here:

6. How to use

To utilize the FFT you should include header file fft.h and place in your code lines like:

...Usually at the top of the file
//   Include FFT header
#include "fft.h"
.
...Some code here
.
   ...Inside your signal processing function
   //   Allocate memory for signal data
   complex *pSignal = new complex[1024];
   ...Fill signal array with data
   //   Apply FFT
   CFFT::Forward(pSignal, 1024);
   ...Utilize transform result
   //   Free memory
   delete[] pSignal;

Here in-place forward Fourier transform performed for signal of 1024 samples size.

7. Final remarks

The FFT algorithm implemented in literature is called Cooley-Tukey. There is also Sand-Tukey algorithm that rearranges data after performing butterflies and in its case butterflies look like ours in fig. 2 but mirrored to the right so that big butterflies come first and small ones do last.

From all our considerations follows that length of input data for our algorithm should be power of 2. In the case length of the input data is not power of 2 it is good idea to extend the data size to the nearest power of 2 and pad additional space with zeroes or input signal itself in a periodic manner — just copy necessary part of the signal from its beginning. Padding with zeroes usually works well.

If you were attentive you could notice that butterflies in parenthesis and brackets in (7) do not really need multiplications but only additions and subtractions. So, optimizing two deepest levels of butterflies we can even improve FFT performance.

Posted in Image / Video Filters | Tagged: , | Leave a Comment »

Gaussian filter, or Gaussian blur

Posted by Hemprasad Y. Badgujar on October 12, 2014

Gaussian filter, or Gaussian blur

Category. Digital signal and image processing (DSP and DIP) software development.

Abstract. The article is a practical tutorial for Gaussian filter, or Gaussian blur understanding and implementation of its separable version. Article contains theory, C++ source code, programming instructions and sample application.

Download Gaussian filter, or Gaussian blur for Win32 (zip, 890 Kb)

Download Gaussian filter, or Gaussian blur C++ source code (zip, 4 Kb)

Original and blurred images

1. Introduction to Gaussian filter, or Gaussian blur

Gaussian filter is windowed filter of linear class, by its nature is weighted mean. Named after famous scientist Carl Gauss because weights in the filter calculated according to Gaussian distribution — the function Carl used in his works. Another name for this filter is Gaussian blur.

To get acquainted with filter window idea in signal and image processing read our “Filter window, or filter mask” article.

2. Understanding Gaussian filter, or Gaussian blur

First of all let us have a look at what that Gaussian distribution is. Gaussian distribution, or normal distribution, is really a function of probability theory. Often this function is referenced as bell-function because of its shape. The most general function formula is:

Gaussian distribution (1)

And its plot is depicted below — fig. 1.

Fig. 1. Gaussian or normal distribution.Fig. 1. Gaussian or normal distribution.

In our case we can suppose parameter a — which called distribution mean or statistical expectation — responsible for distribution shifting along x axis to be zero: a=0; and work with simplified form:

Gaussian distribution (2)

Thus the function is negative exponential one of squared argument. Argument divider σ plays the role of scale factor. σ parameter has special name: standard deviation, and its square σ2variance. Premultiplier in front of the exponent is selected the area below plot to be 1. Pay attention the function is defined everywhere on real axis x∈(-∞, ∞) which means it spreads endlessly to the left and to the right.

Now, first point is we are working in discrete realm, so our Gaussian distribution turns into the set of values at discrete points.

Second, we cannot work with something that spreads endlessly to the left and to the right. It means Gaussian distribution is to be truncated. The question is — where? Usually in practice used the rule of 3σ that is the part of Gaussian distribution utilized is x∈[-3σ, 3σ] — see fig. 2.

Fig. 2. Gaussian distribution truncated at points ±3σ.Fig. 2. Gaussian distribution truncated at points ±3σ.

Why? Good question. To understand that let us see how much we have trimmed. The area below truncated part is:

Area below truncated Gaussian distribution (3)

Now we can start up MatLab and type lines like:

G=inline('1/sqrt(2*pi)*exp(-x.^2/2)');
quad(G, -3, 3)

Which tells MatLab to calculate our integral (3). The result is

ans =
    0.9973

So, the area below our trimmed part is ≈0.997 — that is function is trimmed at points where we have accuracy better than 0.5%. Very good in our case.

Back to our discrete normal distribution: we are interested in points {x-N, x-N+1, … , x-1, x0, x1, … , xN-1, xN}, where x-n=-xn and xN=3σ, and respectively in values of normal distribution at these points: {G(x-N), G(x-N+1), … , G(x-1), G(x0), G(x1), … , G(xN-1), G(xN)}. So, we have 2N+1 value set {Gn | n=-N, -N+1, … , N} where Gn=G(xn).

What shall we do now with this set of values? Use it as window weights! That means we have weighted window of 2N+1 size. To be perfect we are to scale our Gn: G’n=Gn/k so that ∑G’n=1 — sum of all of them to be one: G’n=Gn/∑Gn — which means k=∑Gn.

Thus, we have our window weights {G’n | n=-N, -N+1, … , N} where G’n=Gn/k, Gn=G(xn), k=∑Gn and xN=3σ which means as well xn=3σn/N. To get expression for G’n for practical use let us write down the detailed formula:

Weights calculation (4)

As you can see we have simplification: σ is eliminated from consideration and G’n could be calculated easier via new values G”n and their sumk’.

How to use these weights? If we have some input signal S={si} then for every signal element si new modified value s’i will be s’i=∑G’nsi+n, n=-N, -N+1, … , N. In words that means “for every element put our window so that this element is in the center of the window, multiply every element in the window by corresponding weight and sum up all those products, the sum got is new filtered value”.

Now we can write down step-by-step instructions for processing by 1D Gaussian filter or blur.

1D Gaussian filter, or Gaussian blur algorithm:

  1. Given window size 2N+1 calculate support points xn=3n/N, n=-N, -N+1, … , N;
  2. Calculate values G”n;
  3. Calculate scale factor k’=∑G”n;
  4. Calculate window weights G’n=G”n/k’;
  5. For every signal element:
    1. Place window over it;
    2. Pick up elements;
    3. Multiply elements by corresponding window weights;
    4. Sum up products — this sum is new filtered value.

Let us proceed with 2D case.

3. 2D case

Expression for 2D Gaussian distribution is:

2D Gaussian distribution (5)

And its plot is depicted below — fig. 3.

Fig. 3. 2D Gaussian or normal distribution.Fig. 3. 2D Gaussian or normal distribution.

Gaussian distribution has surprising property. Look, its expression could be rewritten as:

2D Gaussian distribution separation (6)

Which means 2D distribution is split into a pair of 1D ones, and so, 2D filter window (fig. 3) is separated into a couple of 1D ones from fig. 2. Filter version that utilizes this fact is called separable one.

In practice it means that to apply filter to an image it is enough to filter it in horizontal direction with 1D filter and then filter the result with the same filter in vertical direction. Which direction first really makes no difference — our operation is commutative. Thus,

2D separable Gaussian filter, or Gaussian blur, algorithm:

  1. Calculate 1D window weights G’n;
  2. Filter every image line as 1D signal;
  3. Filter every filtered image column as 1D signal.

2D Gaussian filtering with [2N+1]×[2N+1] window is reduced to a couple of 1D filterings with 2N+1 window. That means significant speed-up especially for large images because of jump from O(N2) to O(N) number of operations.

Now, when we have the algorithm, it is time to write some code — let us come down to programming.

4. 2D Gaussian filter, or 2D Gaussian blur programming

We are starting with 2D filter because 1D one could be easily got just by treating signal as one-line image and canceling vertical filtering.

First of all a couple of simple auxiliary structures.

//   Internal auxiliary structure - data array size descriptor
struct CSize
{
   unsigned int x;   //   Array width
   unsigned int y;   //   Array height

   //   Default constructor
   CSize(): x(0), y(0) {}
   //   Constructor with initialization
   CSize(unsigned int _x, unsigned int _y): x(_x), y(_y) {}

   //   Initialization
   void Set(unsigned int _x, unsigned int _y) { x = _x; y = _y; }
   //   Area
   unsigned int Area() const { return x * y; }
};

//   Internal auxiliary structure - array descriptor
struct CArray
{
   CSize Size;   //   Array size
   T *Buffer;    //   Element buffer

   //   Default constructor
   CArray(): Buffer(NULL) {}
   //   Constructors with initialization
   CArray(T *_Buffer, const CSize &_Size): Buffer(_Buffer), Size(_Size) {}
   CArray(T *_Buffer, unsigned int _N): Buffer(_Buffer), Size(_N, 1) {}
};

CSize structure keeps the image size and CArray structure is image descriptor — its size and pointer to image data. Now more complicated structure — our Gaussian window.

//   Internal auxiliary structure - filter window descriptor
struct CWindow
{
   double *Weights;   //   Window weights
   unsigned int Size;   //   Window size

   //   Default constructor
   CWindow(): Weights(NULL), Size(0), Correction(.5 - double(T(.5))) {}
   //   Destructor
   ~CWindow() { if (Weights) delete[] Weights; }

   //   Filter window creation
   bool Create(unsigned int _Size);

   //   FILTER WINDOW APPLICATION
   //     _Element - start element in signal/image
   T Apply(const T *_Element) const
   {
      //   Apply filter - calculate weighted mean
      double Sum = 0.;
      const double *WeightIter = Weights;
      const T *ElIter = _Element;
      const double *const End = Weights + Size;
      while (WeightIter < End)
         Sum += *(WeightIter++) * double(*(ElIter++));
      return T(Sum + Correction);
   }

protected:
   const double Correction;   //   Result correction
};

CWindow structure designed to keep window size and set of weights, as well it has two methods to calculate weights and to apply 1D Gaussian filter starting from a given image element: Create and Apply. As you can see Apply code is trivial and code for Create is a little bit more complicated:

//   FILTER WINDOW CREATION
//     _Size - window size
template <class T> bool TGaussianBlur<T>::CWindow::Create(unsigned int _Size)
{
   //   Allocate window buffer
   Weights = new double[_Size];
   //   Check allocation
   if (!Weights)
      //   Window buffer allocation failed
      return false;
   //   Set window size
   Size = _Size;
   //   Window half
   const unsigned int Half = Size >> 1;
   //   Central weight
   Weights[Half] = 1.;
   //   The rest of weights
   for (unsigned int Weight = 1; Weight < Half + 1; ++Weight)
   {
      //   Support point
      const double x = 3.* double(Weight) / double(Half);
      //   Corresponding symmetric weights
      Weights[Half - Weight] = Weights[Half + Weight] = exp(-x * x / 2.);
   }
   //   Weight sum
   double k = 0.;
   for (unsigned int Weight = 0; Weight < Size; ++Weight)
      k += Weights[Weight];
   //   Weight scaling
   for (unsigned int Weight = 0; Weight < Size; ++Weight)
      Weights[Weight] /= k;
   //   Succeeded
   return true;
}

The method implements steps 1–4 of 1D Gaussian filter algorithm to calculate weights according to expression (4) and utilizes window symmetry G-n=Gn.

Now, the last problem to be solved before we can start filtering the image is its extension.

5. Extension

There is a problem every windowed filter deals with. Being placed at the edge filter window lacks for data elements to be processed. There are two ways to solve the problem: first is not to process edges and second, cleverer one, to extend data across edges. We are taking the second approach and extending our data like depicted in fig. 4.

Fig. 4. Data extension.Fig. 4. Data extension.

And here is CExtension class that will do that job for us.

//   Internal auxiliary structure - array extension descriptor
struct CExtension: public CArray
{
   unsigned int Margin;   //   Extension margins

   enum EMode {ModeHorizontal, ModeVertical};

   //   Default cosntructor
   CExtension(): Margin(0), Mode(ModeHorizontal) {}
   //   Destructor
   ~CExtension() { if (Buffer) delete[] Buffer; }

   //   Mode setting
   void SetMode(EMode _Mode) { Mode = _Mode; }
   //   Extension memory allocation
   bool Allocate(unsigned int _N, unsigned int _W)
   { return _Allocate(CSize(_N, 1), _W >> 1); }
   bool Allocate(const CSize &_Size, unsigned int _W)
   { return _Allocate(_Size, _W >> 1); }
   //   Pasting data into extension from data array
   void Paste(const T * _Start);
   //   Extension
   void Extend();

protected:
   EMode Mode;   //   Current mode

   //   Extension memory allocation
   bool _Allocate(const CSize &_Size, unsigned int _Margin);
};

The job is done inside methods _Allocate, Paste and Extend. _Allocate method allocates space in memory enough to keep extended image line or column.

//   EXTENSION MEMORY ALLOCATION
//     _Size   - signal/image size
//     _Margin - extension margins
template <class T> bool TGaussianBlur<T>::CExtension::_Allocate(
   const CSize &_Size, unsigned int _Margin)
{
   //   Allocate extension buffer
   Buffer = new T[(_Size.x > _Size.y ? _Size.x : _Size.y) + (_Margin << 1)];
   //   Check buffer allocation
   if (!Buffer)
      //   Buffer allocation failed
      return false;
   //   Initialize size descriptors
   Size = _Size;
   Margin = _Margin;
   //   Succeeded
   return true;
}

Method Paste inserts image line or column at proper place in extension.

//   PASTING DATA LINE/COLUMN INTO EXTENSION FROM DATA ARRAY
//     _Start - start postion in image/signal to paste from
template <class T> void TGaussianBlur<T>::CExtension::Paste(const T *const _Start)
{
   if (Mode == ModeHorizontal)
   {
      //   Paste line
      memcpy(Buffer + Margin, _Start, Size.x * sizeof(T));
   }
   else
   {
      //   Stop position
      const T *const Stop = _Start + Size.Area();
      //   Array iterator
      const T *ArrIter = _Start;
      //   Extension iterator
      T *ExtIter = Buffer + Margin;
      //   Paste array column element by element
      while (ArrIter < Stop)
      {
         //   Copy line
         *(ExtIter++) = *ArrIter;
         //   Jump to the next line
         ArrIter += Size.x;
      }
   }
}

Pasting has two modes — horizontal and vertical. In horizontal mode line is copied into the extension and in vertical mode column is copied element by element. Method Extend mirrors pasted data.

//   EXTENSION CREATION
template <class T> void TGaussianBlur<T>::CExtension::Extend()
{
   //   Line size
   const unsigned int Line = Mode == ModeHorizontal ? Size.x : Size.y;
   //   Stop position
   const T *const Stop = Buffer - 1;
   //   Left extension iterator
   T *ExtLeft = Buffer + Margin - 1;
   //   Left array iterator
   const T *ArrLeft = ExtLeft + 2;
   //   Right extension iterator
   T *ExtRight = ExtLeft + Line + 1;
   //   Left array iterator
   const T *ArrRight = ExtRight - 2;
   //   Create extension line element by element
   while (ExtLeft > Stop)
   {
      //   Left extension
      *(ExtLeft--) = *(ArrLeft++);
      //   Right extension
      *(ExtRight++) = *(ArrRight--);
   }
}

Now we have everything to program filtering method.

//   2D GAUSSIAN BLUR
//     pImage  - input image
//     pResult - output image, NULL for inplace processing
//     N       - width of the image
//     M       - height of the image
//     W       - window size
template <class T> bool TGaussianBlur<T>::Filter(T *pImage, T *pResult,
   unsigned int N, unsigned int M, unsigned int W) const
{
   //   Check input data consistency
   if (!Consistent(pImage, CSize(N, M), W))
     return false;
   //   Allocate extension
   CExtension Extension;
   if (!Extension.Allocate(CSize(N, M), W))
      return false;
   //   Create image descriptor
   CArray Image(pImage, CSize(N, M));
   //   Create filter window
   CWindow Window;
   if (!Window.Create(W))
      return false;
   //   Stop postion
   const T * ExtStop = Extension.Buffer + Extension.Size.x;
   //   Result iterator
   T *ResIter = pResult ? pResult : pImage;
   //   Image iterator
   const T *ImIter = Image.Buffer;
   //   Image stop position
   const T * ImStop = Image.Buffer + Image.Size.Area();
   //   Filter line by line
   while (ImIter < ImStop)
   {
      //   Paste image line into extension
      Extension.Paste(ImIter);
      //   Extend image line
      Extension.Extend();
      //   Extension iterator
      const T *ExtIter = Extension.Buffer;
      //   Apply filter to every pixel of the line
      while (ExtIter < ExtStop)
         *(ResIter++) = Window.Apply(ExtIter++);
         //   Move to the next line
      ImIter += Image.Size.x;
   }
   //   Initialize image descriptor with filter result
   Image.Buffer = pResult ? pResult : pImage;
   //   Set vertical extension mode
   Extension.SetMode(CExtension::ModeVertical);
   //   Extension stop position
   ExtStop = Extension.Buffer + Extension.Size.y;
   //   Result column iterator
   T *ResColumnIter = pResult ? pResult : pImage;
   //   Image iterator
   ImIter = Image.Buffer;
   //   Image stop position
   ImStop = Image.Buffer + Image.Size.x;
   //   Filter column by column
   while (ImIter < ImStop)
   {
      //   Paste image column into extension
      Extension.Paste(ImIter++);
      //   Extend image column
      Extension.Extend();
      //   Extension iterator
      const T *ExtIter = Extension.Buffer;
      //   Result pixel iterator
      ResIter = ResColumnIter;
      //   Apply fitler to every pixel of the column
      while (ExtIter < ExtStop)
      {
         *ResIter = Window.Apply(ExtIter++);
         ResIter += Image.Size.x;
      }
      //   Move to the next column
      ++ResColumnIter;
   }
   //   Succeeded
   return true;
}

Structure of the method is quite straightforward: input parameters check, memory allocation, window weights calculation, applying 1D filter in horizontal direction (first loop) and applying it in vertical direction (second loop). Parameters check is a short function below.

//   Internal auxiliary functions - check input data consistency
bool Consistent(const T *_Image, const CSize &_Size, unsigned int _W) const
{
   return  _Image && _Size.x && _Size.y && _W &&
      _Size.x > (_W >> 1) && _Size.y > (_W >> 1) && _W & 1;
}

Which means pointer to image should not be NULL, image and filter window sizes should be some positive numbers, window size 2N+1≡_Wshould be odd number and N should be less than image size in any direction.

6. 1D Gaussian filter, or Gaussian blur programming

1D filter is truncated version of 2D filter:

//   1D GAUSSIAN BLUR
//     pSignal - input signal;
//     pResult - output signal, NULL for inplace processing
//     N       - length of the signal
//     W       - window size, odd positive number
template <class T> bool TGaussianBlur<T>::Filter(T *pSignal, T *pResult,
   unsigned int N, unsigned int W) const
{
   //   Check input data cosnsitency
   if (!Consistent(pSignal, N, W))
      return false;
   //   Allocate extension
   CExtension Extension;
   if (!Extension.Allocate(N, W))
      return false;
   //   Create signal descriptor
   const CArray Signal(pSignal, N);
   //   Paste signal into extension
   Extension.Paste(Signal.Buffer);
   //   Extend signal
   Extension.Extend();
   //   Create filter window
   CWindow Window;
   if (!Window.Create(W))
      return false;
   //   Extension iterator
   const T *ExtIter = Extension.Buffer;
   //   Extension stop position
   const T *const ExtStop = Extension.Buffer + Extension.Size.x;
   //   Result iterator
   T *ResIter = pResult ? pResult : pSignal;
   //   Filter - apply to every element
   while (ExtIter < ExtStop)
      *(ResIter++) = Window.Apply(ExtIter++);
   //   Succeeded
   return true;
}

You can see familiar steps here just now filter makes one pass along signal array.

Our separable Gaussian filter library is ready! You can download full source code here:

Download Gaussian filter, or Gaussian blur C++ source code (zip, 4 Kb)

Full file listings are available online as well:

6. How to use

Pay attention that our solution is universal. First, our filter has window of arbitrary size. Second, we have developed class template that suits any type of input data —

TGaussianBlur<unsigned char> BlurFilter;
TGaussianBlur<short> BlurFilter;
TGaussianBlur<float> BlurFilter;
TGaussianBlur<int> BlurFilter;
TGaussianBlur<unsigned int> BlurFilter;

— all these declarations are valid and will process your data graciously.

To use the filter you should include header file gaussianblur.h and place in your code lines like:

...Usually at the top of the file
//   Include Gaussian blur header
#include "gaussianblur.h"
.
...Some code here
.
   ...Inside your signal/image processing function
   //   Create instance of Gaussian blur filter
   TGaussianBlur<double> BlurFilter;
   //   Apply Gaussian blur
   BlurFilter.Filter(pImage, NULL, 512, 512, 13);

Here image is stored as double array of 512×512 size and it is filtered in-place with Gaussian window of 13×13 pixels width.

And now — an application to play around!

7. Color Gaussian blur

Download Gaussian filter, or Gaussian blur for Win32 (zip, 890 Kb)

We have created an application to see Gaussian filter in action. The sample package includes 3 files — the applications, sample image and description:

  • gaussianblur.exe — Gaussian filter,
  • sample.bmp — 24-bit sample image,
  • readme.txt — description.

Be aware of the fact, that this sample uses OpenGL, so it should be supported by your system (usually that is the case).

8. How to use

Start up gaussianblur.exe application. Load the image.

Fig. 5. Original image. Screenshot.Fig. 5. Original image.

Set filter window size: Set >> Window size… or click w button in toolbar, in dialog key in size, for instance 13. Blur image with Gaussian filter by choosing Set >> Filter or clicking F-button in toolbar. See the result.

Fig. 6. Image blurred by Gaussian filter. Screenshot.Fig. 6. Image blurred by Gaussian filter.

Posted in Image / Video Filters | Leave a Comment »

Hybrid median filter

Posted by Hemprasad Y. Badgujar on October 12, 2014

Hybrid median filter

Category. Digital image processing (DIP) software development.

Abstract. The article is a practical tutorial for hybrid median filter understanding and implementation. Article contains theory, C++ source code, programming instructions and sample applications.

Download hybrid median filter for Win32 (zip, 1.03 Mb)

Download hybrid median filter C++ source code (zip, 2 Kb)

Corrupted and restored images

1. Introduction to hybrid median filter

Hybrid median filter is windowed filter of nonlinear class, that easily removes impulse noise while preserving edges. In comparison with basic version of the median filter hybrid one has better corner preserving characteristics. The basic idea behind filter is for any element of the signal (image) apply median technique several times varying window shape and then take the median of the got median values.

To get acquainted with filter window idea in signal and image processing read our “Filter window, or filter mask” article.

2. Understanding hybrid median filter

Now let us see, how hybrid version of the median filter works. In one phrase idea is like “apply median filter with cross-mask, apply median filter with x-mask and take the median of got results and element itself”. Hybrid median filter workflow depicted below in fig. 1.

Fig. 1. Hybrid median filter workflow.Fig. 1. Hybrid median filter workflow.

Due to the nature of the workflow the filter dimension is not less than 2D.

Now let us write down step-by-step instructions for processing by hybrid median filter.

Hybrid median filter algorithm:

  1. Place a cross-window over element;
  2. Pick up elements;
  3. Order elements;
  4. Take the middle element;
  5. Place a x-window over element;
  6. Pick up elements;
  7. Order elements;
  8. Take the middle element;
  9. Pick up result in point 4, 8 and element itself;
  10. Order elements;
  11. Take the middle element.

Now, when we have the algorithm, it is time to write some code — let us come down to programming.

3. Hybrid median filter programming

We can see that operation of taking median — “order elements” and “take the middle element” — repeated three times in the list. So it is good idea to put those steps into separate function:

//   MEDIAN calculation
//     elements - input elements
//     N        - number of input elements
element median(element* elements, int N)
{
   //   Order elements (only half of them)
   for (int i = 0; i < (N >> 1) + 1; ++i)
   {
      //   Find position of minimum element
      int min = i;
      for (int j = i + 1; j < N; ++j)
         if (elements[j] < elements[min])
            min = j;
      //   Put found minimum element in its place
      const element temp = elements[i];
      elements[i] = elements[min];
      elements[min] = temp;
   }
   //   Get result - the middle element
   return elements[N >> 1];
}

In the function above we are using a code optimization trick. Since everything we need is middle element, it is enough to order just a half of elements.

Let us have 2D signal or image of length N×M as input and let we choose filter window of 3×3 size. The first step is window placing — we do that by changing index of the leading element:

//   Move window through all elements of the image
for (int m = 1; m < M - 1; ++m)
   for (int n = 1; n < N - 1; ++n)

Pay attention, that we are not processing first and last rows and columns. The problem is we cannot process elements at edges, because in this case the part of the filter window is empty. We will discuss below, how to solve that problem.

The second step is picking up elements around with cross-window, ok:

//   Pick up cross-window elements
window[0] = image[(m - 1) * N + n];
window[1] = image[m * N + n - 1];
window[2] = image[m * N + n];
window[3] = image[m * N + n + 1];
window[4] = image[(m + 1) * N + n];

The third and fourth steps are done by our auxiliary function:

//   Get median
results[0] = median(window, 5);

Fifth and sixth steps are picking up elements with x-window:

//   Pick up x-window elements
window[0] = image[(m - 1) * N + n - 1];
window[1] = image[(m - 1) * N + n + 1];
window[2] = image[m * N + n];
window[3] = image[(m + 1) * N + n - 1];
window[4] = image[(m + 1) * N + n + 1];

Seventh and eighth steps are made again by our auxiliary function:

//   Get median
results[1] = median(window, 5);

To complete ninth step enough to pick up our leading element itself:

//   Pick up leading element
results[2] = image[m * N + n];

Final steps are performed by our useful function:

//   Get result
result[(m - 1) * (N - 2) + n - 1] = median(results, 3);

At last, let us write down the entire algorithm as function:

//   2D HYBRID MEDIAN FILTER implementation
//     image  - input image
//     result - output image
//     N      - width of the image
//     M      - height of the image
void _hybridmedianfilter(const element* image, element* result, int N, int M)
{
   //   Move window through all elements of the image
   for (int m = 1; m < M - 1; ++m)
      for (int n = 1; n < N - 1; ++n)
      {
         element window[5];
         element results[3];
         //   Pick up cross-window elements
         window[0] = image[(m - 1) * N + n];
         window[1] = image[m * N + n - 1];
         window[2] = image[m * N + n];
         window[3] = image[m * N + n + 1];
         window[4] = image[(m + 1) * N + n];
         //   Get median
         results[0] = median(window, 5);
         //   Pick up x-window elements
         window[0] = image[(m - 1) * N + n - 1];
         window[1] = image[(m - 1) * N + n + 1];
         window[2] = image[m * N + n];
         window[3] = image[(m + 1) * N + n - 1];
         window[4] = image[(m + 1) * N + n + 1];
         //   Get median
         results[1] = median(window, 5);
         //   Pick up leading element
         results[2] = image[m * N + n];
         //   Get result
         result[(m - 1) * (N - 2) + n - 1] = median(results, 3);
      }
}

Looks very straightforward.

Type element could be defined as:

typedef double element;

4. Treating edges

For all window filters there is some problem. That is edge treating. If you place window over an element at the edge, some part of the window will be empty. To fill the gap, signal should be extended. For hybrid median filter there is good idea to extend image symmetrically, like this:

Fig. 2. Image extension.Fig. 2. Image extension.

In other words we are adding lines at the top and at the bottom of the image and add columns to the left and to the right of it.

So, before passing signal to our median filter function the signal should be extended. Let us write down the wrapper, that makes all preparations.

//   2D HYBRID MEDIAN FILTER wrapper
//     image  - input image
//     result - output image
//     N      - width of the image
//     M      - height of the image
void hybridmedianfilter(element* image, element* result, int N, int M)
{
   //   Check arguments
   if (!image || N < 1 || M < 1)
      return;
   //   Allocate memory for signal extension
   element* extension = new element[(N + 2) * (M + 2)];
   //   Check memory allocation
   if (!extension)
      return;
   //   Create image extension
   for (int i = 0; i < M; ++i)
   {
      memcpy(extension + (N + 2) * (i + 1) + 1,
         image + N * i,
         N * sizeof(element));
      extension[(N + 2) * (i + 1)] = image[N * i];
      extension[(N + 2) * (i + 2) - 1] = image[N * (i + 1) - 1];
   }
   //   Fill first line of image extension
   memcpy(extension,
      extension + N + 2,
      (N + 2) * sizeof(element));
   //   Fill last line of image extension
   memcpy(extension + (N + 2) * (M + 1),
      extension + (N + 2) * M,
      (N + 2) * sizeof(element));
   //   Call hybrid median filter implementation
   _hybridmedianfilter(extension, result ? result : image, N + 2, M + 2);
   //   Free memory
   delete[] extension;
}

As you can see, our code takes into account some practical issues. First of all we check our input parameters — image should not be NULL, and image sizes should be positive:

//   Check arguments
if (!image || N < 1 || M < 1)
   return;

Now let us allocate memory for signal extension.

//   Allocate memory for signal extension
element* extension = new element[(N + 2) * (M + 2)];

And check memory allocation.

//   Check memory allocation
if (!extension)
   return;

Now we are building extension.

//   Create signal extension
for (int i = 0; i < M; ++i)
{
   memcpy(extension + (N + 2) * (i + 1) + 1,
      image + N * i,
      N * sizeof(element));
   extension[(N + 2) * (i + 1)] = image[N * i];
   extension[(N + 2) * (i + 2) - 1] = image[N * (i + 1) - 1];
}
//   Fill first line of image extension
memcpy(extension,
   extension + N + 2,
   (N + 2) * sizeof(element));
//   Fill last line of image extension
memcpy(extension + (N + 2) * (M + 1),
   extension + (N + 2) * M,
   (N + 2) * sizeof(element));

Finally, everything is ready — filtering!

//   Call hybrid median filter implementation
_hybridmedianfilter(extension, result ? result : image, N + 2, M + 2);

And to complete the job — free memory.

//   Free memory
delete[] extension;

Since we are using memory management function from standard library, we should include its header.

#include <memory.h>

5. Hybrid median filter declaration file

Now we have three functions: helper function, hybrid median filter and entry point that makes preparations. Let us write code for header file.

#ifndef _HYBRIDMEDIANFILTER_H_
#define _HYBRIDMEDIANFILTER_H_

//   Image element type
typedef double element;

//   2D HYBRID MEDIAN FILTER, window size 3x3
//     image  - input image
//     result - output image, NULL for inplace processing
//     N      - width of the image
//     M      - height of the image
void hybridmedianfilter(element* image, element* result, int N, int M);

#endif

Now all is ready. The code we have written is good both for Linux/Unix and Windows platforms. You can download full hybrid median filter source code here:

Download hybrid median filter C++ source code (zip, 2 Kb)

Full file listings are available online as well:

And now — a couple of applications to play around!

6. Color median filter: image restoration

Download hybrid median filter for Win32 (zip, 1.03 Mb)

We have created a couple of applications to show hybrid median filter capabilities in restoration images corrupted by impulse noise. The sample package includes 4 files — two applications, sample image and description:

  • hybridmedian.exe — median filter,
  • corrupter.exe — destructive noise generator,
  • sample.bmp — 24-bit sample image,
  • readme.txt — description.

Be aware of the fact, that this sample uses OpenGL, so it should be supported by your system (usually that is the case).

7. Step 1: prepare corrupted image

We have created impulse noise generator that will help us to prepare corrupted image. Start up corrupter.exe application and load image to be corrupted. Choose Set >> Corruption… or click N button in toolbar and set noise level at 5–15%. Click OK. Then save corrupted image.

Fig. 3. Corruption by impulse noise. Screenshot.Fig. 3. Corruption by impulse noise.

8. Step 2: restore corrupted image

Start up hybridmedian.exe application. Load the saved corrupted image. Apply hybrid median filter by choosing Set >> Filter or clickingF-button in toolbar. See the result. If necessary, filter the image one more time.

Fig. 4. Image restored by hybrid median filter. Screenshot.Fig. 4. Image restored by hybrid median filter.

 

Posted in Image / Video Filters | Tagged: | Leave a Comment »

Alpha-trimmed mean filter

Posted by Hemprasad Y. Badgujar on October 12, 2014

Alpha-trimmed mean filter

Category. Digital signal and image processing (DSP and DIP) software development.

Abstract. The article is a practical tutorial for alpha-trimmed mean filter understanding and implementation. Article contains theory, C++ source code, programming instructions and sample applications.

Download alpha-trimmed mean filter for Win32 (zip, 603 Kb)

Download alpha-trimmed mean filter C++ source code (zip, 2 Kb)

Corrupted and restored images

1. Introduction to alpha-trimmed mean filter

Alpha-trimmed mean filter is windowed filter of nonlinear class, by its nature is hybrid of the mean and median filters. The basic idea behind filter is for any element of the signal (image) look at its neighborhood, discard the most atypical elements and calculate mean value using the rest of them. Alpha you can see in the name of the filter is indeed parameter responsible for the number of trimmed elements.

To get acquainted with filter window idea in signal and image processing read our “Filter window, or filter mask” article.

2. Understanding alpha-trimmed mean filter

Now let us see, how to get alpha-trimmed mean value in practice. The basic idea here is to order elements, discard elements at the beginning and at the end of the got ordered set and then calculate average value using the rest. For instance, let us calculate alpha-trimmed mean for the case, depicted in fig. 1.

Fig. 1. Alpha-trimmed mean calculation.Fig. 1. Alpha-trimmed mean calculation.

Thus, to get an alpha-trimmed mean we are to order elements, eliminate elements at the beginning and at the end of the got sequenced collection and get average of the remaining elements. That is all — now alpha-trimmed mean calculated and signal, 1D in our case, filtered by alpha-trimmed mean filter! Let us make resume and write down step-by-step instructions for processing by alpha-trimmed mean filter.

Alpha-trimmed mean filter algorithm:

  1. Place a window over element;
  2. Pick up elements;
  3. Order elements;
  4. Discard elements at the beginning and at the end of the got ordered set;
  5. Take an average — sum up the remaining elements and divide the sum by their number.

A couple of words about alpha parameter responsible for trimmed elements. In practice alpha is the number of elements to be discarded, for instance, in our case alpha is two. Since our filter is symmetric one alpha is an even nonnegative number less than size of the filter window. Minimum value for alpha parameter is zero and in this case alpha-trimmed mean filter degenerates into mean filter. Maximum value for alpha is filter window size minus one and in this case filter degenerates into median filter.

Now, when we have the algorithm, it is time to write some code — let us come down to programming.

3. 1D alpha-trimmed mean filter programming

In this section we develop 1D alpha-trimmed mean filter with window of size 5. Let us have 1D signal of length N as input. The first step is window placing — we do that by changing index of the leading element:

//   Move window through all elements of the signal
for (int i = 2; i < N - 2; ++i)

Pay attention, that we are starting with the third element and finishing with the last but two. The problem is we cannot start with the first element, because in this case the left part of the filter window is empty. We will discuss below, how to solve that problem.

The second step is picking up elements around, ok:

//   Pick up window elements
for (int j = 0; j < 5; ++j)
   window[j] = signal[i - 2 + j];

The third step is putting window elements in order. But we will use a code optimization trick here. As soon as we need only middle part of the ordered set we can omit putting in order last α/2 elements. So, it is enough to process just 5-α/2 elements:

//   End of the trimmed ordered set
const int end = 5 - (alpha >> 1);

Now:

//   Order elements (only necessary part of them)
for (int j = 0; j < end; ++j)
{
   //   Find position of minimum element
   int min = j;
   for (int k = j + 1; k < 5; ++k)
      if (window[k] < window[min])
         min = k;
   //   Put found minimum element in its place
   const element temp = window[j];
   window[j] = window[min];
   window[min] = temp;
}

To discard elements at the beginning of the ordered set let us define the proper constant:

//   Start of the trimmed ordered set
const int start = alpha >> 1;

The final step — taking the average:

//   Get result - the mean value of the elements in trimmed set
result[i - 2] = window[start];
for (int j = start + 1; j < end; ++j)
   result[i - 2] += window[j];
result[i - 2] /= N - alpha;

At last, let us write down the entire algorithm as function:

//   1D ALPHA-TRIMMED MEAN FILTER implementation
//     signal - input signal
//     result - output signal
//     N      - length of the signal
//     alpha  - filter alpha parameter
void _alphatrimmedmeanfilter(const element* signal, element* result,
   int N, int alpha)
{
   //   Start of the trimmed ordered set
   const int start = alpha >> 1;
   //   End of the trimmed ordered set
   const int end = 5 - (alpha >> 1);
   //   Move window through all elements of the signal
   for (int i = 2; i < N - 2; ++i)
   {
      //   Pick up window elements
      element window[5];
      for (int j = 0; j < 5; ++j)
         window[j] = signal[i - 2 + j];
      //   Order elements (only necessary part of them)
      for (int j = 0; j < end; ++j)
      {
         //   Find position of minimum element
         int min = j;
         for (int k = j + 1; k < 5; ++k)
            if (window[k] < window[min])
               min = k;
         //   Put found minimum element in its place
         const element temp = window[j];
         window[j] = window[min];
         window[min] = temp;
      }
      //   Get result - the mean value of the elements in trimmed set
      result[i - 2] = window[start];
      for (int j = start + 1; j < end; ++j)
         result[i - 2] += window[j];
      result[i - 2] /= 5 - alpha;
   }
}

Type element could be defined as:

typedef double element;

4. Treating edges

For all window filters there is some problem. That is edge treating. If you place window over first (last) element, the left (right) part of the window will be empty. To fill the gap, signal should be extended. For alpha-trimmed mean filter there is good idea to extend signal symmetrically, like this:

Fig. 2. Signal extension.Fig. 2. Signal extension.

So, before passing signal to our alpha-trimmed mean filter function the signal should be extended. Let us write down the wrapper, that makes all preparations.

//   1D ALPHA-TRIMMED MEAN FILTER wrapper
//     signal - input signal
//     result - output signal
//     N      - length of the signal
//     alpha  - filter alpha parameter
void alphatrimmedmeanfilter(element* signal, element* result, int N, int alpha)
{
   //   Check arguments
   if (!signal || N < 1 || alpha < 0 || 4 < alpha || alpha & 1)
      return;
   //   Treat special case N = 1
   if (N == 1)
   {
      if (result)
         result[0] = signal[0];
      return;
   }
   //   Allocate memory for signal extension
   element* extension = new element[N + 4];
   //   Check memory allocation
   if (!extension)
      return;
   //   Create signal extension
   memcpy(extension + 2, signal, N * sizeof(element));
   for (int i = 0; i < 2; ++i)
   {
      extension[i] = signal[1 - i];
      extension[N + 2 + i] = signal[N - 1 - i];
   }
   //   Call alpha-trimmed mean filter implementation
   _alphatrimmedmeanfilter(extension + 2, result ? result : signal,
      N + 4, alpha);
   //   Free memory
   delete[] extension;
}

As you can see, our code takes into account some practical issues. First of all we check our input parameters — signal should not be NULL, signal length should be positive, alpha parameter should be even nonnegative number less than window size:

//   Check arguments
if (!signal || N < 1 || alpha < 0 || 4 < alpha || alpha & 1)
   return;

Second step — we check case N=1. This case is special one, because to build extension we need at least two elements. For the signal of 1 element length the result is the signal itself. As well pay attention, our alpha-trimmed mean filter works in-place, if output parameter resultis NULL.

//   Treat special case N = 1
if (N == 1)
{
   if (result)
      result[0] = signal[0];
   return;
}

Now let us allocate memory for signal extension.

//   Allocate memory for signal extension
element* extension = new element[N + 4];

And check memory allocation.

//   Check memory allocation
if (!extension)
   return;

Now we are building extension.

//   Create signal extension
memcpy(extension + 2, signal, N * sizeof(element));
for (int i = 0; i < 2; ++i)
{
   extension[i] = signal[1 - i];
   extension[N + 2 + i] = signal[N - 1 - i];
}

Finally, everything is ready — filtering!

//   Call alpha-trimmed mean filter implementation
_alphatrimmedmeanfilter(extension, result ? result : signal, N + 4, alpha);

And to complete the job — free memory.

//   Free memory
delete[] extension;

Since we are using memory management function from standard library, we should include its header.

#include <memory.h>

5. 2D alpha-trimmed mean filter programming

In 2D case we have 2D signal, or image. The idea is the same, just now alpha-trimmed mean filter has 2D window. Window influences only the elements selection. All the rest is the same: ordering elements, trimming the got set and calculating average of the remaining elements. So, let us have a look at 2D alpha-trimmed mean filter programming. For 2D case we choose window of 3×3 size.

//   2D ALPHA-TRIMMED MEAN FILTER implementation
//     image  - input image
//     result - output image
//     N      - width of the image
//     M      - height of the image
//     alpha  - filter alpha parameter
void _alphatrimmedmeanfilter(const element* image, element* result,
   int N, int M, int alpha)
{
   //   Start of the trimmed ordered set
   const int start = alpha >> 1;
   //   End of the trimmed ordered set
   const int end = 9 - (alpha >> 1);
   //   Move window through all elements of the image
   for (int m = 1; m < M - 1; ++m)
      for (int n = 1; n < N - 1; ++n)
      {
         //   Pick up window elements
         int k = 0;
         element window[9];
         for (int j = m - 1; j < m + 2; ++j)
            for (int i = n - 1; i < n + 2; ++i)
               window[k++] = image[j * N + i];
         //   Order elements (only necessary part of them)
         for (int j = 0; j < end; ++j)
         {
            //   Find position of minimum element
            int min = j;
            for (int l = j + 1; l < 9; ++l)
            if (window[l] < window[min])
               min = l;
            //   Put found minimum element in its place
            const element temp = window[j];
            window[j] = window[min];
            window[min] = temp;
         }
         //   Target index in result image
         const int target = (m - 1) * (N - 2) + n - 1;
         //   Get result - the mean value of the elements in trimmed set
         result[target] = window[start];
         for (int j = start + 1; j < end; ++j)
            result[target] += window[j];
         result[target] /= 9 - alpha;
      }
}

6. Treating edges in 2D case

As for 1D case in 2D case we should extend our input image as well. To do that we are to add lines at the top and at the bottom of the image and add columns to the left and to the right.

Fig. 3. Image extension.Fig. 3. Image extension.

Here is our wrapper function, that does that job.

//   2D ALPHA-TRIMMED MEAN FILTER wrapper
//     image  - input image
//     result - output image
//     N      - width of the image
//     M      - height of the image
//     alpha  - filter alpha parameter
void alphatrimmedmeanfilter(element* image, element* result,
   int N, int M, int alpha)
{
   //   Check arguments
   if (!image || N < 1 || M < 1 || alpha < 0 || 8 < alpha || alpha & 1)
      return;
   //   Allocate memory for signal extension
   element* extension = new element[(N + 2) * (M + 2)];
   //   Check memory allocation
   if (!extension)
      return;
   //   Create image extension
   for (int i = 0; i < M; ++i)
   {
      memcpy(extension + (N + 2) * (i + 1) + 1,
         image + N * i,
         N * sizeof(element));
      extension[(N + 2) * (i + 1)] = image[N * i];
      extension[(N + 2) * (i + 2) - 1] = image[N * (i + 1) - 1];
   }
   //   Fill first line of image extension
   memcpy(extension, extension + N + 2, (N + 2) * sizeof(element));
   //   Fill last line of image extension
   memcpy(extension + (N + 2) * (M + 1),
      extension + (N + 2) * M,
      (N + 2) * sizeof(element));
   //   Call alpha-trimmed mean filter implementation
   _alphatrimmedmeanfilter(extension, result ? result : image,
      N + 2, M + 2, alpha);
   //   Free memory
   delete[] extension;
}

7. Alpha-trimmed mean filter library

Now we have four functions, two of them are for processing 1D signals by alpha-trimmed mean filter, and other two are for filtering 2D images. It is time to put everything together and create small alpha-trimmed mean filter library. Let us write code for header file.

//   alphatrimmedmeanfilter.h - declarations for 
//   1D and 2D alpha-trimmed mean filter routines
//
//   The code is property of LIBROW
//   You can use it on your own
//   When utilizing credit LIBROW site

#ifndef _ALPHATRIMMEDMEANFILTER_H_
#define _ALPHATRIMMEDMEANFILTER_H_

//   Signal/image element type
typedef double element;

//   1D ALPHA-TRIMMED MEAN FILTER, window size 5
//     signal - input signal
//     result - output signal, NULL for inplace processing
//     N      - length of the signal
//     alpha  - filter alpha parameter
void alphatrimmedmeanfilter(element* signal, element* result,
   int N, int alpha);

//   2D ALPHA-TRIMMED MEAN FILTER, window size 3x3
//     image  - input image
//     result - output image, NULL for inplace processing
//     N      - width of the image
//     M      - height of the image
//     alpha  - filter alpha parameter
void alphatrimmedmeanfilter(element* image, element* result,
   int N, int M, int alpha);

#endif

Our library is ready. The code we have written is good both for Linux/Unix and Windows platforms. You can download full alpha-trimmed mean filter library source code here:

Download alpha-trimmed mean filter C++ source code (zip, 2 Kb)

Full listings of library files are available online as well:

And now — a couple of applications to play around!

8. Alpha-trimmed mean filter: image restoration

Download alpha-trimmed mean filter for Win32 (zip, 603 Kb)

We have created a couple of applications to show alpha-trimmed mean filter capabilities in restoration images corrupted by impulse noise. The demo package includes 4 files — two applications, sample image and description:

  • alphatrimmedmean.exe — alpha-trimmed mean filter,
  • corrupter.exe — impulse noise generator,
  • sample.bmp — 24-bit sample image,
  • readme.txt — description.

Be aware of the fact, that this sample uses OpenGL, so it should be supported by your system (usually that is the case).

9. Step 1: prepare corrupted image

We have created impulse noise generator that will help us to prepare corrupted image. Start up corrupter.exe application and load image to be corrupted. Choose Set >> Corruption… or click N button in toolbar and set noise level at 5–15%. Click OK. Then save corrupted image.

Fig. 4. Corruption by impulse noise. Screenshot.Fig. 4. Corruption by impulse noise.

10. Step 2: restore corrupted image

Start up alphatrimmedmean.exe application. Load the saved corrupted image. Set alpha parameter to 6: Set >> Alpha… or click α button in toolbar and select 6 in dialog’s combo-box, click OK.

Fig. 5. Alpha parameter selection. Screenshot.Fig. 5. Alpha parameter selection.

Apply alpha-trimmed mean filter by choosing Set >> Filter or clicking F-button in toolbar. See the result. If necessary, filter the image once more.

Fig. 6. Image restored by alpha-trimmed mean filter. Screenshot.Fig. 6. Image restored by alpha-trimmed mean filter.

Posted in Image / Video Filters | Tagged: | Leave a Comment »

 
Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.

VentureBeat

News About Tech, Money and Innovation

Chetan Solanki

Helpful to u, if u need it.....

ScreenCrush

Explorer of Research in Information Technology

managedCUDA

Explorer of Research in Information Technology

siddheshsathe

A great WordPress.com site

Ari's

This is My Space so Dont Mess With IT !!

Business India 2.0

All about Business Travel 2.0 ideas,technology,ventures and the capital making its happen

Something More for Research

Explorer of Research in Information Technology

WordPress.com News

The latest news on WordPress.com and the WordPress community.

Follow

Get every new post delivered to your Inbox.

Join 3,113 other followers

%d bloggers like this: