Something More for Research

Explorer of Research #HEMBAD

Archive for the ‘Simulation’ Category

Computer Vision Algorithm Implementations

Posted by Hemprasad Y. Badgujar on May 6, 2014

Participate in Reproducible Research

General Image Processing

(C/C++ code, BSD lic) Image manipulation, matrix manipulation, transforms
(C/C++ code, BSD lic) Basic image processing, matrix manipulation and feature extraction algorithms: rotation, flip, photometric normalisations (Histogram Equalization, Multiscale Retinex, Self-Quotient Image or Gross-Brajovic), edge detection, 2D DCT, 2D FFT, 2D Gabor, PCA to do Eigen-Faces, LDA to do Fisher-Faces. Various metrics (Euclidean, Mahanalobis, ChiSquare, NormalizeCorrelation, TangentDistance, …)
(C/C++ code, MIT lic) A Free Experimental System for Image Processing (loading, transforms, filters, histogram, morphology, …)
(C/C++ code, GPL and LGPL lic) CImg Library is an open source C++ toolkit for image processing
Generic Image Library (GIL)boost integration
(C/C++ code, MIT lic) Adobe open source C++ Generic Image Library (GIL)
SimpleCV a kinder, gentler machine vision library
(python code, MIT lic) SimpleCV is a Python interface to several powerful open source computer vision libraries in a single convenient package
PCL, The Point Cloud Library
(C/C++ code, BSD lic) The Point Cloud Library (or PCL) is a large scale, open project for point cloud processing. The PCL framework contains numerous state-of-the art algorithms including filtering, feature estimation, surface reconstruction, registration, model fitting and segmentation.
Population, imaging library in C++ for processing, analysing, modelling and visualising
(C/C++ code, CeCill lic) Population is an open-source imaging library in C++ for processing, analysing, modelling and visualising including more than 200 algorithms designed by V. Tariel.
(C/C++ code, LGPL 3) A computer vision framework based on Qt and OpenCV that provides an easy to use interface to display, analyze and run computer vision algorithms. The library is provided with multiple application examples including stereo, SURF, Sobel and and Hough transform.
Machine Vision Toolbox
(MATLAB/C, LGPL lic) image processing, segmentation, blob/line/point features, multiview geometry, camera models, colorimetry.
(Java code, Apache lic) BoofCV is an open source Java library for real-time computer vision and robotics applications. BoofCV is organized into several packages: image processing, features, geometric vision, calibration, visualize, and IO.
(C++ code, MIT lic) Simd is free open source library in C++. It includes high performance image processing algorithms. The algorithms are optimized with using of SIMD CPU extensions such as SSE2, SSSE3, SSE4.2 and AVX2.
Free but not open source – ArrayFire (formely LibJacket) is a matrix library for CUDA
(CUDA/C++, free lic) ArrayFire offers hundreds of general matrix and image processing functions, all running on the GPU. The syntax is very Matlab-like, with the goal of offering easy porting of Matlab code to C++/ArrayFire.

Image Acquisition, Decoding & encoding

(C/C++ code, LGPL or GPL lic) Record, convert and stream audio and video (lot of codec)
(C/C++ code, BSD lic) PNG, JPEG,… images, avi video files, USB webcam,…
(C/C++ code, BSD lic) Video file decoding/encoding (ffmpeg integration), image capture from a frame grabber or from USB, Sony pan/tilt/zoom camera control using VISCA interface
lib VLC
(C/C++ code, GPL lic) Used by VLC player: record, convert and stream audio and video
(C/C++ code, LGPL lic) RTSP streams
(C/C++ code, GPL lic) Loading & saving DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, TIFF, and more
(C/C++ code, LGPL lic) Loading & saving various image format
(C/C++ code, GPL & FPL lic) PNG, BMP, JPEG, TIFF loading
(C/C++ code, LGPL lic) VideoMan is trying to make the image capturing process from cameras, video files or image sequences easier.


(C/C++ code, BSD lic) Pyramid image segmentation
(C/C++ code, Microsoft Research Lic) Branch-and-Mincut Algorithm for Image Segmentation
Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo

Machine Learning

(C/C++ code, BSD lic) Gradient machines ( multi-layered perceptrons, radial basis functions, mixtures of experts, convolutional networks and even time-delay neural networks), Support vector machines, Ensemble models (bagging, adaboost), Non-parametric models (K-nearest-neighbors, Parzen regression and Parzen density estimator), distributions (Kmeans, Gaussian mixture models, hidden Markov models, input-output hidden Markov models, and Bayes classifier), speech recognition tools

Object Detection

(C/C++ code, BSD lic) Viola-jones face detection (Haar features)
(C/C++ code, BSD lic) MLP & cascade of Haar-like classifiers face detection
Hough Forests
(C/C++ code, Microsoft Research Lic) Class-Specific Hough Forests for Object Detection
Efficient Subwindow Object Detection
(C/C++ code, Apache Lic) Christoph Lampert “Efficient Subwindow” algorithms for Object Detection
INRIA Object Detection and Localization Toolkit
(C/C++ code, Custom Lic) Histograms of Oriented Gradients library for Object Detection

Object Category Labelling

Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo
Multi-label optimization
(C/C++/MATLAB code) The gco-v3.0 library is for optimizing multi-label energies. It supports energies with any combination of unary, pairwise, and label cost terms.

Optical flow

(C/C++ code, BSD lic) Horn & Schunck algorithm, Lucas & Kanade algorithm, Lucas-Kanade optical flow in pyramids, block matching.
(C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU.
(C/C++/Matlab code, Custom Lic.) The RLOF library provides GPU / CPU implementation of Optical Flow and Feature Tracking method.

Features Extraction & Matching

SIFT by R. Hess
(C/C++ code, GPL lic) SIFT feature extraction & RANSAC matching
(C/C++ code) SURF feature extraction algorihtm (kind of fast SIFT)
(C/C++ code, Ecole Polytechnique and ENS Cachan for commercial Lic) Affine SIFT (ASIFT)
VLFeat (formely Sift++)
(C/C++ code) SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, and quick shift
A GPU Implementation of Scale Invariant Feature Transform (SIFT)
(C/C++ code, GPL lic) An enhance version of RANSAC that considers the correlation between data points

Nearest Neighbors matching

(C/C++ code, BSD lic) Approximate Nearest Neighbors (Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration)
(C/C++ code, LGPL lic) Approximate Nearest Neighbor Searching


(C/C++ code, BSD lic) Kalman, Condensation, CAMSHIFT, Mean shift, Snakes
KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker
(C/C++ code, public domain) Kanade-Lucas-Tomasi Feature Tracker
(C/C++/OpenGL/Cg code, ) A GPU-based Implementation of the Kanade-Lucas-Tomasi Feature Tracker
(C/C++/OpenGL/Cg code, LGPL) Gain-Adaptive KLT Tracking and TV-L1 optical flow on the GPU
On-line boosting trackers
(C/C++, LGPL) On-line boosting tracker, semi-supervised tracker, beyond semi-supervised tracker
Single Camera background subtraction tracking
(C/C++, LGPL) Background subtraction based tracking algorithm using OpenCV.
Multi-camera tracking
(C/C++, LGPL) Multi-camera particle filter tracking algorithm using OpenCv and intel IPP.

Simultaneous localization and mapping

Real-Time SLAM – SceneLib
(C/C++ code, LGPL lic) Real-time vision-based SLAM with a single camera
(C/C++ code, Isis Innovation Limited lic) Parallel Tracking and Mapping for Small AR Workspaces
(C/C++ code, BSD lic) GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices

Camera Calibration & constraint

(C/C++ code, BSD lic) Chessboard calibration, calibration with rig or pattern
Geometric camera constraint – Minimal Problems in Computer Vision
Minimal problems in computer vision arise when computing geometrical models from image data. They often lead to solving systems of algebraic equations.
Camera Calibration Toolbox for Matlab
(Matlab toolbox) Camera Calibration Toolbox for Matlab by Jean-Yves Bouguet (C implementation in OpenCV)

Multi-View Reconstruction

Bundle Adjustment – SBA
(C/C++ code, GPL lic) A Generic Sparse Bundle Adjustment Package Based on the Levenberg-Marquardt Algorithm
Bundle Adjustment – SSBA
(C/C++ code, LGPL lic) Simple Sparse Bundle Adjustment (SSBA)


Efficiently solving multi-label MRFs (Readme)
(C/C++ code) Segmentation, object category labelling, stereo
LIBELAS: Library for Efficient LArge-scale Stereo Matching
(C/C++ code) Disparity maps, stereo

Structure from motion

(C/C++ code, GPL lic) A structure-from-motion system for unordered image collections
Patch-based Multi-view Stereo Software (Windows version)
(C/C++ code, GPL lic) A multi-view stereo software that takes a set of images and camera parameters, then reconstructs 3D structure of an object or a scene visible in the images
libmv – work in progress
(C/C++ code, MIT lic) A structure from motion library
Multicore Bundle Adjustment
(C/C++/GPU code, GPL3 lic) Design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems.
(C/C++/GPU code, MPL2 lic) OpenMVG (Multiple View Geometry) “open Multiple View Geometry” is a library for computer-vision scientists and especially targeted to the Multiple View Geometry community. It is designed to provide an easy access to the classical problem solvers in Multiple View Geometry and solve them accurately..

Visual odometry

LIBVISO2: Library for VISual Odometry 2
(C/C++ code, Matlab, GPL lic) Libviso 2 is a very fast cross-platfrom (Linux, Windows) C++ library with MATLAB wrappers for computing the 6 DOF motion of a moving mono/stereo camera.

Posted in Apps Development, C, Computer Hardware, Computer Network & Security, CUDA, Game Development, GPU (CUDA), GPU Accelareted, Graphics Cards, Image Processing, OpenCV, PARALLEL, Simulation, Virtualization | Tagged: , , , , , , , , , , , , , , , , , , , | 3 Comments »

Research skills

Posted by Hemprasad Y. Badgujar on March 26, 2013

Research skills

Writing papers, giving research talks, and writing research proposals are key skills, but they aren’t easy. This page describes how I approach each of these three challenges, in the hope that they may be useful to you.

You probably have ideas of your own. Can you share them with other readers of this page? Here’s a Wiki page that anyone can edit, where you can add your comments on the presentations below, your own thoughts and suggestions, and pointers to other material you have found useful.


How to give a good research talk

“How to give a good research talk”, Simon Peyton Jones, John Launchbury, John Hughes, SIGPLAN Notices 28(11), Nov 1993.

Since we wrote the paper quite a few people have written with constructive comments. Nick Nethercote also has a useful 2-page guide about giving a talk.


How to write a good research paper

“How to write a good research paper”, Simon Peyton Jones.

I gave this talk at the Technical University of Vienna in October 2004. There isn’t a paper about it, only the talk:


How to write a good research proposal

“How to write a good research proposal”, Simon Peyton Jones and Alan Bundy.

Other resources

Finally, here are some pointers to other advice I have found useful, though Google will find you a lot more besides.


Posted in Artificial Intelligence, Entertainment, Image Processing, Journals & Conferences, Neural Network, Placement, Project Related, Simulation | 1 Comment »

CUDA Open Source Projects

Posted by Hemprasad Y. Badgujar on March 4, 2013

CUDA Open Source Projects

In searching for projects to use for learning and developing with plus requests from the NVidia forums I have put together a list here of free and open source research and projects that use CUDA.  Please if you have one to add or updates of anything here let me know.

GNURadio Software defined radio. A hardware/software combination that does baseband signal processing in software. Experiments were carried out to integrate CUDA into this mix.
MediaCoder A transcoding application for videos with a strong focus on mobile players. Some operations (de-interlacing, scaling, encoding) are have been CUDA accelerated.
Bullet Bullet: physics simulation started to include CUDA but it is not fully capable yet.  Perhaps some CUDA genius will add to it?
Thrust (included in Release 4.0) Excellent Library!! A Parallel Template Library for CUDA. Thrust provides a flexible high-level interface for GPU programming that greatly enhances developer productivity.
Pycuda A module which allows access to the complete range of CUDA functionality in Python, including seamless numpy integration, OpenGL interoperability and lots more. Released under the MIT/X consortium license.
FOLKI-GPU An optical-flow estimation, implemented on CUDA
Flam4 CUDA A CUDA accelerated renderer for fractal frames. Sample videos hereand here. Use other tools like Apophysis 2.0 to generate the parameter files (.flame files). A new and ongoing approach to port fractal frame rendering to CUDA is described here.
CUJ2K A CUDA accelerated JPEG 2000 encoder. Command line tool and C/C++ library. This is student work with excellent documentation. Notable speedup achieved only for large files.
Ocelot A Binary Translation Framework for PTX
Msieve A library for factoring large integers, as in RSA-size numbers. The polynomial selection phase of the general number field sieve has a great deal of CUDA code, and the speedup over a CPU is enormous (10-50x)
PFAC An open library for exact string matching performed on GPUs.
cuSVM A CUDA implementation of Support Vector Classification and Regression.
multisvm In this project, it is described how a naïve implementation of a multiclass classifier based on SVMs can map its inherent degrees of parallelism to the GPU programming model and efficiently use its computational throughput.
gpuminer Parallel Data Mining on Graphics Processors
Cmatch Cmatch, performs exact string matching for a set of query sequences and achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version.
R+GPU A popular Open Source solution for Statistical Analysis

Posted in Apps Development, Artificial Intelligence, Computer Languages, CUDA, GPU (CUDA), GPU Accelareted, Image Processing, Neural Network, Open CL, OpenMP, PARALLEL, Project Related, Simulation, Virtualization | 1 Comment »

Getting Started with CUDA

Posted by Hemprasad Y. Badgujar on March 4, 2013

What are the capabilities of Nvidia’s CUDA running on the GPU and how does it compare to CPU performance? I bought a GeForce 9800GT and set about finding out, starting off by installing the CUDA drivers, toolkit and SDK from the Cuda Zone.

The first thing I noticed was that on my Vista64 machine the sample projects had been installed to:

C:\ProgramData\NVIDIA Corporation\NVIDIA CUDA SDK\projects

which is read only. Rather than fight with Vista’s UAC I copied everything into the C:\CUDA directory. To build the solution in VS2008 on my Vista 64 machine all I needed to do was switch the platform to x64, ignore the warning:


Command line warning D9035 : option 'Wp64' has been deprecated and will be removed in a future release


and everything was fine. The SDK’s sample template conveniently included both a gold (CPU) implementation of a function and a GPU implementation. An initial run of the template project showed that only the GPU section was timed. Since the reason to use CUDA is performance and I wanted a comparison, the first modification I made was to put a timer around the CPU implementation:


cutilCheckError( cutStartTimer( timer));
computeGold( reference, h_idata, num_threads);  // reference solution
cutilCheckError( cutStopTimer( timer));


and raced them – but the results weren’t too inspiring:


GPU Processing time: 84.362747 (ms)
CPU Processing time: 0.001257 (ms)


The CPU solution wasn’t even threaded. I remembered the question of a student at the Stanford CUDA lecture on YouTube:


Q: Since there’s overhead in moving the data to the GPU how do you decide when it’s worthwhile?

A: Generally speaking it makes the most sense for large problems with high data intensity where you have to do multiple calculations per data element. 

Hmm, the template code only processed 128 bytes with 32 threads so I had paid the setup costs and then not sent enough data to the GPU – no wonder the CPU was faster. So I needed to increase the data set, but there’s a problem with that since the provided kernel code assumes the entire data set will fit in shared memory and binds the size of the data to the thread count. There needed to be some changes. But you can’t just increase the number of threads or you’ll get:


cutilCheckMsg() CUTIL CUDA error: Kernel execution failed in file <>, line 88 : invalid configuration argument.


First step was to find out what resources were available on the GPU, then I’d need to work out how to get at those resources. Running the SDK Device Query told me how much global and shared memory was available as well as how many threads I could use:


Device 0: "GeForce 9800 GT"
  CUDA Capability Major revision number:         1
  CUDA Capability Minor revision number:         1
  Total amount of global memory:                 1073741824 bytes
  Number of multiprocessors:                     14
  Number of cores:                               112
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       16384 bytes
  Total number of registers available per block: 8192
  Warp size:                                     32
  Maximum number of threads per block:           512
  Maximum sizes of each dimension of a block:    512 x 512 x 64
  Maximum sizes of each dimension of a grid:     65535 x 65535 x 1
  Maximum memory pitch:                          262144 bytes
  Texture alignment:                             256 bytes
  Clock rate:                                    1.50 GHz
  Concurrent copy and execution:                 Yes
  Run time limit on kernels:                     No
  Integrated:                                    No
  Support host page-locked memory mapping:       No
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)


Some interesting numbers there, since the GeForce can perform both a FMUL (2 flops) and a FADD (1 flop) per clock, per processor, we can calculate the maximum theoretical Gflops attainable is 1.5 GHz * 112 * (2 + 1) = 504 Gflops. By way of comparison, the E8400 in my test machine has a peak of 24 Gflops according to Intel’s data sheet:




But back to the problem of pushing more data through.  A few problems:

1) The data size needs to be uncoupled from the thread count which means a change to the GRID count from this:


// setup execution parameters
dim3  grid( 1, 1, 1);
dim3  threads( num_threads, 1, 1);


to something more like this:


cThreadsPerBlock = 64;
cBlocksPerGridx = 1024;
cBlocksPerGridy = 1024;

cData = cThreadsPerBlock * cBlocksPerGridx * cBlocksPerGridy;

dim3  grid ( cBlocksPerGridx, cBlocksPerGridy, 1); 
dim3  block( cThreadsPerBlock, 1, 1);


where the counts of Blocks Per Grid in the x and y directions would need to be data derived. To simplify the example I’ve done it backwards and set the data size based on thread and block breakdown. These grid and block variables are then be passed to GPU using the triple angle bracket <<< >>> notation:


testKernel<<< grid, block, shared_mem_size >>>( d_idata, d_odata);


which is the same as:


testKernel<<< grid, 64, shared_mem_size >>> ( d_idata, d_odata);


because the passed argument is converted to a CUDA dim3 type which “is an integer vector type based on uint3 that is used to specify dimensions. When defining a variable of type dim3, any component left unspecified is initialized to 1.” from the programming guide.

Specifying a shared_mem_size on the kernel call as above allows you to specify the size at runtime. You can then pick up a reference to the memory in the kernel code with:


extern  __shared__  float sdata[];


Alternatively if you know the size at compilation time you can also declare the shared memory inside the kernel like this:


__shared__ float sdata[256];


Which would mean the kernel call would be just be:


testKernel<<< grid, 64 >>> ( d_idata, d_odata);


2) The kernel code must loop through the grid. Calculate the thread id, block id and then global id to figure where in the global data we are up to. Pass the size of the data(int len) since num_threads is no longer coupled with the data length.  The __umul24 in the code provides increased performance but comes with a warning: “Throughput of 32-bit integer multiplication is 2 operations per clock cycle, but __mul24 and __umul24 provide signed and unsigned 24-bit integer multiplication with a throughput of 8 operations per clock cycle. On future architectures however, __[u]mul24 will be slower than 32-bit integer multiplication”.


__global__ void
testKernel( float* g_idata, float* g_odata, int len) 
  // shared memory
  // the size is determined by the host application
  extern  __shared__  float sdata[];

  // thread id
  const unsigned int tid = threadIdx.x;
  // block id
  const unsigned int bid = __umul24(gridDim.x, blockIdx.y) + blockIdx.x ;
  // global memory id
  const unsigned int gid = tid + __umul24(blockDim.x, bid);

  const unsigned int cThreadsPerBlock = __umul24(__umul24(blockDim.x, blockDim.y),blockDim.z);


3) The kernel needs to read from global memory and then synchronise across threads, this causes the threads across warps to sync and thus presents a consistent shared memory picture. So now thread 0 can read from SDATA(1) and will see the data which thread 1 loaded. A call to __syncthreads() is only needed when the count of threads per block exceed the warpSize because as mentioned in the performance optimisation whitepaper: “Instructions are SIMD synchronous within a warp”. Of course every call has a cost and the programming guide states that “throughput for __syncthreads is 8 operations per clock cycle in the case where no thread has to wait for any other threads.”

None of this is important in the sample template code because there is no communication between threads, thus no need for shared memory or thread syncing – a situation in which registers would normally be used but in this case shared memory has presumably been used by Nvidia for example purposes.


const unsigned int cThreadsPerBlock = __umul24(__umul24(blockDim.x, blockDim.y),blockDim.z); 
SDATA(tid) = g_idata[tid];
if (cThreadsPerBlock > warpSize) __syncthreads();


At this point I had revised the template to time the CPU for comparison, remove the size restrictions to allow a decent amount of data to be pushed through and was ready to attempt to answer the question – given the overhead of pushing the data to the GPU, when it is worth doing so? Running the code gave some unexpected answers. Keeping the thread count constant I varied the cBlocksPerGridy to yield various data sizes:



The GPU and CPU seemed to take the same amount of time with different data loads but the GPU was hampered by a constant overhead of 80ms, the exact same difference I noted when only 128 bytes were trialled in the very first instance before any modification.  Where was the time going? Some sort of setup cost?  Also how much was being taken in the kernel and how much in the data transfer? I needed more fine grained data to see what was going on.

I had modified the supplied SDK template code in a minimal way in order to measure CPU vs GPU performance and found that for the simple test code (1 float multiplication) that the E8400 CPU with a claimed 24 Gflops was handily out performing a GPU with a theoretical max 504 Gflops. Where was all the time going? Was the kernel the culprit, the memory copy or something else? I started out by trying to reuse the


cutilCheckError( cutStartTimer( timer));


timing method already in the template. Looking into the CUDA source in SDK\common\src\stopwatch_win.cpp showed that on Windows it was using the QueryPerformanceFrequency method which uses the highest possible resolution hardware timer … on the CPU. Using it to measure GPU performance is problematic because timing the GPU using a CPU timer requires the GPU and the CPU to be synchronised with:




and ruins the timing information. To measure times on the GPU I needed to use GPU based timing on stream 0 using events:

cudaEventRecord(start, 0);

So I created an array of start and stop events, broke the GPU processes into 5 steps and timed everything. The 5 GPU processes were:

1) Alloc: Host to Device – The allocation of memory on the device for the input array which needed to be copied over from the host.

2) Copy: Host to Device – Copying the input array from the host onto the device. Data size divided by time taken here would give bandwidth.

3) Alloc: Device to Host – The allocation of memory on the device for the output array where the result would be stored before being copied back to the host.

4) Compute – Running the actual kernel, reading from the input array, processing and writing results to the output array.

5) Copy: Device to Host – Copying the output array back to the host.

I also retained my CPU timing to measure the amount of time it took for the GPU to do everything and get the answer back onto the host – that way I’d have a 1:1 comparison against the CPU version. That gives one more thing to measure, how does the sum of the GPU times compare to the overall CPU time?

6) Sync with CPU – CPU time minus sum of GPU times indicates how long it takes to sync the two.

Set up 5 GPU timers to get a breakdown of where the GPU was spending time and keep the 2 CPU timers for the original comparison:


// GPU timers - used to time GPU streams
int cGpuTimer = 5;

cudaEvent_t* rgGpuTimer_start = (cudaEvent_t*) malloc (sizeof(cudaEvent_t)*cGpuTimer);
cudaEvent_t* rgGpuTimer_stop = (cudaEvent_t*) malloc (sizeof(cudaEvent_t)*cGpuTimer);

for (int i=0;i<cGpuTimer;i++)
    cutilSafeCall( cudaEventCreate( &rgGpuTimer_start[i] ) );
    cutilSafeCall( cudaEventCreate( &rgGpuTimer_stop[i] ) );


and wrap all the GPU calls with timing calls:


cutilCheckError( cutStartTimer( rgTimer[0]));

  // Alloc: Host to Device
cutilSafeCall( cudaEventRecord( rgGpuTimer_start[0], 0 ) );
  float* d_idata;
  cutilSafeCall( cudaMalloc( (void**) &d_idata, global_mem_size));
cutilSafeCall( cudaEventRecord( rgGpuTimer_stop[0], 0 ) );

  // Copy: Host to Device
cutilSafeCall( cudaEventRecord( rgGpuTimer_start[1], 0 ) );
  cutilSafeCall( cudaMemcpy( d_idata, h_idata, global_mem_size, cudaMemcpyHostToDevice) );
cutilSafeCall( cudaEventRecord( rgGpuTimer_stop[1], 0 ) );

  // Alloc: Device to Host
cutilSafeCall( cudaEventRecord( rgGpuTimer_start[2], 0 ) );
  float* d_odata;
  cutilSafeCall( cudaMalloc( (void**) &d_odata, global_mem_size)); // The pad won't be read back
cutilSafeCall( cudaEventRecord( rgGpuTimer_stop[2], 0 ) );

  // Compute
cutilSafeCall( cudaEventRecord( rgGpuTimer_start[3], 0 ) );
  dim3  gridDim ( cBlocksPerGridx, cBlocksPerGridy, 1);
  dim3  blockDim( cThreadsPerBlock, 1, 1);

  testKernel<<< gridDim, blockDim, shared_mem_size >>>( d_idata, d_odata, cData);

  cutilCheckMsg("Kernel execution failed");
cutilSafeCall( cudaEventRecord( rgGpuTimer_stop[3], 0 ) );

  // Copy: Device to Host
cutilSafeCall( cudaEventRecord( rgGpuTimer_start[4], 0 ) );
  cutilSafeCall( cudaMemcpy( h_odata, d_odata, global_mem_size, cudaMemcpyDeviceToHost) );
cutilSafeCall( cudaEventRecord( rgGpuTimer_stop[4], 0 ) );

cudaThreadSynchronize(); // Block until memory copy is done to ensure accurate timing

cutilCheckError( cutStopTimer( rgTimer[0]));


With this code in place I was ready to find out where the extra 80ms that the GPU took compared to the CPU was coming from and how much time each of the GPU tasks took. First a baseline comparison to verify that the code was still the same and gave the same numbers.

So here’s the graph from before on the left, and here’s the new graph, which should be identical, on the right:




Wow! What’s happened here? All the CPU times are the same, as expected, but the GPU has suddenly closed the gap and now takes only a few ms extra – the 80ms gap has vanished. A diff of the two versions shows that the only change to the code is the addition of GPU timing – and that turns out to be why the GPU suddenly sped up. Directly after setting the device, sending a wakeup call to the GPU like this:


if( cutCheckCmdLineFlag(argc, (const char**)argv, "device") )
    cutilDeviceInit(argc, argv);
    cudaSetDevice( cutGetMaxGflopsDeviceId() );

    cudaEvent_t wakeGPU;
    cutilSafeCall( cudaEventCreate( &wakeGPU) );


means that 80ms vanishes from the timed loop later in the code. Note that the variable is scoped so it isn’t used. Is the GeForce like a person – goes faster when it knows it is being watched?!  Or is this some wakeup from a power saving mode, I’m not sure.  This is the only extra code needed to cut 80ms from the timing which shows how tricky it is to time accurately on the ms scale, the slightest change can have a significant effect. It is always advisable to run tests on large volumes of data with a lot of loops to drown out one-off costs like this where possible.  While on the topic of getting accurate performance readings note that all timing should be done on release code, particularly timing breakdowns as the SDK/common/cutil_readme.txt file states:


“These macros are compiled out in release builds and so they will not affect performance. Note that in debug mode they call cudaThreadSynchronize() to ensure that kernel execution has completed, which can affect performance.” 

Well now that the extra 80ms has been eliminated what does our new GPU timing code show us about how the GPU spends its time? Here’s a chart showing the breakdown for a 16MB sample:



The majority of the time, and this holds for the other data sizes, is taken copying data back and forth. So experimentally it seems that the overhead in moving the data back and forth is quite significant. Of the 24.8ms required in total to process 16MB, 21.9ms were spent copying data. The actual processing takes almost no time.  Running a variety of input sizes and timing each one can tell us what kind of bandwidth we are typically getting as shown in the table below where times are in ms:

Copy: Host to Device MB/s Copy: Device to Host MB/s
16MB 9.0 1771.9 11.8 1359.3
32MB 16.3 1966.0 22.2 1442.8
64MB 30.6 2093.9 49.8 1285.4
128MB 58.2 2198.2 83.9 1526.4
256MB 114.9 2228.7 171.4 1493.4

We wanted to find how where the GPU was spending its time and now discovered that most of the time is in moving data back and forth.  Can we now answer the question of where the GPU outperforms the CPU? Is 2GB/s the expected throughput? Well Nvidia provides a tool in the SDK to answer that – the “Bandwidth Test”. Running it through the provided GUI tool yields the following results:


Running on......
      device 0:GeForce 9800 GT
Quick Mode
Host to Device Bandwidth for Pageable memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               2152.6

Quick Mode
Device to Host Bandwidth for Pageable memory
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               1919.2

Quick Mode
Device to Device Bandwidth
Transfer Size (Bytes)   Bandwidth(MB/s)
 33554432               48507.8


So we can see for 32MB, performance is roughly in line with the template results so that’s case closed … or is it? Two things give cause for concern:

1) PCIe 2.0 is theoretically capable of 500 MB/s per lane and with a x16 slot there are 16 lanes. So throughput should be up around 8GB/s, not the 2GB/s observed.

2) What exactly does “Host to Device Bandwidth for Pageable memory” in the bandwidth test results mean? Pageable memory?

So I found out that the bulk of the time was in data copying, first confirmed that the speeds observed were similar to those given in the Nvidia test suite and then raised new questions about whether we were getting everything out of the hardware given 2GB/s observed and 8GB/s theoretical. So now I need to confirm that my hardware really is PCIe 2.0 x16 and figure out what pageable memory is.

I’d added GPU based timing to my template code and found out that most of the time was spent copying data back and forth between the host and the device. The “Bandwidth Test” in the SDK gave roughly similar results although it mentioned something about pageable memory. But the big problem was the theoretical performance of PCIe 2.0 x16 far exceeded what I was seeing. So the first step was to confirm that both my graphics card and my motherboard supported and were using PCIe 2.0 x16. To do this I used CPU-Z and GPU-Z, with the following results:




So after confirming the hardware should have been capable of better speeds I took another look at the BandwidthTest. Running with the –help switch reveals several options:


C:\ProgramData\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win64\Release>bandwidthTest.exe --help
Usage:  bandwidthTest [OPTION]...
Test the bandwidth for device to host, host to device, and device to device transfers

Example:  measure the bandwidth of device to host pinned memory copies in the range 1024 Bytes
          to 102400 Bytes in 1024 Byte increments
./bandwidthTest --memory=pinned --mode=range --start=1024 --end=102400 --increment=1024 --dtoh

--help  Display this help menu
--csv   Print results as a CSV
--device=[deviceno]     Specify the device device to be used
  all - compute cumulative bandwidth on all the devices
  0,1,2,...,n - Specify any particular device to be used
--memory=[MEMMODE]      Specify which memory mode to use
  pageable - pageable memory
  pinned   - non-pageable system memory
--mode=[MODE]   Specify the mode to use
  quick - performs a quick measurement
  range - measures a user-specified range of values
  shmoo - performs an intense shmoo of a large range of values
--htod  Measure host to device transfers
--dtoh  Measure device to host transfers
--dtod  Measure device to device transfers
--wc    Allocate pinned memory as write-combined
--cputiming     Force CPU-based timing always
Range mode options
--start=[SIZE]  Starting transfer size in bytes
--end=[SIZE]    Ending transfer size in bytes
--increment=[SIZE]      Increment size in bytes


Particularly of interest is the “pinned” memory mode. Let’s try that:


C:\ProgramData\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win64\Release>bandwidthTest.exe --memory=pinned

Running on......
device 0:GeForce 9800 GT
Quick Mode
Host to Device Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5256.9
Quick Mode
Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4891.6
Quick Mode
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 48498.6


and we see that this mode vastly improves the maximum throughput. Not sure why Nvidia didn’t make it the default option. Speeds are now up to 5GB/s. A short investigation of the code reveals that the timing isn’t quite analogous to the testing we are doing in the template code:


56: // defines, project
57: #define MEMCOPY_ITERATIONS  10


as the bandwidthTest copies the same memory 10 times in a row as compared to the single copy we are doing. So we expect our performance to lag slightly behind this 5GB/s. Conveniently, all the code needed to use pinned memory is provided in the bandwidthTest, so putting it into a few wrapper functions called freeHost, mallocHost and memCpy yields:


//  Memory functions to switch between pinned and pageable memory as required

freeHost(void* h_mem, memoryMode memMode)
    if( PINNED == memMode ) {
        return cudaFreeHost(h_mem);
    else {
    return cudaSuccess;

mallocHost(void** h_mem ,uint memSize, memoryMode memMode, bool wc)
    if( PINNED == memMode ) {
#if CUDART_VERSION >= 2020
        return cudaHostAlloc( h_mem, memSize, (wc) ? cudaHostAllocWriteCombined : 0 );
        if (wc) {printf("Write-Combined unavailable on CUDART_VERSION less than 2020, running is: %d", CUDART_VERSION);
        return cudaMallocHost( h_mem, memSize );
    else { // PAGEABLE memory mode
        *h_mem = malloc( memSize );

    return cudaSuccess;

memCpy(void* sink, void* source, uint memSize, cudaMemcpyKind direction, memoryMode memMode)
    if( PINNED == memMode ) {
        return cudaMemcpyAsync( sink, source, memSize, direction, 0);
    else {
        return cudaMemcpy( sink, source, memSize, direction);


These functions take the same parameters as the existing functions with the addition of memory mode and for mallocHost whether or not the memory is Write Combined. Changing the allocation, copying and freeing over to these new functions allow use of pinned memory. Running the same test set shows that now the time is much more evenly spread between tasks:




and running the new numbers on the throughput we get:

Copy: Host to Device MB/s Copy: Device to Host MB/s
16MB 3.2 5026.7 3.3 4878.0
32MB 6.1 5242.5 6.5 4891.5
64MB 12.2 5251.1 13.1 4871.7
128MB 24.4 5247.6 26.2 4894.1
256MB 48.9 5239.0 52.3 4894.7

So now the throughput approaches the theoretical limit and matches the best the bandwidthTest provides. The total times are down significantly and the GPU is faster on all tested sizes. The 256MB trial runs in 30% less time down from 340ms to 236ms.



The next challenge is to find where else time is lost. The pie charts show that most of the time is still spent in allocation and copying with very little in compute time so there’s no need to look at the kernel. We’ve already probably cut most of the time we can from the copying so that leaves allocation. A good idea would probably be to allocate the memory once and then use it over and over for multiple kernel executions, an intensive process like the kind Nvidia suggests are best suited for CUDA. But what if the code needs to be as shown, one kernel being run on one large set of data and then returning to another application? This is the kind of flow seen in Matlab MEX files where CUDA is used – Matlab passes the data through the C/C++ MEX file, which runs up a CUDA program gets the result and then returns to Matlab. Could parallel memory copies and allocations speed things up in this situation?

So we’ve switched the code over to use pinned memory in preference to pageable and attained the desired speedup in memory operations from 2GB/s to about 5GB/s. Theoretically PCIe 2.0 x16 should be able to hit 8GB/s and I don’t know why we aren’t able to achieve speeds closer to this number. If anyone knows please leave a comment or e-mail me. From here the next thing to investigate to get more throughput in the single kernel scenario is parallel allocations and copies.

Posted in Artificial Intelligence, Computer Languages, Computing Technology, CUDA, Game Development, GPU (CUDA), GPU Accelareted, Image Processing, Neural Network, OpenCL, PARALLEL, Simulation, Virtualization | Leave a Comment »

MATLAB workshop

Posted by Hemprasad Y. Badgujar on February 25, 2013

MATLAB workshop


1       Introduction to Matlab c                       3

1.1       What Is MATLAB?                . . . . . . . . . . . . . . . . . . . . . . . . .               3

1.2       The MATLAB System . . . . . . . . . . . . . . . . . . . . . . . .           3

1.3       Starting and stopping MATLAB . . . . . . . . . . . . . . . . . .     4

1.4       Basic MATLAB syntax          . . . . . . . . . . . . . . . . . . . . . . .   4

1.5       Where to get help              . . . . . . . . . . . . . . . . . . . . . . . . . .             6

2       MATLAB Desktop and Examples 7

2.1       MATLAB Desktop . . . . . . . . . . . . . . . . . . . . . . . . . .            7

2.2       Solution of linear system . . . . . . . . . . . . . . . . . . . . . . .     9

2.3       Solution of linear differential system . . . . . . . . . . . . . . . .              12

2.4       Fourier series analysis        . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5       Taylor series expansion . . . . . . . . . . . . . . . . . . . . . . . .     17

3       Matrices and vectors    20

3.1       Transpose of matrices and vectors  . . . . . . . . . . . . . . . . .               21

3.2       Creating vectors  . . . . . . . . . . . . . . . . . . . . . . . . . . .           21

3.3       Creating matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .          23

3.4       Basic matrix operations . . . . . . . . . . . . . . . . . . . . . . .        24

3.5       Indexing into a matrix . . . . . . . . . . . . . . . . . . . . . . . .         25

4       Graphics          27

4.1       2-D plots               . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   27 4.2     3-D plots                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   28

4.3    Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   29

5       Programming with MATLAB 35

5.1       Using m-files         . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       35

5.2       Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.4    Program flow control . . . . . . . . . . . . . . . . . . . . . . . . .        37

6       MATLAB workspace and File I/O 39

6.1       MATLAB workspace . . . . . . . . . . . . . . . . . . . . . . . . .          39

6.2       Function workspace . . . . . . . . . . . . . . . . . . . . . . . . . .       40

6.3       Native data files   . . . . . . . . . . . . . . . . . . . . . . . . . . .           40

6.4       Data import and export . . . . . . . . . . . . . . . . . . . . . . .        40

7       Ordinary Differential Equations 42

7.1       Second order homogeneous linear equation with constant coeffi-

cients     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .               42

7.2       Non-homogeneous 2nd order differential equations . . . . . . . . .  44

8       Sparse Matrices             48

8.1       Storage of data . . . . . . . . . . . . . . . . . . . . . . . . . . . .            48

8.2       Creating sparse matrices . . . . . . . . . . . . . . . . . . . . . . .     49

8.3       Viewing sparse matrices . . . . . . . . . . . . . . . . . . . . . . .      51

8.4       Sparse matrix computations . . . . . . . . . . . . . . . . . . . . .    53

8.5       Reordering of matrices . . . . . . . . . . . . . . . . . . . . . . . .      54

9       Numerical solutions of Ordinary Differential Equations 55

10   SIMULINK        59

10.1    Getting Started in Simulink . . . . . . . . . . . . . . . . . . . . .      59

10.2    Block Diagram Construction              . . . . . . . . . . . . . . . . . . . .         60

10.3    General Simulink Tips . . . . . . . . . . . . . . . . . . . . . . . .         62

10.4    More information . . . . . . . . . . . . . . . . . . . . . . . . . . .         63

1           Introduction to Matlab 

1.1           What Is MATLAB?

MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include

  • Math and computation
  • Algorithm development
  • Data acquisition
  • Modeling, simulation, and prototyping
  • Data analysis, exploration, and visualization
  • Scientific and engineering graphics
  • Application development, including graphical user interface building

MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or Fortran.

The name MATLAB stands for matrix laboratory . MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, and several Toolboxes that allow for real-life engineering problem solving through an intuitive interface.

1.2           The MATLAB System

The MATLAB system consists of several different components all of which can be used individually or together to solve a problem. The first and most apparent piece is the Development Environment. This is the set of tools that let you do all the basic functions like entering commands, view and save data etc. The second element in MATLAB is the Mathematical Function Library.This contains the various mathematical functions ranging from the elementary (like sum, sine etc.) to the complicated (like Bessel Functions, etc.). The third important tool within MATLAB is the graphics package that comes with it. This allows users to graph both data and functions in 2D and 3D. When using MATLAB in the interactive mode,these three would probably the most used components of MATLAB.

In addition to these three components MATLAB has the MATLAB language. This is a high-level matrix/array language with flow control, functions, data structures etc. This component is particularly useful when writing scripts and functions to run in a non-interactive mode.

The final component in MATLAB is the Application Programming Interface, otherwise known as the API. This allows users to extend MATLAB by writing specialized functions and methods in other high-level languages like C/C++ or Fortran. We will not be using this component in this workshop.

1.3           Starting and stopping MATLAB

  • On Windows platforms, to start MATLAB, double-click the MATLAB shortcut icon on your Windows desktop.
  • On UNIX platforms, to start MATLAB, type matlab at the operating system prompt.
  • On Mac OS X, start MATLAB by double-clicking the LaunchMATLAB icon in Applications.

You can change the directory in which MATLAB starts, define startup options including running a script upon startup, and reduce startup time in some situations.

To end your MATLAB session, select Exit MATLAB from the File menu in the desktop, or type quit in the Command Window. To execute specified functions each time MATLAB quits, such as saving the workspace, you can create and run a finish .m script.

On Unix platforms typing matlab -h will give a listing of command line options that allow control over how MATLAB is opened. On a Mac OS X use:

/Applications/MATLAB6p5/bin/matlab – h

The ones of greatest use are -nodesktop -nojvm -nosplash. If you are using MATLAB by connecting to a remote unix machine with either a poor connection or no X windows support, this will launch a bare bones MATLAB environment that runs a lot faster than the one with the full graphical user interface support. Note that on a Mac OS X machine it is possible to get this brief version of MATLAB by typing

/Applications/MATLAB6p5/bin/matlab -nodesktop -nosplash – nojvm

(this assumes that MATLAB was installed in the default location of the disk).

NOTE: MATLAB on Mac OS X will start X11 before starting MATLAB. It is very important that the X11 application stay running for the entire duration MATLAB is running. If the X11 application is closed before MATLAB is quit, you will not be able to run MATLAB any further and will have to shut it down.

1.4           Basic MATLAB syntax

MATLAB requires that all variable names be assigned numerical values prior to being used. Typing the name (say x), then the equal to sign (=), followed by the numerical value (viz. 5) and finally Enter assigns the numerical value to the variable (in our example 5 is assigned to x).

For example:

>> p=7.1 p =


A semicolon at the end of the expression typed by the user suppresses the system’s echoing of entered data

>> p=7.1;


It is also possible to combine expressions with a semicolon sign or a comma sign. Depending on whether a semicolon or a comma is used different things are echoed by the system.

>> p=7.1; x=4.92;

>> p=7.1, x=4.92; p =


>> p=7.1, x=4.92, p =


x =


>> p=7.1; x=4.92, x =


The arithmetic operators in MATLAB are addition (+), subtraction (-), multiplication (*), division (/) and exponentiation (^). For example the equation below:

1              k

t =

1 + px

is written in MATLAB as: t = (1 /(1+p*x))^k

Some useful keys to remember are:

  • The ↑ key scrolls through previously typed commands. To recall a particular entry from the history, type the first few letters of the entry and then press the ↑ key.
  • The ← and → keys allow you to edit the previously typed command
  • The ESC key clears the command line.
  • Ctrl+C quits the current operation and returns control to the command line.

If you need to shutdown MATLAB in the middle of work, it is possible to save your work with the save keyword. This writes out a file called matlab.mat to the current directory. When MATLAB is restarted, you can load your work back with the load keyword. This loads all the variables you defined in your last session into the current session, and you can continue your work.

>> p=7.1 p =


>> save

Saving to: matlab.mat >> quit

Restart MATLAB and then say the following:

>> load

Loading from: matlab.mat

>> p

p =


Note that you now have p defined in the new session. Note that this will not work if you do not have write permission in your current directory:

>> cd /etc

>> save

??? Error using ==> save

Unable to write file matlab.mat: permission denied.

1.5           Where to get help

MATLAB comes with an enormous amount of help. You can type help at the command line. Typing help followed by some keyword or function will give detailed help on that function. If you are not running MATLAB with the

-nodesktop option you can view a large set of demos by typing demo

There is a lot of material online at the web site of Mathworks ( (the makers of MATLAB).

In addition you can email for assistance at or post a message in the forum.

2           MATLAB Desktop and Examples

2.1           MATLAB Desktop

When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user interfaces) for managing files, variables, and applications associated with MATLAB.

The first time MATLAB starts, the desktop appears as shown in the Figure


Figure 1: The MATLAB desktop

You can change the way your desktop looks by opening, closing, moving, and resizing the tools in it. Use the View menu (see Figure 2) to open or close the tools.

Figure 2: The MATLAB desktop: View menu

You can also move tools outside the desktop or move them back into the desktop (docking), see the Figure 3.

Note that these screenshots are from a linux machine. They may look different in a Mac OS X machine or a Windows machine.

In particular on a Mac OS X machine the File, View, etc. menu is not at the top of the MATLAB desktop but (like all other Mac OS X applications) at the top of the screen, as is shown in the Figure 4.

On a Windows XP machine the MATLAB desktop looks as in Figure 5.

You can specify certain characteristics for the desktop tools by selecting Preferences from the File menu. For example, you can specify the font characteristics for Command Window text. For more information, click the Help button in the Preferences dialog box.

In addition to the File menu at the top MATLAB also has a Start button at the bottom left corner, that has shortcuts to some commonly used activities as shown in Figure 6

Figure 3: The MATLAB desktop: Docking menu

2.2           Solution of linear system

We will now focus on some examples to demonstrate how to work with MATLAB. First we look at the solution of a system of linear algebraic equations. Note that the purpose of this section is to demonstrate the power and ease of use of MATLAB. We will focus in much greater detail at the specifics of MATLAB in the later sections.

Consider a system of n equations with n unknowns xk, k = 1, 2, …, n:

a11x1 + a12x2 + ··· + a1nxn a21x1 + a22x2 + ··· + a2nxn



b1 b1

an1x1 + an2x2 + ··· + annxn



We can rewrite this in matrix notation as: Ax = b where,

A11    A21

A = 





A1n A2n   


Ann   

Figure 4: The MATLAB desktop: Mac OS X


x1 x =  x…2 


 xn 

The symbolic solution is:


=  bb…1 2  b               bn   

A−1Ax  =     A−1b        dividing both sides by A−1 x =              A−1b

since A−1A = I and Ix = x, where I is the identity matrix:

1  0              …             0  0       1              …    0 

I =  …             

0          0              …             1

In MATLAB A\b is equivalent to inv(A)b (where inv(A) calculates the inverse of the matrix A) but it is calculated without inverting A. This results in a significantly reduced computational effort and time. So in MATLAB the way to solve this set of equations is as simple as x = A\b!

An example of a linear system is the following set of equations:

8x1 + x2                       =              7

3x1 + 5x2                    =              4

In this case the matrix A is:

8           1 3          5

Figure 5: The MATLAB desktop: Windows XP

Clearly the determinant of A is 37, and so the inverse of A is:

371  − 53        −81  →  −00.1351.0811    −00.2162.0270

The solution then becomes:

371  − 53        −81  74  → 371  1131 →  00..29738378

If we solve the same problem in MATLAB

>> A=[8 1; 3 5] ;

>> b=[7 4] ’;

>> x=A\b x =


0.2973 we can confirm that we get the same answer with considerably greater facility of use.

It can be confirmed that x is the solution of the equations, by multiplying it with A

>> z=A*x z =



Figure 6: The MATLAB desktop: Start button

which is identical to b. Look at the section on Matrices for details on matrix operations.

2.3           Solution of linear differential system

MATLAB has a number of functions to solve first order linear differential equations. One in particular is ode45 which solves non-stiff differential equations (non-stiff means the differential equations have solutions that have a single timescale). The function ode45 solves a differential equation of the form:


dt = f i(y1,y2,…,yn)            i = 1,2,…,n

over the interval t0 ≤ t ≤ tf subject to the initial conditions yj(t0) = aj,j = 1,2,…,n, where aj are constants. The usage of the ode45 are as follows:

[t, y] = ode45(@FunctionName, [t0 tf], [a1 a2 … an]’, …

options, p1, p2, …)

In the above [t, y] denotes that ode45 returns two results. The firstt is a column vector of the times in the range [t0 tf] that are determined by ode45 and the second output y is the matrix of solutions such that the rows are the solutions at any given time ti in the corresponding row of the first output t. Also, @FunctionName is the handle for the name of the function file FunctionName (ignoring the .m at the end of the file) that represents the array of functions which form the right hand side of the equations. Its form must be: function yprime=FunctionName(t, g, p1, p2, …) where t is the independant variable, g is the vector representing yj, and p1, p2, etc. are parameters.

Consider the following second order ordinary differential equation, which could represent a forced damped oscillator.

d2y    dy

+ 2ξ          + y = h(t)

dt 2     dt

Let us now make the substitution,

y1 = y dy y2 =


Then the second order equation can be replaced by two first order equations.

Assume that ξ = 0.15 and that we start at time t0 = 0 and end at tf = 35. At t0 the displacement and the velocity are both zero, viz. y1(t0) = 0 and y2(t0) = 0. Finally we assume h(t) = 1.

First create the function which returns the array of right hand side functions (in this case a two element column vector.

function ForcingFunction(t, w, xi)

% ForcingFunction – return the right hand side of the linear differential system % H1 line % ForcingFunction takes in the time t, vector w, and the constant xi. The % vector w gives the values of the dependant variable at the current time. y = [w(2); -2 *xi*w(2)-w(1)+1]; save this as a file ForcingFunction.m. This file may be created using MATLAB’s own editor as displayed in the Figure 7.

Figure 7: Matlab’s built in editor

Then run the following commands:

>> [tt, yy] = ode45(@ForcingFunction, [0 35], [0 0]’, [], 0.15) ;

>> plot(tt, yy(:, 1))

>> xlabel(’Time’);

>> ylabel(’y(Time)’);

You should get the Figure 8 displaying the displacement y(t) of the oscillator with time t. As expected, the asymptotic value is the forcing value of 1, and the oscillator displays a transient with a damping related to the constant.


Figure 8: Solution of the differential system

2.4           Fourier series analysis

Any real valued periodic function f(x)can be represented as an infinite sum of a Fourier series, a sum of sin and cos functions:

fbn cos(nx)

We can use the orthogonality properties of sin and cos to get the following relations:

1              L                      0

a0       =             Z− f(x0)dx

L    L 1 L nπx0 0

an              =             Z− f(x0)cos              dx

L              L                       L

1              L                      nπx0            0

bn              =             Z− f(x0)sin               dx

L              L                       L

where 2Lis the periodicity of the function. Now let us look at the square wave function. The function is defined as:

h              0 < x < π

f(x) = (

0              π < x < 2π

Then using the above relations for the constants anand bnwe have:

h 2h sin(x) sin(3x) sin(5x) f(x) = +  + + + ···

2          3              5

The square wave function can be representated in MATLAB using the following function myquare.m:

function f=mysquare(x, h)

% mysquare — a square hat function % H1 line

% Given a vector x, mysquare returns a vector f

% which is the square hat function of height h

% and periodicity of 2 *pi xmod = mod(x, 2 *pi); f = h*(xmod <= pi);


Note the use of the vector if statement above. The code to calculate the function f(x) using the fourier sum is fouriersquare.m:

function f=fouriersquare(x, h, n)

% fourierseries – Fourier series fit for a square hat function % H1 line % fourierseries takes in the vector x, the height h and the no.

% n of terms in the expansion and returns a fourier fit to the

% square hat function

f = 0.5*h*ones(size(x)); % zeroth order term

sum = zeros(size(x)); for i=1:2:n % include only the odd terms sum = sum + sin(i*x)/i; end

f = f + 2 *h*sum/pi;


Using the following set of commands we can view the quality of fit in Figure


>> x=0:0.1:5*pi;

>> p1=plot(x, mysquare(x, 1), ’k’);

>> hold on

>> p2=plot(x, fouriersquare(x, 1, 10), ’r’);

>> p3=plot(x, fouriersquare(x, 1, 50), ’b’);

>> p4=plot(x, fouriersquare(x, 1, 200), ’g’);

>> legend([p1, p2, p3, p4], ’Exact’, ’n=10’, ’n=50’, ’n=200’); xlabel(’\theta’)

>> ylabel(’f(\theta)’)

>> title(’Fourier fitting square wave’)


Figure 9: Fourier series fit of the square wave

As can be seen the fit gets closer as the number of terms increases, but there remains even in the sums with very high numbers of Fourier terms a ringing near the edges of the square wave. This is a well known phenomenon and can be understood as a natural problem occuring from trying to fit a function with a discontinuous derivative with a sum of trignometric functions all of whose higher derivative are continuous.

2.5           Taylor series expansion

Most well-behaved functions (viz. possessing derivatives of any order) can be expressed as a polynomial series expansion about some location:

f(x) = ∞ f(n)(a) (x − a) n

X             n!


In the above f(n) is the nth order derivative of f(x), i.e.:

(n)(x) = dndx f(n x)


and n! is of course the factorial n(n − 1).(n − 2)…2.1 of n. This is known as the Taylor’s series expansion.

Consider for example the sin(x). This can be expanded about x = 0 as:

x3                   x5                   x7

sin(x) = x −       +       −              + … 3!     5!            7!

This result can be easily derived if one recalls that all even derivatives of sin(x) will some power of −1 × sin(x) and so must vanish at x = 0. And all odd derivatives of sin(x) will be some power of −1×cos(x) which will be equal to 1 (or −1) at x = 0.

We can then use Matlab to create a function to calculate the series for a given number of terms n. The function taylorsine (in the file taylorsine.m)


function y=taylorsine(x, n)

% taylorsine – Calculates the taylor series approximation to the sine function % H1 line

% taylorsine takes in the value of x and number n of terms to sum to and

% returns the value of the fit y sum = 0 ;

for m=1:2:n; % pick each odd term to n sign=(-1)^((m-1)/2); % sign of each term

yterm = sign*x.^m/factorial(m); sum = sum + yterm;

end y=sum; return

Then the following series of commands on MATLAB

>> x=1:1:15;

>> format long >> for i=1:1:15 y(i)=taylorsine(pi/4,i); ex(i)=sin(pi/4); end

>> hold off

>> p1=plot(x, y, ’ro’);

>> hold on

>> p2=plot(x, ex, ’k+’);

>> xlabel(’n’);

>> ylabel(’sin(\pi/4)’);

>> legend([p1, p2], ’Taylor series’, ’Exact’); >> title(’Taylor series approximation’);

produces the Figure 10 which displays the rapidity with which the approximation approaches the actual value of sin(


Figure 10: Taylor’s approximation

This surprising result can be visualized better if we plot the sin function and the Taylor’s series to different terms. Use the following code:

>> x=-pi:0.1:pi;

>> y=sin(x);

>> y1=x;

>> y2=x-x.^3/factorial(3);

>> y5=taylorsine(x, 5) ;

>> y15=taylorsine(x, 15) ;

>> hold off;

>> p=plot(x, y, ’k-’);

>> hold on

>> p1=plot(x, y1, ’k.’);

>> p2=plot(x, y2, ’k+’);

>> p5=plot(x, y5, ’k*’);

>> p15=plot(x, y15, ’kx’);

>> xlabel(’\theta’);

>> ylabel(’sin(\theta)’);

>> legend([p, p1, p2, p5, p15], ’Exact’, ’1 term’, ’2 terms’, ’5 terms’, …

’15 terms’);

>> title(’Taylors series fit of sin(\theta)’);

Note that we used the function taylorsine defined above to calculate the series fits for 5 terms and 15 terms. This results in the Figure 11.


Figure 11: Taylor series expansion of Sin(θ)

As noticed before upto about 45the linear fit is quite good! After that the linear fit rises too quickly. Adding the second term (which is negative) bring downs the contribution of the linear term, but it does so too fast! Adding the next term brings up the fit to above the exact curve and so on until at about 15 terms the fit looks quite good between the ranges −π < θ < π.

3           Matrices and vectors

An array A of m rows and n columns is called a matrix of order (m × n). The elements of A are referred to as Aij where i is the row number and j is the column number. The simplest way of entering the matrix in MATLAB is by entering it explicitly.

To enter the matrix, simply type in the Command Window

>> A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

A =

16 3 2 13 5 10 11 8 9 6 7 12 4 15 14 1

The order of the matrix A is determined with:

>> size(A) ans =

4 4

Note that the function size returns two values. It is possible to assign these values to variables as follows:

>> [m, n] = size(A) m =


n =


Note that to enter the matrix as a list of its elements you only have to follow a few basic conventions:

  • Separate the elements of a row with spaces or commas.
  • Use a semicolon, ;, to indicate the end of each row.
  • Surround the entire list of elements with square brackets, [ ].

It is possible to mix spaces and commas when declaring a matrix as shown below

>> A = [16, 3, 2, 13; 5, 10 11, 8; 9, 6 7, 12; 4, 15, 14 1]

A =

16 3 2 13 5 10 11 8 9 6 7 12

4 15    14    1

But this can get very hard to read.

Vectors are just a special case of matrices. If m = 1, then A is a column vector. Similarly if n = 1 then A is a row vector.

The distinction between row and column vectors are important because of the rules of multiplying vectors and matrices. For example, suppose you have a matrix A, a column vector c and a row vector r. Only the following operations are allowed: A.c and r.A. This can be seen in MATLAB as follows:

>> c=[3 2 1 4] ’;

>> r=[3 2 1 4] ; >> r*A ans =

83    95    91


>> A*c ans =




60 >> A*r

??? Error using ==> *

Inner matrix dimensions must agree.

>> c*A

??? Error using ==> *

Inner matrix dimensions must agree.

3.1           Transpose of matrices and vectors

A transpose of a matrix is defined as follows:

T  3    4  =

1            2

1       3              5

2       4              6

5            6   

In a general case the elements of the transpose AT of the matrix A with elements Aij is simply the matrix with elements Aji.

In MATLAB the ’ operator takes the transpose of a matrix or a vector. Transposing a row vector turns it into a column vector and vice-versa. For example we could take our column vector c from above and transpose it to get a row vector.

>> cr=c’; >> cr*A

ans =

83 95    91    71

3.2           Creating vectors

There several ways of creating vectors that can be very useful. The simplest and probably most commonly used method create a vector uses the colon notation x = s:d:f

where s is the start of vector, d is the increment (or decrement) between the elements of the vector and f is the last element of the vector. Obviously this can be used when the elements of the vector are equispaced. For example:

>> x=0:0.3:pi; >> x’ ans =


0.3000 0.6000 0.9000 1.2000 1.5000 1.8000 2.1000 2.4000 2.7000


Size of the vector can be got from length(x).

>> length(x) ans =


If d is ignored MATLAB assumes an increment of 1.

>> x=0:pi x =

0 1     2     3

On the other hand to specify n equally spaced intervals use the following:

>> x=linspace(0, pi, 7) x =

0     0.5236      1.0472      1.5708      2.0944      2.6180      3.1416

In this case the increment or decrement is (final – start)/(n-1).

To specify equal spacing in logarithm space use the following:

>> logspace(1,2,7) ans =

10.0000 14.6780 21.5443 31.6228 46.4159 68.1292 100.0000

in this case MATLAB creates the vector, [10s10s+d …10f], where d is d = (f −s)/(n−1). Note that if f is π then the elements of the vector are numbers between 10s and π. In this case the interval d is (log10(π) − s)/(n − 1).

Of course it is possible to explicitly write out the matrices as we have seen before. It is also possible to create vectors from matrices as will be shown later.

3.3           Creating matrices

The easiest way of creating matrices is as described before, by listing members explicitly.

>> A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1]

A =

16 3 2 13 5 10 11 8 9 6 7 12 4 15 14 1

It is also possible to create a matrix from a group of row vectors. For example

>> v_1 = [1 2 3] ;

>> v_2 = [4 5 6] ;

>> v_3 = [7 8 9] ;

>> A = [v_1; v_2; v_3] A =

1 2     3

4 5     6

7 8     9

The order of A is 3 × length(v1).

>> size(A) ans =

3 3

>> length(v_1) ans =


In addition there are a few utility routines to create matrices:

  • zeroes(m, n): a matrix with all zeros of order m × n.
  • ones(m, n): a matrix with all ones.
  • eye(m, n): the identity matrix (ones along the diagonal, zeros everywhere else).
  • rand(m, n): uniformly distributed random elements.
  • randn(m, n) : normally distributed random elements.
  • magic(m): a square matrix whose elements have the same sum, along the row, column and diagonal. An example

>> magic(3) ans =










  • pascal(m): a pascal matrix. An example would be:

>> pascal(3) ans =










3.4           Basic matrix operations

You have already seen the transpose operator ’ before. In addition there are the following list of operations possible on a matrix:

  • ^: exponentiation
  • *: multiplication
  • /: division
  • \: left division. The operation A\B is effectively the same as INV(A)*B, although left division is calculated differently.
  • +: addition
  • -: subtraction

One very important to thing to note is the automatic promotion of scalars. For example when adding a m × n order matrix A to a scalar x, the scalar is promoted to a matrix of order m × n with every element equal to the original scalar .

>> w = [1 2; 3 4] + 5 w =

6 7

8 9

There are also a set of operations that apply to the matrices on a element by element basis. These are called array operations. Examples are:

  • .’ : array transpose
  • .^ : array power
  • .* : array multiplication
  • ./ : array division

It is very important to distinguish between these. In the example below with two 2 × 2 matrices, a matrix multiplication * and an array multiplication .* result in complete different matrices.

>> A=[1 2; 3 4] ;

>> B=[5 6; 7 8] ; >> A*B ans =

19 22

43 50

>> A.*B ans =

5 12 21 32

3.5           Indexing into a matrix

Indices in MATLAB follow the “fortra” notation of starting at 1 and going up to the order of the matrix. So we have the following:

>> A=rand(2)

A =

0.9501   0.6068

0.2311   0.4860 >> A(2,2) ans =


It is also possible to use a single index, which goes top to bottom ( column first) and then left to right (row second).

>> A(4) ans =


In other words it is possible to refer to the element Aij as A(i, j) or as A((i-1)*m+j), where m is the no. of rows of the matrix.

A very powerful operator in indexing into a MATLAB matrix is the : operator. For example:

>> A(:,end) ans =


0.4860 gives the last column of the matrix. Or

>> A(1:2,1:1) ans =


0.2311 gives the first (1:1) column both (1:2) rows. It can now be seen that it is possible to create vectors from the rows and columns of a matrix as follows:

>> r=A(1:1, 1:2) r =

0.9501   0.6068

>> c=A(1:2, 1:1) c =



MATLAB has a lot more information about matrices and the kind of operations you can do with them. To read that information click on the Help link at the top of the desktop (on Mac OS X it is on the top of the screen). Then select the Contents view. Click on the words MATLAB. If you see a small “+” sign to the left of MATLAB click it to open the documentation tree. Then click on the “+” sign to the left of Mathematics and click on Matrices and Linear Algebra.

4           Graphics

4.1           2-D plots

The basic 2-D plotting routine in MATLAB is plot(xdata, ydata, ’colorlinestylemarker’). For example:

>> x=-5:0.1:5;

>> sqr=x.^2;

>> pl1=plot(x, sqr, ’r:s’); produces the Figure 12.



Figure 12: A simple 2D plot

To plot a second plot on top of an existing plot, use hold on. This is demonstrated in Figure 13. Obviously, hold off forces the next plot to show up on a different window.

>> cub=x.^3;

>> hold on

>> pl2=plot(x, cub, ’k-o’);

MATLAB allows the annotation of the plots with a few keywords.

>> title(’Demo plot’);

>> xlabel(’X Axis’);

>> ylabel(’Y Axis’);

>> legend([pl1, pl2], ’x^2’, ’x^3’); produces Figure 14

Figure 13: A 2D plot displaying overlay

4.2           3-D plots

It is possible to draw 3-D line plots exactly the same way as 2-D plots using plot3(x, y, z); where x, y and z are vectors of same length. For example the following:

>> z=0:0.1:40;

>> x=cos(z);

>> y=sin(z); >> pl=plot3(x, y, z); produces Figure 15.

A far more powerful set of 3D plotting functions are those that create surfaces, contours and so on. The basic surface plotting routines are surf and mesh. If we have a surface defined by z = f(x, y) then the surface plot is generated by surf(x, y, z). For example the following code:

>> xx1=linspace(-3, 3, 15) ;

>> xx2=linspace(-3, 13, 17) ;

>> [x1, x2] = meshgrid(xx1, xx2);

>> z=x1.^4+3*x1.^2-2*x1+6-2*x2.*x1.^2+x2.^2-2*x2;

>> pl=surf(x1, x2, z); results in Figure 16.

The possibilities of complex plots are quite enormous. To see the capabilities of MATLAB look at the graphics demos. To do this click on Help at the top

X Axis

Figure 14: A 2D plot with annotations

of the desktop, as usual. Then click on the word Demos on the top left. Then click the “+” sign to the left of MATLAB. Then click the “+” sign to the left of Graphics. Try any one of the demos listed. Particularly attractive ones are Teapot, Viewing a Penny and Earth’s Topography.

4.3           Tables

Another important related need when presenting data is to produce tables. Frequently summary data can be most easily read when presented in the form of a table. This allows the most comparable numbers to be visible next to each other for quick numerical comparison. This can be easily accomplished by using the file I/O routines.

Consider for example finding the roots of a function, that is the values where the functions is equal to zero. Let us take as an example the function,

f(x) = x − 12x1/3 + 12

Lets use a fixed point method to find the roots. This is an iterative method where: xk+1 = g(xk) = xk + f(xk)

Then there will be a xsuch that x= g(x) which gives f(x) = 0. For the

Figure 15: A simple 3D plot

function of interest let us pick three functions g(x):



12x1k/3 − 12

x k + 12 3

g(xk)    =


8×1/3 12 g(xk ) = k −−2/3

1 − 4xk

The first function can be got by setting f(x) = 0 and separating the first x

1 term. The second function is by separating out the x3 term. The last function is just a good guess.

If we plot the function f(x) we will see easily see where the zeros are. We can do this using the following commands:

>> x=linspace(0.1, 30, 50); % generate 50 equispaced points between 0.1 and 30

>> f=x-12*x.^(1/3)+12; % calculate the function value at each x (note the array op.) >> z=zeros(length(x), 1); % a row vector of same size as x full of zeros

>> plot(x, f, ’b-’, x, z, ’k-’); % plot x vs. f (blue line) and x vs. z (black line) >> xlabel(’x’);

>> ylabel(’f(x)’);

This produces the Figure 17.

As can be seen, the function f(x) has two roots, one between 1&2 and another one between 21&22. The root finding algorithm is very simple. Given a choice of g(x) ,

Figure 16: A 3D surface plot

>> xguess=1; % start with an initial guess of 1

>> g=xguess;

>> f=g-12*g.^(1/3)+12;

>> err=norm(f);      % (\sum_k f(x_k)^2)^(1/2)

>> tol=1.e-5;

>> while (err>tol) % iterate until error is small

g=12*g.^(1/3)-12; f=g-12*g.^(1/3)+12; err=norm(f); disp(sprintf(’Error = %f’, err));


Error = 12.000000

Error = 27.473142

Error = 18.106115

Error = 8.829625

Error = 4.256514

Error = 2.108523

Error = 1.067871

Error = 0.548209

Error = 0.283578

Error = 0.147292

Error = 0.076671

Error = 0.039956

Error = 0.020835

Error = 0.010868

Error = 0.005670


Figure 17: The roots of the function f(x)

Error = 0.002958

Error = 0.001543

Error = 0.000805

Error = 0.000420

Error = 0.000219

Error = 0.000114

Error = 0.000060

Error = 0.000031

Error = 0.000016

Error = 0.000008 >> g g =

21.2248 + 0.0000 i

What we need though is to compare the results of the three functions and check on the convergence as we iterate. This is where a tabular representation of data can be particularly useful. We will use the m-file fixedpoint2.m.

function fixed_point2=fixed_point2(xguess);

% function: fixed_point


% find succesive approximations to f(x)= x-12*x^(1/3)+12


% Input: starting point for x in iterative scheme

% Output: table for different iteration functions g


n=10;  % no. of iterations

% open a file

fid = fopen(’succ_approx.txt’, ’wt’);

% create a string to print it, both on the screen and into a file line = sprintf(’%4s %15s %15s %15s %15s %15s %15s’, ’k’, ’g1’, ’f(g1)’, …

’g2’, ’f(g2)’, ’g3’, ’f(g3)’);

disp(line);   % print to screen

fprintf(fid, ’%s\n’, line); % print to file

% start the iteration g1=xguess; g2=xguess; g3=xguess; line = sprintf(’%4.0f %15.10f %15.10f %15.10f %15.10f %15.10f %15.10f’, …

0, g1,fx(g1), g2, fx(g2), g3, fx(g3)); disp(line); fprintf(fid, ’%s\n’, line); for k=1:n g1=12*g1.^(1/3)-12; g2=((g2+12)/12).^3; g3=(8*g3^(1/3)-12)/(1-4*g3^(-2/3)); line = sprintf(’%4.0f %15.10f %15.10f %15.10f %15.10f %15.10f %15.10f’, …

k, g1,fx(g1), g2, fx(g2), g3, fx(g3));

disp(line); fprintf(fid, ’%s\n’, line);

end fclose(fid); end

function fx=fx(x)

% calculate the function given x fx=x-12*x.^(1/3)+12; end

We can now call the function fixedpoint from MATLAB prompt with an argument of beginning guess for the root.

>> fixed_point2(1)





















































































In addition a file succapprox.txt is created which looks like the following:

>> type succ_approx.txt





















































































It is now easy to see from the table that the function g3 converges very rapidly to the lower root while g1 and g2 converge much slower to the higher and lower roots respectively.

5           Programming with MATLAB

5.1           Using m-files

MATLAB provides a full programming language that enables you to write a series of MATLAB statements into a file and then execute them with a single command. You write your program in an ordinary text file, giving the file a name of filename.m . The term you use for filename becomes the new command that MATLAB associates with the program. The file extension of .m makes this a MATLAB M-file.

M-files can be scripts that simply execute a series of MATLAB statements, or they can be functions that also accept arguments and produce output. You create M-files using a text editor, then use them as you would any other MATLAB function or command.

The process looks as displayed in Figure 18.

Figure 18: Steps in using a m-file

What goes in a M-file?

function f = fact(n) % Function definition line

% FACT Factorial. % H1 line

% FACT(N) returns the factorial of N, H! % Help text % usually denoted by N!

% Put simply, FACT(N) is PROD(1:N). f = prod(1:n);       % Function body


This function has some elements that are common to all MATLAB functions:

  • A function definition line. This line defines the function name, and the number and order of input and output arguments.
  • An H1 line. H1 stands for ”help 1” line. MATLAB displays the H1 line for a function when you use lookfor or request help on an entire directory.
  • Help text. MATLAB displays the help text entry together with the H1 line when you request help on a specific function.
  • The function body. This part of the function contains code that performs the actual computations and assigns values to any output arguments.

5.2           Scripts

Scripts are the simplest kind of M-file because they have no input or output arguments. They’re useful for automating series of MATLAB commands, such as computations that you have to perform repeatedly from the command line. Scripts operate on existing data in the workspace, or they can create new data on which to operate. Any variables that scripts create remain in the workspace after the script finishes so you can use them for further computations.

The following demonstrates a simple script m-file. These statements calculate ρ for several trigonometric functions of θ, then create a series of polar plots.

% An M-file script to produce     % Comment lines

% “flower petal” plots

theta = -pi:0.01:pi; % Computations

rho(1,:) = 2 *sin(5 *theta).^2; rho(2,:) = cos(10 *theta).^3; rho(3,:) = sin(theta).^2;

rho(4,:) = 5 *cos(3.5 *theta).^3;

for k = 1:4 polar(theta,rho(k,:)) % Graphics output



Try entering these commands in an M-file called petals.m. This file is now a MATLAB script. Typing petals at the MATLAB command line executes the statements in the script. In this case it will cycle through four plots. The pause button will cause MATLAB to wait after drawing on figure for any key to be pressed. After the script displays a plot, press Return to move to the next plot. There are no input or output arguments; petals creates the variables it needs in the MATLAB workspace. When execution completes, the variables (i,theta, and rho) remain in the workspace. To see a listing of them, enter whos at the command prompt. You can also see the variables listed in the workspace window if you have that open. Note that if you click on the variable listed in the workspace you open the Array editor which displays and allows you to edit the variable array.

5.3           Functions

Functions are M-files that accept input arguments and return output arguments. They operate on variables within their own workspace. This is separate from the workspace you access at the MATLAB command prompt. This will be explained in more detailed in the next section.

The average function shown below is a simple M-file that calculates the average of the elements in a vector.

function y = myaverage(x)

% myaverage Mean of vector elements.

% myaverage(X), where X is a vector, is the mean of vector elements.

% Nonvector input results in an error.

[m,n] = size(x);

if (~((m == 1) | (n == 1)) | (m == 1 & n == 1)) error(’Input must be a vector’)


y = sum(x)/length(x);       % Actual computation


Enter these commands in an M-file called average.m . The average function accepts a single input argument and returns a single output argument. To call the average function, enter

>> x=1:99; >> myaverage(x) ans =


5.4           Program flow control

MATLAB has four basic flow control structures in programming: while, if, for, and switch. Each of these control elements must have a matching end keyword downstream in the program. Logic control structures are:

if/elseif/else switch/case/otherwise Iterative loop structures are:

for while

An example of the if, elseif programming is as follows:

if i==j

A(i, j) = 2; % called only when i is equal to j

elseif abs(i-j)==1

A(i, j) = -1; % called only when i and j differ by 1 else

A(i, j) = 0; % all other situations end

The above assigns a tri-diagonal matrix to A. Similarly, an example of switch is:

switch algorithm     % switch depending on the value of the variable “algorithm” case ’ode23’

str = ’2nd/3rd order’; case {’ode15s’, ’ode23s’}

str = ’stiff system’;


str = ’other algorithm’;


Note that, unlike most other languages, there is no need for a break statement. Also switch is more efficient than if when comparing string arguments.

A simple iterative loop using for is:

n=10; for i=1:n

for j=1:n

a(i, j) = 1 /(i+j-1);



Because MATLAB is designed to work with matrices it is possible to dramatically speed up a loop. It can become more readable in the process as well, when done correctly. Following displays the traditional way of writing a loop over a order m × n matrix:

mass = rand(5, 10000); length = rand(5, 10000) ; width = rand(5, 10000); height = rand(5, 10000) ;

[m, n] = size(mass); for i=1:m

for j=1:n

density(i, j) = mass(i, j) / (length(i, j)*width(i, j)*height(i, j));



Using MATLAB ”vector” notation the above piece of code becomes: density = mass ./ (length .* width .* height);

6           MATLAB workspace and File I/O

6.1           MATLAB workspace

Note that as you work in the MATLAB command window, MATLAB remembers your commands. You can always recall your previous commands using the up arrow key. This is true of all the variables created through these commands as well. For example:

>> x=-5:0.1:5;

>> sqr=x.^2;

>> sqrnorm = norm(sqr); >> sqrnorm sqrnorm =


>> pl1=plot(x, sqr, ’r:s’);

>> sqrnorm sqrnorm =


The variable sqrnorm remained in the workspace. The keyword who recalls

the list of variables in the workspace.

>> who

Your variables are:

A i pl rho x xx1 ans j pl1 sqr x1 xx2 cub k pl2 theta x2 y


The keyword whos gives more detailed information about the workspace.

>> whos








double array




double array




double array




double array

Grand total is 204 elements using 1632 bytes

The keyword clear removes the variable from workspace. The command clear all by itself clears the workspace of all variables. Use this command with caution as there is no undo!

>> clear z

>> z

??? Undefined function or variable ’z’.

6.2           Function workspace

Script files (i.e., m-files with no functions) share its workspace with the base workspace. However functions have their own workspace. So variables defined within the function has no value outside the function. This is usually referred to as encapsulation in programming parlance. This is a good thing that encourages well designed programs that are readable and manageable. However, in case, it is necessary for variables within a function workspace be available in the base workspace, it is possible to use the global keyword. A related concept is the persistent keyword which allows variables in a function workspace be available in two different invocations of the function (although still not available in the base workspace).

6.3           Native data files

The existing workspace data can be saved to a file using save. To save only some variables to a specified file use save filename var1 var2. The code fragment below demonstrates saving the workspace to a file called workspace (this produces a binary file called workspace.mat in the working directory). You can then quit MATLAB and restart it. Obviously now the workspace is empty. However it is possible to load in the saved workspace using load as shown below.

>> save workspace

%– 8/12/03 11:00 AM –%    <– quit MATLAB

>> load workspace    <– new MATLAB session

>> sqrnorm=norm(sqr) <– sqr read in from saved workspace sqrnorm =


6.4           Data import and export

In addition to the native files format, MATLAB provides a large set of file I/O functions.

>> help fileformats Readable file formats.

Data formats

Command       Returns

MAT – MATLAB workspace

load   Variables in file.

CSV – Comma separated numbers

csvread       Double array.

DAT – Formatted text

importdata Double array.

DLM – Delimited text

dlmread       Double array.

TAB – Tab separated text

Spreadsheet formats

dlmread       Double array.

XLS – Excel worksheet

xlsread Double array and cell array.

WK1 – Lotus 123 worksheet

wk1read Double array and cell array.

Scientific data formats

CDF – Common Data Format       cdfread       Cell array of CDF records

FITS – Flexible Image Transport System fitsread    Primary or extension table data

HDF – Hierarchical Data Format hdfread       HDF or HDF-EOS data set

Movie formats

AVI – Movie

Image formats


MATLAB movie.

TIFF – TIFF image


Truecolor, grayscale or indexed image(s).

PNG – PNG image


Truecolor, grayscale or indexed image.

HDF – HDF image


Truecolor or indexed image(s).

BMP – BMP image


Truecolor or indexed image.

JPEG – JPEG image


Truecolor or grayscale image.

GIF – GIF image


Indexed image.

PCX – PCX image


Indexed image.

XWD – XWD image


Indexed image.

CUR – Cursor image


Indexed image.

ICO – Icon image


Indexed image.

RAS – Sun raster image


Truecolor or indexed.

PBM – PBM image


Grayscale image.

PGM – PGM image


Grayscale image.

PPM – PPM image

Audio formats


Truecolor image.

AU – NeXT/Sun sound


Sound data and sample rate.

SND – NeXT/Sun sound


Sound data and sample rate.

WAV – Microsoft Wave sound


Sound data and sample rate.

It is also possible to do low-level file I/O using the standard, file open, write, read and file close functions. The code fragment below demonstrates a trivial use of these functions.

>> fid = fopen (’square_mat.txt’, ’wt’);

>> fprintf(fid, ’%s\n’, ’This is a square matrix’);

>> fprintf(fid, ’%i\t%i\t%i\n’, [1 2 3; 4 5 6; 7 8 9] ’);

>> fclose(fid);

This produces a file square mat.txt in the current working directory which contains:

This is a square matrix

1      2      3

4      5      6

7      8      9

7           Ordinary Differential Equations

7.1           Second order homogeneous linear equation with constant coefficients

Consider the second order linear equation:

d2y dy

2 + adx   + by = 0


where a and b are constants. This is second order because the largest derivative of the dependant variable y is 2. It is homogeneous because the right hand side is 0.

If we choose a solution of the form, y = exp(kx), where k = const then,

dxdy = kexp(kx),   dx d2y2      k 2 exp(kx)


and for this to be a solution of the differential equation the auxiliary equation k2 +ak +b = 0 must be true. This equation (in general) will have two complex roots, k1 and k2.

k1 = −−a/2 +− 1/2√√a2 −− 4b k2 = a/2 1/2 a2 4b

The general solution of the equation then is:

y(x) = C1 exp(k1x) + C2 exp(k2x)

Let us now add the initial conditions:


y(0) = d;                        = g

dx x=0

Then, C1+C2 = d and k1C1+k2C2 = g. Solving the simultaneous equation:

C1 + C2           =              d C1k1 + C2k2              =              g

we get the constants

C1                   =              (g − dk2)/(k1 − k2)

C2                   =              (dk1 − g)/(k1 − k2)

In MATLAB we can use the symbolic package as following:

>> syms a b c d g k1 k2 y     % declare variables as symbolic

>> k1=-(a/2)+1/2*sqrt(a^2-4*b);

>> k2=-(a/2)-1/2*sqrt(a^2-4*b);

>> y=(g-d*k2)/(k1-k2)*exp(k1*x)+(d*k1-g)/(k1-k2)*exp( k2*x);

In the special case:

a = 1    b = 1       d = 1       g = 0

we get:

>> a=1;

>> b=1;

>> d=1;

>> g=0;

>> pretty(eval(y))

1/2 1 / 2

(1/2 – 1/6 I 3 ) exp((- 1/2 + 1/2 I 3   ) x)

1/2     1 / 2

+ (1/2 + 1/6 I 3 ) exp((- 1/2 – 1/2 I 3   ) x)

>> x=0:0.1:6*pi;

>> g=-10;

>> y1=eval(y);

>> g=0;

>> y2=eval(y);

>> g=10;

>> y3=eval(y);

>> plot(x, y1, ’k-’, x, y2, ’k–’, x, y3, ’k-.’);

>> legend(’g=-10’, ’g=0’, ’g=10’);

>> xlabel(’x’);

>> ylabel(’y’);

>> title(’Damped oscillator’)

This produces the Figure 19 displaying the evolution of the function y. This is exactly as expected, with the displacement y starting off at 1 at x = 0 and the damping term (∝ dy/dx) bringing the displacement very rapidly down to zero. Depending on the initial value of dy/dx at x = 0, the transient behaves differently. In all three cases though this is a overdamped system, where the damping rapidly brings the system to a stop.

It is also possible to use MATLAB’s own differential equation solver dsolve to get the above solution almost trivially.

>> S=dsolve(’D2y+a*Dy+b*y=0’, ’y(0)=d, Dy(0)=g’);

>> t=x;

>> plot(x, eval(S), ’k-’, x, y3, ’kx’);

Note the line t=x; has to be there because dsolve returns its results in terms of t and not x. As can be seen from Figure 20 the solution we had derived and the result of dsolve are identical.


Figure 19: Solution of the ordinary 2nd order homogeneous differential equation

7.2           Non-homogeneous 2nd order differential equations

We can now use dsolve to solve for a non-homogeneous 2nd order differential equation. In particular we want to look at the equation:

d2y dy

+ a           + by = c

dx 2                        dx

with the same initial conditions as before:


y(0) = d;                        = g

dx x=0

Then again using the symbolic math package in MATLAB:

>> syms a b c d g y

>> y=dsolve(’D2y+a*Dy+b*y=c’, ’y(0)=d, Dy(0)=g’, ’x’);

>> y

y =

1/2*exp((-1/2*a+1/2*(a^2-4*b)^(1/2))*x)*(-a*c+a*d*b-( a^2-4*b)^(1/2)*c +

( a^2-4*b)^(1/2)*d*b+2*g*b)/(a^2-4*b)^(1/2)/b +

1/2*exp((-1/2*a-1/2*(a^2-4*b)^(1/2))*x)*(a*c-a*d*b-( a^2-4*b)^(1/2)*c + ( a^2-4*b)^(1/2)*d*b-2*g*b)/(a^2-4*b)^(1/2)/b+1/b*c

Note the last argument x to the function dsolve. This tells MATLAB the solution y should in terms of the explicit variable x instead of the default t.

Figure 20: Comparison of dsolve and the traditional method of solving

If we now look at the same special case:

>> a=1;

>> b=1;

>> c=1;

>> d=1;

>> x=0:0.1:6*pi;

>> g=-1;

>> y1=eval(y);

>> g=0;

>> y2=eval(y);

>> g=1;

>> y3=eval(y);

>> plot(x, y1, ’k-’, x, y2, ’k–’, x, y3, ’k-.’);

This gives us the Figure 21. As can be seen it is similar to the homogeneous case. The exception is that because of the forcing constant on the right hand side, the solution asymptotes to that constant rather than 0.

A simple way of quickly visualizing a solution from dsolve is the ezplot function. The script below demonstrates the use of the ezplot function.

>> syms a b c d g y x

>> y=dsolve(’D2y+a*Dy+b*y=c’, ’y(0)=d, Dy(0)=g’, ’x’);

>> a=1;b=1;c=1;d=1;g=1;

Figure 21: Solution of an inhomogeneous equation

>> ezplot(eval(y), [0, 6 *pi]); >> axis([-0.5 20. 0.8 1.6]) ;

This produces Figure 22.

exp((−1/2+1/2 i 31/2) x)/(8604408562923685/81129638414606681695789005144064+i 31/2)−…+1






1.1 1



0                                                                                     2                      4                      6                      8                      10                    12                    14                        16                    18                    20 x

Figure 22: Using the ezplot function

8           Sparse Matrices

8.1           Storage of data

Matrices and linear algebra are frequently a way in which to analyze complex problems that can be framed in terms of partial differential equations. These equations can then be discretized, using some scheme so the system can be represented by discrete fields (instead of continuous fields). Discretizations of partial differential equations almost always lead to sparse matrices. These are matrices where most of the elements are zeros. Although it is still possible to use the same algorithms for sparse as dense matrices (matrices with few zero elements) it is very inefficient to do so. In particular some problems are so large that to write out the entire matrix would require prohibitive amounts of memory. It is therefore critical to take advantage of the sparseness of the matrix.

The first advantage that can be taken is in the storage of data. Normally a matrix in MATLAB is stored by writing out every element. A sparse matrix stores its data using three vectors. Only non-zero elements are stored. The length of the three vectors is called nzmax. This must be greater than the number of non-zero elements in the matrix. The first vector is just the list of (nnz) non-zero elements. The second vector is the row index of the non-zero elements. There are nzmax elements in this vector. The third vector is start and end column index of non-zero elements in each row. For example consider the matrix:

1          0              3              0

 0       4              0              0 

 0      0              6              0 

0          3              2              0

>> A=[1 0 3 0; 0 4 0 0; 0 0 6 0; 0 3 2 0] ;

>> A1=sparse(A);

>> A1

A1 =

(1,1)     1

(2,2)     4

(4,2)     3

(1,3)     3

(3,3)     6

(4,3)     2

>> whos

Name Size  Bytes Class

A   4×4   128 double array

A1  4×4   92 double array ( sparse )

Grand total is 22 elements using 220 bytes >> nzmax(A1)

ans =

6 >> nnz(A1) ans =


The reason for a possible difference between nzmax and nnz is efficiancy. It is more efficient to allocate more space initially and allow the matrix to expand to fill it than to allocate exactly what is needed. However the tradeoff with the extra memory needed against the improved performance changes at some large matrix size. In our tiny example nzmax and nnz are the same.

Note that if the matrix is complex, then there is a fourth vector which is the list of imaginary numbers to store the matrix data.

8.2           Creating sparse matrices

We saw in the previous section sparse is one way of creating sparse matrices from dense ones. But clearly this is not a very efficient way of creating sparse matrices.

Another usage of sparse is to create the matrix from its member vectors. For example for our our example matrix:

>> A2=sparse([1 2 4 1 3 4], [1 2 2 3 3 3], [1 4 3 3 6 2], 4, 4) ;

>> A2 A2 =

(1,1)     1

(2,2)     4

(4,2)     3

(1,3)     3

(3,3)     6

(4,3)     2

>> A3=sparse([1 2 4 1 3 4], [1 2 2 3 3 3], [1 4 3 3 6 2], 4, 4, 10) ;

>> A3 A3 =

(1,1)     1

(2,2)     4

(4,2)     3

(1,3)     3

(3,3)     6

(4,3)     2

>> nzmax(A2) ans =

6 >> nzmax(A3) ans =

10 >> nnz(A2) ans =

6 >> nnz(A3) ans =


The second usage of sparse sets nzmax to 10 instead of the default 6 (the number of non-zero elements). This would be useful if we were to grow the number of non-zero elements in A3.

Another way of creating sparse matrices is to use the spdiags which uses the Compressed-Diagonal storage mode. A matrix of order n has 2n − 1 diagonals.

d0                  d1                  d2                  d3                  ···             dn−1

A =

d−1 d−2 d−3



a11 a21 a31

a12 a22 a32


a13 a23 a33

a1n 

...          


The matrix A can then stored using two matrices B and d, such that each diagonal (including its zero elements) of A that has at least one non-zero element is a column of B. These columns are padded such that:

  • Each superdiagonal (k > 0) which has n − k elements, is padded with k leading zeros.
  • The main diagonal (which has n elements), isn’t padded.
  • Each subdiagonal (k < 0), which has n − |k| elements, is padded with k trailing zeros.

And d is a vector with the list of the diagonal numbers corresponding to the columns of B.

For example for a matrix A,

we have

1  0

A =    00









0 00 

0   


0  0

B =    30









0 03 

0   

d =  −2                −1           0              2

So we can construct our matrix A using:

>> B=[0 0 1 0; 3 0 4 0; 0 2 6 3; 0 0 0 0] ;

>> d=[-2 -1 0 2] ;

>> A2=spdiags(B, d, 4, 4)

A2 =

(1,1)     1

(2,2)     4

(4,2)     3

(1,3)     3

(3,3)     6

(4,3)     2

Some of the built in functions to create dense matrices, like, rand, eye etc. have their sparse equivalents.







There is no equivalent (for obvious reasons) of ones that produces a sparse matrix. However there is a function spones which takes a sparse matrix as an argument and preserves its structure, but replaces each non-zero element with


In addition there is the command spconvert that will use load to read in data (in row, column index and value format) and convert it to a sparse matrix.

8.3           Viewing sparse matrices

MATLAB has a script spy to graphically view sparse matrices. For example spy(A2) produces the Figure 23. This isn’t a very interesting picture. If we use spy to look at more interesting matrices we can see patterns emerge.

For example Figure 24 shows a matrix that comes with MATLAB.

A particularly useful function is find. This returns the list of indices and value of all non-zero elements of a matrix, regardless whether the matrix a dense or sparse. For example:

>> [i, j, v]=find(A2) i =







j =


0.5 1

1.5 2

2.5 3

3.5 4



0           0.5           1              1.5           2              2.5           3              3.5           4              4.5           5 nz = 6

Figure 23: Viewing a sparse matrix graphically







v =

















0                                                   50 100          150          200          250          300          350          400     450 nz = 1887

Figure 24: Viewing a large sparse matrix graphically

8.4           Sparse matrix computations

A sparse matrix may stop being a sparse matrix when operated on if it is not possible to preserve sparsity in that operation. This can potentially produce a very large matrix inadvertantly, so it is important to keep this in mind when operating on a sparse matrix. Functions that act on matrices and return a vector or a scalar return a dense matrix regardless if the original matrix was a sparse or dense. Note, the original matrix is unaffected.

Most unary operations (operations not involving a second matrix) preserve the sparsity of the matrix. For example max(A) is a sparse matrix if A is a sparse matrix. Obviously, the functions sparse and full change the sparsity of the matrix.

Binary operations (operations between two matrices) preserves the sparsity of the matrix if possible. So A.*B is sparse if either A or B are sparse. But A+B is dense if one of the two A or B are dense.

8.5           Reordering of matrices

The simple way of reordering a matrix is to use the permutation vector. That is the vector that lists the new ordering of the rows or columns of the matrix. Using our matrix A2 from before:

>> full(A2)

ans =

1 0     3     0

0 4     0     0

0 0     6     0

0 3     2     0

>> p=[4 2 3 1] ;

>> A2(p,:) ans =

(4,1)     1

(1,2)     3

(2,2)     4

(1,3)     2

(3,3)     6

(4,3)     3

>> full(A2(p,:)) ans =

0 3     2     0

0 4     0     0

0       0 6     0

1       0 3     0

It is also possible to use a permutation matrix to represent the permutation. It is possible to convince oneself that the permutation matrix P is simply I(p,:) where I is the identity matrix.

>> I=eye(4,4);

>> P=I(p,:); >> full(P*A2) ans =

0 3     2     0

0 4     0     0

0       0 6     0

1       0 3     0

A particularly useful function is colperm which returns a permutation vector such that on permutation the matrix has its column ordered in increasing number of non-zero elements.

Some other ordering functions are symrcm, symamd and symmmd that work on symmetric matrices and colamd and colmmd that work on non-symmetric matrices.

9           Numerical solutions of Ordinary Differential


Consider a second order differential equation:

d2y          dy

 + a(x) + b(x)y = c(x)

dx 2              dx

where a and b are functions of x. This is second order because the largest derivative of the dependant variable y is 2.

This can be written as two coupled first order differential equations:


       =              z



 + a(x)z + b(x)y =              c(x) dx

We can then focus, momentarily, on a single first order ODE.


 = f(x,y)


To the simplest approximation a solution would be the Euler’s approximation:

yn+1 = yn + hf(xn,yn)

This is exact if y(x) is linear in x. In most other cases however this yields pretty poor results. In addition this method can become unstable, i.e., small errors would accumulate and result in a final yn very different from the actual y(xn). An improvement to it is the Runge-Kutta 2nd order method:

k1                    =             hf(xn,yn)

1                 1

k2                    =             hf(xn + h,yn + k1)

2                 2 yn+1          =             yn + k2 + O(h3)

where O(h3) refers to terms of order x3. Effectively, this introduces an intermediat step in the Euler’s method in order to refine our guess of the change of y with x.

It is possible to generalize this to n orders as:






hf(xn + a2h,yn + b21k1)





hf(xn + anh,yn + bm1k1 + bm2k2 + ··· + bm(m1)km−1) yn + c1k1 + c2k2 + ··· + cmkm + O(hm)


A particularly useful aspect of this method is that a different choice of constants ci results in a different order approximation to y:


It is thus possible to get an error estimate for yn+1 without extra evaluations of the function f(x,y):


The popular versions of these Runge-Kutta approximate solutions are, (4)5 and (2)3. MATLAB implements both these forms, as ode45 and ode23 respectively. In either of these functions MATLAB uses the error estimate to modify the step size.

Note that both these are for non-stiff ODE’s, where non-stiff means the differential equations have solutions that have a single “timescale” (or at least multiple “timescales” are are not very different from each other). Here “timescale” refers to the scale of the independant variable x over which y changes significantly.

The function ode45 solves a differential equation of the form:


dt = f i(y1,y2,…,yn)            i = 1,2,…,n

over the interval t0 ≤ t ≤ tf subject to the initial conditions yj(t0) = aj,j = 1,2,…,n, where aj are constants. The usage of the ode45 are as follows:

[t, y] = ode45(@FunctionName, [t0 tf], [a1 a2 … an]’, …

options, p1, p2, …)

In the above [t, y] denotes that ode45 returns two results. The first t is a column vector of the times in the range [t0 tf] that are determined by ode45 and the second output y is the matrix of solutions such that the rows are the solutions at any given time t(i) in the corresponding row of the first output t. Also, @FunctionName is the handle for the name of the function file FunctionName (ignoring the .m at the end of the file) that represents the array of functions which form the right hand side of the equations. Its form must be: function y=FunctionName(t, g, p1, p2, …) where t is the independant variable, g is the vector representing yj, and p1, p2 etc. are parameters.

Consider the following second order ordinary differential equation, which could represent a forced damped oscillator.

d2y    dy

2 + 2ξ dt   + y = h(t)


Let us now make the substitution,

y1                   =              y


y2              =


Then the second order equation can be replaced by two first order equations.


= y 2

dt dy2

dt         =              −2ξy 2 − y1 + h(t)

Assume that ξ = 0.15 and that we start at time t0 = 0 and end at tf = 35. At t0 the displacement and the velocity are both zero, viz. y1(t0) = 0 and y2(t0) = 0. Finally we assume h(t) = 1.

Figure 25: Matlab’s built in editor

First create the function which returns the array of right hand side functions (in this case a two element column vector.

function y=ForcingFunction(t, w, xi)

% ForcingFunction – return the right hand side of the system % ForcingFunction takes in the time t, vector w, and the constant xi. The % vector w gives the values of the dependant variable at the current time.

y = [w(2); -2 *xi*w(2)-w(1)+1]; return

save this as a file ForcingFunction.m. This file may be created using MATLAB’s own editor as displayed in the Figure 25.

Then run the following commands:

>> [tt, yy] = ode45(@ForcingFunction, [0 35], [0 0]’, [], 0.15) ;

>> plot(tt, yy(:, 1))

>> xlabel(’Time’);

>> ylabel(’y(Time)’);


Figure 26: Solution of the differential system

You should get the Figure 26 displaying the displacement y(t) of the oscillator with time t. As expected, the asymptotic value is the forcing value of 1 , and the oscillator displays a transient with a damping related to the constant.

10            SIMULINK

SIMULINK is an extension to MATLAB which uses a icon-driven interface for the construction of a block diagram representation of a process. A block diagram is simply a graphical representation of a process (which is composed of an input, the system, and an output).

Typically, the MATLAB m-file ode45 is used to solve sets of linear and nonlinear ordinary differential equations. The “traditional” numerical methods approach is used, e.g. supply the equations to be solved in a function file, and use a general purpose equation solver (linear or nonlinear algebraic, linear or nonlinear differential equation, etc.) which “calls” the supplied function file to obtain the solution. One of the reasons why MATLAB is relatively easy to use is that the “equation solvers” are supplied for us, and we access these through a command line interface (CLI). However, SIMULINK uses a graphical user interface (GUI) for solving process simulations. Instead of writing MATLAB code, we simply connect the necessary “icons” together to construct the block diagram. The “icons” represent possible inputs to the system, parts of the systems, or outputs of the system. SIMULINK allows the user to easily simulate systems of linear and nonlinear ordinary differential equations. Many of the features of SIMULINK are user-friendly due to the icon-driven interface, yet it is important to spend some time experimenting with SIMULINK and its many features.

10.1           Getting Started in Simulink

SIMULINK is an icon-driven state of the art dynamic simulation package that allows the user to specify a block diagram representation of a dynamic process. Assorted sections of the block diagram are represented by icons which are available via various “windows” that the user opens (through double clicking on the icon). The block diagram is composed of icons representing different sections of the process (inputs, state-space models, transfer functions, outputs, etc.) and connections between the icons (which are made by “drawing” a line connecting the icons). Once the block diagram is “built”, one has to specify the parameters in the various blocks, for example the gain of a transfer function. Once these parameters are specified, then the user has to set the integration method (of the dynamic equations), stepsize, start and end times of the integration, etc. in the simulation menu of the block diagram window.

In order to use SIMULINK start a MATLAB session (click on the MATLAB button). Once MATLAB has started up, type simulink (SMALL LETTERS!) at the MATLAB prompt (>>) followed by a carriage return (press the return key). A SIMULINK window should appear shortly, with the several icons: Sources, Sinks, Discrete, etc. This is shown in the Figure 27.

Next, go to the file menu in this window and choose New in order to begin building the block diagram representation of the system of interest.

Figure 27: The SIMULINK library

10.2           Block Diagram Construction

As mentioned previously, the block diagram representation of the system is made up of various type of icons. Basically, one has to specify the model of the system (state space, discrete, transfer functions, nonlinear ODE’s, etc), the input (source) to the system, and where the output (sink) of the simulation of the system will go. Open up the Sources, Sinks, and Linear windows by clicking on the appropriate icons. Note the different types of sources (step function, sinusoidal, white noise, etc.), sinks (scope, file, workspace), and linear systems (transfer function, state space model, etc.).

Let us illustrate this by trying to model a simple harmonic oscillator ( a pendulum). The equation of motion of the pendulum is:

2d2θ + γdθ   + mglsinθ = Af(ωDt)

ml dt 2                    dt

In order to represent this as a block diagram, re-write the equation of motion in the following way:

d2θ          γ dθ        g

= Af(ω dt 2 Dt) − ml2 dt − l sinθ θ = Z Z Af(ωDt) − ml 2 dt l sinθ

γ dθ        g

In the above is the integration operator. For our example we will consider the case whereR the forcing function Af(ωDt) = sin(t), i.e., A = 1, ωD = 1 and

f(ωDt) = sin(ωDt). In addition we will take the case where m = 1, γ = 10 and l = 1. In S.I. g = 9.81kgm/s2.

Let us now try to construct this system using Simulink. Following the instructions given before start a new block diagram.

Figure 28: The SIMULINK model of a simple pendulum

  1. On the MATLAB prompt (>>) type simulink.
  2. Once the Library: simulink window is up click on File and then New and select Model.
  3. Double click on Sources. This opens a list of sources.
  4. Click on the Sine wave block and drag it to the new model window.
  5. Double click on the Sine wave block. A new window pops up displaying the parameters controlling the block. Make sure that the Amplitude is 1 and the Frequency is set to 1 (according to the parameters of our problem). This is the forcing function on the pendulum.
  6. Next double click on the Continuous icon in the library window. Drag an integrator to your model. Repeat this, as our solution has two integrations.
  7. Double click on the Math Operations icon and drag two Gain and one Sum icon to the model.
  8. Double click the Sum block and change the shape to rectangular. In the signs box enter +|-|-. This means there will be three inputs and one output. The second two inputs are to be subtracted from the first input.
  9. Double click on the Gain blocks and put in the correct multiplicative factors.
  10. Finally double click on the Sinks icon and drag a To workspace icon to the model.
  11. All the required blocks are now in, and we can connect them into a system.
  12. To connect blocks click on the source block, and click on the destination block while pressing the ctrl key. In order to branch bring the mouse to the path that will be branched. Then keeping the ctrl key pressed click and drag the mouse to the destination block.
  13. Connect the blocks as shown in the Figure 28.
  14. Double click on the To workspace block and name the variable ( say, theta). Choose the save format as Array.
  15. Now click on the Simulation tab on the top of the model window and select Simulation parameters. Select the Solver tab and set the Stop time to be 30. Select the Workspace I/O tab and uncheck everything but the Time button. Leave the variable name as tout. Uncheck the Limit datapoint to last: button.

Now you can run the system, but clicking Simulation and selecting Start. Now if you go back to the MATLAB prompt you will find that the new variables tout and theta are now in the workspace. Plotting these two gives us the evolution of the pendulum over time.

10.3           General Simulink Tips

The following are general tips and should be used often:

  1. In order to save your work, select Save from the file menu and give the file that you want to save a name (or choose an old name if you are “writing over” an old version), and click the ok button (using the left-most mouse button). Realize that you have a choice of the “folder” that the file is saved in.
  2. The results of a simulation can be sent to the MATLAB window by the use of the to workspace icon from the Sinks window. Open the to workspace icon and select the variable name that you want the results stored in the MATLAB workspace.
  3. If your simulation has n state (or output) variables and you want to save them as different names, then you have to use a special connection called a Demux (as in demultiplexer) icon which is found in the Signal Routing window. Basically, it takes a vector input and converts it into several scalar lines. You can set the number of outputs (scalar lines) by double clicking on the icon and changing the number of outputs. A Mux icon takes several scalar inputs and multiplexes those in a vector (useful sometimes in transferring the results of a simulation to the MATLAB workspace, for example).
  4. You can add steady state constants to variables using a Constant icon in the Sources window. To do this for a scalar output variable, just enter the value of the steady-state into the Constant icon and add this to the scalar output using the Sum icon. For a vector output, you must first “break-up” the vector into scalar outputs using the Demux icon and then add the steady-state value to each scalar output.
  5. Parameters can be “passed” to SIMULINK from the MATLAB window by using the parameter in a SIMULINK block or parameter box and defining the parameter in the MATLAB window. For example, say that one wants to run the simulation with many different damping constants γ, just rewrite the constant in the Gain block as gamma. Then make sure that you define the variable gamma in MATLAB to some constant. Now run the simulation. If you do not take the last step and gamma is undefined, the simulation will stop with an error on the Gain block using gamma.
  6. In order to print the block diagram, first save the block diagram. Then click File and select Print.

10.4           More information

For more information on SIMULINK, please click Help in the top of the model window and select Using Simulink

Posted in Simulation | Leave a Comment »

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione


Just another site

Algunos Intereses de Abraham Zamudio Chauca

Matematica, Linux , Programacion Serial , Programacion Paralela (CPU - GPU) , Cluster de Computadores , Software Cientifico




A great site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.

Chetan Solanki

Helpful to u, if u need it.....


Explorer of Research #HEMBAD


Explorer of Research #HEMBAD


A great site


This is My Space so Dont Mess With IT !!

%d bloggers like this: