Something More for Research

Explorer of Research #HEMBAD

Increment Array

CUDA provides several extensions to the C-language. The function type qualifier __global__ declares a function as being an executable kernel on the CUDA device, which can only be called from the host. All kernels must be declared with a return type of void.

The kernel incrementArrayOnDevice performs the same calculation as incrementArrayOnHost. Looking within incrementArrayOnDevice, you see that there is no loop! This is because the function is simultaneously executed by an array of threads on the CUDA device. However, each thread is provided with a unique ID that can be used to compute different array indicies or make control decisions (such as not doing anything if the thread index exceeds the array size). This makes incrementArrayOnDevice as simple as calculating the unique ID in the register variable, idx, which is then used to uniquely reference each element in the array and increment it by one. Since the number of threads can be larger than the size of the array, idxis first checked against N, an argument passed to the kernel that specifies the number of elements in the array, to see if any work needs to be done.

So how is the kernel called and the execution configuration specified? Well, control flows sequentially through the source code starting at main until the line right after the comment containing the statement Part 2 of 2 in Listing One.



#include <stdio.h>

#include <assert.h>

#include <cuda.h>

void incrementArrayOnHost(float *a, int N)


int i;

for (i=0; i < N; i++) a[i] = a[i]+1.f;


__global__ void incrementArrayOnDevice(float *a, int N)


int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<n) a[<span="" class="hiddenSpellError" pre="" data-mce-bogus="1">idx] = a[idx]+1.f;


int main(void)


float *a_h, *b_h;           // pointers to host memory

float *a_d;                 // pointer to device memory

int i, N = 10;

size_t size = N*sizeof(float);

// allocate arrays on host

a_h = (float *)malloc(size);

b_h = (float *)malloc(size);

// allocate array on device

cudaMalloc((void **) &a_d, size);

// initialization of host data

for (i=0; i

// copy data from host to device

cudaMemcpy(a_d, a_h, sizeof(float)*N, cudaMemcpyHostToDevice);

// do calculation on host

incrementArrayOnHost(a_h, N);

// do calculation on device:

// Part 1 of 2. Compute execution configuration

int blockSize = 4;

int nBlocks = N/blockSize + (N%blockSize == 0?0:1);

// Part 2 of 2. Call incrementArrayOnDevice kernel

incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d, N);

// Retrieve result from device and store in b_h

cudaMemcpy(b_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

// check results

for (i=0; i<N; i++) assert(a_h[i] == b_h[i]);

// cleanup


This queues the launch of incrementArrayOnDevice on the CUDA-enabled device and illustrates another CUDA addition to the C-language, an asynchronous call to a CUDA kernel. The call specifies the name of the kernel and the execution configuration enclosed between triple angle brackets “<<<” and “>>>”. Notice the two parameters that specify the execution configuration: nBlocks and blockSize, which will be discussed next. Any arguments to the kernel call are provided via a standard C-language argument list for a function delimited in the standard C-language fashion with “(” and “)”. In this example, the pointer to the device global memory a_d (which contains the array elements) and N (the number of array elements) are passed to the kernel.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Extracts from a Personal Diary

dedicated to the life of a silent girl who eventually learnt to open up

Num3ri v 2.0

I miei numeri - seconda versione


Just another site

Algunos Intereses de Abraham Zamudio Chauca

Matematica, Linux , Programacion Serial , Programacion Paralela (CPU - GPU) , Cluster de Computadores , Software Cientifico




A great site

Travel tips

Travel tips

Experience the real life.....!!!

Shurwaat achi honi chahiye ...

Ronzii's Blog

Just your average geek's blog

Karan Jitendra Thakkar

Everything I think. Everything I do. Right here.


News About Tech, Money and Innovation

Chetan Solanki

Helpful to u, if u need it.....


Explorer of Research #HEMBAD


Explorer of Research #HEMBAD


A great site


This is My Space so Dont Mess With IT !!

%d bloggers like this: