CUDA Random Example
In order to use cuRAND, we need to add two include files into our program:
#include <curand.h>
#include <curand_kernel.h>
cuRAND uses a curandState_t type to keep track of the state of the random sequence. The normal C rand function also has a state, but it is global, and hidden from the programmer. This makes rand not thread-safe, but easier to use.
A curandState_t object must be initialized with a call to curand_init which has the following parameters:
- seed: The seed determines the beginning point of the sequence of random numbers.
- sequence: The sequence number is another seed-like value. It is used so that, if all cores have the same seed, but different sequence numbers, then they will get different random values.
- offset: The amount we skip ahead in the random sequence. This can be zero.
- state: A pointer to the curandState_t object to initialize.
Once we have an initialized curandState_t object, we can get random numbers with the curand function which takes a pointer to a curandState_t object and returns to us a random unsigned integer.
The following program uses these functions to generate random numbers:
#include <unistd.h>
#include <stdio.h>
/* we need these includes for CUDA's random number stuff */
#include <curand.h>
#include
#define MAX 100
/* this GPU kernel function calculates a random number and stores it in the parameter */
__global__ void random(int* result) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread */
curandState_t state;
/* we have to initialize the state */
curand_init(0, /* the seed controls the sequence of random values that are produced */
0, /* the sequence number is only important with multiple cores */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&state);
/* curand works like rand - except that it takes a state as a parameter */
*result = curand(&state) % MAX;
}
int main( ) {
/* allocate an int on the GPU */
int* gpu_x;
cudaMalloc((void**) &gpu_x, sizeof(int));
/* invoke the GPU to initialize all of the random states */
random<<<1, 1>>>(gpu_x);
/* copy the random number back */
int x;
cudaMemcpy(&x, gpu_x, sizeof(int), cudaMemcpyDeviceToHost);
printf("Random number = %d.\n", x);
/* free the memory we allocated */
cudaFree(gpu_x);
return 0;
}
When run, this program produces the exact same random number each time. This is because the seed passed in was 0. In order to get a different random number each time, we can pass in the current time as the seed.
#include <unistd.h>
#include <stdio.h>
/* we need these includes for CUDA's random number stuff */
#include
#include
#define MAX 100
/* this GPU kernel function calculates a random number and stores it in the parameter */
__global__ void random(unsigned int seed, int* result) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread */
curandState_t state;
/* we have to initialize the state */
curand_init(seed, /* the seed controls the sequence of random values that are produced */
0, /* the sequence number is only important with multiple cores */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&state);
/* curand works like rand - except that it takes a state as a parameter */
*result = curand(&state) % MAX;
}
int main( ) {
/* allocate an int on the GPU */
int* gpu_x;
cudaMalloc((void**) &gpu_x, sizeof(int));
/* invoke the GPU to initialize all of the random states */
random<<<1, 1>>>(time(NULL), gpu_x);
/* copy the random number back */
int x;
cudaMemcpy(&x, gpu_x, sizeof(int), cudaMemcpyDeviceToHost);
printf("Random number = %d.\n", x);
/* free the memory we allocated */
cudaFree(gpu_x);
return 0;
}
Using Random Numbers Across Cores
If we want to get random numbers in multiple GPU cores, then we would need each core to have its own curandState_t.
If we want each run of the program to produce different sequences of random numbers, then we would need to set the seed to the current time.
However, now we would likely have each core get the same sequence of numbers. This is probably undesirable. To avoid it, we set the sequence parameter to the thread’s ID.
This way, each thread will have a different stream of random numbers, which will also be different each time the program is run.
The following program illustrates this by creating N curandState_t objects, then launching a GPU kernel to get N random numbers from them, in parallel.
#include <unistd.h>
#include <stdio.h>
/* we need these includes for CUDA's random number stuff */
#include
#include
#define N 25
#define MAX 100
/* this GPU kernel function is used to initialize the random states */
__global__ void init(unsigned int seed, curandState_t* states) {
/* we have to initialize the state */
curand_init(seed, /* the seed can be the same for each core, here we pass the time in from the CPU */
blockIdx.x, /* the sequence number should be different for each core (unless you want all
cores to get the same sequence of numbers for some reason - use thread id! */
0, /* the offset is how much extra we advance in the sequence for each call, can be 0 */
&states[blockIdx.x]);
}
/* this GPU kernel takes an array of states, and an array of ints, and puts a random int into each */
__global__ void randoms(curandState_t* states, unsigned int* numbers) {
/* curand works like rand - except that it takes a state as a parameter */
numbers[blockIdx.x] = curand(&states[blockIdx.x]) % 100;
}
int main( ) {
/* CUDA's random number library uses curandState_t to keep track of the seed value
we will store a random state for every thread */
curandState_t* states;
/* allocate space on the GPU for the random states */
cudaMalloc((void**) &states, N * sizeof(curandState_t));
/* invoke the GPU to initialize all of the random states */
init<<<n, 1="">>>(time(0), states);
/* allocate an array of unsigned ints on the CPU and GPU */
unsigned int cpu_nums[N];
unsigned int* gpu_nums;
cudaMalloc((void**) &gpu_nums, N * sizeof(unsigned int));
/* invoke the kernel to get some random numbers */
randoms<<<n, 1="">>>(states, gpu_nums);
/* copy the random numbers back */
cudaMemcpy(cpu_nums, gpu_nums, N * sizeof(unsigned int), cudaMemcpyDeviceToHost);
/* print them out */
for (int i = 0; i < N; i++) {
printf("%u\n", cpu_nums[i]);
}
/* free the memory we allocated for the states and numbers */
cudaFree(states);
cudaFree(gpu_nums);
return 0;
}
This program is also the first to use multiple GPU kernel functions.
Random Distributions
In addition to the curand function which, together with modular arithmetic, can return to us random integers from any range we wish, cuRAND provides functions to get floating point numbers from different distributions:
__device__ float curand_uniform (curandState_t *state)
__device__ float curand_normal (curandState_t *state)
curand_uniform returns a random number between 0.0 and 1.0 following a uniform distribution. This means that all floating point numbers in that range are equally likely to be produced.
curand_normal also returns a random number between 0.0 and 1.0, but it follows a normal distribution, meaning that the number 0.5 is more likely to be produced than numbers near 0.0 or 1.0. Normal distributions would be important for modelling many natural phenomenon accurately.