Posted by Hemprasad Y. Badgujar on April 2, 2013
libgpucrypto is subset of SSLShader software that implements few cryptographic algorithms: AES, SHA1, RSA using CUDA. It also includes several data structures to help utilize CUDA’s stream for better performance. See here for more details.
Install required libraries
you can download CUDA stuff at http://developer.nvidia.com/cuda-toolkit-40
libgpucrypto requires CUDA dev driver, CUDA toolkit, and CUDA SDK.
We have tested under software settings as below.
CUDA driver : 270.41.19
CUDA toolkit : 4.0.17
CUDA SDK : 4.0.17
CUDA driver : 260.19.26
CUDA toolkit : 3.2.16
CUDA SDK : 3.2.16
Ubuntu 10.04 LTS 64bit
Install OpenSSL libraries and headers
you can download OpenSSL at http://openssl.org/source/
Configure following variables in Makefile.in
if you’re using system default opeenssl development library, then you may leave OPENSSL_DIR as blank.
Try running test code
#./bin/aes_test -m ENC
AES-128-CBC ENC, Size: 16KB
#msg latency(usec) thruput(Mbps)
1 6012 21
2 6305 41
4 7020 74
8 8737 120
16 11834 177
32 16168 259
64 17244 486
128 19256 871
256 24579 1365
512 27067 2479
1024 31605 4246
2048 40924 6559
4096 61402 8743
Correctness check (batch, random): ………….OK
#./bin/rsa_test -m MP
you can see more detailed usage by running program w/o arguments or w/ incorrect one :).
How to use?
Here, I’ll explain how to use libgpucrypto with an example of AES. Below is part of the code from aes_test.cc.
//1. initialize device context
dev_ctx.init(num_flows * flow_len * 3, 0);
//2. create aes_context.
//generate test random test case
//3. prepare data to be encrypted
pool = new pinned_mem_pool();
pool->init(num_flows * flow_len * 3);
aes_cbc_encrypt_prepare(&ops, ¶m, pool);
//4. Launch GPU code
//5. Wait for completion
Initialize device_context: libgpucrypto has several wrapper for CUDA initialization and stream manipulation. To utilize libgpucrypto, you need to create device_context .
Create aes_context: class aes_context provides APIs to launch GPU code using CUDA library. You need an initialized device_context for this.
Prepare data to be encrypted: To use aes_context, you need to organize data and prepare some metadata. GPU requires large batch size to get maximum throughput and you need to copy data into GPU’s memory before processing. Data copy cost between GPU’s memory and host memory is relatively huge when you copy small amount of data. For this reason, we gather all data into one big buffer before passing to aes_context. Please read sample code aes_test.cc in test directory for details.
In the above example we used pinned_page to avoid another copy in CPU’s memory. Before CUDA4.0, unless you allocate pinned page using CUDA, it will copy data into pinned page internally before copying into GPU. To avoid this we use pinned page explicitly.
We know it’s not very friendly. We’re working on improving the interface.
Launch GPU code: aes_context will copy data into GPU’s memory and launch GPU kernel.
Wait for completion: sync function poll to check whether the GPU execution has finished, and it will copy data back to host memory once kernel execution is done. You can use this function in async manner to just check status. See here for more details.
Please see files in test directory for more examples.
Doxygen API documentation
click here to download
This source code is distributed under BSD-style license. Read LICENSE for more details.