Blog posts



Today I will introduce you CUDA-Aware-MPI. CUDA aware MPI is a brand new technology which shows high potential when used with GPUDirect technology. Basically, it is a CUDA application witch has been divided multiple nodes. So, it is also a MPI application too. It is just like a dream. You may find official introduction blog post of Nvidia through this link.

I want to show you how to compile and run it.

First of all, you should have more than one nodes with Nvidia GPU card. Secondly (if you are used to MPI you already know this part), you have to sync working directories. As it mentioned [url=[/url], we have to use object files to mix MPI and CUDA. We need to have two different source files, one for MPI (main code) and one for CUDA (parallel part).

CUDA Part;

/* */
#include <cuda.h>
#include <cuda_runtime.h>

__global__ void __multiply__ (const float *a, float *b)
const int i = threadIdx.x + blockIdx.x * blockDim.x;
     b[ i ] *= a[ i ];

extern "C" void launch_multiply(const float *a, float *b)
     /* ... load CPU data floato GPU buffers a_dev and b_dev */
float *a_dev, *b_dev;
cudaMalloc(&a_dev, 5);
cudaMalloc(&b_dev, 5);
cudaMemcpy(a_dev, a, 5, cudaMemcpyHostToDevice);
cudaMemcpy(b_dev, b, 5, cudaMemcpyHostToDevice);

     __multiply__ <<< 1,1 >>> (a_dev, b_dev);

        cudaMemcpy(b, b_dev, 5, cudaMemcpyDeviceToHost);

     /* ... transfer data from GPU to CPU */

Main Code

/* main.c */

#include <mpi.h>

void launch_multiply(const float *a, float *b);

int main (int argc, char **argv)
        int rank, nprocs;
     MPI_Init (&argc, &argv);
     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
     MPI_Comm_size (MPI_COMM_WORLD, &nprocs);

     /* ... prepare arrays a and b */
float a[5]={1,2,3,3,4};
float b[5]={9,8,7,6,5};

     launch_multiply (a, b);

       return 0;

Then, we have to compile cuda code and create an object file;

nvcc -c -o multiply.o

Finally, using this object file (and necessary other libraries), we will compile the whole application

mpicc main.c multiply.o -L/usr/local/cuda-5.5/lib64 -L/usr/local/lib/openmpi/ -lcudart -lstdc++ -o main

Finally, you may use your application main.

Note that, there a few important points here;

1) In CUDA part, there is a launcher function called "launch_multiply". It is defined as extern "C", so it will be able to be called from the main code.

2) mpicc has parameters for both cuda libraries and mpi libraries paths. Also, libcudart and libstdc++ are must.

That's all :)

by zgrw on 2014-02-07 20:40:31