CyEnce: How to Use Accelerator Nodes

CyEnce cluster has 24 GPU nodes (each node containing two NVIDIA Tesla K20 GPU) and 24 Intel MIC nodes (each node containing two 60-core Intel Xeon Phi Accelerator card).

To run programs on these accelerator nodes first YOU NEED TO LOGIN TO NODE 'STREAM'. Then create a PBS script using the PBS scriptwriter and selecting node type "gpu" or "mic". For more details on using the job scheduler refer to Managing jobs using the PBS job scheduler . Submit the job into appropriate queue:

qsub -q gpu  (to run on a GPU node)
qsub -q mic  (to run on a MIC node)

Programs for accelerator nodes can be compiled either within such a script, or using one of the following commands:

Examples

1. bgnvcc

Copy and compile a simple "Hello World" program written in CUDA:

   cp /home/SAMPLES/hello_world_cuda.cu ./
   bgnvcc -o hello_world_cuda hello_world_cuda.cu
To execute the hello_world_cuda, create a PBS script as described above. In the script writer select node type "gpu" and in the command field type "./hello_world_cuda". Save the generated script in file myscript, and submit job by issuing "qsub -q gpu myscript".

When the job is completed, you should see "Hello World!" in the BATCH_OUTPUT file.

2. nvcc

The nvcc compiler can not be used on the head node. To compile and run two sample programs from the NVIDIA Toolkit, create a PBS script as described above. In the script writer select node type "gpu" and in the command field type:

   mkdir cuda_samples
   cd cuda_samples
   mkdir 0_Simple
   cp -r /usr/local/cuda/samples/0_Simple/matrixMul 0_Simple/
   cp -r /usr/local/cuda/samples/0_Simple/matrixMulCUBLAS 0_Simple/
   cp -r /usr/local/cuda/samples/common/ ./
   mkdir -p bin/linux/release/
   cd 0_Simple/matrixMul
   make
   cd ../matrixMulCUBLAS
   make
   cd ../../bin/linux/release/
   ./matrixMul
   ./matrixMulCUBLAS

Save the generated script in file myscript, and submit job by issuing "qsub -q gpu myscript". When the job is completed, in the BATCH_OUTPUT file you should see the performance results of two versions of matrix-matrix multiplication program.

2. bgpgfortran

Copy and compile a simple "GPU Info" program written in CUDA Fortran:

   cp /home/SAMPLES/cufinfo.cuf ./
   bgpgfortran -o cufinfo cufinfo.cuf 
To execute the cufinfo, create a PBS script as described above. In the script writer select node type "gpu" and in the command field type "./cufinfo". Save the generated script in file myscript, and submit job by issuing "qsub -q gpu myscript".

When the job is completed, you should see information on the Tesla K20m GPU card in the BATCH_OUTPUT file.