Using the Allinea MAP profiler on hpc-class.its.iastate.edu

Introduction

To optimize your code, first you need to find performance bottlenecks. Performance can be affected by load imbalance in parallel programs, cash misses and inability of compilers to optimize specific loops. To find which parts of the code should be modified to improve the performace, profiler tools can be used.

Two profiler tools are available on hpc-class: Allinea MAP and Intel Trace Analyzer and Collector.

  • MAP is a low-overhead profiler from Allinea for both scalar and MPI programs. The program needs to be compiled with -g compiler option. To profile the program, it needs to be run with map instead of mpirun, i.e. "map -n 4 a.out".

    Before trying the examples below, set up your environment following instructions at Using the DDT Parallel Debugger page. It is also recommended to run the DDT example to get familiar with Allinea tool interface. The MAP examples below are provided by Allinea.

    Note that even though DDT and MAP support GPU languages, such as HMPP, OpenMP Accelerators, CUDA and CUDA Fortran, we don't have a license to use DDT and MAP on the GPUs.

  • To be profiled with Intel Trace Analyzer and Collector, the program needs to be compiled with Intel compiler and linked with -trace option (if linked statically). In case of dynamic linking, the -trace option should be added to the mpirun command. When ran, the Intel Trace Collector will create a .stf file, which later can be analyzed by the Intel Trace Analyzer. See the last example that demonstrates the use of the Intel Trace Analyzer and Collector.

    Allinea MAP Example 1

    This example shows how to measure the performance of MPI programs.
  • Copy the example programs to your directory:
    	cp -r /home/SAMPLES/MAP/1-use-the-source ./
    
  • Change the current working directory to the directory that contains the example code:
    	cd 1-use-the-source/problem
    
  • Follow the instructions in the example guide at 1-Use-The-Source-Handout.pdf.
  • Note: Use Intel compilers (e.g. mpiicc) to avoid linking errors.

    Allinea MAP Example 2

    This example shows how to optimize he program by using compiler flags and improving memory utilization.
  • Copy the example programs to your directory:
    	cp -r /home/SAMPLES/MAP/2-cpu-optimization ./
    
  • Change the current working directory to the directory that contains the example code:
    	cd 2-cpu-optimization/problem
    
  • Follow the instructions in the example guide at 2-CPU-Optimization-Handout.pdf.
  • Note: Use Intel compilers (e.g. mpiicc) to avoid linking errors.

    Intel Trace Analyzer and Collector Example

    This example uses the same source file as the Allinea MAP Example 1.
  • Copy the example programs to your directory:
            cp -r /home/SAMPLES/MAP/1-use-the-source ./
    
  • Change the current working directory to the directory that contains the example code:
            cd 1-use-the-source/problem
    
  • Compile and link the program with the -trace option:
    	mpiicc -trace sqrtmax.c -o sqrtmax_itac
    
  • Use the hpc-class PBS script_writer to write a script to submit to the batch scheduler. In the script choose 4 for the Total # of CPUs needed and use command "mpirun -np 4 ./sqrtmax_itac" to start 4 MPI processes.
  • When the job completes, the sqrtmax_itac.stf file will be generated. Analyze performance of the program by issuing
    	traceanalyzer sqrtmax_itac.stf
    
  • Use the Intel Trace Analyzer Reference Guide and Intel Trace Collector Reference Guide to get more information about these tools.