lightning3: Using MPI


Parallelism on lightning3 is obtained by using MPI. All accounts are set up so that MPI uses the high performance Infiniband communication network. To use MPI :

* To compile Fortran 77, Fortran 90, and Fortran 95 MPI programs use either 
     mpif77, mpif90 or mpif95. To compile C and C++ MPI programs use either 
     mpicc or mpiCC. 
* The default MPI is Open MPI 1.5.3 with GNU compilers. To use Open MPI 1.8.3 
     with Intel compilers set PATH to /shared/openmpi-1.8.3/intel/bin/:$PATH
* use the lightning3 PBS script_writer to write
     a script to submit to the batch scheduler. Remember that there
     are 16 processors per node, so for a 32 processor job, you only
     need 2 nodes.
* In the script use mpirun -np 32 ./a.out
     OpenMPI's mpirun is aware of the nodes that PBS assigns to it,
     so no one else will use the nodes on which your MPI job is running.
* Make sure that the executable (a.out in the example above) resides in one of 
     the following locations:
       /home/user     (where 'user' is your user name)
       /work/group    (where 'group' is your group name, issue 'groups' to find it out)
       /ptmp
     All these locations are mounted on each of the compute nodes.
     Don't place the executable in the local filesystem (/tmp) as each node has its 
     own /tmp . Files placed into /tmp on the front end node won't be available on 
     the compute nodes, so mpirun won't be able to start processes on compute nodes.
* One can use the storage on the disk drive on each of the compute nodes by reading 
     and writing to $TMPDIR.  This is temporary storage that can be used only during 
     the execution of your program. Only processors executing on a node have access 
     to this disk drive.  Since 16 processors share this same storage, you must 
     include the rank of the executing MPI processes when reading and writing files 
     to $TMPDIR. The size of $TMPDIR is about 3 TB.

* The -e and -o PBS files are not available until PBS job finishes, so you
     may want to use 'mpirun -np 12 a.out >& output_file' .  Then you can see
     the output from lightning3 while the job is running. Alternatevily you can use 
     qpeek command:
       qpeek    job# shows STDOUT while job is running.
       qpeek -e job# shows STDERR while job is running.
* For convenience an mpirun command can be submitted to the batch queues using
  the command bmpirun rather than mpirun.  E.g. bmpirun -np 8 ./a.out
  Restrictions:  no more than 16 processes
                 no more than 1 hour
                 output does not appear until after the command is complete.
  The command runs immediately if enough nodes are free, otherwise, it waits
  in the queue.  If you use CNTL-C to exit the command, you also need to 
  qdel the associated job in the PBS queues.