LAM/MPI C++ Example

Compiling C++ Example Using the LAM/MPI modules

All the sample code can be found on the cluster at /share/apps/samples/lammpi-c++ you can copy them to your home folder using:
$ cp -r /share/apps/samples/lammpi-c++ ./

 

First you need to create your source code. See example mpi_hello.cpp  below:

// Filename: mpi_hello.cpp
// Description: A parallel hello world program

#include <iostream>
#include <mpi.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
        MPI::Init(argc, argv);

        int rank = MPI::COMM_WORLD.Get_rank();
        int size = MPI::COMM_WORLD.Get_size();

        std::cout<<"Returned: "<<system("sleep 2")<<" ";
        std::cout << "Hello World! I am " << rank << " of " << size <<
        std::endl;

        MPI::Finalize();
        return(0);
}
 

Then you must load the correct lammpi module

$ module load lam-intel (Loads the lammpi/intel-12 module)
or
$ module load lammpi/intel-11
or
$ module load lammpi/intel-12
or
$ module load lammpi/gnu 
or
$ module load lam-gnu (loads the lammpi/gnu module)

Then compile the code

$ mpiCC -o lam-hello mpi_hello.cpp
 
You should not see any error output, and should have an executable file called lam-hello in current your directory.
CAUTION: If you do not load the module your code will not compile correctly!

Write the jobfile (jobfile.job)

#PBS -N lam-hello
#PBS -q @nic-cluster.mst.edu
#PBS -l nodes=2
#PBS -l walltime=01:00:00
#PBS -m abe
#PBS -M joeminer@mst.edu
#PBS -V
cd $PBS_O_WORKDIR
mpiexec -v -boot -n 2 ./lam-hello

Submit the Job

If you have logged out since you last loaded the module you will need to load the module again.
$ qsub jobfile.job

Output

job_file_name.o####
mpiexec: Booting lam..
Running: lamboot  -v
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
mpiexec: Lamboot Complete
mpiexec: Launching MPI programs
Running: mpirun  -v  /tmp/lam_appschema_5Q0gj722678 ./lam-hello running on n0 (o)
15318 ./lam-hello running on n1
Returned: 0 Hello World! I am 1 of 2
Returned: 0 Hello World! I am 0 of 2
mpiexec: MPI program execution over..
mpiexec: Performing Lamhalt
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
Shutting down LAM
hreq: received HALT_ACK from n1 (compute-0-1.local)
hreq: received HALT_ACK from n0 (compute-0-2.local)
lamhalt: sleeping to wait for lamds to die
LAM halted
mpiexec: Lamhalt complete

Errors

If the output is not what you expected check the job_file_name.e#### file.

Not loading module

Forgetting to load module before executing program.
.e file looks like:
--------------------------------------------------------------------------
Failed to find the following executable:
Host:       compute-0-2.local
Executable: -boot
Cannot continue.
--------------------------------------------------------------------------
Solution: Load module, resubmit job.

Compiling without module

Forgetting to load module before compiling. This one is tricky, because it gives an executable file, in this case broken-lam, and will give some output. This error will also occur if a compiled program is launched under the wrong module.
Output file:
mpiexec: Booting lam..
Running: lamboot  -v
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
mpiexec: Lamboot Complete
mpiexec: Launching MPI programs
Running: mpirun  -v  /tmp/lam_appschema_wrR54i12722 ./broken-lam running on n0 (o)
5600 ./broken-lam running on n1
Returned: 0 Hello World! I am 0 of 1
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
Shutting down LAM
hreq: received HALT_ACK from n1 (compute-0-1.local)
hreq: received HALT_ACK from n0 (compute-0-2.local)
lamhalt: sleeping to wait for lamds to die
LAM halted
mpiexec: Lamhalt complete
That doesn't quite look right. Checking the error file.
Error file:
n-1<12717> ssi:boot:base:linear_windowed: booting n0 (compute-0-2.local)
n-1<12717> ssi:boot:base:linear_windowed: booting n1 (compute-0-1.local)
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
mpirun failed with exit status 252
 
Solution: Load module, recompile code, resubmit job.