Lightweight MPI analysis

Intel MPI

The Intel MPI library has a built in statistics gathering facility that collects essential performance data without disturbing the application execution. This data can afterwards be analysed using the Intel MPI Performance Snapshot utility to get an overview of the application performance.

For a high level profiling, such as how much time is spent in MPI and which MPI subroutines, it is enough to set the following environment variables

export I_MPI_STATS=ipm
export I_MPI_STATS_FILE=prof.dat

The output will contain wallclock, user, system and mpi time; the time spent in the user code is including MPI. To analyse these number in terms of efficiency you might read the following article: Lightweight MPI Profiling with Intel MPI

For a more detailed analysis the native performance data provides in-depth MPI performance data, e.g. data transfer metrics, which could identify performance problems. While the jobs batch script can still be used as usual, the additional environment variable I_MPI_STATS needs to be set to one of the values 1,2,3,4,10 or 20. The higher the value, the more information that will be provided, ie.

export I_MPI_STATS=20

will give you most detailed information on MPI runtime.

One could further restrict the collection of data to some MPI subsystems only via

export I_MPI_STATS_SCOPE="<subsystem>[:<ops>][;<subsystem>[:<ops>][...]]"

Where subsystem is one of coll (for collective MPI operations), p2p (for point-to-point MPI operations) or all (which is the default) - ops can further be used to specify MPI operations explicitely. For details refer to the Intel MPI developer reference.

Once the application has finished, the results are stored in a plain text file called stats.txt. If you want to store the data in a different file, set the filename to be used via the environment variable I_MPI_STATS_FILE.

You can further analyse the data using the Intel mps tool (installed at /sw/rhel6-x64/intel/traceanalyzer/9.1_update2/itac/9.1.2.024/intel64/bin/mps) and produce a more readable summary in txt or html format

$ /sw/rhel6-x64/intel/traceanalyzer/9.1_update2/itac/9.1.2.024/intel64/bin/mps -g -t -D -o stats.txt

OpenMPI and (deprecated) bullxMPI

For the following OpenMPI-based installations we provide a corresponding installation of profiling tools:

  • bullxmpi_mlx/bullxmpi_mlx-1.2.8.3 (deprecated)

  • openmpi/2.0.2p2_hpcx-intel14

mpiP

A lightweight profiling library for MPI applications. In addition to the MPI summary profiling, mpiP can provide “call site” statistics showing which calls in the code are dominating MPI execution time. As mpiP gathers MPI information through the MPI profiling layer, you don’t have to recompile your application to use mpiP. It is enough to relink your binary against the appropriate library. Make sure the mpiP library appears before the MPI library on your link line - this can be achieved by using the MPI compiler wrapper. For OpenMPI-2 you need to modify the link command to:

mpif90 my_obj.o -o appname -L/sw/rhel6-x64/mpi/mpiP-3.4.1-openmpi2-intel14/lib \
-Wl,-rpath,/sw/rhel6-x64/mpi/mpiP-3.4.1-openmpi2-intel14 -lmpiP

After your batch job finalizes, the mpiP profiler generates a text based output file named like

<appname>.<ntasks>.20290.1.mpiP

For details refer to the official mpiP website.

IPM

A tool that provides a low-overhead performance profile of the performance aspects and resource utilization in a parallel program. Using IPM does not require code recompilation. Instead, LD_PRELOAD is used to dynamically load the IPM library (libipm.so) as a wrapper to the MPI runtime. Therefore, you just need to modify the batch script as shown here

export LD_PRELOAD=/sw/rhel6-x64/mpi/ipm-0.983-openmpi2-intel14/lib/libipm.so
srun ... # as usual
unset LD_PRELOAD

The IPM profiler generates by default a summary of the performance information for the application on stdout. IPM also generates an XML file that can be used to generate a graphical webpage. This file is named something like

<username>.1459775774.278838.0

The postprocessing then needs to be done on your local machine:

  1. make sure that ploticus is installed on your local machine (see here)

  2. copy the XML output file, the ipm_parse script and ipm_key file to your machine (located at /sw/rhel6-x64/mpi/ipm-0.983-openmpi2-intel14)

  3. create the webpage using

    IPM_KEYFILE=./ipm_key ./ipm_parse -html <ipm_xml_file>
    
  4. you will be find a new directory with an index.html file that can be opened with any browser

For details refer to the official IPM website.