Intel Tools¶
Intel toolchain comprises VTune Amplifier, Advisor and Inspector.
Intel VTune Amplifier¶
Intel® VTune™ Amplifier XE is a Performance profiler. It should be used to analyse the algorithm choices, find serial and parallel code bottlenecks, understand where and how your application can benefit from available hardware resources, and speed up the execution.
Step 1: Start the VTune Amplifier¶
Build your target application in the Release mode with all optimizations enabled.
Set up the environment variables:
- ::
module add inteltools
Launch the VTune Amplifier:
For standalone GUI interface, run the
amplxe-gui
command.For command line interface, run the
amplxe-cl
command.
For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM’s multiple program configuration (MPMD). Since most of the VTune analysis is just node based, it makes sense to analyse only one tasks or all task on one node.
To use the multiple program configuration create a suitable config file and modify the srun command like
cat > vtune.conf <<EOF 0 amplxe-cl -c [analysis type] -r vtune-results [vtune options] -- ./myapp 1-N ./myapp EOF srun [any slurm options] --multi-prog vtune.conf
The analysis types that can be collected are:
hotspots: identifies the most time-consuming source code.
advanced-hotspots: as before but uses the VTune Amplifier kernel driver to extend the hotspot analysis by collecting call stacks, context switches and statistical call count data as well as analysing the CPI metric (cycles per instruction).
concurrency: usage of available logical CPUs, discovers where parallelism is incurring synchronisation overhead and identifies potential candidates for parallelisation.
locksandwaits: identifies where the application is waiting on synchronisation objects or I/O operations.
general-exploration: uses hardware event-based sampling to analyse general issues affecting the performance of the application.
memory-access: measures a set of metrics to identify memory access related issues.
Further options to amplxe-cl
you might use are:
-trace-mpi
: Configure collectors to trace MPI code, and determine MPI rank IDs in case of a non-Intel MPI library implementation.-data-limit=0
: Limit the amount of raw data to be collected by setting the maximum possible result size (in MB). VTune Amplifier starts collecting data from the beginning of the target execution and ends when the limit for the result size is reached. For unlimited data size, specify 0.-call-stack-mode=all
: Choose how to show system functions in the stack.-target-duration-type=long
: Estimate the application duration time. This value affects the size of collected data. For long running targets, sampling interval is increased to reduce the result size. For hardware event-based analysis types, the duration estimate affects a multiplier applied to the configured Sample after value.
Step 2: Set Up the Analysis Target (only if using the GUI)¶
Create a VTune Amplifier project:
Click the menu button in the right corner and go to New > Project… .
Specify the project name and location in the Create Project dialog box.
In the Analysis Target tab, select a target system from the left pane - just use ‘local’ on mistralpp for very small, serial tests. Otherwise, use the command line interface and submit your analysis job to the queue.
Select the Analysis Type from the according tab - if you are using the command line interface (recommended), you have to specify the analysis target type after the
-collect
option.Configure your target: application location, parameters, and search directories (if required).
Step 3: View and Analyse Performance Data¶
If you are using the GUI, click the Start button on the right to launch the analysis.
If you used the command line interface, you should end up with some directories (one per compute node) containing the analysis results and labelled vtune-results (or according to the -r option given).
Start your analysis with the Summary window to get an overview of the application performance and then switch to other windows to explore the performance deeper at the granularity of function, source line and so on.
Please also have a look at the Intel VTune tutorials: https://software.intel.com/en-us/articles/intel-vtune-amplifier-tutorials
Intel Advisor¶
Intel® Advisor offers a vectorization analysis tool and a threading design and prototyping tool to help ensure your Fortran, C and C++ applications take full performance advantage of today’s processors.
Step 1: Prerequisites¶
Build your target application in the Release mode with all optimizations enabled,
-O2
or higherRequest full debug information (compiler and linker):
-g
Produce compiler diagnostics:
-qopt-report=5
Enable vectorization:
-vec
Enable SIMD directives:
-simd
Enable generation of multi-threaded code based on OpenMP directives if applicable:
-qopenmp
Set up the environment variables by loading the module inteltools:
module add inteltools
Launch the Intel Advisor
For standalone GUI interface, run the
advixe-gui
command.For command line interface, run the
advixe-cl
command.
For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM’s multiple program configuration (MPMD).
To use the multiple program configuration create a suitable config file and modify the srun command like
cat > advisor.conf <<EOF 0 advixe-cl --collect [analysis type] --project-dir advisor-results -- ./myapp 1-N ./myapp EOF srun [any slurm options] --multi-prog advisor.conf
The analysis types that can be collected are:
survey: Explore where to add efficient vectorization and/or threading.
dependencies: Identify and explore loop-carried dependencies for marked loops.
map: Identify and explore complex memory accesses for marked loops.
suitability: Analyze the annotated program to check its predicted parallel performance.
Further options to advixe-cl
you might use are:
-trace-mpi
: Configure collectors to trace MPI code, and determine MPI rank IDs in case of a non-Intel MPI library implementation.-data-limit=0
: Limit the amount of raw data to be collected by setting the maximum possible result size (in MB). VTune Amplifier starts collecting data from the beginning of the target execution and ends when the limit for the result size is reached. For unlimited data size, specify 0.
Step 2: Run Survey Analysis¶
If you are using the Command Line Interface: just submit your SLURM batch job as normal
If you are using the GUI: Under Survey Target in the VECTORIZATION WORKFLOW, click the Run control button to collect Survey data while your application executes.
CAUTION: Perform the interactive survey analysis only on mistralpp and only for very small application settings!
Step 3: View and Analyse the Data¶
After your batch job running the advixe-cl
Command Line Interface
finalized successfully, you will have the results reported in the
project subdirectory. Use the advixe-gui to open the result file and
start analysing.
Please refer to the Intel Advisor Getting Started Guide: https://software.intel.com/en-us/get-started-with-advisor-vectorization-linux
Intel Inspector¶
Intel® Inspector is a dynamic memory and threading error checking tool for users developing serial and multithreaded applications. It offers a standalone GUI and command line operational environments. Key features are:
A wealth of reported memory errors, including on-demand memory leak detection
Memory growth measurement to help ensure your application uses no more memory than expected
Data race, deadlock, lock hierarchy violation, and cross-thread stack access error detection, including error detection on the stack
Step 1: Prerequisites¶
Build your application in debug mode to produce the most accurate and complete Intel Inspector analysis results:
Use optimal compiler/linker settings. For more information, see: Building Applications in Intel Inspector Help.
Ensure your application creates more than one thread before you run threading analyses.
Verify your application runs outside the Intel Inspector environment
Set up the environment variables:
- ::
module add inteltools
Launch the Intel Inspector
For standalone GUI interface, run the
advixe-gui
command.For command line interface, run the
advixe-cl
command.
For the analysis of your application running on mistral, we recommend using the command line interface in combination with SLURM’s multiple program configuration (MPMD). To check for memory leaks e.g. modify your batch script as follows
cat > inspector.conf <<EOF
0 inspxe-cl -c mi3 -trace-mpi -r inspector-results -- ./myapp
1-N ./myapp
EOF
srun [any slurm options] --multi-prog inspector.conf
Step 2: Run Analysis¶
If you are using the Command Line Interface: just submit your SLURM batch job as normal
If you are using the GUI: you need to choose/create a project, configure the project and the targetted analysis. Finally, start the analysis.
CAUTION: Perform the interactive analysis only on mistralpp and only for very small application settings!
Step 3: View and Analyse the Data¶
After your batch job running the inspxe-cl
Command Line Interface
finalized successfully, you will have the results reported in the project
subdirectory - a short summary is given in inspxe-cl.txt while the full
results are in *.inspxe. Use the inspxe-gui
to open the result
file and start analysing.
Please refer to the Intel Inspector Getting Started Guide: https://software.intel.com/en-us/node/595380