Reordering MPI Ranks

Introduction

When a parallel program runs, MPI tasks are assigned to compute cores. Since compute nodes (which each contain 24 cores) are located across different positions on the 3D torus network, communication time between tasks will vary depending not only on node placement, but also the placement of each task within the allocated nodes. This study explores the performance of applications when the placement of MPI tasks is changed across nodes allocated to an application.

Methodology

One way to change MPI task placement on cores is to change the rank ordering, the order in which MPI tasks (or ranks) are assigned to cores. When a parallel program is run on Hopper using the aprun command, the enviroment variable MPICH_RANK_REORDER_METHOD determines the order in which tasks are assigned to cores.

MPICH_RANK_REORDER_METHOD which can be set to an integer from 0 to 3:

Rank reorder method 1 (SMP-Style) is the default, i.e., all programs run with SMP-style rank ordering if MPICH_RANK_REORDER_METHOD is not set.

Setting MPICH_RANK_REORDER_METHOD=3 tells aprun to read a custom rank order from the file named MPICH_RANK_ORDER in the current directory. CrayPAT's pat_report tool can generate recommended rank order files by specifying the -Ompi_sm_rank_order flag. It generates two files, MPICH_RANK_ORDER.d and MPICH_RANK_ORDER.u.

More information can be found on the mpi man page. (Search for MPICH_RANK_REORDER_METHOD.)

Experiment

A series of benchmark programs were run with the different rank orders using the following procedure:

Build with perftools module loaded
Use pat_build to make an CrayPAT-instrumented version.
Run instrumented version using default rank order method.
Use pat_report was used to generate CrayPAT's two recommended rank orders, d and u.
Run noninstrumented version with each of the five (three predefined and two generated) rank orders.
Applications have their own methods of recording run times and these run times were collected and analyzed.

Results and Analysis

Run time* (in seconds)

* minimum of two independent runs

	0	1	2	3d	3u
CAM	349	361	354	351	352
GTC	1,333	1,336	1,334	1,336	1,332
IMPACT-T	637	596	643	630	666
MAESTRO	1,933	1,981	1,939	N/A	N/A
MILC	1,809	996	1,583	1,293	1,315
PARATEC	460	408	442	498	485

The data show that not all programs are the most efficient with the default rank ordering.

Conclusions

The default rank order method is generally the best.

Hopper users may be able to increase the efficiency of their programs by trying different rank order methods.

There is generally no need to use CrayPAT’s custom rank order.

Users wishing to experiment with different rank orders on their own programs may follow

The default rank order method is generally the best.
Hopper users may be able to increase the efficiency of their programs by trying different rank order methods.
Experiments with 6 different applications resulting in no benefit using Cray's CrayPAT’s custom rank order.
Users wishing to experiment with different rank orders on their own programs may follow the procedure used in this study.

Passwords &Account Support

Consulting

Off-Hours Status &Passwords

Reordering MPI Ranks

Introduction

Methodology

Experiment

Results and Analysis

Conclusions

Send us feedback about this page

Passwords &
Account Support

Off-Hours Status &
Passwords