An MPI + GPU implementation case study (Part of NZ HPC Applications Workshop)

In this case study presentation we discuss the design and implementation of a high performance electro-magnetic field simulation code. This code is part of an ongoing simulation and modeling research project at the University of Auckland's Institute of Earth Science and Engineering. Because we were seeking optimal performance in this development, native C++ programming was chosen rather than application packages or general purpose libraries.

The initial desktop version of the code has been recently ported to run on the University of Auckland High Performance Cluster. We targeted taking advantage of the Intel CPU + NVIDIA GPU nodes which are currently available on the cluster. However, using multiple GPU's across multiple nodes imposed a design challenge. The implementation problem was approached using top-down design starting with data-cube partitioning for MPI parallelization and defining a minimal mesh of MPI node communication links. At the inner-most level, we dispatch computation to two GPU's per MPI node (note: as of 18.04.13, GPU coding tests have been successful and the algorithm implementation is in progress).

As part of the HPC port, we also took on the goal of scaling up the simulation resolution and this has pushed the memory, storage and data throughput requirements to system limits.

We validated the correctness of the HPC port by comparing results with the desktop version using a modest model size.

We completed initial performance profiling which indicated good (nearly linear) MPI scaling as well as a positive outlook for GPU scaling. Up-to-date results will be presented at the Symposium.

eResearch NZ 2013 session type: 


Submitted by Tim McNamara on