HPC: Iterative solver benchmark

by Kirill Pichon Gostaf
and Jean-Baptiste Apoung Kamga

Overview

Finite element simulations of arterial blood flow serve to provide pressure, velocity pattern and wall shear stress in regions of arterial dilatations and bifurcations. Even moderate size models require solving linear systems with millions of unknowns; 15 - 20 hours per time step is an average sequential run time, therefore, parallel computing is a necessity → [read more].

Benchmark details

Here, we compare parallelization frameworks to solve large scale problems of hemodynamics using OpenMP and MPI environments. We demonstrate that computational performance of a finite element code could be dramatically improved, when a proper use of a multi-core architecture is carried out. This benchmark evaluates computational performance of the new SGI 100UV cluster, which has been recently installed in our lab. We use an extended version of an open source FreeFem++ platform.

Problem setup

Finite elements P2/P1 are used to generate sparse matrices. The sparsity is constant and does not depend on mesh refinement; it is roughly 28 entries per row. We solve a linear system Au = F for matrices of different size which are mentioned on the charts. Here, you can see the performance of the sparse matrix-vector product for the OpenMP and MPI implementations.

MPI/OpenMP libraries

Modules
Compilateurs/gcc/4.5.2
Bibliotheques/sgi-mpi/2.02

OpenMP

MPI sgi-mpi/2.02

kirill gostaf: OpenMP SGI Altix UV benchmark kirill gostaf: MPI SGI Altix UV benchmark

Observations

First, I would like to thank our IT stuff Khashayar, Philippe and Pascal for their professional/permanent work, help and desire to provide us, students, with the most sophisticated hardware.

OpenMP: I observe that the stage two (new) cluster is much more efficient with respect to the previous one. I could not get more then x25 speed-up (b&w figure below), when running the OpenMP code. The main reason is the presence of the frontal machine. I guess that the hardware interconnections are also improved. So, the actual speed-up of x60 for 120 cores is encouraging.

MPI: Here, the results are less assuring. I have used the MPI sgi-mpi/2.02 library in order to compare with the previous charts, obtained during july 2011. The actual results (timings) are almost the same. So, there is no visible improvement when using MPI version of our code. The speed-up of x20 for 120 cores is absolutely poor. However, the same code performs quite efficiently on the the UPMC IMB iDataPlex cluster (see the figure below).

OpenMP

MPI

kirill gostaf: OpenMP no pre-fetch benchmark kirill gostaf: mixed MPI benchmark
Kirill Gostaf © | page loaded 1930 times since July 12, 2011