Session S23.5
Error Estimates and Communication Overhead in the Computation of the Bidomain Equations on the Distributed Memory Parallel Blue Gene/L Supercomputer
M Reumann*, BG Fitch, A Rayshubskiy, DL Weiss, G Seemann,
O Doessel, MC Pitman, JJ Rice
IBM T. J. Watson Research Center
Yorktown Heights, NY, USA
While CPU power has followed Moore’s Law since the 1970s, the consensus is that clock speeds will not appreciable increase in the foreseeable future with silicon technology. In light of this, further increases in computation power will demand higher levels of parallelism in multi-core approaches. In tissue-level cardiac models, the parallelism will require distribution of workloads, generally based on volumetric decomposition of the heart tissue itself. The balancing of computational load and inter-processor communication become paramount in a highly distributed environment where memory is not shared between all processors (typical in non-SMP machines such as clusters). We present such an example with bidomain tissue-level cardiac models implemented on the Blue Gene/L architecture. Standard bidomain calculations require knowledge of the electrical potential at all points in the 3-D mesh, and hence, the results demonstrate an important use case in a completely distributed environment that does not permit data coalescence into a commonly-shared memory space that is typical in existing implementations. We model a tissue block of size 50 x 50 x 100 cubic elements with 0.2 mm resolution. Computation is distributed on 512 computational nodes. The ten Tusscher et al. (2004) cell model and the bidomain equations are solved for heterogeneous anisotropic tissue by the Saleheen et al. (1998) formulation. The extracellular potential is calculated by the Gauss-Seidel (GS) iterative method using either the global sum of squared residuals across all computational nodes with < 1e-6 as termination criterion or if a maximum number of iterations (set to 100) is reached. In simulation runs of 10000 time steps (fixed at 100 ms), the extracellular potential is calculated at each or at every fifth time step (a commonly used heuristic to reduce communication overhead). A single stimulus is set to create a propagating wave front per run. The results show that after the number of GS iterations per time step falls quickly below 20 and seldom exceeds 10 after 20 ms simulation time. The communication overhead was mainly reduced by allowing less communication in the GS iteration. Specifically, we arbitrary reduced inter-processor communication to every 5th or 10th iteration with little impact on convergence or simulation results. In fact, this reduction had greater impact on computation overhead than calculating the extracellular potential every 5th time step. While preliminary, our results illustrate a fully distributed bi-domain computation is feasible. Moreover, the results suggest some heuristic approaches may further reduce the inter-processor communication to improve the execution time of large-scale simulations. Future work will focus on simulating large anatomical data-sets on system sizes of > 1024 processors.
(Abstract Control Number: 147)