Very large Markov chains often arise from stochastic models of complex real-life systems. In this paper we investigate disk-based techniques for solving such Markov chains on a distributed-memory parallel computer. We focus on two scalable numerical methods, namely the Jacobi and Conjugate Gradient Squared (CGS) algorithms.
The critical bottleneck in these methods is the parallel sparse matrix-vector multiply operation. By exploiting the data locality typically found in automatically generated state-transition-rate matrices, we develop an efficient matrix-vector multiply kernel that is characterised by low per-processor memory usage, low communication cost, and good load balance. At a slightly higher memory cost, the kernel also allows for the overlapping of communication and computation.
We describe a distributed software architecture which makes use of two processes per node to allow for the overlapping of disk I/O with computation and communication. By embedding our matrix-vector multiply kernel in this architecture, we implement a high-performance disk-based solver on a 16-node Fujitsu AP3000 computer. We demonstrate results showing good speedups and the capability to analyse very large models with over 50 million states and over 500 million transitions.
Information from pubs.doc.ic.ac.uk/distributed-disk-markov-nsmc.