Near-optimal decomposition is central to the efficient solution of numerous problems in scientific computing and computer-aided design. In particular, intelligent a priori partitioning of input data can greatly improve the runtime and scalability of large-scale parallel computations. Discrete data structures such as graphs and hypergraphs are used to formalise such partitioning problems, with hypergraphs typically preferred for their greater expressiveness. Optimal graph and hypergraph partitioning are NP-complete problems; however, serial heuristic algorithms that run in low-order polynomial time have been studied extensively and good tool support exists. Yet, to date, only graph partitioning algorithms have been parallelised.
This thesis presents the first parallel hypergraph partitioning algorithms, enabling both partitioning of much larger hypergraphs, and computation of partitions with significantly reduced runtimes. In the multilevel approach which we adopt, the coarsening and refinement phases are performed in parallel while the initial partitioning phase is computed serially. During coarsening and refinement, a two-phase approach overcomes concurrency issues involved in distributing the serial algorithms, while a hash-based data distribution scheme maintains load balance. A theoretical analysis demonstrates our algorithms' asymptotic scalability.
The algorithms are implemented in the Parkway tool. Experiments on hypergraphs from several application domains validate our algorithms and scalability model in two ways. Very large hypergraphs (10^8 vertices) are partitioned with consistent improvements in partition quality (of up to 60%) over an approximate approach that uses a state-of-the-art parallel graph partitioning tool. The algorithms also exhibit good empirical scalability and speedups are observed over serial hypergraph partitioning tools, while maintaining competitive partition quality. An application case study of parallel PageRank computation is presented. Applying hypergraph models for one- and two-dimensional sparse matrix decomposition on a number of publicly-available web graphs results in a significant reduction in communication overhead and a halving of per-iteration runtime.
Information from pubs.doc.ic.ac.uk/atthesis.