CFL (Communication Fusion Library) is an experimental C++ library which supports shared reduction variables in MPI programs. It uses overloading to distinguish private variables from replicated, shared variables, and automatically introduces MPI communication to keep replicated data consistent. This paper concerns a simple but surprisingly effective technique which improves performance substantially: CFL operators are executed lazily in order to expose opportunities for run-time, context-dependent, optimisation such as message aggregation and operator fusion. We evaluate the idea using both toy benchmarks and a `production' code for simulating plankton population dynamics in the upper ocean. The results demonstrate the library's software engineering benefits, and show that performance close to that of manually optimised code can be achieved automatically in many cases.
Information from pubs.doc.ic.ac.uk/Reduction-variables.