AESOP home


Scalable Performance Analysis of Massively Parallel Stochastic Systems

Richard Hayden

PhD Thesis
Imperial College London
March, 2011

The accurate performance analysis of large-scale computer and communication systems is directly inhibited by an exponential growth in the state-space of the underlying Markovian performance model. This is particularly true when considering massively-parallel architectures such as cloud or grid computing infrastructures. Nevertheless, an ability to extract quantitative performance measures such as passage-time distributions from performance models of these systems is critical for providers of these services. Indeed, without such an ability, they remain unable to offer realistic end-to-end service level agreements (SLAs) which they can have any confidence of honouring. Additionally, this must be possible in a short enough period of time to allow many different parameter combinations in a complex system to be tested. If we can achieve this rapid performance analysis goal, it will enable service providers and engineers to determine the cost-optimal behaviour which satisfies the SLAs.

In this thesis, we develop a scalable performance analysis framework for the grouped PEPA stochastic process algebra. Our approach is based on the approximation of key model quantities such as means and variances by tractable systems of ordinary differential equations (ODEs). Crucially, the size of these systems of ODEs is independent of the number of interacting entities within the model, making these analysis techniques extremely scalable. The reliability of our approach is directly supported by convergence results and, in some cases, explicit error bounds.

We focus on extracting passage-time measures from performance models since these are very commonly the language in which a service level agreement is phrased. We design scalable analysis techniques which can handle passages defined both in terms of entire component populations as well as individual or tagged members of a large population.

A precise and straightforward specification of a passage-time service level agreement is as important to the performance engineering process as its evaluation. This is especially true of large and complex models of industrial-scale systems. To address this, we introduce the unified stochastic probe framework. Unified stochastic probes are used to generate a model augmentation which exposes explicitly the SLA measure of interest to the analysis toolkit. In this thesis, we deploy these probes to define many detailed and derived performance measures that can be automatically and directly analysed using rapid ODE techniques. In this way, we tackle applicable problems at many levels of the performance engineering process: from specification and model representation to efficient and scalable analysis.


This thesis was a finalist in the 2012 INFORMS Doctoral Dissertation Award for Operations Research in Telecommunications - - and a runner-up in the CPHC/BCS Distinguished Dissertation awards in October 2012 -

PDF of full publication (2.8 megabytes)
(need help viewing PDF files?)

Information from