- PhD Thesis
- Imperial College London
- December, 2009
- Abstract
Performance analysis calculations, for models of any complexity, require a distributed computation effort that can easily occupy a large compute cluster for many days. Producing a simple steady-state measure involves an enormous dominant eigenvector calculation, with even modest performance models having upwards of 10

^{12}variables. Computations such as passage-time analysis are an order of magnitude more difficult, producing many hundreds of repeated linear system calculations. As models describe greater concurrency, so the state space of the model increases and with it the magnitude of any performance analysis problem that may be being attempted as well.The PageRank algorithm is used by Google to measure the relative importance of web pages. It does this by formulating and solving a similarly enormous dominant eigenvector problem, with one variable for every page on the web. As with performance problems, as the number of web pages grows, so the size of the underlying system calculation grows also. With the number of web pages currently estimated to exceed 1 trillion, the PageRank problem requires many thousands of computers running concurrently over many different clusters.

Both problems share the same underlying mathematical type and also the same requirement to run effectively on large distributed clusters. Traditional iterative solution methods scale poorly over large distributed architectures. This is because of the inherent requirement to communicate and synchronise at every iteration step.

While asynchronous iterative methods have been around since the 1950s, they have, as yet, not been applied to dominant eigenvector problems without some form of restriction. These methods have been shown to be very successful in other contexts when implemented across large distributed architectures.

According to the current state of the art in asynchronous techniques, application to dominant eigenvector problems requires a fixed bound on how and when updates can happen, and thus effectively a bound on the asynchronous communication itself. In this thesis, we show how to apply asynchronous iterative methods to dominant eigenvector problems without any such restrictions. We do this by showing how to map homogeneous, singular linear systems to inhomogeneous, non-singular linear systems which share the same solution.

We present a single asynchronous iterative solution framework for performance analysis problems. We also present three particular solution algorithms. We demonstrate analytically and empirically that asynchronous iterative methods offer significant advantages over traditional synchronous solution methods.

We use the theoretical tools which we introduce in this thesis to reduce the complexity of the PageRank problem, limiting the ever-increasing impact of dangling web pages. We generate a smaller, sparser problem which may be solved using asynchronous iterative methods.

- PDF of full publication (1.3 megabytes)
- (need help viewing PDF files?)

Information from pubs.doc.ic.ac.uk/asynchronous-iterations-pagerank.