Douglas de Jager, Jeremy T. Bradley

- Conference or Workshop Paper
- ICTIR'09, 2nd International Conference on the Theory of Information Retrieval
- September, 2009
- Lecture Notes in Computer Science
- Volume 5766
- pp.17–28
- Springer Verlag
**DOI**10.1007/978-3-642-04417-5_3- Abstract
The PageRank algorithm is used today within web information retrieval to provide a content-neutral ranking metric over web pages. It employs power method iterations to solve for the steady-state vector of a DTMC. The defining one-step probability transition matrix of this DTMC is derived from the hyperlink structure of the web and a model of web surfing behaviour which accounts for user bookmarks and memorised URLs.

In this paper we look to provide a more accessible, more broadly applicable explanation than has been given in the literature of how to make PageRank calculation more tractable through removal of the dangling-page matrix. This allows web pages without outgoing links to be removed before we employ power method iterations. It also allows decomposition of the problem according to irreducible subcomponents of the original transition matrix. Our explanation also covers a PageRank extension to accommodate TrustRank. In setting out our alternative explanation, we introduce and apply a general linear algebraic theorem which allows us to map homogeneous singular linear systems of index one to inhomogeneous non-singular linear systems with a shared solution vector. As an aside, we show in this paper that irreducibility is not required for PageRank to be well-defined.

- PDF of full publication (121.5 kilobytes)
- (need help viewing PDF files?)
- PDF of presentation slides (1.1 megabytes)

Information from pubs.doc.ic.ac.uk/pagerank-explanation.