Project Title: Random Walking on the World-Wide Web


Timely and accurate statistics about the web-pages are becoming increasingly important for academic and commercial use. Some relevant questions are: What percentage of web pages are in the .com domain? How many pages are indexed by a particular search engine? What is the distribution of sizes, modification times, and content of web pages?

In order to answer these questions we need to estimate the size of certain sets of web pages. In this project implemented and run a "webwalker", which is a technique for sampling web pages using a random walk on the web. The random walk surfs the web "at random". At each step it either follows a random link from the current page, or follows a random link that enters the current page "in reverse".

