Timely and accurate statistics about the
web-pages are becoming increasingly important for academic and commercial use.
Some relevant questions are: What percentage of web pages are in the .com
domain? How many pages are indexed by a particular search engine? What is the
distribution of sizes, modification times, and content of web pages?
In order to answer these questions we need to
estimate the size of certain sets of web pages. In this project implemented and
run a "webwalker", which is a technique for sampling web pages using a random walk on
the web. The random walk surfs the web "at random". At each step it
either follows a random link from the current page, or follows a random link
that enters the current page "in reverse".