Research And Markets Research And Markets
This product is currently not available for purchase.

SharpSpider: A Continuous, Parallel and Distributed Spider. Edition No. 1

  • ID: 1913049
  • July 2009
  • 160 Pages
  • VDM Publishing House
1 of 3

Search engines have become so indispensable that
they rank second only to e-mail as the most popular
online activity. To respond to queries in a timely
fashion, search engines make use of large indices of
word occurrences on Web pages to cross-reference
websites to keywords. Such indices are maintained by
spiders, a special kind of computer program that
browses the Web autonomously. However, due to a
variety of technological limitations, a single
spider has proven insufficient to maintain a search
engine's index. Hence, in this book, we review
several alternatives to split a spider's work into
multiple processes, and define a methodology to
preserve an up-to-date index of the Web.
SharpSpider, our prototype spider, has been
evaluated using the resources of PlanetLab, a
globally distributed platform for developing and
deploying planetary-scale services. Despite the
utilisation of very modest equipment, we have
performed large crawls of the Web, distributing the
workload amongst various computers spread across
different continents. The statistics derived from
our research offer valuable insight into the nature
of educational Web resources.

Note: Product cover images may vary from those shown
2 of 3

Marco, Palomino.
After concluding his PhD in Computer Science at the University
of Cambridge, Marco Palomino worked as a software consultant in
London, and then joined the Information Retrieval Group of the
University of Sunderland in 2007. Currently, Marco works as a
research associate, and his work focuses on the automatic
indexing of multimedia collections.

Note: Product cover images may vary from those shown
3 of 3
Note: Product cover images may vary from those shown