Geographical partition for distributed web crawling

Exposto, José
Macedo, Joaquim
Pina, António
Alves, Albano
Rufino, José

Open PDF

Open link

Publication date

January 2005

DOI

10.1145/1096985.1096999

Publisher

American College of Medical Physics (ACMP)

Citation count (estimate)

Abstract

This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same scope of the hosting web server geographical location. During download, if the analyze of a page contents recommends a different geographical scope, the page is forwarded to the well-located web server. A sample of the Portuguese We...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Geographical partition for distributed web crawling

Abstract

Extracted data

Geographical partition for distributed web crawling

Abstract

Extracted data

Related items

Related items