Efficient partitioning strategies for distributed web crawling

Exposto, José
Macedo, Joaquim
Pina, António
Alves, Albano
Rufino, José

Publication date

January 2008

Abstract

This paper presents a multi-objective approach toWeb space partitioning, aimed to improve distributed crawling efficiency. The in- vestigation is supported by the construction of two different weighted graphs. The first is used to model the topological communication infras- tructure between crawlers and Web servers and the second is used to represent the amount of link connections between servers’ pages. The values of the graph edges represent, respectively, computed RTTs and pages links between nodes. The two graphs are further combined, using a multi-objective partition- ing algorithm, to support Web space partitioning and load allocation for an adaptable number of geographical distributed crawlers. Partitioning strategies were...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Efficient partitioning strategies for distributed web crawling

Abstract

Extracted data

Efficient partitioning strategies for distributed web crawling

Abstract

Extracted data

Related items

Related items